Hardware Smoke Test - Conclusions¶
Experiment ID: SMOKE-HW Workstream: S (Shadows) Date: Nov 3, 2025
Key Findings¶
-
Hardware Integration Validated: QuartumSE successfully interfaces with IBM Quantum Runtime, completing end-to-end workflows (job submission, execution, retrieval, provenance capture).
-
Execution Performance Acceptable: 7.82 seconds for 100 shadows on 3-qubit GHZ meets runtime targets, with minimal queue wait (< 5 min) on ibm_fez.
-
Noise Impact Characterized: Hardware noise reduces ZZI observable from 1.0 (expected) to 0.54 (measured), a 46% degradation. ZIZ observable remains accurate (0.99), highlighting topology-dependent noise sensitivity.
-
Provenance System Works on Hardware: Full IBM calibration snapshot captured (T1, T2, gate/readout errors, timestamps), validating manifest generation under real-world conditions.
-
SSR Not Yet Demonstrated: 100-shadow budget insufficient to show shot efficiency gains (SSR ~1.0×). Extended validation with 500+ shadows + mitigation required.
Success Criteria Assessment¶
Primary Criteria¶
| Criterion | Target | Result | Status |
|---|---|---|---|
| Hardware Execution | Success | ✓ Completed on ibm_fez | ✅ PASS |
| Manifest Capture | Complete | ✓ Full provenance | ✅ PASS |
| Observable Estimation | 5 observables | ✓ Estimates with CIs | ✅ PASS |
| Runtime Compliance | < 10 min | 7.82s execution | ✅ PASS |
Secondary Criteria¶
| Criterion | Target | Result | Status |
|---|---|---|---|
| SSR Characterization | 1.1-2× | ~1.0× (inconclusive) | ⚠️ Need S-T01 |
| Noise Impact Analysis | Documented | ✓ 46% ZZI error | ✅ PASS |
| Calibration Metadata | Captured | ✓ T1/T2/errors | ✅ PASS |
| Queue Time Tracking | < 30 min | < 5 min | ✅ PASS |
OVERALL VERDICT: ✅ PASSED for hardware integration and operational validation. SSR target deferred to S-T01/S-T02 extended experiments.
Limitations and Caveats¶
Experimental Limitations¶
- Single Trial: One execution per backend (ibm_fez, ibm_torino), no statistical replication
- Small Shadow Budget: 100 shots insufficient for precise SSR measurement
- No Mitigation: v0 baseline only, no MEM/ZNE applied
- Limited Observables: 3-qubit GHZ Z/ZZ only, not comprehensive
- No Baseline Comparison: Direct measurement not run for true SSR computation
Hardware-Specific Caveats¶
- ibm_fez Specific: Results may not generalize to other IBM backends
- Calibration Drift: Experiments run < 1 hour after calibration (best-case scenario)
- Queue Variability: Low queue (77 jobs) not representative of typical conditions
- Qubit Selection: Used qubits 0-2 (high quality), may differ for other qubit regions
Interpretive Caveats¶
- Noise Attribution Uncertain: Cannot definitively separate gate vs. readout vs. decoherence errors
- SSR Estimate Unreliable: Based on single observable (ZZI) with large error, not statistically rigorous
- Topology Dependence: ZZI (46% error) vs. ZIZ (1% error) suggests qubit-pair heterogeneity, need controlled study
Implications for Phase 1 & Phase 2¶
Phase 1 Progression (Nov 2025)¶
Green Lights: 1. ✅ S-T01 Extended GHZ: Hardware access validated, proceed with ≥10 trials @ 500 shadows 2. ✅ S-T02 Noise-Aware: Use observed 46% ZZI error to benchmark MEM effectiveness 3. ✅ C-T01 H₂ Chemistry: Apply proven hardware pipeline to molecular Hamiltonian estimation 4. ✅ O/B/M Workstreams: Unblock cross-workstream starter experiments
Informed Expectations: - SSR ≥ 1.1× achievable with 500+ shadows + MEM (based on 46% raw noise headroom) - Expect CI coverage 60-80% without mitigation, improving to 80-90% with MEM - Budget 10-20 minutes per experiment (queue + execution + retrieval)
Phase 1 Completion Confidence: HIGH for hardware integration, MODERATE for SSR target (awaiting S-T01/S-T02).
Phase 2 Design Implications (Dec 2025)¶
v1 Noise-Aware Development: - 46% ZZI error provides clear target for MEM + inverse channel effectiveness - Calibration data (T1=63-209 μs, T2=49-199 μs, readout=0.77-2.22%) informs noise modeling
v2 Fermionic Shadows: - Observable-dependent noise (ZZI vs. ZIZ) suggests fermionic basis selection strategies - Chemistry Hamiltonians may benefit from topology-aware measurement allocation
v3 Adaptive Sampling: - ZIZ (1% error) vs. ZZI (46% error) motivates adaptive allocation: more shots for noisy observables - Qubit-pair quality should inform sampling policy
Hardware Campaign Planning: - ibm_fez validated as high-quality backend (queue depth 77, excellent qubits) - Establish backend selection criteria: queue < 200, calibration < 24 hours, readout error < 3%
Next Steps and Follow-Up Experiments¶
Immediate (Phase 1, Nov 2025)¶
- S-T01 Extended GHZ Validation [HIGH PRIORITY]
- Increase shadow_size to 500
- Run ≥10 independent trials on ibm_fez
- Compute SSR with statistical significance (mean ± std)
-
Target: SSR ≥ 1.1× with 80% CI coverage
-
S-T02 Noise-Aware Shadows with MEM [HIGH PRIORITY]
- Calibrate confusion matrices for qubits 0-2
- Run v0 vs. v1 on same ibm_fez backend
- Compare: variance reduction, bias correction, SSR improvement
-
Target: 20-30% variance reduction, SSR ≥ 1.1×
-
Baseline Direct Measurement [MEDIUM PRIORITY]
- Run grouped Pauli measurement with 1000 shots per observable
- Compute true SSR = (baseline_shots / shadow_shots) × (baseline_error / shadow_error)
-
Validate SSR calculation methodology
-
Multi-Backend Validation [LOW PRIORITY]
- Repeat on ibm_torino, ibm_marrakesh for backend diversity
- Compare: qubit quality vs. performance, calibration drift effects
- Document backend selection criteria for Phase 2 campaigns
Phase 2 Follow-Ups (Dec 2025 - Jan 2026)¶
- Hardware Campaign #1:
- Blocked time window (e.g., 8-hour session)
- Run S-T03, S-T04, C-T02, B-T02 in sequence
-
Control for calibration drift (all within 6-hour window)
-
Noise Model Validation:
- Compare observed errors to IBM calibration predictions
- Develop predictive model: estimated_error(observable, qubits, calibration_data)
-
Use for shot allocation in v3 adaptive sampling
-
Cross-Provider Comparison (Phase 4):
- Run identical 3-qubit GHZ on AWS Braket
- Compare: IBM vs. AWS noise profiles, SSR portability
- Validate backend-agnostic provenance schema
Part of Phase 1 Research Plan¶
This experiment is the hardware validation gate in the Phase 1 execution chain:
SMOKE-SIM ─✓─> SMOKE-HW ─✓─> Extended Validation
(17.37×) (this) │
├─> S-T01 (≥10 trials, 500 shadows)
├─> S-T02 (v1 + MEM)
├─> C-T01 (H₂ chemistry)
├─> O-T01 (QAOA MAX-CUT)
├─> B-T01 (RB/XEB)
└─> M-T01 (GHZ phase sensing)
Phase 1 Status: - ✅ SMOKE-SIM: Passed (SSR=17.37×, CI coverage=100%) - ✅ SMOKE-HW: Passed (hardware integration validated) - 🔄 S-T01/S-T02: In progress (awaiting extended validation) - 🔄 C/O/B/M: Ready to launch (hardware access proven)
Lessons Learned¶
Operational Insights¶
- Backend Selection Matters: ibm_fez (77 pending jobs) vs. ibm_brisbane (3175 jobs) → 5 min vs. hours queue wait
- Calibration Freshness Critical: Experiment run < 15 min after calibration → high-quality data
- Qubit Topology Affects Observables: ZZI (qubits 0-1) vs. ZIZ (qubits 0-2) show 46× difference in noise
- 100 Shadows Too Few: Need 500+ for reliable SSR measurement in noisy regime
Technical Insights¶
- Provenance System Scales: Manifest generation works identically for simulator vs. hardware
- IBM Calibration API Stable: Backend snapshot capture reliable across multiple runs
- Bootstrap CI Calibration: Confidence intervals correctly reflect sampling uncertainty (not hardware noise)
- Noise Budget Dominates: Hardware noise (46%) >> shot noise (~10%) at 100-shadow scale
Process Improvements for S-T01¶
- Pre-Calibration Check: Always run
quartumse runtime-statusto confirm fresh calibration - Baseline Concurrent Execution: Run direct measurement alongside shadows for true SSR
- Multiple Random Seeds: Test 3-5 seeds to assess seed-dependent variance
- Incremental Shadow Budgets: Test 100, 200, 500, 1000 to characterize SSR vs. shots curve
Final Assessment¶
The Hardware Smoke Test successfully validates QuartumSE's production readiness for IBM Quantum hardware: - ✅ Integration: End-to-end workflow validated - ✅ Provenance: Full calibration capture working - ✅ Execution: Runtime targets met (7.82s for 100 shadows) - ⚠️ Performance: SSR ≥ 1.1× not yet demonstrated (need S-T01/S-T02)
Recommendation: ✅ APPROVE progression to extended validation experiments (S-T01, S-T02) and cross-workstream launches (C-T01, O-T01, B-T01, M-T01). Hardware integration risk eliminated.
Risk Level: LOW for infrastructure, MODERATE for SSR target achievement (mitigable with larger shadow budgets + MEM).
Phase 1 Gate Review Readiness: SMOKE-HW + C-T01 provide sufficient hardware validation evidence. S-T01/S-T02 will complete SSR ≥ 1.1× demonstration for full Phase 1 closure.
Document Version: 1.0 Last Updated: November 3, 2025 Next Review: After S-T01 completion