Automated Experiment Pipeline (Phase-1)¶
The Phase-1 pipeline automates the reproducible baseline → shadows v0 → shadows v1
runs, aggregates metrics, and emits artefacts that can be replayed later. Use this
guide when you need to run, extend, or debug the experiments/pipeline package.
Metadata schema¶
experiments/pipeline/metadata_schema.py defines the structured metadata that every
pipeline run consumes. Provide the fields below in YAML or JSON (see
experiments/shadows/examples/extended_ghz/experiment_metadata.yaml for a template):
| Field | Type | Purpose |
|---|---|---|
experiment |
string | Human-readable experiment name that is echoed in manifests and reports. |
context |
string | One-paragraph background describing why the run exists. |
aims |
list[string] | Bullet points for the primary questions the run answers. |
success_criteria |
list[string] | Quantitative exit checks (used to infer default targets). |
methods |
list[string] | High-level procedure summary for provenance. |
budget.total_shots |
int | Equal-budget anchor: total shots to spend per approach (baseline, v0, v1). |
budget.calibration.shots_per_state |
int | Shots to allocate to each computational-basis state during MEM calibration. |
budget.calibration.total |
int | Total calibration shots (must equal shots_per_state * 2^n). |
budget.v0_shadow_size |
int | Measurement budget for the Phase-1 shadows v0 reference. |
budget.v1_shadow_size |
int | Measurement budget for the Phase-1 shadows v1 + MEM run (not including calibration shots). |
device |
string | Default backend descriptor (e.g. aer_simulator or ibm:ibm_brisbane). |
discussion_template |
string | Markdown template pre-populated in the generated HTML report. |
num_qubits |
int (optional) | Override for inferred qubit count (normally derived from the calibration budget). |
targets |
map[string, number] (optional) | Explicit metric thresholds (ssr_average, ci_coverage, etc.). |
ground_truth |
mapping (optional) | Observable → expectation pairs for MAE/CI/SSR calculations. |
Tip: Leave
targetsblank if the success criteria already specify SSR/CI thresholds—run_full_pipeline.pyautomatically parses numbers from those strings.
Phase-1 equal-budget rules¶
Phase-1 comparisons assume every approach consumes the same total shot budget and distributes resources uniformly across observables:
- Stage 1 (direct Pauli baseline) divides
budget.total_shotsevenly across the stabiliser observables using_allocate_shotsinexperiments/pipeline/_runners.py. - Stage 2 (
ShadowVersion.V0_BASELINE) records the exactbudget.v0_shadow_sizemeasurements with mitigation disabled. - Stage 3 (
ShadowVersion.V1_NOISE_AWARE) spendsbudget.v1_shadow_sizeon measurement shots and reuses calibration snapshots so thatcalibration.total + v1_shadow_size = budget.total_shots. - The analysis layer (
experiments/pipeline/analyzer.py) uses the equal-budget metrics fromsrc/quartumse/utils/metrics.py(compute_mae,compute_ci_coverage, andcompute_ssr_equal_budget) that treat each observable with uniform weight.
Violating the equal-budget assumption (for example, by changing the shot allocator or by skipping calibration reuse) invalidates SSR/CI comparisons, so keep the metadata and runner defaults aligned when you extend the pipeline.
Calibration reuse and refresh windows¶
The pipeline and CLI share the ReadoutCalibrationManager so that measurement error
mitigation (MEM) snapshots are only regenerated when required:
Unix/macOS:
# Reuse cached confusion matrices unless forced or stale
quartumse calibrate-readout \
--backend ibm:ibm_brisbane \
--qubit 0 --qubit 1 --qubit 2 --qubit 3 \
--shots 256 \
--output-dir validation_data/calibrations \
--max-age-hours 6
Windows:
# Reuse cached confusion matrices unless forced or stale
quartumse calibrate-readout `
--backend ibm:ibm_brisbane `
--qubit 0 --qubit 1 --qubit 2 --qubit 3 `
--shots 256 `
--output-dir validation_data/calibrations `
--max-age-hours 6
Key behaviours (src/quartumse/cli.py):
- Reuse by default:
ensure_calibrationreturns cached matrices and marks the manifest as"reused": truewhen the existing artefact matches the qubit set and backend descriptor. --max-age-hours: Provide a float to refresh calibrations that are older than the allowed window. Pass--forceto ignore age checks entirely.- Manifest trail: Each calibration writes
<confusion>.manifest.jsonalongside the.npz, recording backend version, shot counts, and reuse flags for provenance.
The automated pipeline (experiments/pipeline/executor.py) points its calibration
manager at <output>/calibrations/ so reruns in the same directory automatically
reuse MEM artefacts as long as the age/force constraints permit.
Running the CLI pipeline¶
Use the run_full_pipeline.py entrypoint to execute the three stages, verify
artefacts, and render a report.
Simulator (Aer)¶
Unix/macOS:
python -m experiments.pipeline.run_full_pipeline \
--metadata experiments/shadows/examples/extended_ghz/experiment_metadata.yaml \
--output validation_data/pipeline_runs/ghz4_aer \
--backend aer_simulator
Windows:
python -m experiments.pipeline.run_full_pipeline `
--metadata experiments/shadows/examples/extended_ghz/experiment_metadata.yaml `
--output validation_data/pipeline_runs/ghz4_aer `
--backend aer_simulator
- Produces manifests under
validation_data/pipeline_runs/ghz4_aer/manifests/. - Stores calibration artefacts in
validation_data/pipeline_runs/ghz4_aer/calibrations/. - Writes the result digest (
result_hash.txt), analysis summary, and HTML report to the same directory.
IBM Quantum hardware¶
Unix/macOS:
export QISKIT_IBM_TOKEN="<your-runtime-token>"
python -m experiments.pipeline.run_full_pipeline \
--metadata experiments/shadows/examples/extended_ghz/experiment_metadata.yaml \
--output data/pipeline_runs/ghz4_kyoto \
--backend ibm:ibm_kyoto
Windows (PowerShell):
$env:QISKIT_IBM_TOKEN="<your-runtime-token>"
python -m experiments.pipeline.run_full_pipeline `
--metadata experiments/shadows/examples/extended_ghz/experiment_metadata.yaml `
--output data/pipeline_runs/ghz4_kyoto `
--backend ibm:ibm_kyoto
Windows (Command Prompt):
set QISKIT_IBM_TOKEN=<your-runtime-token>
python -m experiments.pipeline.run_full_pipeline ^
--metadata experiments/shadows/examples/extended_ghz/experiment_metadata.yaml ^
--output data/pipeline_runs/ghz4_kyoto ^
--backend ibm:ibm_kyoto
- Set
QISKIT_RUNTIME_API_TOKEN/QISKIT_IBM_CHANNEL/QISKIT_IBM_INSTANCEif your hub requires them (seequartumse connect ibmhints insrc/quartumse/cli.py). - Hardware runs honour the same equal-budget accounting—MEM calibration shots are deducted from the total and reused when possible.
- The output directory is Git-ignored; move final manifests to
data/manifests/when publishing results.
To override the backend encoded in the metadata file, pass a different --backend
value. Omit the flag to use metadata.device as-is.
Artefacts and replay workflow¶
After a pipeline run completes, inspect the output directory:
Unix/macOS:
ls -R validation_data/pipeline_runs/ghz4_aer
cat validation_data/pipeline_runs/ghz4_aer/report_*.html | head
Windows:
Get-ChildItem -Recurse validation_data/pipeline_runs/ghz4_aer
Get-Content validation_data/pipeline_runs/ghz4_aer/report_*.html | Select-Object -First 10
You should see:
manifests/– baseline, v0, and v1 JSON manifests (Stage 1–3).calibrations/– MEM confusion matrices plus.manifest.jsonmetadata (only when Phase-1 runs require mitigation).analysis_<hash>.json– Aggregated metrics (ssr_average,ci_coverage, MAE) and target evaluation flags.report_<hash>.html– Full Phase-1 report ready for review or screenshots.result_hash.txt– Stable digest derived from the manifest payloads.
Replaying artefacts does not require backend access:
Unix/macOS:
# Regenerate the HTML report after editing metadata or narrative sections
quartumse report validation_data/pipeline_runs/ghz4_aer/manifests/<manifest>.json \
--output validation_data/pipeline_runs/ghz4_aer/replay_report.html
# Programmatic replay: recompute observables from saved manifests
python - <<'PY'
from quartumse.estimator import ShadowEstimator
from quartumse.reporting.manifest import ProvenanceManifest
manifest_path = "validation_data/pipeline_runs/ghz4_aer/manifests/<manifest>.json"
manifest = ProvenanceManifest.from_json(manifest_path)
estimator = ShadowEstimator.replay_from_manifest(manifest)
print(estimator)
PY
Windows:
# Regenerate the HTML report after editing metadata or narrative sections
quartumse report validation_data/pipeline_runs/ghz4_aer/manifests/<manifest>.json `
--output validation_data/pipeline_runs/ghz4_aer/replay_report.html
# Programmatic replay: recompute observables from saved manifests
python -c "from quartumse.estimator import ShadowEstimator; from quartumse.reporting.manifest import ProvenanceManifest; manifest_path = 'validation_data/pipeline_runs/ghz4_aer/manifests/<manifest>.json'; manifest = ProvenanceManifest.from_json(manifest_path); estimator = ShadowEstimator.replay_from_manifest(manifest); print(estimator)"
The verifier stage (experiments/pipeline/verifier.py) checks that shot files and MEM
confusion matrices still exist and match their checksums. If you relocate artefacts,
update manifest.schema.shot_data_path or keep the directory layout intact so replay
continues to pass.
Need more automation? Extend experiments/pipeline/executor.py with new stages, but
keep the metadata schema and equal-budget invariants intact so downstream analysis and
reports remain comparable.
Connect the Dots¶
- Log new pipeline stages or campaign results in the experiment tracker.
- Share automation patterns or feature requests in the community hub.
- Reference supporting research in the literature library when proposing major workflow changes.