Phase 1 Reference Simulation Procedure¶
This document defines the procedure used during Phase 1 to generate and maintain reusable reference datasets for QuartumSE experiments. The goal is to capture a high-fidelity simulator baseline that downstream experiments can replay without resampling, ensuring that shot budgets are preserved for hardware executions.
Overview¶
Reference runs are executed on the Qiskit aer_simulator backend using the
classical-shadows estimator. Artifacts are stored under data/:
| Artifact | Location |
|---|---|
| Provenance manifests | data/manifests/ |
| Persisted shadow measurement parquet | data/shots/ |
| Reference manifest index | data/manifests/reference_index.json |
Each manifest contains a metadata.reference_dataset block and is tagged with
reference-dataset plus scenario-specific tags (e.g., phase1, ghz). The
index file is used by experiment scripts to look up a reference run before
generating new data.
Simulator configuration¶
Phase 1 reference runs use the following simulator settings:
- Backend:
aer_simulator(default configuration, single shot queue) - Shadow version: Baseline
v0for noise-free references andv1when MEM calibration data is required. - Random seed: Fixed to
42to guarantee deterministic measurement bases. - Measurement ensemble: Local random Clifford rotations (default for
ShadowConfig). - Output precision: 64-bit floating point expectations with 95% confidence intervals generated by the estimator.
Shot counts¶
| Scenario | Measurement shots | Calibration shots | Notes |
|---|---|---|---|
GHZ reference (v0) |
4,096 | 0 | Three-qubit GHZ prepared via CNOT ladder. |
GHZ reference (v1 + MEM) |
4,096 | 4,096 | MEM shots allocated as 512 per basis state on three qubits. |
Tip: When a configuration includes MEM calibration the calibration shots are recorded inside the manifest metadata and will be skipped when an existing reference run is replayed.
Naming and metadata¶
Reference datasets are keyed by a slug that uniquely identifies the scenario. Recommended convention:
{phase}-{circuit}-{variant}-n{num_qubits}-s{measurement_shots}
Example: phase1-ghz-v0-n3-s4096.
When a reference run completes, the manifest metadata is populated with:
"reference_dataset": {
"slug": "phase1-ghz-v0-n3-s4096",
"phase": "phase1",
"experiment": "ghz-reference",
"run_name": "ghz-3q-baseline",
"variant": "v0",
"backend_descriptor": "aer_simulator",
"shadow_size": 4096,
"num_qubits": 3,
"observable_count": 5,
"calibration_shots": 0,
"registered_at": "<UTC timestamp>",
"last_used_at": "<UTC timestamp>",
"tags": ["phase1", "reference", "ghz"]
}
The manifest tags field is also extended with the reference-dataset marker
and all scenario tags, allowing simple filtering.
Execution steps¶
- Prepare configuration – Either use the default YAML configuration embedded in the template CLI or author a custom config file describing the runs (see below).
- Invoke the template CLI – Run
python experiments/reference/run_phase1_reference.pywith optional overrides such as--config custom.ymlor--backend aer_simulator. - Replay when available – The CLI (through
ReferenceDatasetRegistry) checksreference_index.jsonand manifest metadata. If an entry with the requested slug exists, it is replayed via the estimator without queuing new shots. - Review outputs – The CLI prints the manifest and shot data paths for
each run. Additional summary files can be generated later using
quartumse reportagainst the saved manifests.
Configuration schema¶
Custom configurations are authored as YAML or JSON with the shape:
experiment_name: ghz-reference
phase: phase1
default_tags: [phase1, reference]
runs:
- name: ghz-3q-baseline
reference_slug: phase1-ghz-v0-n3-s4096
circuit: ghz
num_qubits: 3
variant: v0
shadow_size: 4096
backend: aer_simulator
tags: [ghz]
observables:
- pauli: ZII
- pauli: IZI
- pauli: IIZ
- pauli: ZZI
- pauli: ZZZ
- name: ghz-3q-mem
reference_slug: phase1-ghz-v1-n3-s4096
circuit: ghz
num_qubits: 3
variant: v1
shadow_size: 4096
mem_shots: 512
backend: aer_simulator
tags: [ghz, mem]
observables: *same_as_above
Each run entry is processed independently. Omitting mem_shots defaults to the
CLI argument (512). The ReferenceDatasetRegistry will add bookkeeping fields
such as registered_at and last_used_at automatically.
Manifest index maintenance¶
- The registry updates
reference_index.jsonon every successful run. - Stale entries are pruned automatically if the manifest file is removed.
- When editing manifests manually, keep the
reference_dataset.slugfield in sync with the filename or update the index by rerunning the CLI with--forceto regenerate the dataset.
Following this procedure guarantees that all Phase 1 simulators share the same reference baselines and that manifests carry enough metadata for downstream automation to reason about provenance and reuse.