Skip to content

Phase 1 Reference Simulation Procedure

This document defines the procedure used during Phase 1 to generate and maintain reusable reference datasets for QuartumSE experiments. The goal is to capture a high-fidelity simulator baseline that downstream experiments can replay without resampling, ensuring that shot budgets are preserved for hardware executions.

Overview

Reference runs are executed on the Qiskit aer_simulator backend using the classical-shadows estimator. Artifacts are stored under data/:

Artifact Location
Provenance manifests data/manifests/
Persisted shadow measurement parquet data/shots/
Reference manifest index data/manifests/reference_index.json

Each manifest contains a metadata.reference_dataset block and is tagged with reference-dataset plus scenario-specific tags (e.g., phase1, ghz). The index file is used by experiment scripts to look up a reference run before generating new data.

Simulator configuration

Phase 1 reference runs use the following simulator settings:

  • Backend: aer_simulator (default configuration, single shot queue)
  • Shadow version: Baseline v0 for noise-free references and v1 when MEM calibration data is required.
  • Random seed: Fixed to 42 to guarantee deterministic measurement bases.
  • Measurement ensemble: Local random Clifford rotations (default for ShadowConfig).
  • Output precision: 64-bit floating point expectations with 95% confidence intervals generated by the estimator.

Shot counts

Scenario Measurement shots Calibration shots Notes
GHZ reference (v0) 4,096 0 Three-qubit GHZ prepared via CNOT ladder.
GHZ reference (v1 + MEM) 4,096 4,096 MEM shots allocated as 512 per basis state on three qubits.

Tip: When a configuration includes MEM calibration the calibration shots are recorded inside the manifest metadata and will be skipped when an existing reference run is replayed.

Naming and metadata

Reference datasets are keyed by a slug that uniquely identifies the scenario. Recommended convention:

{phase}-{circuit}-{variant}-n{num_qubits}-s{measurement_shots}

Example: phase1-ghz-v0-n3-s4096.

When a reference run completes, the manifest metadata is populated with:

"reference_dataset": {
  "slug": "phase1-ghz-v0-n3-s4096",
  "phase": "phase1",
  "experiment": "ghz-reference",
  "run_name": "ghz-3q-baseline",
  "variant": "v0",
  "backend_descriptor": "aer_simulator",
  "shadow_size": 4096,
  "num_qubits": 3,
  "observable_count": 5,
  "calibration_shots": 0,
  "registered_at": "<UTC timestamp>",
  "last_used_at": "<UTC timestamp>",
  "tags": ["phase1", "reference", "ghz"]
}

The manifest tags field is also extended with the reference-dataset marker and all scenario tags, allowing simple filtering.

Execution steps

  1. Prepare configuration – Either use the default YAML configuration embedded in the template CLI or author a custom config file describing the runs (see below).
  2. Invoke the template CLI – Run python experiments/reference/run_phase1_reference.py with optional overrides such as --config custom.yml or --backend aer_simulator.
  3. Replay when available – The CLI (through ReferenceDatasetRegistry) checks reference_index.json and manifest metadata. If an entry with the requested slug exists, it is replayed via the estimator without queuing new shots.
  4. Review outputs – The CLI prints the manifest and shot data paths for each run. Additional summary files can be generated later using quartumse report against the saved manifests.

Configuration schema

Custom configurations are authored as YAML or JSON with the shape:

experiment_name: ghz-reference
phase: phase1
default_tags: [phase1, reference]
runs:
  - name: ghz-3q-baseline
    reference_slug: phase1-ghz-v0-n3-s4096
    circuit: ghz
    num_qubits: 3
    variant: v0
    shadow_size: 4096
    backend: aer_simulator
    tags: [ghz]
    observables:
      - pauli: ZII
      - pauli: IZI
      - pauli: IIZ
      - pauli: ZZI
      - pauli: ZZZ

  - name: ghz-3q-mem
    reference_slug: phase1-ghz-v1-n3-s4096
    circuit: ghz
    num_qubits: 3
    variant: v1
    shadow_size: 4096
    mem_shots: 512
    backend: aer_simulator
    tags: [ghz, mem]
    observables: *same_as_above

Each run entry is processed independently. Omitting mem_shots defaults to the CLI argument (512). The ReferenceDatasetRegistry will add bookkeeping fields such as registered_at and last_used_at automatically.

Manifest index maintenance

  • The registry updates reference_index.json on every successful run.
  • Stale entries are pruned automatically if the manifest file is removed.
  • When editing manifests manually, keep the reference_dataset.slug field in sync with the filename or update the index by rerunning the CLI with --force to regenerate the dataset.

Following this procedure guarantees that all Phase 1 simulators share the same reference baselines and that manifests carry enough metadata for downstream automation to reason about provenance and reuse.