# Writing a Config File QUMPHY trains models from a single YAML config that is parsed by [`app/train.py`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/train.py) and turned into a Lightning training run by {py:class}`qumphy.trainer.Trainer`. The config describes **what to build** (model, data module, trainer, callbacks, …), not **how to run it** from the `tasks:` section of the config `fit`, `test`, `predict`. The best starting points are the up-to-date templates under [`app/configs/D2paper/`](https://gitlab.com/qumphy/qumphy-software/-/tree/main/app/configs/D2paper). ## The core idea: `class_path` + `init_args` Every component in a QUMPHY config is constructed from the same shape: ```yaml class_path: dotted.path.to.SomeClass init_args: some_kwarg: value another_kwarg: 42 ``` At runtime, {py:func}`qumphy.misc.misc.instantiate_class` imports the class named by `class_path` and instantiates it with the keyword arguments in `init_args`. This is how Lightning's `cli`-style configs work, and it applies uniformly to optimizers, loss functions, networks, callbacks, loggers, and the trainer itself. ### Nesting components via `classes` Many objects take other objects as constructor arguments — a model needs a network, a loss function, optionally an output activation; a trainer needs callbacks and a logger. Those nested objects are listed under a sibling `classes:` key: ```yaml class_path: qumphy.models.pulsedb.PulseDBModule init_args: optimizer: class_path: torch.optim.AdamW init_args: lr: 1.e-5 weight_decay: 1.e-3 classes: - keyword: net class_path: qumphy.models.alexnet.AlexNet1D init_args: input_size: 1250 output_size: 4 - keyword: loss_fn class_path: qumphy.models.pulsedb.PulseDBGaussianLoss init_args: num_distributions: 2 ``` Each entry under `classes` carries a `keyword:` field — that's the constructor argument name the instantiated object is passed to (so `net=AlexNet1D(...)`, `loss_fn=PulseDBGaussianLoss(...)`). If the same `keyword` appears multiple times, the instances are collected into a list. This is how the trainer ends up with several callbacks (see the trainer example below). `init_args: {}` is a valid empty value — use it when a class takes no arguments. ## Top-level structure A complete training config has these top-level keys: | Key | Required | What it is | |---|---|---| | `model` | yes | LightningModule definition (class + nested net, loss, etc.) | | `data` | yes | LightningDataModule definition | | `trainer` | yes | `lightning.Trainer` + its callbacks and logger | | `ensemble` | no | Wraps the model + trainer in an N-member ensemble | | `find_lr` | no | If `True`, run Lightning's LR finder before training | | `ckpt_path` | no | Path to a checkpoint to resume from (or per-member list for ensembles) | | `seed_everything` | no | Integer seed; defaults to `None` (no seeding) | | `feature_extractor` | no | Optional pre-trained feature extractor applied to the data | | `sweep_parameters` | no | Used by W&B sweeps to map top-level keys into nested locations | `tasks` is consumed by the trainer to decide what to do (`fit`, `test`, `predict`); CLI flags (`--fit`, `--test`, `--predict`) act as additional toggles in [`app/train.py`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/train.py). ## Walkthrough: a deep-ensemble regressor for PulseDB The file [`app/configs/D2paper/calib_alexnet.yaml`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/configs/D2paper/calib_alexnet.yaml) is a good reference. We'll go through it block by block. ### Header ```yaml find_lr: False ckpt_path: null ``` `find_lr` runs Lightning's learning-rate finder when `True`. `ckpt_path: null` means start from scratch; provide a string path here to resume. ### `model` ```yaml model: class_path: qumphy.models.pulsedb.PulseDBModule init_args: optimizer: class_path: torch.optim.AdamW init_args: lr: 1.e-5 weight_decay: 1.e-3 classes: - keyword: net class_path: qumphy.models.alexnet.AlexNet1D init_args: input_size: 1250 output_size: 4 - keyword: dataset class_path: qumphy.data.pulsedb.PulseDBDataModule init_args: data_directory: /gpu-scratch/pfeffe01/pulsedb batch_size: 32 num_workers: 1 dataset: calib source: vital load_data: False - keyword: loss_fn class_path: qumphy.models.pulsedb.PulseDBGaussianLoss init_args: num_distributions: 2 ``` Notes: - The optimizer (and an optional `lr_scheduler:` block) sit *inside* `init_args`, not under `classes`, because PyTorch optimizer objects are passed as ordinary keyword arguments to the LightningModule. - `net.output_size: 4` matches `loss_fn.num_distributions: 2` — the Gaussian loss expects two parameters (`mu`, `sigma`) per output target, and PulseDB calibration has two targets (SBP, DBP). For pinball loss the output size is `len(quantiles) * num_targets` instead. - The `dataset:` sub-class inside `model` is the data module used for internal calibration / normalisation only; the *actual* training data module is configured separately under the top-level `data:` key (see below). #### Adding a learning-rate scheduler ```yaml init_args: optimizer: { ... } lr_scheduler: class_path: torch.optim.lr_scheduler.ReduceLROnPlateau init_args: mode: max factor: 0.5 patience: 8 config: monitor: val_auc ``` The extra `config:` block at the scheduler level holds the Lightning scheduler config (which metric to monitor, etc.), separate from the constructor arguments. See [`app/configs/D2paper/deepbeat_alexnet.yaml`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/configs/D2paper/deepbeat_alexnet.yaml). ### `trainer` ```yaml trainer: class_path: lightning.Trainer init_args: accelerator: auto fast_dev_run: False max_epochs: 100 overfit_batches: 0 log_every_n_steps: 1 precision: "32" default_root_dir: /…/logs/ classes: - keyword: callbacks class_path: qumphy.callbacks.pulsedb.PulseDBLogging_Ensemble init_args: log_quantities: ["mae", "std"] log_pressure: "both" save_predictions: True - keyword: logger class_path: lightning.pytorch.loggers.WandbLogger init_args: save_dir: /…/logs/ offline: True project: calib name: alexnet - keyword: callbacks class_path: lightning.pytorch.callbacks.early_stopping.EarlyStopping init_args: monitor: val_loss patience: 15 mode: min - keyword: callbacks class_path: qumphy.callbacks.progressbar.EpochProgressBar init_args: {} - keyword: callbacks class_path: lightning.pytorch.callbacks.ModelCheckpoint init_args: monitor: val_loss mode: min save_last: True save_top_k: 1 ``` The four entries with `keyword: callbacks` are collected into a list and passed as `callbacks=[...]` to `lightning.Trainer`. The single `logger:` entry is passed as `logger=WandbLogger(...)`. Replace it with e.g. `lightning.pytorch.loggers.TensorBoardLogger` if you don't use W&B. `fast_dev_run: True` is the quickest way to smoke-test a new config — one batch through fit/validate/test, no checkpoints. ### `data` ```yaml data: class_path: qumphy.data.pulsedb.PulseDBDataModule init_args: data_directory: /gpu-scratch/pfeffe01/pulsedb batch_size: 32 num_workers: 14 dataset: calib source: vital load_data: True ``` Notes: - `data_directory` should point at the local copy of the dataset (PulseDB / DeepBeat). Change this to match your machine. - For PulseDB, valid `dataset:` values include `calib`, `calibfree`, and `mini` (the small dev split). - For DeepBeat, see [`deepbeat_alexnet.yaml`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/configs/D2paper/deepbeat_alexnet.yaml): `dataset: set_revised`, plus a `target_format` field (`class_index` for cross-entropy-style targets, `one_hot` for the KG-loss variants). - `num_workers` should be increased on production runs; the value inside the model's nested `dataset` block can stay low (it's only used for instantiation, not the actual loader). ### `ensemble` ```yaml ensemble: size: 5 class_path: qumphy.models.pulsedb.PulseDBEnsemble init_args: {} ``` When `ensemble:` is present, the trainer creates `size` Lightning trainers + models, fits each independently, then wraps them in the class given by `class_path` for inference. Remove this block to train a single model. For DeepBeat ensembles, the wrapper takes an extra `noise_samples:` argument; see [`deepbeat_alexnet.yaml`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/configs/D2paper/deepbeat_alexnet.yaml). ## Uncertainty-quantification variants Three approaches are represented in `D2paper/`, and each picks a different combination of network + loss: ### 1. Deep ensemble + Gaussian NLL (default) - Network: `qumphy.models.alexnet.AlexNet1D` or `qumphy.models.xresnet1d.XResNet1d50` - Loss: `qumphy.models.pulsedb.PulseDBGaussianLoss(num_distributions: 2)` → predicts `mu` and `sigma`; combined with `ensemble.size: 5` gives a deep ensemble. - Output size: `2 × num_distributions = 4`. Example: [`calib_alexnet.yaml`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/configs/D2paper/calib_alexnet.yaml). ### 2. Monte-Carlo dropout (MCD) - Module: `qumphy.models.pulsedb.PulseDBModule_MCD` with `MCD_samples: 50` (number of stochastic forward passes at inference). - Network: the `_MCD` variant (e.g. `AlexNet1D_MCD`) with `dropout_rate: 0.05` and `mcdropout: True`. - Logging callback: `qumphy.callbacks.pulsedb.PulseDBLogging_MCD`. - No `ensemble:` block — uncertainty comes from dropout sampling. Example: [`calib_alexnet_MCD.yaml`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/configs/D2paper/calib_alexnet_MCD.yaml). ### 3. Quantile / pinball loss - Loss: `qumphy.models.utils.pinballloss.PinballLoss` - Loss args: `quantiles: [0.0228, 0.1587, 0.5, 0.8413, 0.9772]` and `num_targets: 2`. - Network output size = `len(quantiles) * num_targets` (here, `10`). - Logging callback: `qumphy.callbacks.pulsedb.PulseDBLogging_Pinballloss`. - No `ensemble:` block — a single model emits the full quantile vector. Example: [`calib_alexnet_pinball.yaml`](https://gitlab.com/qumphy/qumphy-software/-/blob/main/app/configs/D2paper/calib_alexnet_pinball.yaml). When you change loss family, **three things must agree**: the loss class, the network's `output_size`, and the logging callback under the trainer. Mismatches surface as cryptic shape errors at the first training step. ## Common edits | Goal | Change | |---|---| | Try a quick run | Set `trainer.init_args.fast_dev_run: True` | | Use a different network | Swap the `keyword: net` entry's `class_path` and `init_args` (mind `input_size` / `output_size`) | | Use a different dataset split | Change `data.init_args.dataset` (`calib`, `calibfree`, `mini`, `set_revised`, …) | | Disable W&B | Remove the `WandbLogger` entry or set `offline: True` | | Stop earlier / later | Adjust the `EarlyStopping` `patience` and / or `Trainer.max_epochs` | | Train a single model instead of an ensemble | Delete the `ensemble:` block | | Resume from a checkpoint | Set `ckpt_path:` (string for single model, list of strings for an ensemble) | | Reproducibility | Add `seed_everything: 42` (or any int) at the top level | ## CLI parameter overrides `train.py` accepts `-p key.subkey:value` to overlay values onto the loaded config without editing the YAML. The same syntax can be passed multiple times: ```sh python app/train.py --config app/configs/D2paper/calib_alexnet.yaml \ -p trainer.init_args.max_epochs:5 \ -p data.init_args.batch_size:16 ``` Unknown `--key:value` flags are also accepted (see {py:func}`qumphy.misc.misc.train_argument_parser`). This is the same mechanism the W&B sweep agent uses. ## Validating a config before a long run The fastest sanity check is a single-batch fit: ```yaml trainer: init_args: fast_dev_run: True ``` …or override from the CLI: ```sh python app/train.py --config -p trainer.init_args.fast_dev_run:True ``` If you use the GUI ({doc}`usage`), the **Config (editable)** tab lets you toggle that flag and hit Run without modifying the on-disk file. ## Where to put your own configs `app/configs/` is part of the repository, and most subdirectories (`D1paper/`, `D2paper/`, …) are version-controlled curated sets. For ad-hoc experiments use [`app/configs/personal_configs/`](https://gitlab.com/qumphy/qumphy-software/-/tree/main/app/configs/personal_configs) or [`app/configs/working_configs/`](https://gitlab.com/qumphy/qumphy-software/-/tree/main/app/configs/working_configs), which are ignored by git via `app/configs/.gitignore`.