Hydra: Advanced | Haobin Tan

Hydra: Advanced

Extend Configs

A common pattern is to extend an existing config, overriding and/or adding new config values to it. The extension is done by

including the base configuration, and then
overriding the chosen values in the current config.

Example: Extending a config from the same config group

config.yaml

defaults:
  - db: mysql

db/mysql.yaml

defaults:
  - base_mysql  # We extend mysql on the base of base_mysql

user: omry
password: secret
port: 3307
encoding: utf8

db/base_mysql.yaml

host: localhost
port: 3306
user: ???
password: ???

$ python my_app.py
db:
  host: localhost   # from db/base_mysql
  port: 3307        # overridden by db/mysql.yaml 
  user: omry        # populated by db/mysql.yaml
  password: secret  # populated by db/mysql.yaml
  encoding: utf8    # added by db/mysql.yaml

Configuring Experiments

To clearly support multiple configurations, each configuration file only specifies the changes to the master (default) configuration.

Example:

The default configuration is:

The benchmark config files specify the deltas from the default configuration:

Key concepts:

# @package _global_
Changes specified in this config should be interpreted as relative to the _global_ package.
We could instead place nglite.yaml and aplite.yaml next to config.yaml and omit this line.
The overrides of /db and /server are absolute paths.
This is necessary because they are outside of the experiment directory.

Running the experiments from the command line requires prefixing the experiment choice with a +.

$ python .\train.py +experiment=aplite

Sweeping over experiments

This approach also enables sweeping over those experiments:

$ python my_app.py --multirun +experiment=aplite,nglite

To run all the experiments, use the glob syntax:

$ python my_app.py --multirun '+experiment=glob(*)'

Specializing Configuration

In some cases the desired configuration should depend on other configuration choices.

Example:

You may want to use only 5 layers in your Alexnet model if the dataset of choice is cifar10, and the default 7 otherwise.

We can start with a config that looks like this:

# config.yaml
defaults:
  - dataset: imagenet
  - model: alexnet

We want to specialize the config based on the choice of the selected dataset and model.

OmegaConf supports value interpolation, we can construct a value that would - at runtime - be a function of other values. The idea is that we can add another element to the defaults list that would load a file name that depends on those two values.

Modify config.yaml:

defaults:
  - dataset: imagenet
  - model: alexnet
  - optional dataset_model: ${dataset}_${model}

The key dataset_model is an arbitrary directory, it can be anything unique that makes sense, including nested directory like dataset/model.
${dataset}_${model} is using OmegaConf’s variable interpolation syntax. At runtime, that value would resolve to imagenet_alexnet, or cifar_resnet - depending on the values of defaults.dataset and defaults.model.
optional: By default, Hydra fails with an error if a config specified in the defaults does not exist. In this case we only want to specialize cifar10 + alexnet, not all 4 combinations. the keyword optional tells Hydra to just continue if it can’t find this file.

Code example see: hydra/examples/patterns/specializing_config at main · facebookresearch/hydra · GitHub

Configuring Hydra

Hydra is highly configurable. Many of its aspects and subsystems can be configured, including:

The Launcher
The Sweeper
Logging
Output directory patterns
Application help (–help and –hydra-help)

You can include some Hydra config snippet in your own config to override it directly, or compose in different configurations provided by plugins or by your own code. You can also override everything in Hydra from the command line just like with your own configuration.

Accessing the Hydra config

Hydra is passing to the function annotated by @hydra.main(). Two ways to access the Hydra config:

In your config, using the hydra resolver:
```
config_name: ${hydra:job.name}
```
The resolver name is hydra, and the key is passed after the colon.

In code, using the HydraConfig singleton

from hydra.core.hydra_config import HydraConfig

@hydra.main()
def my_app(cfg: DictConfig) -> None:
    print(HydraConfig.get().job.name)

The following variables are populated at runtime:

hydra.job: used for configuring some aspects of your job (more information see: Job Configuration)
hydra.run: Used in single-run mode (i.e. when the --multirun command-line flag is omitted). See configuration for run.
hydra.sweep: Used in multi-run mode (i.e. when the --multirun command-line flag is given) See configuration for multirun.
hydra.runtime: Fields under hydra.runtime are populated automatically and should NOT be overridden.
hydra.overrides: Fields under hydra.overrides are populated automatically and should not be overridden.
hydra.mode

For other fields that are present also at the top level of the Hydra Config see: Other Hydra settings.

Resolvers provided by Hydra

hydra: Interpolates into the hydra config node. e.g. Use ${hydra:job.name} to get the Hydra job name.
now: Creates a string representing the current time using strftime. e.g. for formatting the time you can use something like${now:%H-%M-%S}.
python_version: Return a string representing the runtime python version by calling sys.version_info. Takes an optional argument of a string with the values major, minor or macro. e.g:

Hydra + wandb

See: Configuring W&B Projects with Hydra

Reference

Hydra official tutorial

Last updated on 2024-09-05

← Hydra: Basics 2023-02-23