Configuration¶
Overview¶
Configuration is defined by config.py and values are stored in YAML files within the configs/
directory. Configuration files can include other configuration files using the !include directive.
Each configuration file is associated with a Pydantic model — you can generate JSON schemas
for them with uv run src/ocean_emulators/config_schema.py (which is run automatically in pre-commit).
To associate a configuration file with a Pydantic model, generate the JSON schema (if it doesn't
already exist) and then add this line to the top of the config file:
This is what the config_schema.py script uses to determine which model to validate against,
and also enables autocomplete/type checking in VS Code via the YAML extension.
Command Line Configuration¶
The train and eval modules accept the configuration file as a positional argument.
You can override arbitrary keys on the command line — see --help for details. When overriding
an object (as opposed to a single scalar value) via the command line, you can either supply JSON
like --data '{"key": "value"}' or a YAML file with a leading @ symbol: --data @configs/data/file.yaml.
Training runs create a YAML file in the checkpoint directory with the final configuration used which
you can use to reproduce the run by passing to train e.g. uv run -m ocean_emulators.train path/to/config.yaml.
API Reference¶
ocean_emulators.config
¶
JulianDate(s)
¶
Represents a Julian date as a cftime.datetime at noon on the relevant day.
This is the format the OM4 data uses, so we match that here. TODO(jder): probably worth asserting the date format when opening the data.
Source code in src/ocean_emulators/config.py
TimeConfig
¶
Bases: BaseConfig
Represents a time slice of the data.
Endpoints are Julian dates (not times) but cftime stores them in datetimes. The final endpoint is exclusive.
overlaps(other)
¶
Check if this time range overlaps with another time range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
Self
|
Another TimeConfig to check for overlap |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the time ranges overlap, False otherwise |
Source code in src/ocean_emulators/config.py
PerceiverConfig
¶
Bases: BaseConfig
A standard config interface to various perceiver implementations.
Builds either a regular Perceiver (for the encoder, via build) or a
PerceiverIO (for the decoder, via build_io). Both respect the shared
implementation setting from FOMOConfig.perceiver_implementation.
build(in_channels, out_channels, max_patch_size, implementation)
¶
Build a regular Perceiver (used by the encoder).
Source code in src/ocean_emulators/config.py
build_io(in_channels, queries_dim, out_channels, implementation)
¶
Build a PerceiverIO (used by the decoder).
Source code in src/ocean_emulators/config.py
DecoderConfig
¶
Bases: BaseConfig
A PerceiverIO-based decoder configuration.
Uses PerceiverIO (with an explicit query mechanism) rather than a regular
Perceiver. Output pixel positions are encoded as queries, so the output
size is determined by the query count — not by num_latents.
When window_patches is set, the decoder tiles the output grid into
spatial blocks of that many patches per side. Each block's PerceiverIO
call receives only the overlapping latent tokens plus context_patches
extra rings of neighbors, keeping cost bounded even when the latent grid
is large (i.e. fine patch_extent).
ocean_emulators.config_base
¶
BaseConfig
¶
Bases: BaseModel
Base class for all configs.
TopLevelConfig(*args, **kwargs)
¶
Bases: BaseSettings
Base class for top-level configs (ie tasks like train or eval).
Source code in src/ocean_emulators/config_base.py
from_yaml_and_cli(args_to_parse=None)
classmethod
¶
Load config from YAML & CLI with validation.
Source code in src/ocean_emulators/config_base.py
IncludeYamlCliSettingsSource(*args, **kwargs)
¶
register_include_constructor()
¶
Set up yaml.safe_load to include other yaml files via !include.