diff --git a/README.md b/README.md
index 8cc5b438..0095564e 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,28 @@
-# Electrolyte Foundation Model
-Benchmarking RoBERTa model pre-training on molecular datasets.
+# MIST: Molecular Insight SMILES Transformer
+
+<div align="center" display="flex" >
+
+![GitHub License](https://img.shields.io/github/license/BattModels/mist)
+<a href="https://arxiv.org/abs/2510.18900">![arXiv:2409.15370](https://img.shields.io/badge/cs.LG-2409.15370-b31b1b?style=flat&amp;logo=arxiv&amp;logoColor=red)</a>
+[![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm.svg)](https://huggingface.co/mist-models)
+
+</div>
+
+
+MIST is a family of molecular foundation models for molecular property prediction.
+The models were pre-trained on [smirk tokenized](https://github.com/BattModels/smirk) SMILES strings from the [Enamine REAL Space](https://enamine.net/compound-collections/real-compounds/real-space-navigator) dataset using the Masked Language Modeling (MLM) objective, then fine-tuned for downstream prediction tasks.
 
 # Installation
 
 The following provides installation instructions for the top-level package (`electrolyte_fm`), optional add-ons for our
-various additional analysis and downstream applications (See `opt/`) may require additional configuration.
+various additional analysis and downstream applications (See [`./opt`](./opt/) may require additional configuration.
+
+1. Install [uv](https://docs.astral.sh/uv/getting-started/installation/) and [julia](https://julialang.org/downloads/) (only needed for `/opt` tasks)
+2. Instantiate the environment: `uv sync`
+3. Use [`submit/submit.py`](./submit/submit.py) to submit a training job or checkout one of our applications in [`./opt`](./opt)
+
+> You may need to install [rust](https://www.rust-lang.org/tools/install) if pre-built wheels for [smirk](https://github.com/BattModels/smirk) are not available on [PyPi](https://pypi.org/project/smirk/).
+> Feel free to [open an issue](https://github.com/BattModels/smirk/issues) to request additional pre-built wheels.
 
 ## Polaris
 
@@ -34,10 +52,13 @@ Same as above except:
 1. Build the image `bash container/build.sh`, once build relocate the image `mv /tmp/mist.sif ./mist.sif`
 2. Run training within the image `apptainer run --nv mist.sif python train.py ...`
 
-> See `submit/dgx.j2` or `submit/delta.j2` for a more complete example of using the container
+> See [`submit/dgx.j2`](./submit/dgx.j2) or [`submit/delta.j2`](./submit/delta.j2) for a more complete example of using the container
 
 # Submitting Jobs
 
+We use a python script ([`submit/submit.py`](./submit/submit.py)) to template training jobs for submission on HPC systems across multiple sites.
+Templates may need to be modified for your particular HPC cluster, but should provide a starting point.
+
 ```shell
 source ./activate # Activate Environment
 ./submit/submit.py ./submit/polaris.j2 --data ./submit/pretrain.yaml | qsub
@@ -45,6 +66,8 @@ source ./activate # Activate Environment
 
 See `submit/submit.py --help` for more info
 
+> Note: [./activate](./activate) is used to activate the python virtual environment *and* set various environment variables.
+
 # Development
 
 ## Pre-commit
diff --git a/opt/BayesianScaling/Project.toml b/opt/BayesianScaling/Project.toml
index 67cfde76..7841c29b 100644
--- a/opt/BayesianScaling/Project.toml
+++ b/opt/BayesianScaling/Project.toml
@@ -21,6 +21,7 @@ OnlineStats = "a15396b6-48d5-5d58-9928-6d29437db91e"
 Optimization = "7f7a1694-90dd-40f0-9382-eb1efda571ba"
 OptimizationOptimJL = "36348300-93cb-4f02-beb5-3c3902f8871e"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+ReTestItems = "817f1d60-ba6b-4fd5-9520-3cf149f6a823"
 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
 StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
 StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
diff --git a/opt/BayesianScaling/README.md b/opt/BayesianScaling/README.md
new file mode 100644
index 00000000..c8d30acc
--- /dev/null
+++ b/opt/BayesianScaling/README.md
@@ -0,0 +1,20 @@
+# BayesianScaling
+
+A Julia Package for fitting regression models using MCMC, that was used to fit penalized neural scaling laws.
+To install:
+
+- Install Julia: https://julialang.org/downloads/
+- Instantiate the package: `julia --project -e 'using Pkg; Pkg.instantiate()`
+- Download the wandb records or chains ([doi:10.5281/zenodo.17527149](https://doi.org/10.5281/zenodo.17527149))
+
+## Code Organization
+
+- `./scripts/` are used for fitting and analyzing the neural scaling laws.
+- `./plots/` has plotting code for the paper and various conferences
+- `./src` is the MCMC regression and analysis package powering this work
+    - [ppl.jl](./src/ppl.jl): Define a regression first interface for fitting MCMC models,
+        plus single-pass algorithms for working with posterior samples
+    - [scaling.jl](./src/scaling.jl): functional forms for neural scaling laws and derived qualities
+    - [analysis.jl](./src/analysis.jl.jl): Code for predicting the perform of models using fitted neural scaling laws
+- `./test/` has the unit tests for the BayesianScaling.jl package
+- `./benchmark/`: benchmark suite for evaluating different AD backends using [PkgJogger.jl](https://github.com/awadell1/PkgJogger.jl)
diff --git a/opt/BayesianScaling/src/ppl.jl b/opt/BayesianScaling/src/ppl.jl
index 8bfe1e88..34608d79 100644
--- a/opt/BayesianScaling/src/ppl.jl
+++ b/opt/BayesianScaling/src/ppl.jl
@@ -216,7 +216,7 @@ function transform_samples(t::TransformVariables.AbstractTransform, x::Matrix{T}
 end
 
 function transform!(y::AbstractVector, tt::TransformVariables.TransformTuple, x::AbstractVector)
-    (; transformations) = tt
+    transformations = getfield(tt, :inner)
     @assert TransformVariables.dimension(tt) == length(y) == length(x)
     index = firstindex(y)
     for t in transformations
@@ -242,7 +242,7 @@ transform!(y::AbstractVector, t::TransformVariables.AbstractTransform, x::Abstra
 function transfrom_axis(tt::TransformVariables.TransformTuple{<:NamedTuple})
     ax_tt = []
     index = 1
-    for (k, t) in pairs(tt.transformations)
+    for (k, t) in pairs(getfield(tt, :inner))
         ax = transfrom_axis(t)
         n = TransformVariables.dimension(t)
         if ax isa Union{ComponentArrays.ShapedAxis,ComponentArrays.Axis}
diff --git a/opt/FeatureMiner/README.md b/opt/FeatureMiner/README.md
new file mode 100644
index 00000000..7991a850
--- /dev/null
+++ b/opt/FeatureMiner/README.md
@@ -0,0 +1,13 @@
+# Feature Miner
+
+Code for evaluating fitted linear probes for their ability to predict various chemically meaningful features.
+
+# Replication
+
+1. Install [Julia](https://julialang.org/downloads/)
+2. Instantiate the project: `julia --project -e 'using Pkg; Pkg.instantiate()'`
+3. Train linear probes using [linear_probe.jsonnet](../../submit/linear_probe.jsonnet) and [submit/submit.py](../../submit/submit.py) on
+MIST finetuned models.
+4. Run `julia --project explore_probes.jl` to extract fitted probe weights from the checkpoints
+5. Instantiate the plotting code: `julia --project=plots -e 'using Pkg; Pkg.instantiate()'`
+6. Evaluate fitted probes: `julia --project=plots ./plots/lipinski_probes.jl`
diff --git a/opt/MISTStyle/README.md b/opt/MISTStyle/README.md
new file mode 100644
index 00000000..06c7ae16
--- /dev/null
+++ b/opt/MISTStyle/README.md
@@ -0,0 +1,3 @@
+# MISTStyle.jl
+
+A collection of plotting utilities and themes for [Makie.jl](https://docs.makie.org/stable/) used through the codebase to generate high-quality plots for publication with a consistent visual theme.
diff --git a/opt/TokenizerStats/README.md b/opt/TokenizerStats/README.md
index 2c05d09d..783fe3ff 100644
--- a/opt/TokenizerStats/README.md
+++ b/opt/TokenizerStats/README.md
@@ -1,4 +1,13 @@
-# Analysis Code for "Smirk: An Atomically Complete Tokenizer for Molecular Foundation Models"
+# Analysis Code for "Tokenization for Molecular Foundation Models"
+
+<div align="center" display="flex" >
+
+![GitHub License](https://img.shields.io/github/license/BattModels/smirk)
+<a href="https://doi.org/10.1021/acs.jcim.5c01856">![paper](https://img.shields.io/badge/paper-10.1021%2Facs.jcim.5c01856-blue)</a>
+<a href="https://doi.org/10.5281/zenodo.13761262">![data](https://img.shields.io/badge/data-10.5281%2Fzenodo.13761262-blue)</a>
+<a href="https://arxiv.org/abs/2409.15370">![arXiv:2409.15370](https://img.shields.io/badge/cs.LG-2409.15370-b31b1b?style=flat&amp;logo=arxiv&amp;logoColor=red)</a>
+
+</div>
 
 ## Installation
 
diff --git a/opt/design/README.md b/opt/design/README.md
new file mode 100644
index 00000000..3ad4c6ff
--- /dev/null
+++ b/opt/design/README.md
@@ -0,0 +1,12 @@
+# Evaluating Chemical Trends with MIST
+
+Source code for querying the MIST models on hydrocarbon and other templatable organic molecules.
+
+## Installation
+
+> All commands run from this directory
+
+1. Install [Julia](https://julialang.org/downloads/) and [uv](https://docs.astral.sh/uv/getting-started/installation/)
+2. Instantiate the project: `uv run julia --project -e 'using Pkg; Pkg.instantiate()'`
+3. Download the mist models to `models/`
+4. Recreate the plots `uv run julia --project plots.jl`
diff --git a/opt/interp_embeddings/README.md b/opt/interp_embeddings/README.md
index abd99f88..2e5e6d4e 100644
--- a/opt/interp_embeddings/README.md
+++ b/opt/interp_embeddings/README.md
@@ -5,6 +5,6 @@ Scripts for exploring MIST's embeddings and generating relevant figures from the
 ## Reproducing Analysis
 
 1. Install [julia](https://julialang.org/downloads/) and the base project (See [Project README](../../README.md))
-2. Instantiate the environment `julia --project -e 'using Pkg; Pkg.instantiate()'`
+2. Instantiate the environment `uv run julia --project -e 'using Pkg; Pkg.instantiate()'`
 3. Obtain model files and place at the appropriate path (see `plots.jl`)
-4. Run the script: `julia --project plots.jl`
+4. Run the script: `uv run julia --project plots.jl`
diff --git a/opt/mixtures/README.md b/opt/mixtures/README.md
new file mode 100644
index 00000000..f3e79ea0
--- /dev/null
+++ b/opt/mixtures/README.md
@@ -0,0 +1,17 @@
+# Mixtures
+
+Code for evaluating the MIST mixture models, exploring mixture space and optimizing mixture composition.
+
+## Installation
+
+> All commands run from this directory
+
+1. Install [Julia](https://julialang.org/downloads/) and [uv](https://docs.astral.sh/uv/getting-started/installation/)
+2. Instantiate the project: `uv run julia --project -e 'using Pkg; Pkg.instantiate()'`
+3. Obtain the mixtures dataset from [doi:10.5281/zenodo.17527149](https://doi.org/10.5281/zenodo.17527149)
+
+## Reproducing Plots
+
+Once installed, most of the scripts in the current directory can be run with:
+- python: `uv run <script>.py`
+- julia: `uv run julia --project <script>.jl`
diff --git a/opt/olfactory/README.md b/opt/olfactory/README.md
index e69de29b..60df99bb 100644
--- a/opt/olfactory/README.md
+++ b/opt/olfactory/README.md
@@ -0,0 +1,10 @@
+# Olfactory
+
+Scripts for exploring MIST's olfaction model and generating relevant figures from the paper
+
+## Reproducing Analysis
+
+1. Install [julia](https://julialang.org/downloads/) and the base project (See [Project README](../../README.md))
+2. Instantiate the environment `uv run julia --project -e 'using Pkg; Pkg.instantiate()'`
+3. Obtain model files and place at the appropriate path (see `discordance.jl` and `olfactory.jl`)
+4. Run the script: `uv run julia --project discordance.jl`
diff --git a/opt/package/.python-version b/opt/package/.python-version
new file mode 100644
index 00000000..e4fba218
--- /dev/null
+++ b/opt/package/.python-version
@@ -0,0 +1 @@
+3.12
diff --git a/opt/pubchem-qc/submit.sh b/opt/package/__init__.py
similarity index 100%
rename from opt/pubchem-qc/submit.sh
rename to opt/package/__init__.py
diff --git a/opt/package/__main__.py b/opt/package/__main__.py
index d2a27787..2e181b3e 100755
--- a/opt/package/__main__.py
+++ b/opt/package/__main__.py
@@ -60,6 +60,7 @@
     save_tokenizer,
 )
 from .write_model_class import write_modeling_module
+from .channel_schema import resolve_dataset_channels
 
 cli = typer.Typer()
 logging.basicConfig(level=logging.INFO)
@@ -152,7 +153,7 @@ def pretrained(ckpt: Path, name: Optional[str] = None, safe: bool = True):
     create_tar_gz(save_dir)
 
 
-def export_finetuned(ckpt: Path) -> MISTFinetuned:
+def export_finetuned(ckpt: Path, dataset: Optional[str] = None) -> MISTFinetuned:
     bundle, best_ckpt = load_model(ckpt, model_class="MISTFinetuned")
     train_cfg = read_training_config(ckpt)
     tokenizer = train_cfg.get("data")
@@ -161,12 +162,11 @@ def export_finetuned(ckpt: Path) -> MISTFinetuned:
     else:
         tokenizer = load_tokenizer("smirk")
 
-    # Try to get channels from training config or packaged model config
-    try:
-        channels = train_cfg["data"]["init_args"].get("target_columns")
-    except KeyError:
-        # If loading from already-packaged model, channels are at top level
-        channels = train_cfg.get("channels")
+    channels = resolve_dataset_channels(
+        train_cfg,
+        dataset=dataset,
+        source_names=[ckpt.name, best_ckpt.name],
+    )
 
     model = MISTFinetuned.from_components(
         encoder=bundle.encoder,
@@ -179,11 +179,19 @@ def export_finetuned(ckpt: Path) -> MISTFinetuned:
 
 
 @cli.command()
-def finetuned(ckpt: Path, name: Optional[str] = None, safe: bool = True):
+def finetuned(
+    ckpt: Path,
+    name: Optional[str] = None,
+    safe: bool = True,
+    dataset: Optional[str] = typer.Option(
+        None,
+        help="Override dataset name instead of autodetecting it from the checkpoint config",
+    ),
+):
     """
     Export a finetuned model with embedded remote code.
     """
-    model, best_ckpt = export_finetuned(ckpt)
+    model, best_ckpt = export_finetuned(ckpt, dataset=dataset)
 
     tag = name_model(
         model,
@@ -344,7 +352,7 @@ def excess_physics(ckpt: Path, name: Optional[str] = None, safe: bool = True):
     create_tar_gz(save_dir)
 
 
-def export_mixtures(ckpt: Path) -> MISTMixtures:
+def export_mixtures(ckpt: Path, dataset: Optional[str] = None) -> MISTMixtures:
     bundle, best_ckpt = load_model(ckpt, model_class="MISTMixtures")
 
     train_cfg = read_training_config(ckpt)
@@ -358,11 +366,11 @@ def export_mixtures(ckpt: Path) -> MISTMixtures:
     if hasattr(temperature_condition, "value"):
         temperature_condition = temperature_condition.value
 
-    # Try to get channels from training config
-    try:
-        channels = train_cfg["data"]["init_args"].get("target_col")
-    except KeyError:
-        channels = model_cfg.get("target_columns")
+    channels = resolve_dataset_channels(
+        train_cfg,
+        dataset=dataset,
+        source_names=[ckpt.name, best_ckpt.name],
+    )
 
     model = MISTMixtures.from_components(
         encoder=bundle.encoder,
@@ -378,11 +386,19 @@ def export_mixtures(ckpt: Path) -> MISTMixtures:
 
 
 @cli.command()
-def mixtures(ckpt: Path, name: Optional[str] = None, safe: bool = True):
+def mixtures(
+    ckpt: Path,
+    name: Optional[str] = None,
+    safe: bool = True,
+    dataset: Optional[str] = typer.Option(
+        None,
+        help="Override dataset name instead of autodetecting it from the checkpoint config",
+    ),
+):
     """
     Export a mixture property prediction model.
     """
-    model, best_ckpt = export_mixtures(ckpt)
+    model, best_ckpt = export_mixtures(ckpt, dataset=dataset)
 
     tag = name_model(
         model,
@@ -423,21 +439,35 @@ def mixtures(ckpt: Path, name: Optional[str] = None, safe: bool = True):
     create_tar_gz(save_dir)
 
 
-def export_multitask(encoder_ckpt: Path, task_ckpt: List[Path]) -> MISTMultiTask:
+def export_multitask(
+    encoder_ckpt: Path,
+    task_ckpt: List[Path],
+    task_datasets: Optional[List[Optional[str]]] = None,
+) -> MISTMultiTask:
+    encoder_ckpt = maybe_best_ckpt(encoder_ckpt)
+    if task_datasets is None:
+        task_datasets = [None] * len(task_ckpt)
+    if len(task_datasets) != len(task_ckpt):
+        raise ValueError("task_datasets must match task_ckpt length")
     try:
         # Try loading from training checkpoints
-        encoder_ckpt = maybe_best_ckpt(encoder_ckpt)
         encoder = load_encoder(encoder_ckpt)
         tokenizer = get_ckpt_tokenizer(encoder_ckpt)
 
         task_networks, transforms, channels = [], [], []
-        for ckpt in task_ckpt:
+        for ckpt, dataset in zip(task_ckpt, task_datasets):
             ckpt = maybe_best_ckpt(ckpt)
             cfg = read_training_config(ckpt)
             assert cfg["model"]["init_args"][
                 "freeze_encoder"
             ], f"Encoder not frozen for {ckpt}"
-            channels.extend(cfg["data"]["init_args"]["target_columns"])
+            channels.extend(
+                resolve_dataset_channels(
+                    cfg,
+                    dataset=dataset,
+                    source_names=[ckpt.name],
+                )
+            )
             bundle = SaveConfigWithCkpts.load(ckpt, strict=False)
             task_networks.append(bundle.task_network)
             transforms.append(bundle.transform)
@@ -498,6 +528,10 @@ def multitask(
     task_ckpt: List[Path] = typer.Option(
         [], help="Repeat to add multiple task checkpoints"
     ),
+    task_dataset: List[str] = typer.Option(
+        [],
+        help="Repeat to override the dataset name for each task checkpoint in order",
+    ),
     tasks_in_folder: bool = False,
     name: Optional[str] = None,
     safe: bool = True,
@@ -510,7 +544,15 @@ def multitask(
             if d.is_dir() and d.name != "pretrained":
                 task_ckpt.append(get_best_ckpt(d))
 
-    model = export_multitask(encoder_ckpt, task_ckpt)
+    task_datasets: List[Optional[str]]
+    if task_dataset:
+        if len(task_dataset) != len(task_ckpt):
+            raise ValueError("task_dataset must be provided once per task_ckpt")
+        task_datasets = list(task_dataset)
+    else:
+        task_datasets = [None] * len(task_ckpt)
+
+    model = export_multitask(encoder_ckpt, task_ckpt, task_datasets=task_datasets)
 
     tag = name_model(
         model,
diff --git a/opt/package/channel_schema.py b/opt/package/channel_schema.py
new file mode 100644
index 00000000..88989658
--- /dev/null
+++ b/opt/package/channel_schema.py
@@ -0,0 +1,196 @@
+from __future__ import annotations
+
+from pathlib import Path
+from typing import List, Optional, Sequence
+
+import pint
+import yaml
+
+DATASETS_DIR = Path(__file__).parent / "datasets"
+UNIT_REGISTRY = pint.UnitRegistry()
+UNIT_REGISTRY.define("logit = []")
+
+
+def normalize_dataset_name(name: str) -> str:
+    key = name.strip()
+    lowered = key.lower()
+    aliases = {
+        "tmqm": "tmqm",
+        "tmQM": "tmqm",
+        "pka": "pka",
+        "etn": "etn",
+        "ionic_conductivity": "conductivity",
+        "conductivity": "conductivity",
+        "excess": "mixtures",
+    }
+    return aliases.get(key, aliases.get(lowered, lowered))
+
+
+def extract_channel_names(raw_channels) -> Optional[List[str]]:
+    if raw_channels is None:
+        return None
+    if isinstance(raw_channels, str):
+        return [raw_channels]
+    if not isinstance(raw_channels, list):
+        return None
+
+    names = []
+    for channel in raw_channels:
+        if isinstance(channel, dict):
+            name = channel.get("name")
+        else:
+            name = channel
+        if name is None:
+            return None
+        names.append(name)
+    return names
+
+
+def get_original_channel_names(cfg: dict) -> Optional[List[str]]:
+    candidates = [
+        cfg.get("data", {}).get("init_args", {}).get("target_columns"),
+        cfg.get("data", {}).get("init_args", {}).get("target_col"),
+        cfg.get("target_columns"),
+        cfg.get("target_col"),
+        cfg.get("model", {}).get("init_args", {}).get("target_columns"),
+        cfg.get("model", {}).get("init_args", {}).get("target_col"),
+        cfg.get("channels"),
+    ]
+
+    for candidate in candidates:
+        names = extract_channel_names(candidate)
+        if names:
+            return names
+    return None
+
+
+def load_dataset_spec(dataset_name: str) -> dict:
+    dataset_key = normalize_dataset_name(dataset_name)
+    path = DATASETS_DIR / f"{dataset_key}.yaml"
+    if not path.is_file():
+        raise ValueError(f"Dataset spec not found for '{dataset_name}' at {path}")
+    spec = yaml.safe_load(path.read_text())
+    validate_dataset_spec_units(spec, dataset_name=dataset_key)
+    return spec
+
+
+def validate_dataset_spec_units(spec: dict, *, dataset_name: str) -> None:
+    for index, channel in enumerate(spec.get("channels", [])):
+        if not isinstance(channel, dict):
+            continue
+        unit = channel.get("unit")
+        if unit is None:
+            continue
+        try:
+            UNIT_REGISTRY.parse_units(unit)
+        except Exception as exc:
+            channel_name = channel.get("name", f"index {index}")
+            raise ValueError(
+                f"Dataset '{dataset_name}' channel '{channel_name}' has invalid unit "
+                f"{unit!r}"
+            ) from exc
+
+
+def parse_dataset_from_name(name: str) -> Optional[str]:
+    if name.startswith("mist-conductivity"):
+        return "conductivity"
+    if name.startswith("mist-mixtures") or "excess" in name:
+        return "mixtures"
+
+    parts = name.split("-")
+    if len(parts) >= 4 and parts[0] == "mist":
+        return normalize_dataset_name(parts[3])
+    return None
+
+
+def find_matching_datasets(original_channel_names: List[str]) -> List[str]:
+    matches = []
+    for path in sorted(DATASETS_DIR.glob("*.yaml")):
+        spec = yaml.safe_load(path.read_text())
+        dataset_channels = {
+            channel["name"]
+            for channel in spec.get("channels", [])
+            if isinstance(channel, dict) and "name" in channel
+        }
+        if all(name in dataset_channels for name in original_channel_names):
+            matches.append(path.stem)
+    return matches
+
+
+def detect_dataset_name(
+    cfg: dict,
+    *,
+    dataset: Optional[str] = None,
+    original_channel_names: Optional[List[str]] = None,
+    source_names: Sequence[str] = (),
+) -> str:
+    if dataset is not None:
+        dataset_key = normalize_dataset_name(dataset)
+        load_dataset_spec(dataset_key)
+        return dataset_key
+
+    candidate_names = [
+        cfg.get("data", {}).get("name"),
+        cfg.get("data", {}).get("init_args", {}).get("name"),
+        cfg.get("dataset"),
+        cfg.get("model", {}).get("init_args", {}).get("dataset"),
+        *(parse_dataset_from_name(name) for name in source_names),
+    ]
+    candidate_names = [
+        normalize_dataset_name(name)
+        for name in candidate_names
+        if isinstance(name, str) and name.strip()
+    ]
+
+    for candidate in candidate_names:
+        try:
+            spec = load_dataset_spec(candidate)
+        except ValueError:
+            continue
+        dataset_names = [channel["name"] for channel in spec.get("channels", [])]
+        if not original_channel_names or all(
+            name in dataset_names for name in original_channel_names
+        ):
+            return candidate
+
+    if original_channel_names:
+        matches = find_matching_datasets(original_channel_names)
+        if len(matches) == 1:
+            return matches[0]
+        if len(matches) > 1:
+            raise ValueError(
+                "Could not uniquely identify dataset: "
+                f"channel names match multiple dataset specs {matches}. "
+                "Use --dataset to override."
+            )
+
+    raise ValueError("Could not identify dataset. Use --dataset to override.")
+
+
+def resolve_dataset_channels(
+    cfg: dict,
+    *,
+    dataset: Optional[str] = None,
+    source_names: Sequence[str] = (),
+) -> List[dict]:
+    original_channel_names = get_original_channel_names(cfg)
+    dataset_name = detect_dataset_name(
+        cfg,
+        dataset=dataset,
+        original_channel_names=original_channel_names,
+        source_names=source_names,
+    )
+    spec = load_dataset_spec(dataset_name)
+    dataset_channels = spec.get("channels", [])
+
+    if not original_channel_names:
+        return dataset_channels
+
+    by_name = {channel["name"]: channel for channel in dataset_channels}
+    missing = [name for name in original_channel_names if name not in by_name]
+    if missing:
+        raise ValueError(
+            f"Dataset '{dataset_name}' does not contain required channels {missing}"
+        )
+
+    return [by_name[name] for name in original_channel_names]
diff --git a/opt/package/datasets/bace.yaml b/opt/package/datasets/bace.yaml
new file mode 100644
index 00000000..dba20a28
--- /dev/null
+++ b/opt/package/datasets/bace.yaml
@@ -0,0 +1,18 @@
+# Published Hugging Face config has channels set to null: https://huggingface.co/mist-models/mist-28M-8loj3bab-bace/resolve/main/config.json
+# Single-target name below follows the canonical MoleculeNet / DeepChem target column for BACE.
+# MoleculeNet benchmark metadata for the BACE task: https://doi.org/10.1021/acs.jcim.7b00118
+name: BACE
+source: MoleculeNet benchmark
+description: The BACE dataset from MoleculeNet was used. It includes 1,513 small molecules assayed for inhibition of β-secretase 1 (BACE-1), a target relevant to Alzheimer's disease. It is a single-task binary classification dataset.
+task_type: Binary classification
+task_description: BACE-1 inhibition
+output_format: Binary classification prediction
+output_description: Returns unnormalized logits for binary prediction for BACE-1 binding
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve)
+split_type: scaffold
+num_samples: 1513
+channels:
+  - name: Class
+    unit: logit
+    description: BACE-1 inhibitor
diff --git a/opt/package/datasets/bbbp.yaml b/opt/package/datasets/bbbp.yaml
new file mode 100644
index 00000000..31204b74
--- /dev/null
+++ b/opt/package/datasets/bbbp.yaml
@@ -0,0 +1,18 @@
+# Published Hugging Face config has channels set to null: https://huggingface.co/mist-models/mist-28M-3xpfhv48-bbbp/resolve/main/config.json
+# Single-target name below follows the canonical MoleculeNet / DeepChem target column for BBBP.
+# MoleculeNet benchmark metadata for the BBBP task: https://doi.org/10.1021/acs.jcim.7b00118
+name: BBBP
+source: MoleculeNet benchmark
+description: The BBBP (Blood-Brain Barrier Penetration) dataset from MoleculeNet was used. It labels 2,039 compounds as penetrating or non-penetrating with respect to the blood-brain barrier. It is a single-task binary classification dataset focused on central nervous system delivery.
+task_type: Binary classification
+task_description: blood-brain barrier penetration
+output_format: Binary classification prediction
+output_description: Returns unnormalized logits for binary prediction for blood-brain barrier penetration
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve)
+split_type: scaffold
+num_samples: 2039
+channels:
+  - name: p_np
+    unit: logit
+    description: Penetrates the blood-brain barrier
diff --git a/opt/package/datasets/bp.yaml b/opt/package/datasets/bp.yaml
new file mode 100644
index 00000000..6573ef01
--- /dev/null
+++ b/opt/package/datasets/bp.yaml
@@ -0,0 +1,17 @@
+# Authenticated Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-26.9M-b302p09x-bp/resolve/main/config.json
+# Characteristic temperatures dataset collected by Shang Zhu et al., as cited in the packaged metadata.
+name: Boiling Point
+source: a Characteristic Temperatures dataset collect by Shang Zhu et al.
+description: The Boiling Point dataset (part of the Characteristic Temperatures Dataset) was used. The boiling point of a substance determines part of its thermal operating window. The MIST variant was fine-tuned on a dataset of 3,969 boiling-point data entries curated from the PubChem database. The dataset was filtered to chemicals with fewer than 20 heavy atoms (not including hydrogen) and zero radical electrons.
+task_type: Regression
+task_description: boiling point
+output_format: Regression prediction for boiling point (°C)
+output_description: Returns predictions for normal boiling point
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE) for boiling point predictions
+split_type: random
+num_samples: 3969
+channels:
+  - name: bp
+    unit: degree_Celsius
+    description: Boiling point
diff --git a/opt/package/datasets/clintox.yaml b/opt/package/datasets/clintox.yaml
new file mode 100644
index 00000000..50f1d887
--- /dev/null
+++ b/opt/package/datasets/clintox.yaml
@@ -0,0 +1,21 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-97vfcykk-clintox/resolve/main/config.json
+# Gayvert et al. A Data-Driven Approach to Predicting Successes and Failures of Clinical Trials. https://doi.org/10.1038/nbt.3578
+# MoleculeNet benchmark metadata: https://doi.org/10.1021/acs.jcim.7b00118
+name: ClinTox
+source: MoleculeNet benchmark
+description: The ClinTox dataset from MoleculeNet was used. It provides clinical toxicity data for 1,478 compounds, distinguishing FDA-approved drugs from those withdrawn for toxicity. It comprises two binary-classification tasks (approval status and withdrawal risk).
+task_type: Multi-label binary classification
+task_description: clinical trial toxicity
+output_format: Multi-label binary classification for 2 tasks (CT_TOX and FDA approval)
+output_description: Returns unnormalized logits predictions for clinical trial toxicity and FDA approval status
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve) averaged across both tasks
+split_type: random
+num_samples: 1478
+channels:
+  - name: FDA_APPROVED
+    unit: logit
+    description: FDA approval status
+  - name: CT_TOX
+    unit: logit
+    description: Clinical-trial toxicity status
diff --git a/opt/package/datasets/conductivity.yaml b/opt/package/datasets/conductivity.yaml
new file mode 100644
index 00000000..a8ca6f04
--- /dev/null
+++ b/opt/package/datasets/conductivity.yaml
@@ -0,0 +1,12 @@
+# Authenticated Hugging Face config for this model omits top-level channels: https://huggingface.co/mist-models/mist-conductivity-27.0M-2mpg8dcd/resolve/main/config.json
+# Output names below follow the packaged model card for the exported predict API.
+# Zhu et al. dataset source referenced by the model card: https://doi.org/10.1038/s41467-024-51653-7
+channels:
+  - name: ln conductivity
+    description: Natural logarithm of ionic conductivity in millisiemens per centimeter
+  - name: Ea
+    unit: kelvin
+    description: Pseudo-activation energy parameter in the VFT relation
+  - name: Tg
+    unit: kelvin
+    description: Vogel temperature parameter in the VFT relation
diff --git a/opt/package/datasets/dn.yaml b/opt/package/datasets/dn.yaml
new file mode 100644
index 00000000..3b9d6736
--- /dev/null
+++ b/opt/package/datasets/dn.yaml
@@ -0,0 +1,16 @@
+# Authenticated Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-26.9M-6hk5coof-dn/resolve/main/config.json
+# Laurence and Gal. Lewis Basicity and Affinity Scales: Data and Measurement. https://doi.org/10.1002/9780470681909
+name: BF₃ affinity
+source: 'Lewis Basicity and Affinity Scales: Data and Measurement, by  Christian Laurence, Jean-François Gal, DOI:10.1002/9780470681909'
+description: The BF₃ affinity dataset was used. The BF₃ affinity is a measure of a molecule's Lewis basicity and has been applied to electrolyte design for analyzing lithium-ion salt solubility and solvation environment. The MIST variant was fine-tuned on a dataset of 344 BF₃ affinity measurements of Lewis bases at 298 K and 1 atm.
+task_type: Regression
+task_description: BF₃ affinity
+output_format: Regression prediction for BF₃ affinity
+output_description: Returns predictions for BF₃ affinity
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE)
+split_type: random
+num_samples: 344
+channels:
+  - name: BF3 affinity
+    description: BF3 affinity
diff --git a/opt/package/datasets/esol.yaml b/opt/package/datasets/esol.yaml
new file mode 100644
index 00000000..d558396e
--- /dev/null
+++ b/opt/package/datasets/esol.yaml
@@ -0,0 +1,17 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-kcwb9le5-esol/resolve/main/config.json
+# Delaney, J. S. ESOL: A Novel Method for Estimating Aqueous Solubility Directly from Molecular Structure. https://doi.org/10.1021/ci034243x
+# MoleculeNet benchmark metadata: https://doi.org/10.1021/acs.jcim.7b00118
+name: ESOL
+source: MoleculeNet benchmark
+description: The ESOL (Estimated SOLubility) dataset from MoleculeNet was used. It provides experimental aqueous solubility (log S) measurements for 1,128 small molecules. It is a single-target regression task.
+task_type: Regression
+task_description: aqueous solubility
+output_format: Regression prediction for aqueous solubility (log S)
+output_description: Returns predictions for measured log solubility in mols per litre
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE)
+split_type: random
+num_samples: 1128
+channels:
+  - name: log_solubility
+    description: Aqueous log solubility in moles per liter
diff --git a/opt/package/datasets/etn.yaml b/opt/package/datasets/etn.yaml
new file mode 100644
index 00000000..f640c201
--- /dev/null
+++ b/opt/package/datasets/etn.yaml
@@ -0,0 +1,6 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-27.1M-1gcxtg8y-ETN/resolve/main/config.json
+# Dataset source noted in the model card: Stenutz Dimroth-Reichardt ET solvent polarity table, https://www.stenutz.eu/chem/solv20.php?sort=3
+channels:
+  - name: ETN
+    unit: dimensionless
+    description: Normalized Dimroth-Reichardt solvent polarity parameter E_T^N
diff --git a/opt/package/datasets/fp.yaml b/opt/package/datasets/fp.yaml
new file mode 100644
index 00000000..154c6ef7
--- /dev/null
+++ b/opt/package/datasets/fp.yaml
@@ -0,0 +1,17 @@
+# Authenticated Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-26.9M-cyuo2xb6-fp/resolve/main/config.json
+# Characteristic temperatures dataset collected by Shang Zhu et al., as cited in the packaged metadata.
+name: Flash Point
+source: a Characteristic Temperatures dataset collect by Shang Zhu et al.
+description: The Flash Point dataset (part of the Characteristic Temperatures Dataset) was used. The flash point of a substance is the lowest temperature at which vapors ignite and determines its flammability. The MIST variant was fine-tuned on a dataset of 10,090 flash-point data entries obtained from the literature. The dataset was filtered to chemicals with fewer than 20 heavy atoms (not including hydrogen) and zero radical electrons.
+task_type: Regression
+task_description: flash point
+output_format: Regression prediction for flash point (°C)
+output_description: Returns predictions for flash point temperature
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE) for flash point predictions
+split_type: random
+num_samples: 10090
+channels:
+  - name: fp
+    unit: degree_Celsius
+    description: Flash point
diff --git a/opt/package/datasets/freesolv.yaml b/opt/package/datasets/freesolv.yaml
new file mode 100644
index 00000000..15595848
--- /dev/null
+++ b/opt/package/datasets/freesolv.yaml
@@ -0,0 +1,19 @@
+# Published Hugging Face config has channels set to null: https://huggingface.co/mist-models/mist-28M-0uiq7o7m-freesolv/resolve/main/config.json
+# Single-target name below follows the canonical MoleculeNet / DeepChem target column for FreeSolv.
+# Mobley and Guthrie. FreeSolv: a database of experimental and calculated hydration free energies, with input files. https://doi.org/10.1007/s10822-014-9747-x
+# MoleculeNet benchmark metadata: https://doi.org/10.1021/acs.jcim.7b00118
+name: FreeSolv
+source: MoleculeNet benchmark
+description: The FreeSolv dataset from MoleculeNet was used. It contains experimental and calculated hydration free energies for 642 neutral molecules in water. It has one regression target per molecule and benchmarks models' ability to predict molecule–solvent interaction energies.
+task_type: Regression
+task_description: hydration free energy
+output_format: Regression prediction for hydration free energy (kcal/mol)
+output_description: Returns predictions for hydration free energy in kcal/mol
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE) in kcal/mol for hydration free energy predictions
+split_type: random
+num_samples: 642
+channels:
+  - name: expt
+    unit: kilocalorie / mole
+    description: Hydration free energy
diff --git a/opt/package/datasets/hiv.yaml b/opt/package/datasets/hiv.yaml
new file mode 100644
index 00000000..b3f9ab9e
--- /dev/null
+++ b/opt/package/datasets/hiv.yaml
@@ -0,0 +1,17 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-8fh43gke-hiv/resolve/main/config.json
+# MoleculeNet benchmark metadata for the HIV task: https://doi.org/10.1021/acs.jcim.7b00118
+name: HIV
+source: MoleculeNet benchmark
+description: The HIV dataset from MoleculeNet was used. It contains 41,127 compounds tested for inhibition of HIV-1 replication in T-cell lines. It is a single-task binary classification dataset split 80/10/10 using a scaffold split.
+task_type: Binary classification
+task_description: HIV inhibition
+output_format: Binary classification prediction
+output_description: Returns unnormalized logits for binary prediction for HIV inhibition activity
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve)
+split_type: scaffold
+num_samples: 41127
+channels:
+  - name: activity
+    unit: logit
+    description: Active against HIV replication
diff --git a/opt/package/datasets/kt.yaml b/opt/package/datasets/kt.yaml
new file mode 100644
index 00000000..81fa4aa4
--- /dev/null
+++ b/opt/package/datasets/kt.yaml
@@ -0,0 +1,21 @@
+# Authenticated Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-26.9M-0vxdbm36-kt/resolve/main/config.json
+# Kamlet-Taft parameter dataset reference from the packaged metadata: https://doi.org/10.1021/acscentsci.1c01151
+# Packaged config exposes only alpha and beta channels; pi* is not present in the shipped model config.
+name: Kamlet-Taft Parameters
+source: DOI 10.1021/acscentsci.1c01151
+description: The Kamlet-Taft solvatochromatic parameters dataset was used. Kamlet-Taft parameters are used to characterize solvent properties, particularly their ability to participate in hydrogen bonding and their polarizability. The three parameters are α (hydrogen bond acidity, the solvent's ability to donate a hydrogen bond), β (hydrogen bond basicity, the solvent's ability to accept a hydrogen bond) and π* (polarizability, the solvent's overall polarity). The fine-tuning dataset consisted of 182 Kamlet-Taft values measured using NMR.
+task_type: Multi-task regression
+task_description: Kamlet-Taft parameters
+output_format: Regression predictions for Kamlet-Taft parameters (α, β, π*)
+output_description: Returns predictions for α (H-bond acidity), β (H-bond basicity), and π* (polarity)
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE) averaged across all three parameters
+split_type: random
+num_samples: 182
+channels:
+  - name: alpha
+    unit: dimensionless
+    description: Kamlet-Taft hydrogen bond donor
+  - name: beta
+    unit: dimensionless
+    description: Kamlet-Taft hydrogen bond acceptor
diff --git a/opt/package/datasets/lipo.yaml b/opt/package/datasets/lipo.yaml
new file mode 100644
index 00000000..4ac44f87
--- /dev/null
+++ b/opt/package/datasets/lipo.yaml
@@ -0,0 +1,17 @@
+# Published Hugging Face config has channels set to null: https://huggingface.co/mist-models/mist-28M-xzr5ulva-lipo/resolve/main/config.json
+# Single-target name below follows the canonical MoleculeNet / DeepChem target column for lipophilicity.
+# MoleculeNet benchmark metadata for the lipophilicity task: https://doi.org/10.1021/acs.jcim.7b00118
+name: Lipophilicity
+source: MoleculeNet benchmark
+description: The Lipophilicity dataset from MoleculeNet was used. It provides measured octanol–water distribution coefficients (log D) for 4,200 small molecules.
+task_type: Regression
+task_description: lipophilicity
+output_format: Regression prediction for lipophilicity (logD at pH 7.4)
+output_description: Returns predictions for octanol/water distribution coefficient (logD)
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE) in logD units
+split_type: random
+num_samples: 4200
+channels:
+  - name: exp
+    description: Octanol/water distribution coefficient logD at pH 7.4
diff --git a/opt/package/datasets/mixtures.yaml b/opt/package/datasets/mixtures.yaml
new file mode 100644
index 00000000..09df2cb0
--- /dev/null
+++ b/opt/package/datasets/mixtures.yaml
@@ -0,0 +1,13 @@
+# Authenticated Hugging Face config for this model omits top-level channels: https://huggingface.co/mist-models/mist-mixtures-zffffbex/resolve/main/config.json
+# Target names below follow the packaged dataset and model card for the excess-property prediction task.
+# Mixture excess-property dataset card: https://huggingface.co/datasets/mist-models/excess-properties
+channels:
+  - name: density
+    unit: gram / centimeter ** 3
+    description: Mixture density
+  - name: molar volume
+    unit: centimeter ** 3 / mole
+    description: Mixture molar volume
+  - name: molar enthalpy
+    unit: joule / mole
+    description: Mixture molar enthalpy
diff --git a/opt/package/datasets/mp.yaml b/opt/package/datasets/mp.yaml
new file mode 100644
index 00000000..f515b26d
--- /dev/null
+++ b/opt/package/datasets/mp.yaml
@@ -0,0 +1,17 @@
+# Authenticated Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-26.9M-y3ge5pf9-mp/resolve/main/config.json
+# Characteristic temperatures dataset collected by Shang Zhu et al., as cited in the packaged metadata.
+name: Melting Point
+source: a Characteristic Temperatures dataset collect by Shang Zhu et al.
+description: The Melting Point dataset (part of the Characteristic Temperatures Dataset) was used. The melting point of a substance determines part of its thermal operating window. The MIST variant was fine-tuned on a dataset of 5,734 melting-point data entries curated from the PubChem database. The dataset was filtered to chemicals with fewer than 20 heavy atoms (not including hydrogen) and zero radical electrons.
+task_type: Regression
+task_description: melting point
+output_format: Regression prediction for melting point (°C)
+output_description: Returns predictions for melting point temperature
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Root Mean Squared Error (RMSE) for melting point predictions
+split_type: random
+num_samples: 5734
+channels:
+  - name: mp
+    unit: degree_Celsius
+    description: Melting point
diff --git a/opt/package/datasets/muv.yaml b/opt/package/datasets/muv.yaml
new file mode 100644
index 00000000..0b3a8d36
--- /dev/null
+++ b/opt/package/datasets/muv.yaml
@@ -0,0 +1,49 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-yr1urd2c-muv/resolve/main/config.json
+# Rohrer and Baumann. Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data. https://doi.org/10.1021/ci8002649
+# MoleculeNet benchmark metadata: https://doi.org/10.1021/acs.jcim.7b00118
+name: MUV
+source: MoleculeNet benchmark
+description: The MUV (Maximum Unbiased Validation) dataset from MoleculeNet was used. It contains 93,087 compounds tested against 17 challenging bioassay targets designed to avoid analogue bias and provide a more realistic evaluation of virtual screening methods.
+task_type: Multi-label binary classification
+task_description: bioassay activity
+output_format: Multi-label binary classification for 17 bioassay targets
+output_description: Returns unnormalized logits for 17 bioassay activity endpoints
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve) averaged across all 17 tasks
+split_type: random
+num_samples: 93087
+channels:
+  - name: MUV-466
+    unit: logit
+  - name: MUV-548
+    unit: logit
+  - name: MUV-600
+    unit: logit
+  - name: MUV-644
+    unit: logit
+  - name: MUV-652
+    unit: logit
+  - name: MUV-689
+    unit: logit
+  - name: MUV-692
+    unit: logit
+  - name: MUV-712
+    unit: logit
+  - name: MUV-713
+    unit: logit
+  - name: MUV-733
+    unit: logit
+  - name: MUV-737
+    unit: logit
+  - name: MUV-810
+    unit: logit
+  - name: MUV-832
+    unit: logit
+  - name: MUV-846
+    unit: logit
+  - name: MUV-852
+    unit: logit
+  - name: MUV-858
+    unit: logit
+  - name: MUV-859
+    unit: logit
diff --git a/opt/package/datasets/odour.yaml b/opt/package/datasets/odour.yaml
new file mode 100644
index 00000000..f97d7e69
--- /dev/null
+++ b/opt/package/datasets/odour.yaml
@@ -0,0 +1,284 @@
+# Authenticated Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-26.9M-48kpooqf-odour/resolve/main/config.json
+# Model card dataset source: Lee et al. olfaction dataset, as cited by the packaged model card.
+name: Olfaction
+source: application dataset
+description: The Olfaction dataset was used, consisting of 4,983 molecules collected by Lee et al. As an arbitrary number of scent labels can be applied to any molecule, the task was framed as a multi-label multi-class binary classification task, with 135 possible scent labels.
+task_type: Multi-label binary classification
+task_description: odor descriptors
+output_format: Multi-label binary classification for 135 scent labels
+output_description: Returns unnormalized logits for binary labels for 135 possible scent descriptors
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve) averaged across all 135 scent labels
+split_type: random
+num_samples: 4983
+channels:
+  - name: almond
+    unit: logit
+  - name: amber
+    unit: logit
+  - name: animal
+    unit: logit
+  - name: anisic
+    unit: logit
+  - name: apple
+    unit: logit
+  - name: apricot
+    unit: logit
+  - name: aromatic
+    unit: logit
+  - name: balsamic
+    unit: logit
+  - name: banana
+    unit: logit
+  - name: beefy
+    unit: logit
+  - name: bergamot
+    unit: logit
+  - name: berry
+    unit: logit
+  - name: bitter
+    unit: logit
+  - name: black currant
+    unit: logit
+  - name: brandy
+    unit: logit
+  - name: burnt
+    unit: logit
+  - name: buttery
+    unit: logit
+  - name: cabbage
+    unit: logit
+  - name: camphoreous
+    unit: logit
+  - name: caramellic
+    unit: logit
+  - name: cedar
+    unit: logit
+  - name: celery
+    unit: logit
+  - name: chamomile
+    unit: logit
+  - name: cheesy
+    unit: logit
+  - name: cherry
+    unit: logit
+  - name: chocolate
+    unit: logit
+  - name: cinnamon
+    unit: logit
+  - name: citrus
+    unit: logit
+  - name: clean
+    unit: logit
+  - name: clove
+    unit: logit
+  - name: cocoa
+    unit: logit
+  - name: coconut
+    unit: logit
+  - name: coffee
+    unit: logit
+  - name: cognac
+    unit: logit
+  - name: cooked
+    unit: logit
+  - name: cooling
+    unit: logit
+  - name: cortex
+    unit: logit
+  - name: coumarinic
+    unit: logit
+  - name: creamy
+    unit: logit
+  - name: cucumber
+    unit: logit
+  - name: dairy
+    unit: logit
+  - name: dry
+    unit: logit
+  - name: earthy
+    unit: logit
+  - name: ethereal
+    unit: logit
+  - name: fatty
+    unit: logit
+  - name: fermented
+    unit: logit
+  - name: fishy
+    unit: logit
+  - name: floral
+    unit: logit
+  - name: fresh
+    unit: logit
+  - name: fruit skin
+    unit: logit
+  - name: fruity
+    unit: logit
+  - name: garlic
+    unit: logit
+  - name: gassy
+    unit: logit
+  - name: geranium
+    unit: logit
+  - name: grape
+    unit: logit
+  - name: grapefruit
+    unit: logit
+  - name: grassy
+    unit: logit
+  - name: green
+    unit: logit
+  - name: hawthorn
+    unit: logit
+  - name: hay
+    unit: logit
+  - name: hazelnut
+    unit: logit
+  - name: herbal
+    unit: logit
+  - name: honey
+    unit: logit
+  - name: hyacinth
+    unit: logit
+  - name: jasmin
+    unit: logit
+  - name: juicy
+    unit: logit
+  - name: ketonic
+    unit: logit
+  - name: lactonic
+    unit: logit
+  - name: lavender
+    unit: logit
+  - name: leafy
+    unit: logit
+  - name: leathery
+    unit: logit
+  - name: lemon
+    unit: logit
+  - name: lily
+    unit: logit
+  - name: malty
+    unit: logit
+  - name: meaty
+    unit: logit
+  - name: medicinal
+    unit: logit
+  - name: melon
+    unit: logit
+  - name: metallic
+    unit: logit
+  - name: milky
+    unit: logit
+  - name: mint
+    unit: logit
+  - name: muguet
+    unit: logit
+  - name: mushroom
+    unit: logit
+  - name: musk
+    unit: logit
+  - name: musty
+    unit: logit
+  - name: natural
+    unit: logit
+  - name: nutty
+    unit: logit
+  - name: odorless
+    unit: logit
+  - name: oily
+    unit: logit
+  - name: onion
+    unit: logit
+  - name: orange
+    unit: logit
+  - name: orangeflower
+    unit: logit
+  - name: orris
+    unit: logit
+  - name: ozone
+    unit: logit
+  - name: peach
+    unit: logit
+  - name: pear
+    unit: logit
+  - name: phenolic
+    unit: logit
+  - name: pine
+    unit: logit
+  - name: pineapple
+    unit: logit
+  - name: plum
+    unit: logit
+  - name: popcorn
+    unit: logit
+  - name: potato
+    unit: logit
+  - name: powdery
+    unit: logit
+  - name: pungent
+    unit: logit
+  - name: radish
+    unit: logit
+  - name: raspberry
+    unit: logit
+  - name: ripe
+    unit: logit
+  - name: roasted
+    unit: logit
+  - name: rose
+    unit: logit
+  - name: rummy
+    unit: logit
+  - name: sandalwood
+    unit: logit
+  - name: savory
+    unit: logit
+  - name: sharp
+    unit: logit
+  - name: smoky
+    unit: logit
+  - name: soapy
+    unit: logit
+  - name: solvent
+    unit: logit
+  - name: sour
+    unit: logit
+  - name: spicy
+    unit: logit
+  - name: strawberry
+    unit: logit
+  - name: sulfurous
+    unit: logit
+  - name: sweaty
+    unit: logit
+  - name: sweet
+    unit: logit
+  - name: tea
+    unit: logit
+  - name: terpenic
+    unit: logit
+  - name: tobacco
+    unit: logit
+  - name: tomato
+    unit: logit
+  - name: tropical
+    unit: logit
+  - name: vanilla
+    unit: logit
+  - name: vegetable
+    unit: logit
+  - name: vetiver
+    unit: logit
+  - name: violet
+    unit: logit
+  - name: warm
+    unit: logit
+  - name: waxy
+    unit: logit
+  - name: weedy
+    unit: logit
+  - name: winey
+    unit: logit
+  - name: woody
+    unit: logit
diff --git a/opt/package/datasets/pka.yaml b/opt/package/datasets/pka.yaml
new file mode 100644
index 00000000..3c0d7bb2
--- /dev/null
+++ b/opt/package/datasets/pka.yaml
@@ -0,0 +1,6 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-27.1M-f8xk9lsv-pKa/resolve/main/config.json
+# Dataset source noted in the model card: IUPAC Dissociation Constants repository, https://github.com/IUPAC/Dissociation-Constants
+channels:
+  - name: pka_value
+    unit: dimensionless
+    description: Acid dissociation constant pKa
diff --git a/opt/package/datasets/qm8.yaml b/opt/package/datasets/qm8.yaml
new file mode 100644
index 00000000..9c636919
--- /dev/null
+++ b/opt/package/datasets/qm8.yaml
@@ -0,0 +1,60 @@
+name: QM8
+source: MoleculeNet benchmark
+description: The QM8 dataset from MoleculeNet was used. It contains 21,786 small molecules for which time-dependent density functional theory (TDDFT) was used to compute eight excitation energies and four oscillator strengths, resulting in 12 scalar targets per molecule.
+task_type: Multi-task regression
+task_description: electronic spectra properties
+output_format: Regression predictions for 12 electronic transition properties (8 excitation energies and 4 oscillator strengths)
+output_description: Returns predictions for 12 electronic spectra properties computed using TDDFT
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Mean Absolute Error (MAE) averaged across all 12 properties
+split_type: random
+num_samples: 21786
+channels:
+  - name: E1-CC2-RI-CC2/def2TZVP
+    unit: hartree
+    description: First excitation energy at the CC2-RI-CC2/def2TZVP level
+  - name: E2-CC2-RI-CC2/def2TZVP
+    unit: hartree
+    description: Second excitation energy at the CC2-RI-CC2/def2TZVP level
+  - name: f1-CC2-RI-CC2/def2TZVP
+    unit: dimensionless
+    description: First oscillator strength at the CC2-RI-CC2/def2TZVP level
+  - name: f2-CC2-RI-CC2/def2TZVP
+    unit: dimensionless
+    description: Second oscillator strength at the CC2-RI-CC2/def2TZVP level
+  - name: E1-PBE0-LR-TDPBE0/def2SVP
+    unit: hartree
+    description: First excitation energy at the PBE0-LR-TDPBE0/def2SVP level
+  - name: E2-PBE0-LR-TDPBE0/def2SVP
+    unit: hartree
+    description: Second excitation energy at the PBE0-LR-TDPBE0/def2SVP level
+  - name: f1-PBE0-LR-TDPBE0/def2SVP
+    unit: dimensionless
+    description: First oscillator strength at the PBE0-LR-TDPBE0/def2SVP level
+  - name: f2-PBE0-LR-TDPBE0/def2SVP
+    unit: dimensionless
+    description: Second oscillator strength at the PBE0-LR-TDPBE0/def2SVP level
+  - name: E1-PBE0-LR-TDPBE0/def2TZVP
+    unit: hartree
+    description: First excitation energy at the PBE0-LR-TDPBE0/def2TZVP level
+  - name: E2-PBE0-LR-TDPBE0/def2TZVP
+    unit: hartree
+    description: Second excitation energy at the PBE0-LR-TDPBE0/def2TZVP level
+  - name: f1-PBE0-LR-TDPBE0/def2TZVP
+    unit: dimensionless
+    description: First oscillator strength at the PBE0-LR-TDPBE0/def2TZVP level
+  - name: f2-PBE0-LR-TDPBE0/def2TZVP
+    unit: dimensionless
+    description: Second oscillator strength at the PBE0-LR-TDPBE0/def2TZVP level
+  - name: E1-CAM-LR-TDCAM-B3LYP/def2TZVP
+    unit: hartree
+    description: First excitation energy at the CAM-LR-TDCAM-B3LYP/def2TZVP level
+  - name: E2-CAM-LR-TDCAM-B3LYP/def2TZVP
+    unit: hartree
+    description: Second excitation energy at the CAM-LR-TDCAM-B3LYP/def2TZVP level
+  - name: f1-CAM-LR-TDCAM-B3LYP/def2TZVP
+    unit: dimensionless
+    description: First oscillator strength at the CAM-LR-TDCAM-B3LYP/def2TZVP level
+  - name: f2-CAM-LR-TDCAM-B3LYP/def2TZVP
+    unit: dimensionless
+    description: Second oscillator strength at the CAM-LR-TDCAM-B3LYP/def2TZVP level
diff --git a/opt/package/datasets/qm9.yaml b/opt/package/datasets/qm9.yaml
new file mode 100644
index 00000000..9e27c19b
--- /dev/null
+++ b/opt/package/datasets/qm9.yaml
@@ -0,0 +1,51 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-26.9M-kkgx0omx-qm9/resolve/main/config.json
+# Ramakrishnan et al. Quantum chemistry structures and properties of 134 kilo molecules. https://doi.org/10.1038/sdata.2014.22
+# MoleculeNet benchmark metadata for QM9 target naming: https://doi.org/10.1021/acs.jcim.7b00118
+name: QM9
+source: MoleculeNet benchmark
+description: The QM9 dataset from MoleculeNet was used. It contains approximately 134,000 small organic molecules (up to 9 heavy atoms) with 12 quantum mechanical properties calculated using density functional theory (DFT). The properties include geometric, energetic, electronic, and thermodynamic characteristics such as HOMO/LUMO energies, dipole moments, polarizability, heat capacity, and internal energies.
+task_type: Multi-task regression
+task_description: quantum properties
+output_format: Regression predictions for 12 quantum mechanical properties
+output_description: Returns predictions for 12 quantum mechanical properties (HOMO, LUMO, gap, r2, zpve, u0, u298, h298, g298, cv, mu, alpha)
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Mean Absolute Error (MAE) averaged across all 12 quantum mechanical properties
+split_type: random
+num_samples: 133885
+channels:
+  - name: mu
+    unit: debye
+    description: Dipole moment
+  - name: alpha
+    unit: cubic bohr
+    description: Isotropic polarizability
+  - name: homo
+    unit: hartree
+    description: HOMO energy
+  - name: lumo
+    unit: hartree
+    description: LUMO energy
+  - name: gap
+    unit: hartree
+    description: HOMO-LUMO energy gap
+  - name: r2
+    unit: square bohr
+    description: Electronic spatial extent
+  - name: zpve
+    unit: hartree
+    description: Zero-point vibrational energy
+  - name: u0
+    unit: hartree
+    description: Internal energy at 0 K
+  - name: u298
+    unit: hartree
+    description: Internal energy at 298.15 K
+  - name: h298
+    unit: hartree
+    description: Enthalpy at 298.15 K
+  - name: g298
+    unit: hartree
+    description: Gibbs free energy at 298.15 K
+  - name: cv
+    unit: calorie / mole / kelvin
+    description: Heat capacity at 298.15 K
diff --git a/opt/package/datasets/sider.yaml b/opt/package/datasets/sider.yaml
new file mode 100644
index 00000000..410bbbf8
--- /dev/null
+++ b/opt/package/datasets/sider.yaml
@@ -0,0 +1,69 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-z8qo16uy-sider/resolve/main/config.json
+# Kuhn et al. The SIDER database of drugs and side effects. https://doi.org/10.1093/nar/gkv1075
+# MoleculeNet benchmark metadata: https://doi.org/10.1021/acs.jcim.7b00118
+name: SIDER
+source: MoleculeNet benchmark
+description: The SIDER (Side Effect Resource) dataset from MoleculeNet was used. It aggregates adverse drug reactions for 1,427 marketed drugs across 27 MedDRA side-effect terms. This is a multi-label binary-classification dataset.
+task_type: Multi-label binary classification
+task_description: adverse drug reactions
+output_format: Multi-label binary classification for 27 types of adverse drug reactions
+output_description: Returns unnormalized logits for 27 MedDRA side-effect terms
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve) averaged across all 27 tasks
+split_type: random
+num_samples: 1427
+channels:
+  - name: Hepatobiliary disorders
+    unit: logit
+  - name: Metabolism and nutrition disorders
+    unit: logit
+  - name: Product issues
+    unit: logit
+  - name: Eye disorders
+    unit: logit
+  - name: Investigations
+    unit: logit
+  - name: Musculoskeletal and connective tissue disorders
+    unit: logit
+  - name: Gastrointestinal disorders
+    unit: logit
+  - name: Social circumstances
+    unit: logit
+  - name: Immune system disorders
+    unit: logit
+  - name: Reproductive system and breast disorders
+    unit: logit
+  - name: Neoplasms benign, malignant and unspecified (incl cysts and polyps)
+    unit: logit
+  - name: General disorders and administration site conditions
+    unit: logit
+  - name: Endocrine disorders
+    unit: logit
+  - name: Surgical and medical procedures
+    unit: logit
+  - name: Vascular disorders
+    unit: logit
+  - name: Blood and lymphatic system disorders
+    unit: logit
+  - name: Skin and subcutaneous tissue disorders
+    unit: logit
+  - name: Congenital, familial and genetic disorders
+    unit: logit
+  - name: Infections and infestations
+    unit: logit
+  - name: Respiratory, thoracic and mediastinal disorders
+    unit: logit
+  - name: Psychiatric disorders
+    unit: logit
+  - name: Renal and urinary disorders
+    unit: logit
+  - name: Pregnancy, puerperium and perinatal conditions
+    unit: logit
+  - name: Ear and labyrinth disorders
+    unit: logit
+  - name: Cardiac disorders
+    unit: logit
+  - name: Nervous system disorders
+    unit: logit
+  - name: Injury, poisoning and procedural complications
+    unit: logit
diff --git a/opt/package/datasets/tmqm.yaml b/opt/package/datasets/tmqm.yaml
new file mode 100644
index 00000000..51e162af
--- /dev/null
+++ b/opt/package/datasets/tmqm.yaml
@@ -0,0 +1,38 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-ggd8iisr-tmQM/resolve/main/config.json
+# Balcells and Skjelstad. tmQM Dataset - Quantum Geometries and Properties of 86k Transition Metal Complexes. https://doi.org/10.1021/acs.jcim.0c01041
+name: tmQM
+source: tmQM benchmark https://doi.org/10.1021/acs.jcim.0c01041
+description: The tmQM dataset was used. It is a quantum mechanics dataset of 108,000 large transition organometallic compounds curated from the Cambridge Structural Database. tmQM provides quantum mechanical properties computed using DFT at the TPSSh-D3BJ/def2-SVP level using geometries optimized at the GFN2-xTB level. The dataset is of particular interest due to its expanded elemental and stereochemical diversity relative to other datasets.
+task_type: Multi-task regression
+task_description: quantum properties for organometallic compounds
+output_format: Regression predictions for 8 quantum mechanical properties
+output_description: Returns predictions for HOMO, LUMO, HOMO/LUMO gap, dipole moment, and natural charge
+loss_function: Mean Squared Error (MSE) Loss
+metrics: Mean Absolute Error (MAE) averaged across all 8 quantum mechanical properties
+split_type: random
+num_samples: 108000
+channels:
+  - name: Electronic_E
+    unit: hartree
+    description: Electronic energy
+  - name: Dispersion_E
+    unit: hartree
+    description: Dispersion energy correction
+  - name: Dipole_M
+    unit: debye
+    description: Dipole moment
+  - name: Metal_q
+    unit: elementary_charge
+    description: Partial charge on the metal center
+  - name: HL_Gap
+    unit: hartree
+    description: HOMO-LUMO gap
+  - name: HOMO_Energy
+    unit: hartree
+    description: HOMO energy
+  - name: LUMO_Energy
+    unit: hartree
+    description: LUMO energy
+  - name: Polarizability
+    unit: cubic bohr
+    description: Isotropic polarizability
diff --git a/opt/package/datasets/tox21.yaml b/opt/package/datasets/tox21.yaml
new file mode 100644
index 00000000..48f221fe
--- /dev/null
+++ b/opt/package/datasets/tox21.yaml
@@ -0,0 +1,39 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-kw4ks27p-tox21/resolve/main/config.json
+# Tox21 Data Challenge 2014 overview: https://doi.org/10.3389/fenvs.2015.00080
+# MoleculeNet benchmark metadata: https://doi.org/10.1021/acs.jcim.7b00118
+name: Tox21
+source: MoleculeNet benchmark
+description: The Tox21 dataset from MoleculeNet was used. It features 7,831 compounds tested against 12 biological targets related to nuclear receptor signaling and stress response pathways. This is a multi-label binary-classification dataset.
+task_type: Multi-label binary classification
+task_description: toxicity endpoints
+output_format: Multi-label binary classification for 12 toxicity endpoints
+output_description: Returns unnormalized logits for predictions for 12 toxicity endpoints (nuclear receptor signaling and stress response)
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve) averaged across all 12 tasks
+split_type: random
+num_samples: 7831
+channels:
+  - name: NR-AR
+    unit: logit
+  - name: NR-AR-LBD
+    unit: logit
+  - name: NR-AhR
+    unit: logit
+  - name: NR-Aromatase
+    unit: logit
+  - name: NR-ER
+    unit: logit
+  - name: NR-ER-LBD
+    unit: logit
+  - name: NR-PPAR-gamma
+    unit: logit
+  - name: SR-ARE
+    unit: logit
+  - name: SR-ATAD5
+    unit: logit
+  - name: SR-HSE
+    unit: logit
+  - name: SR-MMP
+    unit: logit
+  - name: SR-p53
+    unit: logit
diff --git a/opt/package/datasets/toxcast.yaml b/opt/package/datasets/toxcast.yaml
new file mode 100644
index 00000000..00d055bd
--- /dev/null
+++ b/opt/package/datasets/toxcast.yaml
@@ -0,0 +1,1249 @@
+# Hugging Face config for exact channel names: https://huggingface.co/mist-models/mist-28M-ttqcvt6fs-toxcast/resolve/main/config.json
+# Tice et al. Improving the Human Hazard Characterization of Chemicals: A ToxCast Update. https://doi.org/10.1093/toxsci/kfs159
+# MoleculeNet benchmark metadata: https://doi.org/10.1021/acs.jcim.7b00118
+name: ToxCast
+source: MoleculeNet benchmark
+description: The ToxCast dataset from MoleculeNet was used. It contains over 8,000 compounds tested across 617 high-throughput screening assays covering various biological pathways and toxicity endpoints from the EPA's ToxCast project.
+task_type: Multi-label binary classification
+task_description: ToxCast assay endpoints
+output_format: Multi-label binary classification for 617 toxicity assay endpoints
+output_description: Returns unnormalized logits for 617 high-throughput screening assay endpoints
+loss_function: Binary Cross-Entropy Loss
+metrics: AUROC (Area Under the Receiver Operator Curve) averaged across all 617 tasks
+split_type: random
+num_samples: 8615
+channels:
+  - name: ACEA_T47D_80hr_Negative
+    unit: logit
+  - name: ACEA_T47D_80hr_Positive
+    unit: logit
+  - name: APR_HepG2_CellCycleArrest_24h_dn
+    unit: logit
+  - name: APR_HepG2_CellCycleArrest_24h_up
+    unit: logit
+  - name: APR_HepG2_CellCycleArrest_72h_dn
+    unit: logit
+  - name: APR_HepG2_CellLoss_24h_dn
+    unit: logit
+  - name: APR_HepG2_CellLoss_72h_dn
+    unit: logit
+  - name: APR_HepG2_MicrotubuleCSK_24h_dn
+    unit: logit
+  - name: APR_HepG2_MicrotubuleCSK_24h_up
+    unit: logit
+  - name: APR_HepG2_MicrotubuleCSK_72h_dn
+    unit: logit
+  - name: APR_HepG2_MicrotubuleCSK_72h_up
+    unit: logit
+  - name: APR_HepG2_MitoMass_24h_dn
+    unit: logit
+  - name: APR_HepG2_MitoMass_24h_up
+    unit: logit
+  - name: APR_HepG2_MitoMass_72h_dn
+    unit: logit
+  - name: APR_HepG2_MitoMass_72h_up
+    unit: logit
+  - name: APR_HepG2_MitoMembPot_1h_dn
+    unit: logit
+  - name: APR_HepG2_MitoMembPot_24h_dn
+    unit: logit
+  - name: APR_HepG2_MitoMembPot_72h_dn
+    unit: logit
+  - name: APR_HepG2_MitoticArrest_24h_up
+    unit: logit
+  - name: APR_HepG2_MitoticArrest_72h_up
+    unit: logit
+  - name: APR_HepG2_NuclearSize_24h_dn
+    unit: logit
+  - name: APR_HepG2_NuclearSize_72h_dn
+    unit: logit
+  - name: APR_HepG2_NuclearSize_72h_up
+    unit: logit
+  - name: APR_HepG2_OxidativeStress_24h_up
+    unit: logit
+  - name: APR_HepG2_OxidativeStress_72h_up
+    unit: logit
+  - name: APR_HepG2_StressKinase_1h_up
+    unit: logit
+  - name: APR_HepG2_StressKinase_24h_up
+    unit: logit
+  - name: APR_HepG2_StressKinase_72h_up
+    unit: logit
+  - name: APR_HepG2_p53Act_24h_up
+    unit: logit
+  - name: APR_HepG2_p53Act_72h_up
+    unit: logit
+  - name: APR_Hepat_Apoptosis_24hr_up
+    unit: logit
+  - name: APR_Hepat_Apoptosis_48hr_up
+    unit: logit
+  - name: APR_Hepat_CellLoss_24hr_dn
+    unit: logit
+  - name: APR_Hepat_CellLoss_48hr_dn
+    unit: logit
+  - name: APR_Hepat_DNADamage_24hr_up
+    unit: logit
+  - name: APR_Hepat_DNADamage_48hr_up
+    unit: logit
+  - name: APR_Hepat_DNATexture_24hr_up
+    unit: logit
+  - name: APR_Hepat_DNATexture_48hr_up
+    unit: logit
+  - name: APR_Hepat_MitoFxnI_1hr_dn
+    unit: logit
+  - name: APR_Hepat_MitoFxnI_24hr_dn
+    unit: logit
+  - name: APR_Hepat_MitoFxnI_48hr_dn
+    unit: logit
+  - name: APR_Hepat_NuclearSize_24hr_dn
+    unit: logit
+  - name: APR_Hepat_NuclearSize_48hr_dn
+    unit: logit
+  - name: APR_Hepat_Steatosis_24hr_up
+    unit: logit
+  - name: APR_Hepat_Steatosis_48hr_up
+    unit: logit
+  - name: ATG_AP_1_CIS_dn
+    unit: logit
+  - name: ATG_AP_1_CIS_up
+    unit: logit
+  - name: ATG_AP_2_CIS_dn
+    unit: logit
+  - name: ATG_AP_2_CIS_up
+    unit: logit
+  - name: ATG_AR_TRANS_dn
+    unit: logit
+  - name: ATG_AR_TRANS_up
+    unit: logit
+  - name: ATG_Ahr_CIS_dn
+    unit: logit
+  - name: ATG_Ahr_CIS_up
+    unit: logit
+  - name: ATG_BRE_CIS_dn
+    unit: logit
+  - name: ATG_BRE_CIS_up
+    unit: logit
+  - name: ATG_CAR_TRANS_dn
+    unit: logit
+  - name: ATG_CAR_TRANS_up
+    unit: logit
+  - name: ATG_CMV_CIS_dn
+    unit: logit
+  - name: ATG_CMV_CIS_up
+    unit: logit
+  - name: ATG_CRE_CIS_dn
+    unit: logit
+  - name: ATG_CRE_CIS_up
+    unit: logit
+  - name: ATG_C_EBP_CIS_dn
+    unit: logit
+  - name: ATG_C_EBP_CIS_up
+    unit: logit
+  - name: ATG_DR4_LXR_CIS_dn
+    unit: logit
+  - name: ATG_DR4_LXR_CIS_up
+    unit: logit
+  - name: ATG_DR5_CIS_dn
+    unit: logit
+  - name: ATG_DR5_CIS_up
+    unit: logit
+  - name: ATG_E2F_CIS_dn
+    unit: logit
+  - name: ATG_E2F_CIS_up
+    unit: logit
+  - name: ATG_EGR_CIS_up
+    unit: logit
+  - name: ATG_ERE_CIS_dn
+    unit: logit
+  - name: ATG_ERE_CIS_up
+    unit: logit
+  - name: ATG_ERRa_TRANS_dn
+    unit: logit
+  - name: ATG_ERRg_TRANS_dn
+    unit: logit
+  - name: ATG_ERRg_TRANS_up
+    unit: logit
+  - name: ATG_ERa_TRANS_up
+    unit: logit
+  - name: ATG_E_Box_CIS_dn
+    unit: logit
+  - name: ATG_E_Box_CIS_up
+    unit: logit
+  - name: ATG_Ets_CIS_dn
+    unit: logit
+  - name: ATG_Ets_CIS_up
+    unit: logit
+  - name: ATG_FXR_TRANS_up
+    unit: logit
+  - name: ATG_FoxA2_CIS_dn
+    unit: logit
+  - name: ATG_FoxA2_CIS_up
+    unit: logit
+  - name: ATG_FoxO_CIS_dn
+    unit: logit
+  - name: ATG_FoxO_CIS_up
+    unit: logit
+  - name: ATG_GAL4_TRANS_dn
+    unit: logit
+  - name: ATG_GATA_CIS_dn
+    unit: logit
+  - name: ATG_GATA_CIS_up
+    unit: logit
+  - name: ATG_GLI_CIS_dn
+    unit: logit
+  - name: ATG_GLI_CIS_up
+    unit: logit
+  - name: ATG_GRE_CIS_dn
+    unit: logit
+  - name: ATG_GRE_CIS_up
+    unit: logit
+  - name: ATG_GR_TRANS_dn
+    unit: logit
+  - name: ATG_GR_TRANS_up
+    unit: logit
+  - name: ATG_HIF1a_CIS_dn
+    unit: logit
+  - name: ATG_HIF1a_CIS_up
+    unit: logit
+  - name: ATG_HNF4a_TRANS_dn
+    unit: logit
+  - name: ATG_HNF4a_TRANS_up
+    unit: logit
+  - name: ATG_HNF6_CIS_dn
+    unit: logit
+  - name: ATG_HNF6_CIS_up
+    unit: logit
+  - name: ATG_HSE_CIS_dn
+    unit: logit
+  - name: ATG_HSE_CIS_up
+    unit: logit
+  - name: ATG_IR1_CIS_dn
+    unit: logit
+  - name: ATG_IR1_CIS_up
+    unit: logit
+  - name: ATG_ISRE_CIS_dn
+    unit: logit
+  - name: ATG_ISRE_CIS_up
+    unit: logit
+  - name: ATG_LXRa_TRANS_dn
+    unit: logit
+  - name: ATG_LXRa_TRANS_up
+    unit: logit
+  - name: ATG_LXRb_TRANS_dn
+    unit: logit
+  - name: ATG_LXRb_TRANS_up
+    unit: logit
+  - name: ATG_MRE_CIS_up
+    unit: logit
+  - name: ATG_M_06_TRANS_up
+    unit: logit
+  - name: ATG_M_19_CIS_dn
+    unit: logit
+  - name: ATG_M_19_TRANS_dn
+    unit: logit
+  - name: ATG_M_19_TRANS_up
+    unit: logit
+  - name: ATG_M_32_CIS_dn
+    unit: logit
+  - name: ATG_M_32_CIS_up
+    unit: logit
+  - name: ATG_M_32_TRANS_dn
+    unit: logit
+  - name: ATG_M_32_TRANS_up
+    unit: logit
+  - name: ATG_M_61_TRANS_up
+    unit: logit
+  - name: ATG_Myb_CIS_dn
+    unit: logit
+  - name: ATG_Myb_CIS_up
+    unit: logit
+  - name: ATG_Myc_CIS_dn
+    unit: logit
+  - name: ATG_Myc_CIS_up
+    unit: logit
+  - name: ATG_NFI_CIS_dn
+    unit: logit
+  - name: ATG_NFI_CIS_up
+    unit: logit
+  - name: ATG_NF_kB_CIS_dn
+    unit: logit
+  - name: ATG_NF_kB_CIS_up
+    unit: logit
+  - name: ATG_NRF1_CIS_dn
+    unit: logit
+  - name: ATG_NRF1_CIS_up
+    unit: logit
+  - name: ATG_NRF2_ARE_CIS_dn
+    unit: logit
+  - name: ATG_NRF2_ARE_CIS_up
+    unit: logit
+  - name: ATG_NURR1_TRANS_dn
+    unit: logit
+  - name: ATG_NURR1_TRANS_up
+    unit: logit
+  - name: ATG_Oct_MLP_CIS_dn
+    unit: logit
+  - name: ATG_Oct_MLP_CIS_up
+    unit: logit
+  - name: ATG_PBREM_CIS_dn
+    unit: logit
+  - name: ATG_PBREM_CIS_up
+    unit: logit
+  - name: ATG_PPARa_TRANS_dn
+    unit: logit
+  - name: ATG_PPARa_TRANS_up
+    unit: logit
+  - name: ATG_PPARd_TRANS_up
+    unit: logit
+  - name: ATG_PPARg_TRANS_up
+    unit: logit
+  - name: ATG_PPRE_CIS_dn
+    unit: logit
+  - name: ATG_PPRE_CIS_up
+    unit: logit
+  - name: ATG_PXRE_CIS_dn
+    unit: logit
+  - name: ATG_PXRE_CIS_up
+    unit: logit
+  - name: ATG_PXR_TRANS_dn
+    unit: logit
+  - name: ATG_PXR_TRANS_up
+    unit: logit
+  - name: ATG_Pax6_CIS_up
+    unit: logit
+  - name: ATG_RARa_TRANS_dn
+    unit: logit
+  - name: ATG_RARa_TRANS_up
+    unit: logit
+  - name: ATG_RARb_TRANS_dn
+    unit: logit
+  - name: ATG_RARb_TRANS_up
+    unit: logit
+  - name: ATG_RARg_TRANS_dn
+    unit: logit
+  - name: ATG_RARg_TRANS_up
+    unit: logit
+  - name: ATG_RORE_CIS_dn
+    unit: logit
+  - name: ATG_RORE_CIS_up
+    unit: logit
+  - name: ATG_RORb_TRANS_dn
+    unit: logit
+  - name: ATG_RORg_TRANS_dn
+    unit: logit
+  - name: ATG_RORg_TRANS_up
+    unit: logit
+  - name: ATG_RXRa_TRANS_dn
+    unit: logit
+  - name: ATG_RXRa_TRANS_up
+    unit: logit
+  - name: ATG_RXRb_TRANS_dn
+    unit: logit
+  - name: ATG_RXRb_TRANS_up
+    unit: logit
+  - name: ATG_SREBP_CIS_dn
+    unit: logit
+  - name: ATG_SREBP_CIS_up
+    unit: logit
+  - name: ATG_STAT3_CIS_dn
+    unit: logit
+  - name: ATG_STAT3_CIS_up
+    unit: logit
+  - name: ATG_Sox_CIS_dn
+    unit: logit
+  - name: ATG_Sox_CIS_up
+    unit: logit
+  - name: ATG_Sp1_CIS_dn
+    unit: logit
+  - name: ATG_Sp1_CIS_up
+    unit: logit
+  - name: ATG_TAL_CIS_dn
+    unit: logit
+  - name: ATG_TAL_CIS_up
+    unit: logit
+  - name: ATG_TA_CIS_dn
+    unit: logit
+  - name: ATG_TA_CIS_up
+    unit: logit
+  - name: ATG_TCF_b_cat_CIS_dn
+    unit: logit
+  - name: ATG_TCF_b_cat_CIS_up
+    unit: logit
+  - name: ATG_TGFb_CIS_dn
+    unit: logit
+  - name: ATG_TGFb_CIS_up
+    unit: logit
+  - name: ATG_THRa1_TRANS_dn
+    unit: logit
+  - name: ATG_THRa1_TRANS_up
+    unit: logit
+  - name: ATG_VDRE_CIS_dn
+    unit: logit
+  - name: ATG_VDRE_CIS_up
+    unit: logit
+  - name: ATG_VDR_TRANS_dn
+    unit: logit
+  - name: ATG_VDR_TRANS_up
+    unit: logit
+  - name: ATG_XTT_Cytotoxicity_up
+    unit: logit
+  - name: ATG_Xbp1_CIS_dn
+    unit: logit
+  - name: ATG_Xbp1_CIS_up
+    unit: logit
+  - name: ATG_p53_CIS_dn
+    unit: logit
+  - name: ATG_p53_CIS_up
+    unit: logit
+  - name: BSK_3C_Eselectin_down
+    unit: logit
+  - name: BSK_3C_HLADR_down
+    unit: logit
+  - name: BSK_3C_ICAM1_down
+    unit: logit
+  - name: BSK_3C_IL8_down
+    unit: logit
+  - name: BSK_3C_MCP1_down
+    unit: logit
+  - name: BSK_3C_MIG_down
+    unit: logit
+  - name: BSK_3C_Proliferation_down
+    unit: logit
+  - name: BSK_3C_SRB_down
+    unit: logit
+  - name: BSK_3C_Thrombomodulin_down
+    unit: logit
+  - name: BSK_3C_Thrombomodulin_up
+    unit: logit
+  - name: BSK_3C_TissueFactor_down
+    unit: logit
+  - name: BSK_3C_TissueFactor_up
+    unit: logit
+  - name: BSK_3C_VCAM1_down
+    unit: logit
+  - name: BSK_3C_Vis_down
+    unit: logit
+  - name: BSK_3C_uPAR_down
+    unit: logit
+  - name: BSK_4H_Eotaxin3_down
+    unit: logit
+  - name: BSK_4H_MCP1_down
+    unit: logit
+  - name: BSK_4H_Pselectin_down
+    unit: logit
+  - name: BSK_4H_Pselectin_up
+    unit: logit
+  - name: BSK_4H_SRB_down
+    unit: logit
+  - name: BSK_4H_VCAM1_down
+    unit: logit
+  - name: BSK_4H_VEGFRII_down
+    unit: logit
+  - name: BSK_4H_uPAR_down
+    unit: logit
+  - name: BSK_4H_uPAR_up
+    unit: logit
+  - name: BSK_BE3C_HLADR_down
+    unit: logit
+  - name: BSK_BE3C_IL1a_down
+    unit: logit
+  - name: BSK_BE3C_IP10_down
+    unit: logit
+  - name: BSK_BE3C_MIG_down
+    unit: logit
+  - name: BSK_BE3C_MMP1_down
+    unit: logit
+  - name: BSK_BE3C_MMP1_up
+    unit: logit
+  - name: BSK_BE3C_PAI1_down
+    unit: logit
+  - name: BSK_BE3C_SRB_down
+    unit: logit
+  - name: BSK_BE3C_TGFb1_down
+    unit: logit
+  - name: BSK_BE3C_tPA_down
+    unit: logit
+  - name: BSK_BE3C_uPAR_down
+    unit: logit
+  - name: BSK_BE3C_uPAR_up
+    unit: logit
+  - name: BSK_BE3C_uPA_down
+    unit: logit
+  - name: BSK_CASM3C_HLADR_down
+    unit: logit
+  - name: BSK_CASM3C_IL6_down
+    unit: logit
+  - name: BSK_CASM3C_IL6_up
+    unit: logit
+  - name: BSK_CASM3C_IL8_down
+    unit: logit
+  - name: BSK_CASM3C_LDLR_down
+    unit: logit
+  - name: BSK_CASM3C_LDLR_up
+    unit: logit
+  - name: BSK_CASM3C_MCP1_down
+    unit: logit
+  - name: BSK_CASM3C_MCP1_up
+    unit: logit
+  - name: BSK_CASM3C_MCSF_down
+    unit: logit
+  - name: BSK_CASM3C_MCSF_up
+    unit: logit
+  - name: BSK_CASM3C_MIG_down
+    unit: logit
+  - name: BSK_CASM3C_Proliferation_down
+    unit: logit
+  - name: BSK_CASM3C_Proliferation_up
+    unit: logit
+  - name: BSK_CASM3C_SAA_down
+    unit: logit
+  - name: BSK_CASM3C_SAA_up
+    unit: logit
+  - name: BSK_CASM3C_SRB_down
+    unit: logit
+  - name: BSK_CASM3C_Thrombomodulin_down
+    unit: logit
+  - name: BSK_CASM3C_Thrombomodulin_up
+    unit: logit
+  - name: BSK_CASM3C_TissueFactor_down
+    unit: logit
+  - name: BSK_CASM3C_VCAM1_down
+    unit: logit
+  - name: BSK_CASM3C_VCAM1_up
+    unit: logit
+  - name: BSK_CASM3C_uPAR_down
+    unit: logit
+  - name: BSK_CASM3C_uPAR_up
+    unit: logit
+  - name: BSK_KF3CT_ICAM1_down
+    unit: logit
+  - name: BSK_KF3CT_IL1a_down
+    unit: logit
+  - name: BSK_KF3CT_IP10_down
+    unit: logit
+  - name: BSK_KF3CT_IP10_up
+    unit: logit
+  - name: BSK_KF3CT_MCP1_down
+    unit: logit
+  - name: BSK_KF3CT_MCP1_up
+    unit: logit
+  - name: BSK_KF3CT_MMP9_down
+    unit: logit
+  - name: BSK_KF3CT_SRB_down
+    unit: logit
+  - name: BSK_KF3CT_TGFb1_down
+    unit: logit
+  - name: BSK_KF3CT_TIMP2_down
+    unit: logit
+  - name: BSK_KF3CT_uPA_down
+    unit: logit
+  - name: BSK_LPS_CD40_down
+    unit: logit
+  - name: BSK_LPS_Eselectin_down
+    unit: logit
+  - name: BSK_LPS_Eselectin_up
+    unit: logit
+  - name: BSK_LPS_IL1a_down
+    unit: logit
+  - name: BSK_LPS_IL1a_up
+    unit: logit
+  - name: BSK_LPS_IL8_down
+    unit: logit
+  - name: BSK_LPS_IL8_up
+    unit: logit
+  - name: BSK_LPS_MCP1_down
+    unit: logit
+  - name: BSK_LPS_MCSF_down
+    unit: logit
+  - name: BSK_LPS_PGE2_down
+    unit: logit
+  - name: BSK_LPS_PGE2_up
+    unit: logit
+  - name: BSK_LPS_SRB_down
+    unit: logit
+  - name: BSK_LPS_TNFa_down
+    unit: logit
+  - name: BSK_LPS_TNFa_up
+    unit: logit
+  - name: BSK_LPS_TissueFactor_down
+    unit: logit
+  - name: BSK_LPS_TissueFactor_up
+    unit: logit
+  - name: BSK_LPS_VCAM1_down
+    unit: logit
+  - name: BSK_SAg_CD38_down
+    unit: logit
+  - name: BSK_SAg_CD40_down
+    unit: logit
+  - name: BSK_SAg_CD69_down
+    unit: logit
+  - name: BSK_SAg_Eselectin_down
+    unit: logit
+  - name: BSK_SAg_Eselectin_up
+    unit: logit
+  - name: BSK_SAg_IL8_down
+    unit: logit
+  - name: BSK_SAg_IL8_up
+    unit: logit
+  - name: BSK_SAg_MCP1_down
+    unit: logit
+  - name: BSK_SAg_MIG_down
+    unit: logit
+  - name: BSK_SAg_PBMCCytotoxicity_down
+    unit: logit
+  - name: BSK_SAg_PBMCCytotoxicity_up
+    unit: logit
+  - name: BSK_SAg_Proliferation_down
+    unit: logit
+  - name: BSK_SAg_SRB_down
+    unit: logit
+  - name: BSK_hDFCGF_CollagenIII_down
+    unit: logit
+  - name: BSK_hDFCGF_EGFR_down
+    unit: logit
+  - name: BSK_hDFCGF_EGFR_up
+    unit: logit
+  - name: BSK_hDFCGF_IL8_down
+    unit: logit
+  - name: BSK_hDFCGF_IP10_down
+    unit: logit
+  - name: BSK_hDFCGF_MCSF_down
+    unit: logit
+  - name: BSK_hDFCGF_MIG_down
+    unit: logit
+  - name: BSK_hDFCGF_MMP1_down
+    unit: logit
+  - name: BSK_hDFCGF_MMP1_up
+    unit: logit
+  - name: BSK_hDFCGF_PAI1_down
+    unit: logit
+  - name: BSK_hDFCGF_Proliferation_down
+    unit: logit
+  - name: BSK_hDFCGF_SRB_down
+    unit: logit
+  - name: BSK_hDFCGF_TIMP1_down
+    unit: logit
+  - name: BSK_hDFCGF_VCAM1_down
+    unit: logit
+  - name: CEETOX_H295R_11DCORT_dn
+    unit: logit
+  - name: CEETOX_H295R_ANDR_dn
+    unit: logit
+  - name: CEETOX_H295R_CORTISOL_dn
+    unit: logit
+  - name: CEETOX_H295R_DOC_dn
+    unit: logit
+  - name: CEETOX_H295R_DOC_up
+    unit: logit
+  - name: CEETOX_H295R_ESTRADIOL_dn
+    unit: logit
+  - name: CEETOX_H295R_ESTRADIOL_up
+    unit: logit
+  - name: CEETOX_H295R_ESTRONE_dn
+    unit: logit
+  - name: CEETOX_H295R_ESTRONE_up
+    unit: logit
+  - name: CEETOX_H295R_OHPREG_up
+    unit: logit
+  - name: CEETOX_H295R_OHPROG_dn
+    unit: logit
+  - name: CEETOX_H295R_OHPROG_up
+    unit: logit
+  - name: CEETOX_H295R_PROG_up
+    unit: logit
+  - name: CEETOX_H295R_TESTO_dn
+    unit: logit
+  - name: CLD_ABCB1_48hr
+    unit: logit
+  - name: CLD_ABCG2_48hr
+    unit: logit
+  - name: CLD_CYP1A1_24hr
+    unit: logit
+  - name: CLD_CYP1A1_48hr
+    unit: logit
+  - name: CLD_CYP1A1_6hr
+    unit: logit
+  - name: CLD_CYP1A2_24hr
+    unit: logit
+  - name: CLD_CYP1A2_48hr
+    unit: logit
+  - name: CLD_CYP1A2_6hr
+    unit: logit
+  - name: CLD_CYP2B6_24hr
+    unit: logit
+  - name: CLD_CYP2B6_48hr
+    unit: logit
+  - name: CLD_CYP2B6_6hr
+    unit: logit
+  - name: CLD_CYP3A4_24hr
+    unit: logit
+  - name: CLD_CYP3A4_48hr
+    unit: logit
+  - name: CLD_CYP3A4_6hr
+    unit: logit
+  - name: CLD_GSTA2_48hr
+    unit: logit
+  - name: CLD_SULT2A_24hr
+    unit: logit
+  - name: CLD_SULT2A_48hr
+    unit: logit
+  - name: CLD_UGT1A1_24hr
+    unit: logit
+  - name: CLD_UGT1A1_48hr
+    unit: logit
+  - name: NCCT_HEK293T_CellTiterGLO
+    unit: logit
+  - name: NCCT_QuantiLum_inhib_2_dn
+    unit: logit
+  - name: NCCT_QuantiLum_inhib_dn
+    unit: logit
+  - name: NCCT_TPO_AUR_dn
+    unit: logit
+  - name: NCCT_TPO_GUA_dn
+    unit: logit
+  - name: NHEERL_ZF_144hpf_TERATOSCORE_up
+    unit: logit
+  - name: NVS_ADME_hCYP19A1
+    unit: logit
+  - name: NVS_ADME_hCYP1A1
+    unit: logit
+  - name: NVS_ADME_hCYP1A2
+    unit: logit
+  - name: NVS_ADME_hCYP2A6
+    unit: logit
+  - name: NVS_ADME_hCYP2B6
+    unit: logit
+  - name: NVS_ADME_hCYP2C19
+    unit: logit
+  - name: NVS_ADME_hCYP2C9
+    unit: logit
+  - name: NVS_ADME_hCYP2D6
+    unit: logit
+  - name: NVS_ADME_hCYP3A4
+    unit: logit
+  - name: NVS_ADME_hCYP4F12
+    unit: logit
+  - name: NVS_ADME_rCYP2C12
+    unit: logit
+  - name: NVS_ENZ_hAChE
+    unit: logit
+  - name: NVS_ENZ_hAMPKa1
+    unit: logit
+  - name: NVS_ENZ_hAurA
+    unit: logit
+  - name: NVS_ENZ_hBACE
+    unit: logit
+  - name: NVS_ENZ_hCASP5
+    unit: logit
+  - name: NVS_ENZ_hCK1D
+    unit: logit
+  - name: NVS_ENZ_hDUSP3
+    unit: logit
+  - name: NVS_ENZ_hES
+    unit: logit
+  - name: NVS_ENZ_hElastase
+    unit: logit
+  - name: NVS_ENZ_hFGFR1
+    unit: logit
+  - name: NVS_ENZ_hGSK3b
+    unit: logit
+  - name: NVS_ENZ_hMMP1
+    unit: logit
+  - name: NVS_ENZ_hMMP13
+    unit: logit
+  - name: NVS_ENZ_hMMP2
+    unit: logit
+  - name: NVS_ENZ_hMMP3
+    unit: logit
+  - name: NVS_ENZ_hMMP7
+    unit: logit
+  - name: NVS_ENZ_hMMP9
+    unit: logit
+  - name: NVS_ENZ_hPDE10
+    unit: logit
+  - name: NVS_ENZ_hPDE4A1
+    unit: logit
+  - name: NVS_ENZ_hPDE5
+    unit: logit
+  - name: NVS_ENZ_hPI3Ka
+    unit: logit
+  - name: NVS_ENZ_hPTEN
+    unit: logit
+  - name: NVS_ENZ_hPTPN11
+    unit: logit
+  - name: NVS_ENZ_hPTPN12
+    unit: logit
+  - name: NVS_ENZ_hPTPN13
+    unit: logit
+  - name: NVS_ENZ_hPTPN9
+    unit: logit
+  - name: NVS_ENZ_hPTPRC
+    unit: logit
+  - name: NVS_ENZ_hSIRT1
+    unit: logit
+  - name: NVS_ENZ_hSIRT2
+    unit: logit
+  - name: NVS_ENZ_hTrkA
+    unit: logit
+  - name: NVS_ENZ_hVEGFR2
+    unit: logit
+  - name: NVS_ENZ_oCOX1
+    unit: logit
+  - name: NVS_ENZ_oCOX2
+    unit: logit
+  - name: NVS_ENZ_rAChE
+    unit: logit
+  - name: NVS_ENZ_rCNOS
+    unit: logit
+  - name: NVS_ENZ_rMAOAC
+    unit: logit
+  - name: NVS_ENZ_rMAOAP
+    unit: logit
+  - name: NVS_ENZ_rMAOBC
+    unit: logit
+  - name: NVS_ENZ_rMAOBP
+    unit: logit
+  - name: NVS_ENZ_rabI2C
+    unit: logit
+  - name: NVS_GPCR_bAdoR_NonSelective
+    unit: logit
+  - name: NVS_GPCR_bDR_NonSelective
+    unit: logit
+  - name: NVS_GPCR_g5HT4
+    unit: logit
+  - name: NVS_GPCR_gH2
+    unit: logit
+  - name: NVS_GPCR_gLTB4
+    unit: logit
+  - name: NVS_GPCR_gLTD4
+    unit: logit
+  - name: NVS_GPCR_gMPeripheral_NonSelective
+    unit: logit
+  - name: NVS_GPCR_gOpiateK
+    unit: logit
+  - name: NVS_GPCR_h5HT2A
+    unit: logit
+  - name: NVS_GPCR_h5HT5A
+    unit: logit
+  - name: NVS_GPCR_h5HT6
+    unit: logit
+  - name: NVS_GPCR_h5HT7
+    unit: logit
+  - name: NVS_GPCR_hAT1
+    unit: logit
+  - name: NVS_GPCR_hAdoRA1
+    unit: logit
+  - name: NVS_GPCR_hAdoRA2a
+    unit: logit
+  - name: NVS_GPCR_hAdra2A
+    unit: logit
+  - name: NVS_GPCR_hAdra2C
+    unit: logit
+  - name: NVS_GPCR_hAdrb1
+    unit: logit
+  - name: NVS_GPCR_hAdrb2
+    unit: logit
+  - name: NVS_GPCR_hAdrb3
+    unit: logit
+  - name: NVS_GPCR_hDRD1
+    unit: logit
+  - name: NVS_GPCR_hDRD2s
+    unit: logit
+  - name: NVS_GPCR_hDRD4.4
+    unit: logit
+  - name: NVS_GPCR_hH1
+    unit: logit
+  - name: NVS_GPCR_hLTB4_BLT1
+    unit: logit
+  - name: NVS_GPCR_hM1
+    unit: logit
+  - name: NVS_GPCR_hM2
+    unit: logit
+  - name: NVS_GPCR_hM3
+    unit: logit
+  - name: NVS_GPCR_hM4
+    unit: logit
+  - name: NVS_GPCR_hNK2
+    unit: logit
+  - name: NVS_GPCR_hOpiate_D1
+    unit: logit
+  - name: NVS_GPCR_hOpiate_mu
+    unit: logit
+  - name: NVS_GPCR_hTXA2
+    unit: logit
+  - name: NVS_GPCR_p5HT2C
+    unit: logit
+  - name: NVS_GPCR_r5HT1_NonSelective
+    unit: logit
+  - name: NVS_GPCR_r5HT_NonSelective
+    unit: logit
+  - name: NVS_GPCR_rAdra1B
+    unit: logit
+  - name: NVS_GPCR_rAdra1_NonSelective
+    unit: logit
+  - name: NVS_GPCR_rAdra2_NonSelective
+    unit: logit
+  - name: NVS_GPCR_rAdrb_NonSelective
+    unit: logit
+  - name: NVS_GPCR_rNK1
+    unit: logit
+  - name: NVS_GPCR_rNK3
+    unit: logit
+  - name: NVS_GPCR_rOpiate_NonSelective
+    unit: logit
+  - name: NVS_GPCR_rOpiate_NonSelectiveNa
+    unit: logit
+  - name: NVS_GPCR_rSST
+    unit: logit
+  - name: NVS_GPCR_rTRH
+    unit: logit
+  - name: NVS_GPCR_rV1
+    unit: logit
+  - name: NVS_GPCR_rabPAF
+    unit: logit
+  - name: NVS_GPCR_rmAdra2B
+    unit: logit
+  - name: NVS_IC_hKhERGCh
+    unit: logit
+  - name: NVS_IC_rCaBTZCHL
+    unit: logit
+  - name: NVS_IC_rCaDHPRCh_L
+    unit: logit
+  - name: NVS_IC_rNaCh_site2
+    unit: logit
+  - name: NVS_LGIC_bGABARa1
+    unit: logit
+  - name: NVS_LGIC_h5HT3
+    unit: logit
+  - name: NVS_LGIC_hNNR_NBungSens
+    unit: logit
+  - name: NVS_LGIC_rGABAR_NonSelective
+    unit: logit
+  - name: NVS_LGIC_rNNR_BungSens
+    unit: logit
+  - name: NVS_MP_hPBR
+    unit: logit
+  - name: NVS_MP_rPBR
+    unit: logit
+  - name: NVS_NR_bER
+    unit: logit
+  - name: NVS_NR_bPR
+    unit: logit
+  - name: NVS_NR_cAR
+    unit: logit
+  - name: NVS_NR_hAR
+    unit: logit
+  - name: NVS_NR_hCAR_Antagonist
+    unit: logit
+  - name: NVS_NR_hER
+    unit: logit
+  - name: NVS_NR_hFXR_Agonist
+    unit: logit
+  - name: NVS_NR_hFXR_Antagonist
+    unit: logit
+  - name: NVS_NR_hGR
+    unit: logit
+  - name: NVS_NR_hPPARa
+    unit: logit
+  - name: NVS_NR_hPPARg
+    unit: logit
+  - name: NVS_NR_hPR
+    unit: logit
+  - name: NVS_NR_hPXR
+    unit: logit
+  - name: NVS_NR_hRAR_Antagonist
+    unit: logit
+  - name: NVS_NR_hRARa_Agonist
+    unit: logit
+  - name: NVS_NR_hTRa_Antagonist
+    unit: logit
+  - name: NVS_NR_mERa
+    unit: logit
+  - name: NVS_NR_rAR
+    unit: logit
+  - name: NVS_NR_rMR
+    unit: logit
+  - name: NVS_OR_gSIGMA_NonSelective
+    unit: logit
+  - name: NVS_TR_gDAT
+    unit: logit
+  - name: NVS_TR_hAdoT
+    unit: logit
+  - name: NVS_TR_hDAT
+    unit: logit
+  - name: NVS_TR_hNET
+    unit: logit
+  - name: NVS_TR_hSERT
+    unit: logit
+  - name: NVS_TR_rNET
+    unit: logit
+  - name: NVS_TR_rSERT
+    unit: logit
+  - name: NVS_TR_rVMAT2
+    unit: logit
+  - name: OT_AR_ARELUC_AG_1440
+    unit: logit
+  - name: OT_AR_ARSRC1_0480
+    unit: logit
+  - name: OT_AR_ARSRC1_0960
+    unit: logit
+  - name: OT_ER_ERaERa_0480
+    unit: logit
+  - name: OT_ER_ERaERa_1440
+    unit: logit
+  - name: OT_ER_ERaERb_0480
+    unit: logit
+  - name: OT_ER_ERaERb_1440
+    unit: logit
+  - name: OT_ER_ERbERb_0480
+    unit: logit
+  - name: OT_ER_ERbERb_1440
+    unit: logit
+  - name: OT_ERa_EREGFP_0120
+    unit: logit
+  - name: OT_ERa_EREGFP_0480
+    unit: logit
+  - name: OT_FXR_FXRSRC1_0480
+    unit: logit
+  - name: OT_FXR_FXRSRC1_1440
+    unit: logit
+  - name: OT_NURR1_NURR1RXRa_0480
+    unit: logit
+  - name: OT_NURR1_NURR1RXRa_1440
+    unit: logit
+  - name: TOX21_ARE_BLA_Agonist_ch1
+    unit: logit
+  - name: TOX21_ARE_BLA_Agonist_ch2
+    unit: logit
+  - name: TOX21_ARE_BLA_agonist_ratio
+    unit: logit
+  - name: TOX21_ARE_BLA_agonist_viability
+    unit: logit
+  - name: TOX21_AR_BLA_Agonist_ch1
+    unit: logit
+  - name: TOX21_AR_BLA_Agonist_ch2
+    unit: logit
+  - name: TOX21_AR_BLA_Agonist_ratio
+    unit: logit
+  - name: TOX21_AR_BLA_Antagonist_ch1
+    unit: logit
+  - name: TOX21_AR_BLA_Antagonist_ch2
+    unit: logit
+  - name: TOX21_AR_BLA_Antagonist_ratio
+    unit: logit
+  - name: TOX21_AR_BLA_Antagonist_viability
+    unit: logit
+  - name: TOX21_AR_LUC_MDAKB2_Agonist
+    unit: logit
+  - name: TOX21_AR_LUC_MDAKB2_Antagonist
+    unit: logit
+  - name: TOX21_AR_LUC_MDAKB2_Antagonist2
+    unit: logit
+  - name: TOX21_AhR_LUC_Agonist
+    unit: logit
+  - name: TOX21_Aromatase_Inhibition
+    unit: logit
+  - name: TOX21_AutoFluor_HEK293_Cell_blue
+    unit: logit
+  - name: TOX21_AutoFluor_HEK293_Media_blue
+    unit: logit
+  - name: TOX21_AutoFluor_HEPG2_Cell_blue
+    unit: logit
+  - name: TOX21_AutoFluor_HEPG2_Cell_green
+    unit: logit
+  - name: TOX21_AutoFluor_HEPG2_Media_blue
+    unit: logit
+  - name: TOX21_AutoFluor_HEPG2_Media_green
+    unit: logit
+  - name: TOX21_ELG1_LUC_Agonist
+    unit: logit
+  - name: TOX21_ERa_BLA_Agonist_ch1
+    unit: logit
+  - name: TOX21_ERa_BLA_Agonist_ch2
+    unit: logit
+  - name: TOX21_ERa_BLA_Agonist_ratio
+    unit: logit
+  - name: TOX21_ERa_BLA_Antagonist_ch1
+    unit: logit
+  - name: TOX21_ERa_BLA_Antagonist_ch2
+    unit: logit
+  - name: TOX21_ERa_BLA_Antagonist_ratio
+    unit: logit
+  - name: TOX21_ERa_BLA_Antagonist_viability
+    unit: logit
+  - name: TOX21_ERa_LUC_BG1_Agonist
+    unit: logit
+  - name: TOX21_ERa_LUC_BG1_Antagonist
+    unit: logit
+  - name: TOX21_ESRE_BLA_ch1
+    unit: logit
+  - name: TOX21_ESRE_BLA_ch2
+    unit: logit
+  - name: TOX21_ESRE_BLA_ratio
+    unit: logit
+  - name: TOX21_ESRE_BLA_viability
+    unit: logit
+  - name: TOX21_FXR_BLA_Antagonist_ch1
+    unit: logit
+  - name: TOX21_FXR_BLA_Antagonist_ch2
+    unit: logit
+  - name: TOX21_FXR_BLA_agonist_ch2
+    unit: logit
+  - name: TOX21_FXR_BLA_agonist_ratio
+    unit: logit
+  - name: TOX21_FXR_BLA_antagonist_ratio
+    unit: logit
+  - name: TOX21_FXR_BLA_antagonist_viability
+    unit: logit
+  - name: TOX21_GR_BLA_Agonist_ch1
+    unit: logit
+  - name: TOX21_GR_BLA_Agonist_ch2
+    unit: logit
+  - name: TOX21_GR_BLA_Agonist_ratio
+    unit: logit
+  - name: TOX21_GR_BLA_Antagonist_ch2
+    unit: logit
+  - name: TOX21_GR_BLA_Antagonist_ratio
+    unit: logit
+  - name: TOX21_GR_BLA_Antagonist_viability
+    unit: logit
+  - name: TOX21_HSE_BLA_agonist_ch1
+    unit: logit
+  - name: TOX21_HSE_BLA_agonist_ch2
+    unit: logit
+  - name: TOX21_HSE_BLA_agonist_ratio
+    unit: logit
+  - name: TOX21_HSE_BLA_agonist_viability
+    unit: logit
+  - name: TOX21_MMP_ratio_down
+    unit: logit
+  - name: TOX21_MMP_ratio_up
+    unit: logit
+  - name: TOX21_MMP_viability
+    unit: logit
+  - name: TOX21_NFkB_BLA_agonist_ch1
+    unit: logit
+  - name: TOX21_NFkB_BLA_agonist_ch2
+    unit: logit
+  - name: TOX21_NFkB_BLA_agonist_ratio
+    unit: logit
+  - name: TOX21_NFkB_BLA_agonist_viability
+    unit: logit
+  - name: TOX21_PPARd_BLA_Agonist_viability
+    unit: logit
+  - name: TOX21_PPARd_BLA_Antagonist_ch1
+    unit: logit
+  - name: TOX21_PPARd_BLA_agonist_ch1
+    unit: logit
+  - name: TOX21_PPARd_BLA_agonist_ch2
+    unit: logit
+  - name: TOX21_PPARd_BLA_agonist_ratio
+    unit: logit
+  - name: TOX21_PPARd_BLA_antagonist_ratio
+    unit: logit
+  - name: TOX21_PPARd_BLA_antagonist_viability
+    unit: logit
+  - name: TOX21_PPARg_BLA_Agonist_ch1
+    unit: logit
+  - name: TOX21_PPARg_BLA_Agonist_ch2
+    unit: logit
+  - name: TOX21_PPARg_BLA_Agonist_ratio
+    unit: logit
+  - name: TOX21_PPARg_BLA_Antagonist_ch1
+    unit: logit
+  - name: TOX21_PPARg_BLA_antagonist_ratio
+    unit: logit
+  - name: TOX21_PPARg_BLA_antagonist_viability
+    unit: logit
+  - name: TOX21_TR_LUC_GH3_Agonist
+    unit: logit
+  - name: TOX21_TR_LUC_GH3_Antagonist
+    unit: logit
+  - name: TOX21_VDR_BLA_Agonist_viability
+    unit: logit
+  - name: TOX21_VDR_BLA_Antagonist_ch1
+    unit: logit
+  - name: TOX21_VDR_BLA_agonist_ch2
+    unit: logit
+  - name: TOX21_VDR_BLA_agonist_ratio
+    unit: logit
+  - name: TOX21_VDR_BLA_antagonist_ratio
+    unit: logit
+  - name: TOX21_VDR_BLA_antagonist_viability
+    unit: logit
+  - name: TOX21_p53_BLA_p1_ch1
+    unit: logit
+  - name: TOX21_p53_BLA_p1_ch2
+    unit: logit
+  - name: TOX21_p53_BLA_p1_ratio
+    unit: logit
+  - name: TOX21_p53_BLA_p1_viability
+    unit: logit
+  - name: TOX21_p53_BLA_p2_ch1
+    unit: logit
+  - name: TOX21_p53_BLA_p2_ch2
+    unit: logit
+  - name: TOX21_p53_BLA_p2_ratio
+    unit: logit
+  - name: TOX21_p53_BLA_p2_viability
+    unit: logit
+  - name: TOX21_p53_BLA_p3_ch1
+    unit: logit
+  - name: TOX21_p53_BLA_p3_ch2
+    unit: logit
+  - name: TOX21_p53_BLA_p3_ratio
+    unit: logit
+  - name: TOX21_p53_BLA_p3_viability
+    unit: logit
+  - name: TOX21_p53_BLA_p4_ch1
+    unit: logit
+  - name: TOX21_p53_BLA_p4_ch2
+    unit: logit
+  - name: TOX21_p53_BLA_p4_ratio
+    unit: logit
+  - name: TOX21_p53_BLA_p4_viability
+    unit: logit
+  - name: TOX21_p53_BLA_p5_ch1
+    unit: logit
+  - name: TOX21_p53_BLA_p5_ch2
+    unit: logit
+  - name: TOX21_p53_BLA_p5_ratio
+    unit: logit
+  - name: TOX21_p53_BLA_p5_viability
+    unit: logit
+  - name: Tanguay_ZF_120hpf_AXIS_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_ActivityScore
+    unit: logit
+  - name: Tanguay_ZF_120hpf_BRAI_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_CFIN_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_CIRC_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_EYE_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_JAW_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_MORT_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_OTIC_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_PE_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_PFIN_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_PIG_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_SNOU_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_SOMI_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_SWIM_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_TRUN_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_TR_up
+    unit: logit
+  - name: Tanguay_ZF_120hpf_YSE_up
+    unit: logit
diff --git a/opt/package/migrate_channel_config.py b/opt/package/migrate_channel_config.py
new file mode 100755
index 00000000..7a6c75cb
--- /dev/null
+++ b/opt/package/migrate_channel_config.py
@@ -0,0 +1,192 @@
+#!/usr/bin/env python
+
+from __future__ import annotations
+
+import os
+import json
+from io import BytesIO
+from pathlib import Path
+from typing import Iterable, Optional, Tuple
+
+import typer
+from huggingface_hub import CommitOperationAdd, HfApi, hf_hub_download
+
+from .channel_schema import resolve_dataset_channels
+
+cli = typer.Typer()
+
+
+def resolve_config_source(source: str) -> Tuple[Path, str]:
+    source_path = Path(source)
+
+    if source_path.is_dir():
+        config_path = source_path / "config.json"
+        if not config_path.is_file():
+            raise typer.BadParameter(f"No config.json found in directory: {source}")
+        return config_path, str(source_path)
+
+    if source_path.is_file():
+        return source_path, str(source_path)
+
+    try:
+        config_path = hf_hub_download(repo_id=source, filename="config.json")
+    except Exception as exc:
+        raise typer.BadParameter(
+            f"Could not resolve '{source}' as a local config or Hugging Face model ID"
+        ) from exc
+
+    return Path(config_path), source
+
+
+def iter_org_model_repo_ids(api: HfApi, org: str) -> Iterable[str]:
+    for model in api.list_models(author=org):
+        repo_id = getattr(model, "modelId", None) or getattr(model, "id", None)
+        if repo_id:
+            yield repo_id
+
+
+def render_migrated_config(
+    cfg: dict,
+    *,
+    dataset: Optional[str],
+    source_name: str,
+) -> str:
+    source_names = [source_name]
+    source_path = Path(source_name)
+    if source_path.name:
+        source_names.append(source_path.name)
+    cfg["channels"] = resolve_dataset_channels(
+        cfg,
+        dataset=dataset,
+        source_names=source_names,
+    )
+    return json.dumps(cfg, indent=2, ensure_ascii=False) + "\n"
+
+
+def normalize_repo_id(org: str, repo: str) -> str:
+    return repo if "/" in repo else f"{org}/{repo}"
+
+
+def dry_run_output_path(repo_id: str) -> Path:
+    return Path.cwd() / repo_id.split("/")[-1] / "config.json"
+
+
+@cli.command()
+def main(
+    source: str = typer.Argument(
+        ...,
+        help="Path to config.json, model directory, or Hugging Face model ID",
+    ),
+    output: Optional[Path] = typer.Option(
+        None,
+        "--output",
+        "-o",
+        help="Write the migrated config to a file instead of stdout",
+    ),
+    dataset: Optional[str] = typer.Option(
+        None,
+        help="Override dataset name instead of autodetecting it from the config",
+    ),
+):
+    """
+    Migrate a model config to the curated dataset channel schema.
+    """
+    config_path, source_name = resolve_config_source(source)
+    cfg = json.loads(config_path.read_text())
+    rendered = render_migrated_config(cfg, dataset=dataset, source_name=source_name)
+    if output is not None:
+        output.write_text(rendered)
+    else:
+        typer.echo(rendered)
+
+
+@cli.command("org")
+def migrate_org(
+    org: str = typer.Option(..., help="HF organization name (namespace)"),
+    repo: Optional[str] = typer.Option(
+        None,
+        help="Optional repo name or full repo id to migrate just one model",
+    ),
+    dataset: Optional[str] = typer.Option(
+        None,
+        help="Override dataset name instead of autodetecting it from the config",
+    ),
+    revision: str = typer.Option(
+        "main", help='Base revision/branch to update (default: "main")'
+    ),
+    commit_message: str = typer.Option(
+        "Migrate channel config", help="Commit title / PR title"
+    ),
+    pr: bool = typer.Option(True, help="Create a PR instead of commiting directly"),
+    dry_run: bool = typer.Option(False, help="Print actions, do not create PRs"),
+    skip_fail: bool = typer.Option(
+        True,
+        help="Skip repos when dataset resolution fails instead of stopping",
+    ),
+):
+    """
+    Migrate config.json across model repos in an org and open PRs for the changes.
+    """
+    token = os.environ.get("HF_TOKEN")
+    api = HfApi(token=token)
+
+    if repo is not None:
+        repo_ids = [normalize_repo_id(org, repo)]
+    else:
+        repo_ids = list(iter_org_model_repo_ids(api, org))
+
+    typer.echo(f"Found {len(repo_ids)} model repos to inspect")
+
+    for repo_id in repo_ids:
+        try:
+            config_path = hf_hub_download(
+                repo_id=repo_id,
+                filename="config.json",
+                repo_type="model",
+                revision=revision,
+                token=token,
+            )
+        except Exception:
+            typer.echo(f"Failed to download {repo_id}")
+
+        existing = Path(config_path).read_text()
+        cfg = json.loads(existing)
+        try:
+            rendered = render_migrated_config(cfg, dataset=dataset, source_name=repo_id)
+        except Exception as exc:
+            if skip_fail:
+                typer.echo(f"Skipping {repo_id}: {exc}")
+                continue
+            raise
+
+        if rendered == existing:
+            typer.echo(f"Skipping {repo_id}: config.json already up to date")
+            continue
+
+        if dry_run:
+            output_path = dry_run_output_path(repo_id)
+            output_path.parent.mkdir(parents=True, exist_ok=True)
+            output_path.write_text(rendered)
+            typer.echo(f"[DRY RUN] Wrote {output_path}")
+            continue
+
+        op = CommitOperationAdd(
+            path_in_repo="config.json",
+            path_or_fileobj=BytesIO(rendered.encode("utf-8")),
+        )
+        api.create_commit(
+            repo_id=repo_id,
+            repo_type="model",
+            revision=revision,
+            operations=[op],
+            commit_message=commit_message,
+            create_pr=pr,
+        )
+        if pr:
+            typer.echo(f"Opened PR for {repo_id}")
+        else:
+            typer.echo(f"Migrated config for {repo_id}")
+
+
+if __name__ == "__main__":
+    cli()
diff --git a/opt/package/pyproject.toml b/opt/package/pyproject.toml
new file mode 100644
index 00000000..1300156b
--- /dev/null
+++ b/opt/package/pyproject.toml
@@ -0,0 +1,9 @@
+[project]
+name = "package"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.10, <3.14"
+dependencies = [
+    "pint>=0.24.4",
+]
diff --git a/opt/package/test_channel_schema.py b/opt/package/test_channel_schema.py
new file mode 100644
index 00000000..f9efafa2
--- /dev/null
+++ b/opt/package/test_channel_schema.py
@@ -0,0 +1,28 @@
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+import pytest
+
+sys.path.insert(0, str(Path(__file__).resolve().parent))
+
+from channel_schema import load_dataset_spec, validate_dataset_spec_units
+
+
+def test_load_dataset_spec_validates_existing_units() -> None:
+    spec = load_dataset_spec("conductivity")
+    assert spec["channels"][1]["unit"] == "kelvin"
+
+
+def test_validate_dataset_spec_units_rejects_invalid_unit() -> None:
+    spec = {"channels": [{"name": "target", "unit": "definitely_not_a_unit"}]}
+
+    with pytest.raises(ValueError, match="invalid unit"):
+        validate_dataset_spec_units(spec, dataset_name="test")
+
+
+def test_validate_dataset_spec_units_accepts_logit() -> None:
+    spec = {"channels": [{"name": "target", "unit": "logit"}]}
+
+    validate_dataset_spec_units(spec, dataset_name="test")
diff --git a/opt/package/test_hf_org.py b/opt/package/test_hf_org.py
index cd648059..575ad4bb 100644
--- a/opt/package/test_hf_org.py
+++ b/opt/package/test_hf_org.py
@@ -7,7 +7,7 @@
 from huggingface_hub import HfApi
 from transformers import AutoModel
 
-from test_inference import (
+from .test_inference import (
     single_molecule_smiles,
     conductivity_test_data,
     excess_test_data,
diff --git a/opt/package/test_inference.py b/opt/package/test_inference.py
index c43af8b2..f4e73072 100644
--- a/opt/package/test_inference.py
+++ b/opt/package/test_inference.py
@@ -1,11 +1,12 @@
 import logging
 from pathlib import Path
-from typing import List, Dict, Any, Union
+from typing import Any, Dict, List, Union
+
+import numpy as np
 import pytest
 import torch
-import numpy as np
-from transformers import AutoModel
 from smirk import SmirkTokenizerFast
+from transformers import AutoModel
 
 logger = logging.getLogger(__name__)
 
diff --git a/opt/package/update_reqs.py b/opt/package/update_reqs.py
new file mode 100755
index 00000000..9215e89c
--- /dev/null
+++ b/opt/package/update_reqs.py
@@ -0,0 +1,116 @@
+#!/usr/bin/env python3
+
+from __future__ import annotations
+
+import argparse
+from io import BytesIO
+from typing import Iterable
+
+from huggingface_hub import CommitOperationAdd, HfApi, file_exists, hf_hub_download
+from huggingface_hub.utils import EntryNotFoundError
+
+
+def iter_org_model_repo_ids(api: HfApi, org: str) -> Iterable[str]:
+    # list_models(author=...) returns ModelInfo objects with .modelId / .id depending on version.
+    for m in api.list_models(author=org):
+        repo_id = getattr(m, "modelId", None) or getattr(m, "id", None)
+        if repo_id:
+            yield repo_id
+
+
+def load_existing_requirements(repo_id: str, token: str | None) -> str | None:
+    if not file_exists(repo_id, "requirements.txt", repo_type="model", token=token):
+        return None
+    try:
+        path = hf_hub_download(
+            repo_id, "requirements.txt", repo_type="model", token=token
+        )
+        with open(path, "r", encoding="utf-8") as f:
+            return f.read()
+    except EntryNotFoundError:
+        return None
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--org", required=True, help="HF organization name (namespace)")
+    ap.add_argument(
+        "--requirements", required=True, help="Path to desired requirements.txt"
+    )
+    mode = ap.add_mutually_exclusive_group(required=True)
+    mode.add_argument(
+        "--create-pr", action="store_true", help="Open a PR per repo (safer)"
+    )
+    mode.add_argument(
+        "--push-direct", action="store_true", help="Commit directly to main (dangerous)"
+    )
+    ap.add_argument(
+        "--revision", default="main", help='Base revision/branch (default: "main")'
+    )
+    ap.add_argument(
+        "--commit-message", default="Update requirements.txt", help="Commit title"
+    )
+    ap.add_argument(
+        "--dry-run", action="store_true", help="Print actions, do not commit"
+    )
+    ap.add_argument(
+        "--patch-append",
+        action="append",
+        default=[],
+        help="Line(s) to append if missing (repeatable). If not provided, overwrites file exactly.",
+    )
+    args = ap.parse_args()
+
+    import os
+
+    token = os.environ.get("HF_TOKEN")
+
+    api = HfApi(token=token)
+
+    with open(args.requirements, "r", encoding="utf-8") as f:
+        desired = f.read().strip() + "\n"
+
+    repo_ids = list(iter_org_model_repo_ids(api, args.org))
+    print(f"Found {len(repo_ids)} model repos under org={args.org}")
+
+    for repo_id in repo_ids:
+        # Decide new content
+        new_content = desired
+        if args.patch_append:
+            existing = load_existing_requirements(repo_id, token)
+            if existing is None:
+                base = ""
+            else:
+                base = existing
+            lines = [ln.rstrip("\n") for ln in base.splitlines() if ln.strip()]
+            s = set(lines)
+            for extra in args.patch_append:
+                if extra not in s:
+                    lines.append(extra)
+            new_content = "\n".join(lines).strip() + "\n"
+
+        if args.dry_run:
+            print(
+                f"[DRY RUN] Would update {repo_id}:requirements.txt (create_pr={args.create_pr})"
+            )
+            continue
+
+        op = CommitOperationAdd(
+            path_in_repo="requirements.txt",
+            path_or_fileobj=BytesIO(new_content.encode("utf-8")),
+        )
+        api.create_commit(
+            repo_id=repo_id,
+            repo_type="model",
+            revision=args.revision,
+            operations=[op],
+            commit_message=args.commit_message,
+            create_pr=bool(args.create_pr),
+        )
+        print(f"Updated {repo_id}")
+
+    print("Done.")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/opt/package/uv.lock b/opt/package/uv.lock
new file mode 100644
index 00000000..7518fc90
--- /dev/null
+++ b/opt/package/uv.lock
@@ -0,0 +1,3 @@
+version = 1
+revision = 3
+requires-python = ">=3.12"
diff --git a/opt/pubchem-qc/README.md b/opt/pubchem-qc/README.md
index e69de29b..9a59431d 100644
--- a/opt/pubchem-qc/README.md
+++ b/opt/pubchem-qc/README.md
@@ -0,0 +1,3 @@
+Code for generating ready-to-train datasets for MIST from [PubChemQC](https://doi.org/10.1021/acs.jcim.7b00083)
+
+To run see: `uv run python main.py --help`
diff --git a/opt/qmist/README.md b/opt/qmist/README.md
index 6e9be6e3..f6850584 100644
--- a/opt/qmist/README.md
+++ b/opt/qmist/README.md
@@ -1,5 +1,5 @@
 # QM9 Calculation Replication
-Scripts to reproduce the QM9 calculations from the dataset for arbitrary SMILES strings.
+Scripts to reproduce the [QM9](https://doi.org/10.1038/sdata.2014.22) calculations from the dataset for arbitrary SMILES strings.
 
 ## Installation
 
diff --git a/opt/run_logs/README.md b/opt/run_logs/README.md
new file mode 100644
index 00000000..f35513a4
--- /dev/null
+++ b/opt/run_logs/README.md
@@ -0,0 +1,4 @@
+# Wandb Sync and Normalization
+
+Scripts for downloading and normalizing job experiments from Wandb.
+The result of running these scripts is provided in `wandb-export.tar.gz` at [doi:10.5281/zenodo.17527149](https://doi.org/10.5281/zenodo.17527149)
diff --git a/opt/screening/README.md b/opt/screening/README.md
index e69de29b..14bf9e25 100644
--- a/opt/screening/README.md
+++ b/opt/screening/README.md
@@ -0,0 +1,15 @@
+# High-Throughput Screening with MIST
+
+
+This is our high-throughput screening workflow using [FASMIFRA](https://doi.org/10.1186/s13321-021-00566-4) for molecular generation and MIST as high-quality critic for filtering.
+
+1. Install [uv](https://docs.astral.sh/uv/getting-started/installation/)
+2. Install [FASMIFRA](https://github.com/UnixJunkie/FASMIFRA) placing the binary at `./vendor/fasmifra`
+3. See `uv run python screen.py --help` for run arguments
+
+## Sample Screening Campaigns
+
+- [Electrolytes](./config.yaml)
+- [Fragrance, Non-Toxic Liquids](./config_fragrance.yaml)
+- [Tom Ford's Eau de Soleil Fragrance](./config_tf_eau_de_soleil.yaml)
+- [Tom Ford's Oud Wood Fragrance](./config_tf_oud_wood.yaml)
diff --git a/opt/screening/plots/plot_lio2.jl b/opt/screening/plots/plot_lio2.jl
index e641648a..84257d21 100755
--- a/opt/screening/plots/plot_lio2.jl
+++ b/opt/screening/plots/plot_lio2.jl
@@ -11,7 +11,7 @@ GIT_ROOT = realpath(joinpath(ROOTDIR, "..", ".."))
 fig_dir = joinpath(ROOTDIR, "fig")
 isdir(fig_dir) || mkdir(fig_dir)
 
-run_path = "/nfs/turbo/coe-venkvis/abhutani/electrolyte-fm/opt/screening/out/3e6b639f-6cd8-40a6-947f-62b238ab408d"
+run_path = "runs/screening/out/3e6b639f-6cd8-40a6-947f-62b238ab408d"
 df = ScreeningPlots.load_generated_molecules(run_path)
 
 # Rename DMSO_pKa to pKa_DMSO for consistency
diff --git a/opt/sterochemistry/README.md b/opt/sterochemistry/README.md
new file mode 100644
index 00000000..039f9fa5
--- /dev/null
+++ b/opt/sterochemistry/README.md
@@ -0,0 +1 @@
+Code for generating ready-to-train datasets for MIST from [Omol25](https://huggingface.co/facebook/OMol25) and reproducing stereochemistry plots in paper.
diff --git a/opt/synth_access/README.md b/opt/synth_access/README.md
new file mode 100644
index 00000000..69d6b6ca
--- /dev/null
+++ b/opt/synth_access/README.md
@@ -0,0 +1,10 @@
+# Synthetic Accessibility
+
+Evaluating if MIST's molecular surprise correlates with a chemist's ``Synthetic Accessibility''.
+Evaluates multiple metrics for synthetic accessibility on two datasets reporting results, plus analysis/plotting code to access the results.
+
+# Installation
+
+1. Install [Julia](https://julialang.org/downloads/) and [uv](https://docs.astral.sh/uv/getting-started/installation/)
+2. Evaluate the metrics: `uv run python main.py`
+3. Generate plots & analysis results: `julia --project ./plot.jl`