Lineage and reproducibility

What data trained the model that's running in production right now?

If the answer takes more than 10 seconds, the problem usually isn't the team. It's that the tracking happened in someone's notebook, a forgotten S3 folder, and a Slack thread from March. Valohai records every experiment, every dataset version, and every model automatically, so the answer is one query, not an investigation.

Start a free trial See it in action

01 · Automatic versioning

You track code in Git. Everything else falls through the cracks.

Code goes in Git. Models and datasets don't fit there, so they end up in object storage with timestamp filenames or a registry that lost context two refactors ago. And the long list of context that actually determines whether a run is reproducible (the specific GPU, the CUDA version, the dependency tree, the config files, the preprocessing pipeline that produced the input data, the random seed) almost never gets versioned at all. Not because nobody cares, but because nobody has time to manually log it for every experiment.

Valohai versions all of it automatically. Every experiment is a complete record: code, dataset version, model artifacts, parameters, environment, dependencies, hardware, configs, and the upstream lineage of the data itself. Not because someone remembered. Because the platform does it by default.

Valohai execution record showing environment, commit, step, image, command, inputs, and parameters captured automatically

02 · Full lineage

Something went wrong in production. Which model? Which data? Which experiment?

When a model misbehaves in production, the questions come fast. Which version is running? Which experiment produced it? What dataset trained it, and which version of that dataset? What code, what parameters, what dependencies, what hardware? Without lineage, this is a forensic investigation across notebooks, S3 buckets, and old commits. With lineage, it's a query.

Valohai links every model to the full chain behind it: the experiment that produced it, the dataset version it trained on, the code commit, the parameters, the environment, the hardware, and the preprocessing pipeline that produced the data in the first place. Trace from any production model back to any decision in the chain.

Lineage graph linking training executions through best-model promotion to the deployed model dataset

03 · Reproducible by default

Reproducing last month's experiment took three engineers two days.

The environment had changed. The dependencies had updated. The data had moved. What should have been "run it again" became a multi-day effort to reconstruct the conditions that existed when the experiment originally ran. By the time the team got it working, half of them had forgotten what they were trying to debug.

Because Valohai captured the full state of every experiment when it ran (code, dataset version, parameters, environment, dependencies, hardware), reproducing one is rerunning what's already on file. Same setup, same conditions, same outputs.

New Execution form pre-filled with the original commit, environment, docker image, command, and step from a past run

04 · Audit and compliance

A regulator asked for the training history of every model in production. Your deadline is two weeks.

Regulated industries need to demonstrate exactly how a model was built and how it changed. Which data, which version, what validation was performed, what changed since the previous version, and who approved the promotion. Reconstructing that from git logs, chat threads, and memory takes weeks the deadline doesn't allow.

Valohai generates this record automatically as experiments run. Every dataset version, every pipeline execution, every model promotion, every approval, captured in queryable lineage that can be exported as an audit-ready report. When the auditor asks, the answer is already there.

harbor-model registry page with latest approved version, full version history, and approval timestamps

05 · Collaboration

Three people are running experiments on the same project. Nobody knows what anyone else tried.

Teams duplicate work without a shared view of what's been tried. Someone runs a variation that was already tested last week. Someone overwrites a configuration that was carefully tuned. The "best model" is the one someone remembers being good, not the one the data shows.

The same lineage that makes individual experiments reproducible makes the team's full experiment history visible to everyone. Anyone on the project can see what's been run, what worked, what didn't, what changed between versions, and why. Compare results across runs, branch off someone else's experiment, and pick up where a teammate left off.

Shared executions list with multiple teammates' runs, statuses, metrics, and parameters in one view

Track everything. Reproduce anything.

Start a free trial or see it in action.

Start a free trial See it in action