Blog / Save time and avoid recomputation with Pipeline Step Caching

Save time and avoid recomputation with Pipeline Step Caching

by Tarek Oraby | on August 20, 2024

Machine learning pipelines often involve multiple steps, many of which can be time-consuming and resource-intensive. In most cases, these steps are repeated across different runs, leading to redundant computations and wasted resources.

In order to address this issue, we’ve released a new functionality within the valohai MLOps platform. Executions Reuse is a form of pipeline caching that enables the reuse of intermediate results from previous pipeline executions.

With this feature, you can significantly accelerate your ML workflows, reduce redundant computational costs, and improve overall efficiency.

What is Executions Reuse?

Executions Reuse is a feature of Valohai Pipelines that allows you to reuse intermediate results from previous pipeline runs. When a step in the pipeline has already been executed and its input data, parameters, and code remain unchanged, Valohai can reuse the previous results instead of re-running the step. This approach prevents redundant computations and speeds up the time needed to run the entire pipeline.

How does Executions Reuse work?

In order to ensure reproducibility and lineage tracking, Valohai automatically tracks the input data, parameters, source code, and output data of each step in the pipeline as well as ad-hoc executions. You can learn more about traceability and reproducibility in Valohai from a dedicated blog post.

The new Executions Reuse feature leverages this pre-existing tracking capability to determine if a pipeline step can be reused from a previous run. When Executions Reuse is enabled, Valohai checks if the input data, parameters, and source code of a step match those of a previously run execution (regardless of whether it was part of a pipeline or not). If a match is found, Valohai reuses the results from the previous run, saving time and computational resources.

Key benefits of Executions Reuse

Faster iterations

Reduced computation time: By reusing stored results, Executions Reuse eliminates the need to re-run identical steps, leading to faster iterations and quicker model development.
Improved team productivity: With shorter iteration cycles, data scientists and machine learning engineers can experiment more freely, test different hypotheses, and refine models more efficiently.

Cost savings

Optimized resource usage: Executions Reuse minimizes redundant computations, which translates to lower costs, especially in cloud environments where you pay for the computing power you use.
Efficient resource allocation: By reusing stored results, you can utilize computational resources more effectively, allocating them to other tasks or spinning them down for cost savings.

Enhanced reproducibility

Consistent results: Reusing previous results ensures that identical steps yield the same outputs across different runs, enhancing reproducibility and consistency between them.
Simplified debugging: With consistent intermediate outputs, debugging and troubleshooting issues in ML pipelines become more straightforward.

Getting Started with Executions Reuse

Enabling Executions Reuse in your Valohai Pipelines is simple and straightforward. You can activate Executions Reuse by updating your pipeline configuration in the valohai.yaml file to include the “reuse-executions: true” flag.

Here is an example of how to enable Executions Reuse in your pipeline configuration:

    - pipeline:
        name: three-trainings-pipeline
        reuse-executions: true
        nodes:
        - name: preprocess
            type: execution
            step: preprocess-dataset
        - name: train1
            type: execution
            step: train-model
            override:
        ...

Alternatively, you can enable or disable the feature in our web UI when creating a new machine-learning pipeline:

The view of enabling Valohai's Executions Reuse in the web UI

By setting reuse-executions: true, you enable Executions Reuse for the entire pipeline, allowing Valohai to reuse stored results whenever possible.

Next steps

Executions Reuse is a powerful feature that can significantly accelerate your ML pipelines, reduce costs, and improve the efficiency of your machine learning workflows and operations. Try Executions Reuse in your Valohai Pipelines today and experience faster, more efficient machine learning workflows.

If you’re not a Valohai user yet, you can get a preview of Valohai’s capabilities using our self-service trial or chat with our Customer Team:

Start your Valohai trialTry out the MLOps platform for 14 days