Caching
Kestra provides a file caching, which is especially useful when you work with sizeable package dependencies that don't change often.
Cache files in a WorkingDirectory task
The file caching functionality on the WorkingDirectory task allows you to cache a subset of files to speed up your workflow execution. This is especially useful when you work with sizeable package dependencies that don't change often.
Kestra can only cache files installed or created as part of the script tasks if the script uses a PROCESS runner. If the script uses a DOCKER runner, the files will not be cached and the WorkingDirectory task will throw an error: Unable to execute WorkingDirectory post actions.
Use cases for file caching
The file caching is useful if you want to install some pip or npm packages before running your script. You can cache the node_modules or Python venv folder to avoid re-installing the dependencies on each run.
To do that, add a cache to your WorkingDirectory task. The cache property accepts a list of glob patterns to match files to cache. The cache will be automatically invalidated after a specified time-to-live using the ttl property accepting a duration.
id: caching_files
namespace: company.team
tasks:
- id: working_dir
type: io.kestra.plugin.core.flow.WorkingDirectory
cache:
patterns:
- some_directory/**
ttl: PT1H
How does it work under the hood
Kestra packages the files that need to be cached and stores them in the internal storage. When the task is executed again, the cached files are retrieved, initializing the working directory with their contents.
Node.js example
Here's an example of a flow that installs the colors package before running a Node.js script. The node_modules folder is cached for one hour.
id: node_cached_dependencies
namespace: company.team
tasks:
- id: working_dir
type: io.kestra.plugin.core.flow.WorkingDirectory
cache:
patterns:
- node_modules/**
ttl: PT1H
tasks:
- id: node_script
type: io.kestra.plugin.scripts.node.Script
beforeCommands:
- npm install colors
script: |
const colors = require("colors");
console.log(colors.red("Hello"));
Python example
Here's an example of a flow that installs the pandas package before running a Python script. The venv folder is cached for one day.
id: python_cached_dependencies
namespace: company.team
tasks:
- id: working_dir
type: io.kestra.plugin.core.flow.WorkingDirectory
tasks:
- id: python_script
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process
warningOnStdErr: false
beforeCommands:
- python -m venv venv
- source venv/bin/activate
- pip install pandas
script: |
import pandas as pd
print(pd.__version__)
cache:
patterns:
- venv/**
ttl: PT24H
How to invalidate the cache
Here's how that works:
- After the first run, the files are cached
- The next time the task is executed:
- if the
ttldidn't pass, the files are retrieved from cache - If the
ttlpassed, the cache is invalidated and no files will be retrieved from cache; because cache is no longer present, thenpm installcommand from thebeforeCommandsproperty will take a bit longer to execute
- if the
- If you edit the task and change the
ttlto:- a longer duration e.g.
PT5H— the files will be cached for five hours using the newttlduration - a shorter duration e.g.
PT5M— the cache will be invalidated after five minutes using the newttlduration.
- a longer duration e.g.
The ttl is evaluated at runtime. If the most recently set ttl duration has passed as compared to the last task run execution date, the cache is invalidated and the files are no longer retrieved from cache.
Was this page helpful?