Snakemake + SLURM
Get Snakemake to submit jobs via SLURM
Snakemake (version 8+) has a fantastic feature of submitting jobs to SLURM (or other schedulers) on your behalf. This feature existed in pervious versions too, but v8 is a complete rewrite of how it works and it’s much, much simpler than before. This document will walk you through the things you’ll need to know to get a Snakemake workflow to automate all of the logistics for working on the BioHPC
1. Make sure it’s snakemake v8
Snakemake 8 has a much easier cluster submission system, so it’ll need to be updated to major version 8, which also requires python version 3.12 to be manually specified
mamba update -c bioconda -c conda-forge snakemake=8 python=3.12
if you need to install Snakemake
If you don’t have Snakemake installed, replace mamba update
with mamba install
mamba install -c bioconda -c conda-forge snakemake=8 python=3.12
if you aren’t in a conda environment
If you aren’t in any conda environment whatsoever, replace mamba update
with mamba create -n SOMENAME
where SOMENAME
is the name you want to give your environment. Once created, activate the environment with mamba activate SOMENAME
.
mamba create -n snakemake -c bioconda -c conda-forge snakemake=8
mamba activate snakemake
install the plugins
Then, add the SLURM executor plugin and filesystem plugin
mamba install -c bioconda -c conda-forge snakemake-executor-plugin-slurm snakemake-storage-plugin-fs
2. Setting up a SLURM profile config
To use the Snakemake SLURM stuff, you’ll need a config file, in YAML format, often called a “profile”. This file will specify all the necessities of the cluster configuration so that Snakemake can submit jobs on your behalf as governed by the workflow. Create a folder called profiles
mkdir profiles
Within this folder, create a file named config.yaml
that looks like this:
__use_yte__: true
executor: slurm
default-resources:
slurm_account: nt246_0001
slurm_partition: regular
mem_mb: attempt * 1500
runtime: 10
jobs: 30
latency-wait: 60
retries: 1
default-storage-provider: fs
local-storage-prefix: /home2/pd348
shared-fs-usage:
- persistence
- software-deployment
- sources
- source-cache
The lines, explained:
the basic configuration
__use_yte__: true
lets Snakemake know to use an enhanced YAML interpreting engine– don’t think too much about itexecutor: slurm
this tells Snakemake to use that slurm exector plugin you installed earlier, which is configured to work for SLURM schedulersdefault-resources
is a catch-all group that lets you put in scheduler-specific parameters that don’t appear in Snakemake’s command line optionsslurm_account
is the user account that will interact with the SLURM scheduler, which in most cases should bent246_0001
slurm_partition
is the name of the BioHPC partition your are requesting, such asregular
orlong7
. The partition list can be found here.mem_mb
the maximum number of megabytes of memory to request for a given job if not specified in theresources
directive within a rulejobs
is the maximum number of jobs Snakemake is allowed to submit to SLURM at a given timeruntime
is the time (in minutes) requested for the jobs if not specified in theresources
directive within a rule
latency-wait
is the time, in seconds, Snakemake should wait after a job has finished to check if the expected outputs are presentretries
is the number of times Snakemake can try to resubmit a failed job. Works best when the Snakemake is setup to change things on retries, like increasing RAM per retry (e.g.attempt * 1000
), requesting more time from the scheduler, etc.
automatic file I/O
This part can be omitted if you don’t need snakemake to move files in or out of home2
(or another directory) between job submissions
default-storage-provider
is the type of filesystem to use, wherefs
is a locally mounted filesystem where things are moved (by Snakemake) usingrsync
. Other options can be explored here.local-storage-prefix
is a convenient setting to have Snakemake copy inputs to a SCRATCH-type directory, run the job there, then copy the expected outputs back to the project directory upon completion of the job. For the BioHPC, this can be seen as the automated way to move stuff into and out ofhome2/USER
without having to think about it all that much.shared-fs-usage
is a list of permissions/specifications for how Snakemake can interact with the filesystem. The options shown in the example above (persistence
,software-deployment
,sources
, andsources-cache
) are necessary for Snakemake automating input/output of thehome2
(or other) directory. Unfortunately, the Snakemake documentation on these parameters are lacking, so there isn’t more to say about it until that information becomes publicly available.
3. Prepare the working directory
At this point, you should have a profiles/config.yaml
in your working directory. Keep in mind that the folder can be named anything, not specifically profiles
, but the config file has to be named config.yaml
.
4. Run the workflow
How you will invoke Snakemake is specific to your workflow, but a typical invocation of Snakemake could look like this:
snakemake --cores 15 --sdm conda \
--workflow-profile profiles \
--configfile config/config.yaml \
--snakefile workflow.smk
--cores
is the number of cores/threads you’re allowing the scheduler to use.--workflow-profile
is the relative path to the directory with the SLURM configuration you made above, which in this case should just be the folder nameprofiles
.--snakefile
is the path to the workflow snakefile--sdm
[optional] is a shortcut for--software-deployment-method
, which in this case is telling Snakemake to use the conda configurations written into your workflow to manage runtime software dependencies. Defaults to using mamba. This could also be omitted if there are noconda
directives inside any rules, or--sdm apptainer
if thecontainer
directive is featured inside any rules.