Snakemake + SLURM
Get Snakemake to submit jobs via SLURM
Snakemake (version 8+) has a fantastic feature of submitting jobs to SLURM (or other schedulers) on your behalf. This feature existed in pervious versions too, but v8 is a complete rewrite of how it works and it’s much, much simpler than before. This document will walk you through the things you’ll need to know to get a Snakemake workflow to automate all of the logistics for working on the BioHPC
1. Make sure it’s snakemake v8
Snakemake 8 has a much easier cluster submission system, so it’ll need to be updated to major version 8, which also requires python version 3.12 to be manually specified
mamba update -c bioconda -c conda-forge snakemake=8 python=3.12if you need to install Snakemake
If you don’t have Snakemake installed, replace mamba update with mamba install
mamba install -c bioconda -c conda-forge snakemake=8 python=3.12if you aren’t in a conda environment
If you aren’t in any conda environment whatsoever, replace mamba update with mamba create -n SOMENAME where SOMENAME is the name you want to give your environment. Once created, activate the environment with mamba activate SOMENAME.
mamba create -n snakemake -c bioconda -c conda-forge snakemake=8
mamba activate snakemakeinstall the plugins
Then, add the SLURM executor plugin and filesystem plugin
mamba install -c bioconda -c conda-forge snakemake-executor-plugin-slurm snakemake-storage-plugin-fs2. Setting up a SLURM profile config
To use the Snakemake SLURM stuff, you’ll need a config file, in YAML format, often called a “profile”. This file will specify all the necessities of the cluster configuration so that Snakemake can submit jobs on your behalf as governed by the workflow. Create a folder called profiles
mkdir profilesWithin this folder, create a file named config.yaml that looks like this:
__use_yte__: true
executor: slurm
default-resources:
slurm_account: nt246_0001
slurm_partition: regular
mem_mb: attempt * 1500
runtime: 10
jobs: 30
latency-wait: 60
retries: 1
default-storage-provider: fs
local-storage-prefix: /home2/pd348
shared-fs-usage:
- persistence
- software-deployment
- sources
- source-cacheThe lines, explained:
the basic configuration
__use_yte__: truelets Snakemake know to use an enhanced YAML interpreting engine– don’t think too much about itexecutor: slurmthis tells Snakemake to use that slurm exector plugin you installed earlier, which is configured to work for SLURM schedulersdefault-resourcesis a catch-all group that lets you put in scheduler-specific parameters that don’t appear in Snakemake’s command line optionsslurm_accountis the user account that will interact with the SLURM scheduler, which in most cases should bent246_0001slurm_partitionis the name of the BioHPC partition your are requesting, such asregularorlong7. The partition list can be found here.mem_mbthe maximum number of megabytes of memory to request for a given job if not specified in theresourcesdirective within a rulejobsis the maximum number of jobs Snakemake is allowed to submit to SLURM at a given timeruntimeis the time (in minutes) requested for the jobs if not specified in theresourcesdirective within a rule
latency-waitis the time, in seconds, Snakemake should wait after a job has finished to check if the expected outputs are presentretriesis the number of times Snakemake can try to resubmit a failed job. Works best when the Snakemake is setup to change things on retries, like increasing RAM per retry (e.g.attempt * 1000), requesting more time from the scheduler, etc.
automatic file I/O
This part can be omitted if you don’t need snakemake to move files in or out of home2 (or another directory) between job submissions
default-storage-provideris the type of filesystem to use, wherefsis a locally mounted filesystem where things are moved (by Snakemake) usingrsync. Other options can be explored here.local-storage-prefixis a convenient setting to have Snakemake copy inputs to a SCRATCH-type directory, run the job there, then copy the expected outputs back to the project directory upon completion of the job. For the BioHPC, this can be seen as the automated way to move stuff into and out ofhome2/USERwithout having to think about it all that much.shared-fs-usageis a list of permissions/specifications for how Snakemake can interact with the filesystem. The options shown in the example above (persistence,software-deployment,sources, andsources-cache) are necessary for Snakemake automating input/output of thehome2(or other) directory. Unfortunately, the Snakemake documentation on these parameters are lacking, so there isn’t more to say about it until that information becomes publicly available.
3. Prepare the working directory
At this point, you should have a profiles/config.yaml in your working directory. Keep in mind that the folder can be named anything, not specifically profiles, but the config file has to be named config.yaml.
4. Run the workflow
How you will invoke Snakemake is specific to your workflow, but a typical invocation of Snakemake could look like this:
snakemake --cores 15 --sdm conda \
--workflow-profile profiles \
--configfile config/config.yaml \
--snakefile workflow.smk--coresis the number of cores/threads you’re allowing the scheduler to use.--workflow-profileis the relative path to the directory with the SLURM configuration you made above, which in this case should just be the folder nameprofiles.--snakefileis the path to the workflow snakefile--sdm[optional] is a shortcut for--software-deployment-method, which in this case is telling Snakemake to use the conda configurations written into your workflow to manage runtime software dependencies. Defaults to using mamba. This could also be omitted if there are nocondadirectives inside any rules, or--sdm apptainerif thecontainerdirective is featured inside any rules.