Revision 428015d2
Added by Chloe Quignot over 1 year ago
| README.md | ||
|---|---|---|
|
# Examples and solutions of the Snakemake BIOI2 training session
|
||
|
|
||
|
Instructions are on the BIOI2 website: https://bioi2.i2bc.paris-saclay.fr/training/snakemake/
|
||
|
|
||
|
To download this repository, open a terminal and type:
|
||
|
```bash
|
||
|
git clone https://forge.i2bc.paris-saclay.fr/git/bioi2_formations/snakemake_examples.git
|
||
|
```
|
||
|
|
||
|
## Organisation of this repository
|
||
|
|
||
|
```text
|
||
|
├── README.md
|
||
|
├── exercise0
|
||
|
├── exercise0_improved_after_1A
|
||
|
├── exercise0_improved_after_1B
|
||
|
├── exercise0_improved_after_1C
|
||
|
└── demo_advanced
|
||
|
```
|
||
|
|
||
|
### Exercise 0
|
||
|
|
||
|
The example Snakemake pipeline to execute in [Exercise 0](https://bioi2.i2bc.paris-saclay.fr/training/snakemake/exercises/exercise-0-objective) is in the `exercise0` folder.
|
||
|
|
||
|
**exercise0_improved_after_1X** folders are examples of improvements of the initial SnakeFile after applying what you've learnt in Exercises 1A, 1B and 1C. We advise you to have a look at them once you've finished with the forementioned exercises.
|
||
|
|
||
|
### Exercise 2
|
||
|
|
||
|
The `demo_advanced` folder is an example solution for Exercise 2 comprising several different types of syntaxes that you could encounter in Snakefiles.
|
||
|
|
||
|
|
||
|
## Executing the SnakeFiles
|
||
|
|
||
|
If you're in the folder that contains the Snakefile (and if the Snakefile is named Snakefile), you can just type:
|
||
|
|
||
|
```bash
|
||
|
snakemake --cores 1
|
||
|
```
|
||
|
|
||
|
If you'd like to specify the Snakefile in the command line (because it's not in your current directory or because it's named differently):
|
||
|
```bash
|
||
|
snakemake --cores 1 -s /path/to/snakefile.smk
|
||
|
```
|
||
| exercise0_improved_after_1C/Snakefile | ||
|---|---|---|
|
import yaml
|
||
|
|
||
|
with open('samples.yaml', 'r') as file
|
||
|
content = yaml.safe_load(file)
|
||
|
samples = content['samples']
|
||
|
|
||
|
|
||
|
rule targets:
|
||
|
input:
|
||
|
expand("fasta/{sample}.fasta", sample=samples),
|
||
|
"fusionFasta/allSequences.fasta",
|
||
|
"mafft/mafft_res.fasta",
|
||
|
|
||
|
|
||
|
# Update 1: add the threads directive to all rules specifying
|
||
|
# the maximum number of threads/CPUs/processors to
|
||
|
# use per rule
|
||
|
# Update 2: add the resources directive to all rules specifying
|
||
|
# the maximum amount of memory, walltime etc. to
|
||
|
# use per rule
|
||
|
rule loadData:
|
||
|
output:
|
||
|
"fasta/{sample}.fasta",
|
||
|
params:
|
||
|
dirFasta = "fasta",
|
||
|
log:
|
||
|
stdout = "logs/{sample}_wget.stdout",
|
||
|
stderr = "logs/{sample}_wget.stderr",
|
||
|
threads: 1
|
||
|
resources:
|
||
|
mem="1gb",
|
||
|
time_min="00:05:00",
|
||
|
shell:
|
||
|
"""
|
||
|
wget --output-file {log.stderr} \
|
||
|
--directory-prefix {params.dirFasta} \
|
||
|
https://www.uniprot.org/uniprot/{wildcards.sample}.fasta > {log.stdout}
|
||
|
"""
|
||
|
|
||
|
|
||
|
rule fusionFasta:
|
||
|
input:
|
||
|
expand("fasta/{sample}.fasta", sample=samples),
|
||
|
output:
|
||
|
"fusionFasta/allSequences.fasta",
|
||
|
log:
|
||
|
"logs/fusionData.stderr",
|
||
|
threads: 1
|
||
|
resources:
|
||
|
mem="1gb",
|
||
|
time_min="00:05:00",
|
||
|
shell:
|
||
|
"""
|
||
|
cat {input} > {output} 2> {log}
|
||
|
"""
|
||
|
|
||
|
|
||
|
# Update 3: add the envmodules directive to rules that use
|
||
|
# non-standard tools such as mafft so that Snakemake
|
||
|
# automatically "activates" the tool on the cluster
|
||
|
# NB: use "module avail" to see the right syntax
|
||
|
rule mafft:
|
||
|
input:
|
||
|
"fusionFasta/allSequences.fasta",
|
||
|
output:
|
||
|
"mafft/mafft_res.fasta",
|
||
|
log:
|
||
|
"logs/whichMafft.txt",
|
||
|
threads: 1
|
||
|
resources:
|
||
|
mem="1gb",
|
||
|
time_min="00:05:00",
|
||
|
envmodules:
|
||
|
"nodes/mafft-7.475"
|
||
|
shell:
|
||
|
"""
|
||
|
mafft {input} > {output} 2> {log}
|
||
|
"""
|
||
|
|
||
|
|
||
|
# Update 4: add a profile configuration file: profile/config.yaml
|
||
|
# to specify options instead of specifying them in the
|
||
|
# command line at execution
|
||
| exercise0_improved_after_1C/profile/config.yaml | ||
|---|---|---|
|
# cluster-specific options (for PBSpro environment):
|
||
|
jobs: 6
|
||
|
executor: cluster-generic
|
||
|
cluster-generic-submit-cmd: "qsub -l ncpus={threads} -l mem={resources.mem} -l walltime={resources.time_min}"
|
||
|
cluster-generic-cancel-cmd: "qdel"
|
||
|
# set default resources for each job to 1 cpu and 1Gb if not specified otherwise:
|
||
|
default-resources: [threads=1, mem="1Gb", time_min="02:00:00"]
|
||
|
# software option:
|
||
|
software-deployment-method: env-modules
|
||
|
# to avoid typing -p everytime:
|
||
|
printshellcmds: True
|
||
|
# deactivate global stop when there's an error with 1 input:
|
||
|
keep-going: True
|
||
|
# retry running 3 times if fail
|
||
|
restart-times: 3
|
||
|
# in case there is latency when jobs are run on the cluster, wait a while
|
||
|
latency-wait: 180
|
||
| exercise0_improved_after_1C/readme_runSnake.txt | ||
|---|---|---|
|
Pour faire fonctionner le pipeline il faut se connecter sur un noeud du cluster puis:
|
||
|
|
||
|
- charger l'environnement snakemake:
|
||
|
module load snakemake/snakemake-8.4.6
|
||
|
module load nodes/mafft-7.475
|
||
|
|
||
|
- executer le programme, se placer dans ce dossier et:
|
||
|
snakemake --cores 1
|
||
| exercise0_improved_after_1C/samples.yaml | ||
|---|---|---|
|
samples: ["P01325", "P01308"]
|
||
Also available in: Unified diff
add README and exercise 0 improved after 1C