Project

General

Profile

« Previous | Next » 

Revision 428015d2

Added by Chloe Quignot 7 months ago

add README and exercise 0 improved after 1C

View differences:

README.md
# Examples and solutions of the Snakemake BIOI2 training session
Instructions are on the BIOI2 website: https://bioi2.i2bc.paris-saclay.fr/training/snakemake/
To download this repository, open a terminal and type:
```bash
git clone https://forge.i2bc.paris-saclay.fr/git/bioi2_formations/snakemake_examples.git
```
## Organisation of this repository
```text
├── README.md
├── exercise0
├── exercise0_improved_after_1A
├── exercise0_improved_after_1B
├── exercise0_improved_after_1C
└── demo_advanced
```
### Exercise 0
The example Snakemake pipeline to execute in [Exercise 0](https://bioi2.i2bc.paris-saclay.fr/training/snakemake/exercises/exercise-0-objective) is in the `exercise0` folder.
**exercise0_improved_after_1X** folders are examples of improvements of the initial SnakeFile after applying what you've learnt in Exercises 1A, 1B and 1C. We advise you to have a look at them once you've finished with the forementioned exercises.
### Exercise 2
The `demo_advanced` folder is an example solution for Exercise 2 comprising several different types of syntaxes that you could encounter in Snakefiles.
## Executing the SnakeFiles
If you're in the folder that contains the Snakefile (and if the Snakefile is named Snakefile), you can just type:
```bash
snakemake --cores 1
```
If you'd like to specify the Snakefile in the command line (because it's not in your current directory or because it's named differently):
```bash
snakemake --cores 1 -s /path/to/snakefile.smk
```
exercise0_improved_after_1C/Snakefile
import yaml
with open('samples.yaml', 'r') as file
content = yaml.safe_load(file)
samples = content['samples']
rule targets:
input:
expand("fasta/{sample}.fasta", sample=samples),
"fusionFasta/allSequences.fasta",
"mafft/mafft_res.fasta",
# Update 1: add the threads directive to all rules specifying
# the maximum number of threads/CPUs/processors to
# use per rule
# Update 2: add the resources directive to all rules specifying
# the maximum amount of memory, walltime etc. to
# use per rule
rule loadData:
output:
"fasta/{sample}.fasta",
params:
dirFasta = "fasta",
log:
stdout = "logs/{sample}_wget.stdout",
stderr = "logs/{sample}_wget.stderr",
threads: 1
resources:
mem="1gb",
time_min="00:05:00",
shell:
"""
wget --output-file {log.stderr} \
--directory-prefix {params.dirFasta} \
https://www.uniprot.org/uniprot/{wildcards.sample}.fasta > {log.stdout}
"""
rule fusionFasta:
input:
expand("fasta/{sample}.fasta", sample=samples),
output:
"fusionFasta/allSequences.fasta",
log:
"logs/fusionData.stderr",
threads: 1
resources:
mem="1gb",
time_min="00:05:00",
shell:
"""
cat {input} > {output} 2> {log}
"""
# Update 3: add the envmodules directive to rules that use
# non-standard tools such as mafft so that Snakemake
# automatically "activates" the tool on the cluster
# NB: use "module avail" to see the right syntax
rule mafft:
input:
"fusionFasta/allSequences.fasta",
output:
"mafft/mafft_res.fasta",
log:
"logs/whichMafft.txt",
threads: 1
resources:
mem="1gb",
time_min="00:05:00",
envmodules:
"nodes/mafft-7.475"
shell:
"""
mafft {input} > {output} 2> {log}
"""
# Update 4: add a profile configuration file: profile/config.yaml
# to specify options instead of specifying them in the
# command line at execution
exercise0_improved_after_1C/profile/config.yaml
# cluster-specific options (for PBSpro environment):
jobs: 6
executor: cluster-generic
cluster-generic-submit-cmd: "qsub -l ncpus={threads} -l mem={resources.mem} -l walltime={resources.time_min}"
cluster-generic-cancel-cmd: "qdel"
# set default resources for each job to 1 cpu and 1Gb if not specified otherwise:
default-resources: [threads=1, mem="1Gb", time_min="02:00:00"]
# software option:
software-deployment-method: env-modules
# to avoid typing -p everytime:
printshellcmds: True
# deactivate global stop when there's an error with 1 input:
keep-going: True
# retry running 3 times if fail
restart-times: 3
# in case there is latency when jobs are run on the cluster, wait a while
latency-wait: 180
exercise0_improved_after_1C/readme_runSnake.txt
Pour faire fonctionner le pipeline il faut se connecter sur un noeud du cluster puis:
- charger l'environnement snakemake:
module load snakemake/snakemake-8.4.6
module load nodes/mafft-7.475
- executer le programme, se placer dans ce dossier et:
snakemake --cores 1
exercise0_improved_after_1C/samples.yaml
samples: ["P01325", "P01308"]

Also available in: Unified diff