Snakemake Pipeline Setup ========================= This guide covers setting up and configuring the Snakemake workflow for automated analysis pipelines. Installing Pipeline Dependencies --------------------------------- Install the optional pipeline dependencies: **Using uv:** .. code-block:: bash uv add neurodent[pipeline] **Using pip:** .. code-block:: bash pip install neurodent[pipeline] .. note:: The ``pipeline`` extra includes Snakemake and related dependencies needed for running the automated analysis workflow. If you only need the core NeuRodent library for Python-based analysis, the basic installation is sufficient. SLURM Cluster Configuration ---------------------------- If you're running the Snakemake workflow on a SLURM cluster, the setup depends on your Snakemake version. Snakemake 7.x (Python 3.10) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use a `Snakemake SLURM profile `_ generated with cookiecutter. **Recommended log path setting:** To place SLURM job logs alongside Snakemake logs (making debugging easier), update your profile's ``CookieCutter.py``: .. code-block:: python # ~/.config/snakemake/your-profile/CookieCutter.py def get_cluster_logpath() -> str: return "logs/%r/slurm_%j" # puts slurm_{jobid}.{out,err} in logs// Run the workflow with: .. code-block:: bash uv run snakemake --profile your-profile Snakemake 8+ (Python 3.11+) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Snakemake 8 uses a native `SLURM executor plugin `_ instead of cookiecutter profiles. Install the plugin: .. code-block:: bash pip install snakemake-executor-plugin-slurm Run the workflow with the ``--executor`` flag: .. code-block:: bash uv run snakemake --executor slurm --default-resources --jobs 30 The native plugin automatically: - Deletes SLURM log files for successful jobs (reduces clutter) - Preserves logs for failed jobs for 10 days - Uses ``--slurm-logdir`` to customize log location To customize log directory, add to your profile: .. code-block:: yaml # ~/.config/snakemake/your-profile/config.yaml executor: slurm slurm-logdir: "logs/slurm" See the `plugin documentation `_ for full configuration options. Local Configuration Overrides ------------------------------ You can override any setting from ``config/config.yaml`` using a local configuration file. This is useful for adjusting analysis parameters or file paths for your specific environment without modifying the main configuration file (which is tracked by git). To use local overrides: 1. Create a file named ``config/config.local.yaml``. 2. Add the specific configuration keys you wish to override. You do *not* need to copy the entire configuration file; Snakemake performs a "deep merge", so only the keys you specify will be updated. **Example:** If you want to change the analysis sampling rate but keep all other settings: .. code-block:: yaml # config/config.local.yaml analysis: sampling_rate: 2000 The ``config/config.local.yaml`` file is included in ``.gitignore`` and will not be pushed to the repository. Testing with a Subset of Animals --------------------------------- When testing the pipeline, you may want to run only a small number of animals instead of the full dataset. Use the ``truncate_animals`` setting under ``samples`` to limit processing to the first *N* animals in the samples file: .. code-block:: yaml # config/config.local.yaml samples: truncate_animals: 2 # only process the first 2 animals Set ``truncate_animals`` to ``null`` (the default) to process all animals. .. tip:: Combine this with a fast dataset such as ``mini_real`` for quick smoke-testing: .. code-block:: bash NEURODENT_DATASET=mini_real uv run snakemake --cores all Running the Pipeline -------------------- Basic Usage ^^^^^^^^^^^ .. code-block:: bash # Dry run to see what would be executed uv run snakemake --dry-run # Run pipeline locally (for testing) uv run snakemake --cores all # Run on SLURM cluster uv run snakemake --profile your-profile Useful Commands ^^^^^^^^^^^^^^^ .. code-block:: bash # Generate workflow visualization uv run snakemake --rulegraph | dot -Tpng > workflow.png # Clean results (be careful!) uv run snakemake --delete-all-output # Unlock workflow (if interrupted) uv run snakemake --unlock # Force re-run specific rule uv run snakemake --forcerun rule_name See also: :doc:`dataset_configuration` for selecting and configuring different datasets.