Skip to content

Conda

Conda is an open-source package manager and environment manager that simplifies installation of software packages and creation of isolated software environments. While it was originally designed for Python, it can manage packages for many programming languages.

Using Conda allows users to maintain separate environments with different software versions, which is particularly useful when working on multiple projects or when software dependencies conflict.


Conda Distributions

The Conda package manager is available through two main distributions.

Miniconda

  • Minimal Conda installation
  • Installs only the core Conda system
  • Users install additional packages manually

Anaconda

  • Full scientific Python distribution
  • Includes hundreds of commonly used packages
  • Larger installation footprint

CONDA

For HPC environments Miniconda is generally recommended, since it keeps installations small and allows users to install only the packages they require.


Installing Miniconda

The Miniconda installer can be downloaded from the official repository.

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

After installation, initialize Conda:

~/miniconda3/bin/conda init bash

Restart your terminal or reconnect to the cluster to apply the changes.


Conda Channels

Conda packages are distributed through channels, which are package repositories.

By default Conda uses the defaults channel, but many scientific packages are available through conda-forge, a large community repository.

Example installation from conda-forge:

conda install <package> --channel conda-forge

You can permanently enable the channel:

conda config --add channels conda-forge

Creating and Managing Environments

Conda environments allow isolated software installations.

Create a new environment:

conda create --name myenv

Activate or deactivate the environment:

conda activate myenv
conda deactivate

List available environments:

conda info --envs

Remove an environment:

conda remove --name myenv --all

Verify removal:

conda info --envs

Creating Environments from YAML Files

Conda environments can be defined in a YAML specification file. This is useful for sharing environments or reproducing software setups.

Create an environment from a YAML file:

conda env create -f my-env.yml

A typical YAML file contains the environment name, dependencies, and channels used for installation.


Sharing Conda Environments

To export an existing environment to a YAML file:

conda env export > my-env.yml

The generated file includes all installed packages and versions.

Another user can recreate the environment with:

conda env create -f my-env.yml

If the environment requires additional channels, they can be added with:

conda config --add channels <channel-name>

Installing Packages

Packages can be installed into an environment either by activating the environment or by specifying it explicitly.

Base environment

Avoid installing many packages into the default base environment. Dependency conflicts may arise over time.

Recommended workflow:

  • Keep the base environment minimal
  • Create separate environments for each project
  • Export environments using YAML files for reproducibility
  • Periodically clean package caches with conda clean --all

Using isolated environments helps prevent dependency conflicts and makes it easier to reproduce software setups across different systems.

Activate the environment:

conda activate myenv
conda install matplotlib

Install into a specific environment:

conda install --name myenv matplotlib

List installed packages:

conda list

List packages in a specific environment:

conda list -n myenv

Search for a specific package:

conda list -n myenv <package>

Cleaning Conda Package Cache

Conda stores downloaded packages locally. Over time this cache may consume a significant amount of disk space in your home directory.

To remove unused packages and caches:

conda clean --all

This command removes unused package files, caches, and temporary data.


Using conda in sumbission scripts

Since all computationally heavy operations must be performed on compute nodes, Conda environments are can also be used in jobs submitted through SLURM scheduler:

Conda submission script

  #!/bin/bash
  #SBATCH -J "sample_job"   # name of job in SLURM
  #SBATCH --account=<project>   # project number
  #SBATCH --partition=short     # select partition short, medium, long, ngpu
  #SBATCH --nodes=      # number of nodes
  #SBATCH --ntasks=     # number of mpi ranks, needs to be tested for the best performance
  #SBATCH --cpus-per-task=  # number of cpus per mpi rank, needs to be tested for the best performance
  #SBATCH --time=hh:mm:ss   # time limit for a job
  #SBATCH -o stdout.%J.out  # standard output
  #SBATCH -e stderr.%J.out  # error output

  echo "Launched at $(date)"
  echo "Job ID: ${SLURM_JOBID}"
  echo "Node list: ${SLURM_NODELIST}"
  echo "Submit dir.: ${SLURM_SUBMIT_DIR}"
  echo "Numb. of cores: ${SLURM_CPUS_PER_TASK}"

  conda activate myenv

  # Continue with your code ...

  # Example
  export SRUN_CPUS_PER_TASK="${SLURM_CPUS_PER_TASK}"
  export OMP_NUM_THREADS=1

  srun ...
Created by: Andrej Sec