Skip to content

Apptainer

Apptainer is a software for creating and running containers.

We will install Apptainer using Pixi. Before proceeding, ensure that you have followed the Pixi installation guide. Once Pixi is installed in your environment, you can install Apptainer globally by running:

Terminal window
pixi global install apptainer

InterProScan is a widely used tool for annotating protein sequences. It can classify sequences into families and predict the presence of domains, repeats, and functional sites. By integrating several analysis tools, InterProScan compares input sequences against reference entries from the InterPro consortium’s member databases, providing comprehensive functional annotations in a single run.

That said, installing InterProScan can be challenging due to its many dependencies, and recent versions are no longer available through Bioconda. A practical solution is to run it inside a container, which makes the installation process much easier and keeps the environment self‑contained.

This guide explains how to run InterProScan version 5.75‑106.0 locally using Apptainer and use it to annotate the proteome of Promethearchaeum syntrophicum.

  1. Create a directory within your $HOME to store SIF images:

    Terminal window
    mkdir -p $HOME/images
  2. Pull the InterProScan image from Docker Hub:

    Terminal window
    apptainer pull "$HOME/images/interproscan.sif" "docker://interpro/interproscan:5.75-106.0"
  3. Download and extract the InterProScan data:

    Terminal window
    wcurl "http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.75-106.0/alt/interproscan-data-5.75-106.0.tar.gz"
    tar zxfv interproscan-data-5.75-106.0.tar.gz
  4. To simplify running InterProScan, we’ll define a wrapper function named interproscan that will let you run the tool via a simple command, skipping the full apptainer syntax each time.

    To set it up, edit your .bashrc file, which is located in your home directory, as shown below:

    • Directory$HOME
      • .bashrc

    Next, add the function below to the end of the file:

    .bashrc
    interproscan() {
    local output_dir=""
    local data_dir=""
    local i=1
    # If -help or --help is given, run interproscan.sh --help and exit
    if [[ "$1" == "-help" || "$1" == "--help" ]]; then
    apptainer --silent exec \
    "$HOME/images/interproscan.sif" \
    /opt/interproscan/interproscan.sh --help
    return 0
    fi
    # Show usage if no arguments given
    if [[ $# -eq 0 ]]; then
    echo "Usage:"
    echo " interproscan --data-dir <path> [interproscan.sh arguments]"
    echo
    echo "Example:"
    echo " interproscan --data-dir /path/to/data --output-dir /path/to/output \\"
    echo " --applications Pfam,NCBIfam --disable-precalc \\"
    echo " --cpu 16 --input /path/to/input.faa"
    echo
    echo "Documentation for interproscan.sh:"
    echo " https://interproscan-docs.readthedocs.io/en/v5/HowToRun.html"
    return 1
    fi
    # Parse arguments to find output-dir and data-dir
    while [[ $i -le $# ]]; do
    if [[ "${!i}" == "--output-dir" ]] && [[ $((i+1)) -le $# ]]; then
    ((i++))
    output_dir="${!i}"
    elif [[ "${!i}" == "--data-dir" ]] && [[ $((i+1)) -le $# ]]; then
    ((i++))
    data_dir="${!i}"
    fi
    ((i++))
    done
    if [[ -z "$data_dir" ]]; then
    echo "Error: --data-dir is required" >&2
    return 1
    fi
    # Check if output directory exists and is not empty
    if [[ -d "$output_dir" ]] && [[ -n "$(ls -A "$output_dir" 2>/dev/null)" ]]; then
    echo "Error: Output directory '$output_dir' is not empty" >&2
    return 1
    fi
    # Verify data directory exists
    if [[ ! -d "$data_dir" ]]; then
    echo "Error: Data directory '$data_dir' does not exist" >&2
    return 1
    fi
    # Create output directory
    mkdir -p "$output_dir"
    # Filter out --data-dir from arguments since it's not passed to interproscan.sh
    local args=()
    i=1
    while [[ $i -le $# ]]; do
    if [[ "${!i}" == "--data-dir" ]]; then
    ((i++)) # Skip the flag
    ((i++)) # Skip the value
    else
    args+=("${!i}")
    ((i++))
    fi
    done
    # Execute interproscan with proper mounts
    apptainer --silent exec \
    -B "$data_dir/data:/opt/interproscan/data" \
    "$HOME/images/interproscan.sif" \
    /opt/interproscan/interproscan.sh \
    "${args[@]}"
    }

    Then, source .bashrc to load the new function into your active shell:

    Terminal window
    source $HOME/.bashrc
  5. Download the proteome of Promethearchaeum syntrophicum from UniProt:

    Terminal window
    wcurl --output UP000321408.faa "https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=%28%28proteome%3AUP000321408%29%29"
  6. Execute InterProScan to annotate the proteins by searching the Pfam, NCBIfam, CDD, and HAMAP databases:

    Terminal window
    interproscan \
    --applications Pfam,NCBIfam,CDD,HAMAP --iprlookup --goterms --pathways \
    --data-dir interproscan-5.75-106.0 --output-dir UP000321408_interproscan \
    --cpu 16 --input UP000321408.faa

    The annotation results are generated in multiple formats and written to separate output files:

    • DirectoryUP000321408_interproscan
      • UP000321408.faa.gff3
      • UP000321408.faa.json
      • UP000321408.faa.tsv
      • UP000321408.faa.xml
Terminal window
mkdir $HOME/images
apptainer pull "$HOME/images/samtools.sif" "docker://quay.io/biocontainers/samtools:1.22.1--h96c455f_0"
apptainer --silent exec "$HOME/images/samtools.sif" samtools --help
Terminal window
samtools() {
apptainer --silent exec "$HOME/images/samtools.sif" samtools "$@"
}
Terminal window
apptainer cache clean --force