Skip to content

Introduction#

About taxopy#

taxopy is a Python package that provides an interface for assessing NCBI-formatted taxonomic databases. It enables various operations on taxonomic data, such as obtaining complete lineages, determining the lowest common ancestors (LCAs), retrieving taxa names from taxonomic identifiers, and more.

Installation#

You can install taxopy on your computer using Python's pip, uv, or through the pixi, conda or mamba package managers:

$ pip install taxopy
$ uv init example
$ cd example
$ uv add taxopy
$ pixi init --channel conda-forge --channel bioconda example
$ cd example
$ pixi add taxopy
$ conda create -n taxopy-env -c conda-forge -c bioconda taxopy
$ conda activate taxopy-env
$ mamba create -n taxopy-env -c conda-forge -c bioconda taxopy
$ mamba activate taxopy-env

Enabling fuzzy search of taxon names

taxopy supports fuzzy string matching to search for taxa with names that are similar but not identical to the queries. This feature is not enabled by default to avoid additional dependencies. However, you can enable it by installing the fuzzy-matching extra using pip or uv:

$ pip install taxopy[fuzzy-matching]
$ uv init example
$ cd example
$ uv add taxopy --extra fuzzy-matching

Alternatively, you can install the rapidfuzz library alongside taxopy:

$ pip install taxopy rapidfuzz
$ uv init example
$ cd example
$ uv add taxopy rapidfuzz
$ pixi init --channel conda-forge --channel bioconda example
$ cd example
$ pixi add taxopy rapidfuzz
$ conda create -n taxopy-env -c conda-forge -c bioconda taxopy rapidfuzz
$ conda activate taxopy-env
$ mamba create -n taxopy-env -c conda-forge -c bioconda taxopy rapidfuzz
$ mamba activate taxopy-env

Acknowledgements#

Some of the code used to parse taxdump files in taxopy was adapted from CAT/BAT1, a tool for taxonomic assignment of contigs and metagenome-assembled genomes.


  1. Von Meijenfeldt, F. A. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H. & Dutilh, B. E. "Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT". Genome Biology 20, 217 (2019).