Skip to content

Introduction#

About taxopy#

taxopy is a Python package that provides an interface for assessing NCBI-formatted taxonomic databases. It enables various operations on taxonomic data, such as obtaining complete lineages, determining the lowest common ancestors (LCAs), retrieving taxa names from taxonomic identifiers, and more.

Installation#

You can install taxopy on your computer using Python's pip, uv, or through the pixi, conda or mamba package managers:

$ pip install taxopy
$ uv init example
$ cd example
$ uv add taxopy
$ pixi init --channel conda-forge --channel bioconda example
$ cd example
$ pixi add taxopy
$ conda create -n taxopy-env -c conda-forge -c bioconda taxopy
$ conda activate taxopy-env
$ mamba create -n taxopy-env -c conda-forge -c bioconda taxopy
$ mamba activate taxopy-env

Enabling fuzzy search of taxon names

taxopy supports fuzzy string matching to [search for taxa with names that are similar but not identical to the queries][retrieval-of-taxa-with-nearly-matching-names-though-fuzzy-search]. This feature is not enabled by default to avoid additional dependencies. However, you can enable it by installing the fuzzy-matching extra using pip or uv:

$ pip install taxopy[fuzzy-matching]
$ uv init example
$ cd example
$ uv add taxopy --extra fuzzy-matching

Alternatively, you can install the rapidfuzz library alongside taxopy:

$ pip install taxopy rapidfuzz
$ uv init example
$ cd example
$ uv add taxopy rapidfuzz
$ pixi init --channel conda-forge --channel bioconda example
$ cd example
$ pixi add taxopy rapidfuzz
$ conda create -n taxopy-env -c conda-forge -c bioconda taxopy rapidfuzz
$ conda activate taxopy-env
$ mamba create -n taxopy-env -c conda-forge -c bioconda taxopy rapidfuzz
$ mamba activate taxopy-env

Acknowledgements#

Some of the code used to parse taxdump files in taxopy was adapted from CAT/BAT1, a tool for taxonomic assignment of contigs and metagenome-assembled genomes.


  1. Von Meijenfeldt, F. A. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H. & Dutilh, B. E. "Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT". Genome Biology 20, 217 (2019).