Skip to content

Introduction#

About taxopy#

taxopy is a Python package that provides an interface for assessing NCBI-formatted taxonomic databases. It enables various operations on taxonomic data, such as obtaining complete lineages, determining the lowest common ancestors (LCAs), retrieving taxa names from taxonomic identifiers, and more.

Installation#

You can install taxopy on your computer using Python's pip or through the conda or mamba package managers:

$ pip install taxopy
$ conda install -c conda-forge -c bioconda taxopy
$ mamba install -c conda-forge -c bioconda taxopy

Enabling fuzzy search of taxon names

taxopy supports fuzzy string matching to search for taxa with names that are similar but not identical to the queries. This feature is not enabled by default to avoid additional dependencies. However, you can enable it by installing the fuzzy-matching extra using pip:

$ pip install taxopy[fuzzy-matching]

Alternatively, you can install the rapidfuzz library alongside taxopy:

$ pip install taxopy rapidfuzz
$ conda install -c conda-forge -c bioconda taxopy rapidfuzz
$ mamba install -c conda-forge -c bioconda taxopy rapidfuzz

Acknowledgements#

Some of the code used to parse taxdump files in taxopy was adapted from CAT/BAT1, a tool for taxonomic assignment of contigs and metagenome-assembled genomes.


  1. Von Meijenfeldt, F. A. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H. & Dutilh, B. E. "Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT". Genome Biology 20, 217 (2019).