Skip to content

Reference#

taxopy.TaxDb #

Create an object of the TaxDb class.

Parameters:

Name Type Description Default
taxdb_dir str

A directory to download NCBI's taxonomy database files to. If the directory does not exist it will be created.

None
nodes_dmp str

The path for a pre-downloaded nodes.dmp file. If both nodes.dmp and names.dmp are supplied NCBI's taxonomy database won't be downloaded.

None
names_dmp str

The path for a pre-downloaded names.dmp file. If both names.dmp and nodes.dmp are supplied NCBI's taxonomy database won't be downloaded.

None
merged_dmp str

The path for a pre-downloaded merged.dmp file.

None
taxdump_url str

The URL of the taxdump file (in .tar.gz) to be downloaded. By default, the latest version of NCBI's taxdump will be fetched.

None
keep_files bool

Keep the nodes.dmp and names.dmp files after the TaxDb object is created. If taxdb_dir was supplied the whole directory will be deleted. By default the files are deleted, unless nodes_dmp, names_dmp, or taxdb_dir were manually supplied.

False

Attributes:

Name Type Description
taxid2name dict

A dictionary where the keys are taxonomic identifiers and the values are their corresponding scientific names.

taxid2all_names dict

A two-level dictionary where the keys are the taxonomic identifiers, yielding a dictionary mapping the kinds of names from the NCBI taxonomy (e.g. "scientific name", "common name") to the corresponding names.

taxid2parent dict

A dictionary where the keys are taxonomic identifiers and the values are the taxonomic identifiers of their corresponding parent taxon.

taxid2rank dict

A dictionary where the keys are taxonomic identifiers and the values are their corresponding ranks.

oldtaxid2newtaxid dict or None

A dictionary where the keys are legacy taxonomic identifiers and the values are their corresponding new identifiers. If pre-downloaded nodes.dmp and names.dmp files were provided but the merged.dmp file was not supplied, this attribute will be None.

Raises:

Type Description
DownloadError

If the download of the taxonomy database fails.

ExtractionError

If the decompression of the taxonomy database fails.

Source code in taxopy/core.py
class TaxDb:
    """
    Create an object of the TaxDb class.

    Parameters
    ----------
    taxdb_dir : str, optional
        A directory to download NCBI's taxonomy database files to. If the
        directory does not exist it will be created.
    nodes_dmp : str, optional
        The path for a pre-downloaded `nodes.dmp` file. If both `nodes.dmp` and
        `names.dmp` are supplied NCBI's taxonomy database won't be downloaded.
    names_dmp : str, optional
        The path for a pre-downloaded `names.dmp` file. If both `names.dmp` and
        `nodes.dmp` are supplied NCBI's taxonomy database won't be downloaded.
    merged_dmp : str, optional
        The path for a pre-downloaded `merged.dmp` file.
    taxdump_url : str, optional
        The URL of the taxdump file (in .tar.gz) to be downloaded. By default,
        the latest version of NCBI's taxdump will be fetched.
    keep_files : bool, default False
        Keep the `nodes.dmp` and `names.dmp` files after the TaxDb object is
        created. If `taxdb_dir` was supplied the whole directory will be deleted.
        By default the files are deleted, unless `nodes_dmp`, `names_dmp`, or
        `taxdb_dir` were manually supplied.

    Attributes
    ----------
    taxid2name : dict
        A dictionary where the keys are taxonomic identifiers and the values are
        their corresponding scientific names.
    taxid2all_names : dict
        A two-level dictionary where the keys are the taxonomic identifiers,
        yielding a dictionary mapping the kinds of names from the NCBI
        taxonomy (e.g. "scientific name", "common name") to the corresponding
        names.
    taxid2parent: dict
        A dictionary where the keys are taxonomic identifiers and the values are
        the taxonomic identifiers of their corresponding parent taxon.
    taxid2rank: dict
        A dictionary where the keys are taxonomic identifiers and the values are
        their corresponding ranks.
    oldtaxid2newtaxid: dict or None
        A dictionary where the keys are legacy taxonomic identifiers and the
        values are their corresponding new identifiers. If pre-downloaded
        `nodes.dmp` and `names.dmp` files were provided but the `merged.dmp`
        file was not supplied, this attribute will be `None`.

    Raises
    ------
    DownloadError
        If the download of the taxonomy database fails.
    ExtractionError
        If the decompression of the taxonomy database fails.
    """

    def __init__(
        self,
        *,
        taxdb_dir: Optional[str] = None,
        taxdump_url: Optional[str] = None,
        nodes_dmp: Optional[str] = None,
        names_dmp: Optional[str] = None,
        merged_dmp: Optional[str] = None,
        keep_files: bool = False,
    ):
        if not taxdb_dir:
            self._taxdb_dir = os.getcwd()
        elif not os.path.isdir(taxdb_dir):
            os.makedirs(taxdb_dir)
            self._taxdb_dir = taxdb_dir
        else:
            self._taxdb_dir = taxdb_dir
        # If `nodes_dmp` and `names_dmp` were not provided:
        if not nodes_dmp or not names_dmp:
            nodes_dmp_path = os.path.join(self._taxdb_dir, "nodes.dmp")
            names_dmp_path = os.path.join(self._taxdb_dir, "names.dmp")
            merged_dmp_path = os.path.join(self._taxdb_dir, "merged.dmp")
            # If the `nodes.dmp` and `names.dmp` files are not in the `taxdb_dir` directory,
            # download the taxonomy from NCBI:
            if not os.path.isfile(nodes_dmp_path) or not os.path.isfile(names_dmp_path):
                (
                    self._nodes_dmp,
                    self._names_dmp,
                    self._merged_dmp,
                ) = self._download_taxonomy(taxdump_url)
            else:
                self._nodes_dmp, self._names_dmp = nodes_dmp_path, names_dmp_path
                # If `merged.dmp` is not in the `taxdb_dir` directory, set the `_merged_dmp`
                # attribute to `None`:
                self._merged_dmp = (
                    merged_dmp_path if os.path.isfile(merged_dmp_path) else None
                )
        else:
            self._nodes_dmp, self._names_dmp = nodes_dmp, names_dmp
            # If `merged_dmp` was not provided, set the `_merged_dmp` attribute to `None`:
            self._merged_dmp = merged_dmp or None
        # If a `merged.dmp` file was provided or downloaded, create the oldtaxid2newtaxid
        # dictionary:
        self._oldtaxid2newtaxid = self._import_merged() if self._merged_dmp else None
        # Create the taxid2parent, taxid2rank, and taxid2name dictionaries:
        self._taxid2parent, self._taxid2rank = self._import_nodes()
        self._taxid2name, self._taxid2all_names = self._import_names()
        # Delete temporary files if `keep_files` is set to `False`, unless
        # `nodes_dmp`, `names_dmp`, or `taxdb_dir` were manually supplied:
        # Determine whether to delete files based on multiple conditions
        if all([
            # User hasn't explicitly requested to keep files
            not keep_files,
            # Files weren't manually provided by user
            (not nodes_dmp or not names_dmp or not taxdb_dir),
        ]):
            self._delete_files()

    @property
    def taxid2name(self) -> Dict[int, str]:
        return self._taxid2name

    @property
    def taxid2all_names(self) -> Dict[int, dict[str, List[str]]]:
        return self._taxid2all_names

    @property
    def taxid2parent(self) -> Dict[int, int]:
        return self._taxid2parent

    @property
    def taxid2rank(self) -> Dict[int, str]:
        return self._taxid2rank

    @property
    def oldtaxid2newtaxid(self) -> Optional[Dict[int, int]]:
        return self._oldtaxid2newtaxid

    def _download_taxonomy(self, url: Optional[str] = None):
        if not url:
            url = "ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz"
        tmp_taxonomy_file = os.path.join(self._taxdb_dir, "taxdump.tar.gz")
        try:
            urllib.request.urlretrieve(url, tmp_taxonomy_file)
        except:
            raise DownloadError(
                "Download of taxonomy files failed. The server may be offline."
            )
        try:
            with tarfile.open(tmp_taxonomy_file) as tf:
                for member in tf.getmembers():
                    if not member.isfile():
                        continue
                    filename = os.path.basename(member.name)
                    if filename not in ("nodes.dmp", "names.dmp", "merged.dmp"):
                        continue
                    m = tf.extractfile(member)
                    with open(os.path.join(self._taxdb_dir, filename), "wb") as fo:
                        while True:
                            chunk = m.read(1024)
                            if not chunk:
                                break
                            fo.write(chunk)
        except:
            raise ExtractionError(
                "Something went wrong while extracting the taxonomy files."
            )
        os.remove(tmp_taxonomy_file)
        return (
            os.path.join(self._taxdb_dir, "nodes.dmp"),
            os.path.join(self._taxdb_dir, "names.dmp"),
            os.path.join(self._taxdb_dir, "merged.dmp"),
        )

    def _import_merged(self):
        oldtaxid2newtaxid = {}
        with open(self._merged_dmp, "r") as f:
            for line in f:
                line = line.split("\t")
                taxid = int(line[0])
                merged = int(line[2])
                oldtaxid2newtaxid[taxid] = merged
        return oldtaxid2newtaxid

    def _import_nodes(self):
        taxid2parent = {}
        taxid2rank = {}
        with open(self._nodes_dmp, "r") as f:
            for line in f:
                line = line.split("\t")
                taxid = int(line[0])
                parent = int(line[2])
                rank = line[4].strip()
                taxid2parent[taxid] = parent
                taxid2rank[taxid] = rank
        if self._merged_dmp:
            for oldtaxid, newtaxid in self._oldtaxid2newtaxid.items():
                taxid2rank[oldtaxid] = taxid2rank[newtaxid]
                taxid2parent[oldtaxid] = taxid2parent[newtaxid]
        return taxid2parent, taxid2rank

    def _import_names(self) -> Tuple[Dict[int, str], Dict[int, dict[str, List[str]]]]:
        taxid2name: Dict[int, str] = {}
        taxid2all_names: Dict[int, dict[str, List[str]]] = {}

        with open(self._names_dmp, "r") as f:
            for line in f:
                fields = line.strip("\n").split("\t")
                kind = fields[6]
                taxid = int(fields[0])
                name = fields[2].strip()

                if kind == "scientific name":
                    taxid2name[taxid] = name

                if taxid not in taxid2all_names:
                    taxid2all_names[taxid] = {}

                if kind not in taxid2all_names[taxid]:
                    taxid2all_names[taxid][kind] = []

                taxid2all_names[taxid][kind].append(name)

        if self._merged_dmp:
            for oldtaxid, newtaxid in self._oldtaxid2newtaxid.items():
                taxid2name[oldtaxid] = taxid2name[newtaxid]
                taxid2all_names[oldtaxid] = taxid2all_names[newtaxid]

        return taxid2name, taxid2all_names

    def _delete_files(self):
        os.remove(self._nodes_dmp)
        os.remove(self._names_dmp)
        if self._merged_dmp:
            os.remove(self._merged_dmp)
        if not os.listdir(self._taxdb_dir) and self._taxdb_dir != os.getcwd():
            os.rmdir(self._taxdb_dir)

taxopy.Taxon #

Create an object of the Taxon class.

Parameters:

Name Type Description Default
taxid int

A NCBI taxonomic identifier.

required
taxdb TaxDb

A TaxDb object.

required

Attributes:

Name Type Description
taxid int

The NCBI taxonomic identifier the object represents (e.g., 9606).

name str

The name of the taxon (e.g., 'Homo sapiens').

all_names dict

All names of the taxon as a dictionary, mapping kind to the list of names (e.g., all_names['authority'] = ['Homo sapiens Linnaeus, 1758'], all_names['genbank common name'] = ['human']. In many cases, only one name is provided, but e.g. for common name multiple names may be available.

rank str

The rank of the taxon (e.g., 'species').

legacy_taxid bool

A boolean that represents whether the NCBI taxonomic identifier was merged to another identifier (True) or not (False). If pre-downloaded nodes.dmp and names.dmp files were provided to build taxdb but the merged.dmp file was not supplied, this attribute will be None.

taxid_lineage list

An ordered list containing the taxonomic identifiers of the whole lineage of the taxon, from the most specific to the most general.

name_lineage list

An ordered list containing the names of the whole lineage of the taxon, from the most specific to the most general.

rank_lineage list

An ordered list containing the rank names of the whole lineage of the taxon, from the most specific to the most general.

ranked_name_lineage list

An ordered list of tuples, where each tuple represents a rank in the lineage, with the first element denoting the rank name and the second indicating the taxon's name.

ranked_taxid_lineage list

An ordered list of tuples, where each tuple represents a rank in the lineage, with the first element denoting the rank name and the second indicating the taxon's taxonomic identifier.

rank_taxid_dictionary dict

A dictionary where the keys are named ranks and the values are the taxids of the taxa that correspond to each of the named ranks in the lineage.

rank_name_dictionary dict

A dictionary where the keys are named ranks and the values are the names of the taxa that correspond to each of the named ranks in the lineage.

Methods:

Name Description
parent

Returns a Taxon object of the parent node.

Raises:

Type Description
TaxidError

If the input integer is not a valid NCBI taxonomic identifier.

Source code in taxopy/core.py
class Taxon:
    """
    Create an object of the Taxon class.

    Parameters
    ----------
    taxid : int
        A NCBI taxonomic identifier.
    taxdb : TaxDb
        A TaxDb object.

    Attributes
    ----------
    taxid : int
        The NCBI taxonomic identifier the object represents (e.g., 9606).
    name: str
        The name of the taxon (e.g., 'Homo sapiens').
    all_names: dict
        All names of the taxon as a dictionary, mapping kind to the list of names
        (e.g., `all_names['authority'] = ['Homo sapiens Linnaeus, 1758']`,
        `all_names['genbank common name'] = ['human']`. In many cases, only one
        name is provided, but e.g. for `common name` multiple names may be available.
    rank: str
        The rank of the taxon (e.g., 'species').
    legacy_taxid: bool
        A boolean that represents whether the NCBI taxonomic identifier was
        merged to another identifier (`True`) or not (`False`). If pre-downloaded
        `nodes.dmp` and `names.dmp` files were provided to build `taxdb` but the
        `merged.dmp` file was not supplied, this attribute will be `None`.
    taxid_lineage: list
        An ordered list containing the taxonomic identifiers of the whole lineage
        of the taxon, from the most specific to the most general.
    name_lineage: list
        An ordered list containing the names of the whole lineage of the taxon,
        from the most specific to the most general.
    rank_lineage: list
        An ordered list containing the rank names of the whole lineage of the
        taxon, from the most specific to the most general.
    ranked_name_lineage : list
        An ordered list of tuples, where each tuple represents a rank in the
        lineage, with the first element denoting the rank name and the second
        indicating the taxon's name.
    ranked_taxid_lineage : list
        An ordered list of tuples, where each tuple represents a rank in the
        lineage, with the first element denoting the rank name and the second
        indicating the taxon's taxonomic identifier.
    rank_taxid_dictionary: dict
        A dictionary where the keys are named ranks and the values are the taxids
        of the taxa that correspond to each of the named ranks in the lineage.
    rank_name_dictionary: dict
        A dictionary where the keys are named ranks and the values are the names
        of the taxa that correspond to each of the named ranks in the lineage.

    Methods
    -------
    parent(taxdb)
        Returns a Taxon object of the parent node.

    Raises
    ------
    TaxidError
        If the input integer is not a valid NCBI taxonomic identifier.
    """

    _legacy_taxid: Optional[bool]

    def __init__(self, taxid: int, taxdb: TaxDb):
        self._taxid = taxid
        if self.taxid not in taxdb.taxid2name:
            raise TaxidError(
                "The input integer is not a valid NCBI taxonomic identifier."
            )
        self._name = taxdb.taxid2name[self.taxid]
        self._all_names = taxdb.taxid2all_names[self.taxid]
        self._rank = taxdb.taxid2rank[self.taxid]
        if taxdb.oldtaxid2newtaxid:
            self._legacy_taxid = self.taxid in taxdb.oldtaxid2newtaxid
        else:
            self._legacy_taxid = None
        self._taxid_lineage = self._find_lineage(taxdb.taxid2parent)
        self._name_lineage = self._convert_to_names(taxdb.taxid2name)
        self._rank_lineage = [taxdb._taxid2rank[taxid] for taxid in self.taxid_lineage]
        (
            self._rank_taxid_dictionary,
            self._rank_name_dictionary,
        ) = self._convert_to_rank_dictionary(taxdb.taxid2rank, taxdb.taxid2name)

    @property
    def taxid(self) -> int:
        return self._taxid

    @property
    def name(self) -> str:
        return self._name

    @property
    def all_names(self) -> Dict[str, List[str]]:
        return self._all_names

    @property
    def rank(self) -> str:
        return self._rank

    @property
    def legacy_taxid(self) -> Optional[bool]:
        return self._legacy_taxid

    @property
    def taxid_lineage(self) -> List[int]:
        return self._taxid_lineage

    @property
    def name_lineage(self) -> List[str]:
        return self._name_lineage

    @property
    def rank_lineage(self) -> List[str]:
        return self._rank_lineage

    @property
    def ranked_taxid_lineage(self) -> List[Tuple[str, int]]:
        return list(zip(self.rank_lineage, self.taxid_lineage))

    @property
    def ranked_name_lineage(self) -> List[Tuple[str, str]]:
        return list(zip(self.rank_lineage, self.name_lineage))

    @property
    def rank_taxid_dictionary(self) -> Dict[str, int]:
        return self._rank_taxid_dictionary

    @property
    def rank_name_dictionary(self) -> Dict[str, str]:
        return self._rank_name_dictionary

    def parent(self, taxdb) -> Taxon:
        """
        Returns the parent node of the taxon.

        Returns
        -------
        Taxon
            The Taxon object of the parent node.
        """
        parent_taxid = taxdb.taxid2parent[self.taxid]
        return Taxon(parent_taxid, taxdb)

    def _find_lineage(self, taxid2parent):
        current_taxid = self.taxid
        lineage = [current_taxid]
        while taxid2parent[current_taxid] != current_taxid:
            current_taxid = taxid2parent[current_taxid]
            lineage.append(current_taxid)
        return lineage

    def _convert_to_names(self, taxid2name):
        return [taxid2name[taxid] for taxid in self.taxid_lineage]

    def _convert_to_rank_dictionary(self, taxid2rank, taxid2name):
        rank_taxid_dictionary = OrderedDict()
        rank_name_dictionary = OrderedDict()
        for taxid in self.taxid_lineage:
            rank = taxid2rank[taxid]
            if rank != "no rank":
                rank_taxid_dictionary[rank] = taxid
                rank_name_dictionary[rank] = taxid2name[taxid]
        return rank_taxid_dictionary, rank_name_dictionary

    def __str__(self) -> str:
        lineage = [
            f"{rank[0]}__{name}" for rank, name in self.rank_name_dictionary.items()
        ]
        return ";".join(reversed(lineage))

    def __repr__(self) -> str:
        return str(self)

    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Taxon):
            return NotImplemented
        return self.taxid_lineage == other.taxid_lineage

    def __hash__(self) -> int:
        return hash(self.taxid)

parent #

parent(taxdb) -> Taxon

Returns the parent node of the taxon.

Returns:

Type Description
Taxon

The Taxon object of the parent node.

Source code in taxopy/core.py
def parent(self, taxdb) -> Taxon:
    """
    Returns the parent node of the taxon.

    Returns
    -------
    Taxon
        The Taxon object of the parent node.
    """
    parent_taxid = taxdb.taxid2parent[self.taxid]
    return Taxon(parent_taxid, taxdb)

taxopy.find_lca #

find_lca(taxon_list: List[Taxon], taxdb: TaxDb) -> Taxon

Takes a list of multiple Taxon objects and returns their lowest common ancestor (LCA).

Parameters:

Name Type Description Default
taxon_list list of Taxon

A list containing at least two Taxon objects.

required
taxdb TaxDb

A TaxDb object.

required

Returns:

Type Description
_AggregatedTaxon

The _AggregatedTaxon object of the lowest common ancestor (LCA) of the inputs.

Raises:

Type Description
LCAError

When the input list contains fewer than two Taxon objects or when no taxa are common across the provided lineages.

Source code in taxopy/utilities.py
def find_lca(taxon_list: List[Taxon], taxdb: TaxDb) -> Taxon:
    """
    Takes a list of multiple Taxon objects and returns their lowest common
    ancestor (LCA).

    Parameters
    ----------
    taxon_list : list of Taxon
        A list containing at least two Taxon objects.
    taxdb : TaxDb
        A TaxDb object.

    Returns
    -------
    _AggregatedTaxon
        The _AggregatedTaxon object of the lowest common ancestor (LCA) of the
        inputs.

    Raises
    ------
    LCAError
        When the input list contains fewer than two Taxon objects or when no
        taxa are common across the provided lineages.
    """
    if len(taxon_list) < 2:
        raise LCAError("The input list must contain at least two Taxon objects.")
    lineage_list = [taxon.taxid_lineage for taxon in taxon_list]
    overlap = set.intersection(*map(set, lineage_list))
    for taxid in lineage_list[0]:
        if taxid in overlap:
            aggregated_taxa = [taxon.taxid for taxon in taxon_list]
            return _AggregatedTaxon(taxid, taxdb, 1.0, aggregated_taxa)
    raise LCAError("No taxon is shared by the input lineages.")

taxopy.find_majority_vote #

find_majority_vote(taxon_list: List[Taxon], taxdb: TaxDb, fraction: float = 0.5, weights: Optional[List[float]] = None) -> Taxon

Takes a list of multiple Taxon objects and returns the most specific taxon that is shared by more than the chosen fraction of the input lineages.

Parameters:

Name Type Description Default
taxon_list list of Taxon

A list containing at least two Taxon objects.

required
taxdb TaxDb

A TaxDb object.

required
fraction float

The returned taxon will be shared by more than fraction of the input taxa lineages. This value must be greater than 0.0 and less than 1.0.

0.5
weights Optional[List[float]]

A list of weights associated with the taxa lineages in taxon_list. These values are used to weight the votes of their associated lineages.

None

Returns:

Type Description
_AggregatedTaxon

The _AggregatedTaxon object of the most specific taxon that is shared by more than the chosen fraction of the input lineages.

Raises:

Type Description
MajorityVoteError

If any of the following conditions occur: the input taxon list contains fewer than two Taxon objects; the fraction parameter is less than or equal to 0.0 or greater than or equal to 1.0; or there are no taxa common to the input lineages.

Source code in taxopy/utilities.py
def find_majority_vote(
    taxon_list: List[Taxon],
    taxdb: TaxDb,
    fraction: float = 0.5,
    weights: Optional[List[float]] = None,
) -> Taxon:
    """
    Takes a list of multiple Taxon objects and returns the most specific taxon
    that is shared by more than the chosen fraction of the input lineages.

    Parameters
    ----------
    taxon_list : list of Taxon
        A list containing at least two Taxon objects.
    taxdb : TaxDb
        A TaxDb object.
    fraction: float, default 0.5
        The returned taxon will be shared by more than `fraction` of the input
        taxa lineages. This value must be greater than 0.0 and less than 1.0.
    weights: list of float, optional
        A list of weights associated with the taxa lineages in `taxon_list`.
        These values are used to weight the votes of their associated lineages.

    Returns
    -------
    _AggregatedTaxon
        The _AggregatedTaxon object of the most specific taxon that is shared by
        more than the chosen fraction of the input lineages.

    Raises
    ------
    MajorityVoteError
        If any of the following conditions occur: the input taxon list contains
        fewer than two Taxon objects; the fraction parameter is less than or
        equal to 0.0 or greater than or equal to 1.0; or there are no taxa
        common to the input lineages.
    """
    if fraction <= 0.0 or fraction >= 1:
        raise MajorityVoteError(
            "The `fraction` parameter must be greater than 0.0 and less than 1."
        )
    if len(taxon_list) < 2:
        raise MajorityVoteError(
            "The input taxon list must contain at least two Taxon objects."
        )
    if weights and len(taxon_list) != len(weights):
        raise MajorityVoteError(
            "The input taxon and weights lists must have the same length."
        )
    if weights:
        majority_vote = _weighted_majority_vote(taxon_list, taxdb, fraction, weights)
    else:
        majority_vote = _unweighted_majority_vote(taxon_list, taxdb, fraction)
    if majority_vote:
        return majority_vote
    else:
        raise MajorityVoteError("No taxon is shared by the input lineages.")

taxopy.taxid_from_name #

taxid_from_name(names: Union[str, List[str]], taxdb: TaxDb, fuzzy: bool = False, score_cutoff: float = 0.9) -> Union[List[int], List[List[int]]]

Takes one (or more) taxon name and returns a list (or list of lists) containing the taxonomic identifiers associated with it (or them).

Parameters:

Name Type Description Default
names str or list of str

The name of the taxon whose taxonomic identifier will be returned. A list of names can also be provided.

required
taxdb TaxDb

A TaxDb object.

required
fuzzy bool

If True, the input name will be matched to the taxa names in the database using fuzzy string matching.

False
score_cutoff float

The minimum score required for a match to be considered valid when fuzzy string matching is used. This value must be between 0.0 and 1.0.

0.9

Returns:

Type Description
list or list of list

A list of all the taxonomic identifiers associated with the input taxon name. If a list of names is provided, a list of lists is returned.

Source code in taxopy/utilities.py
def taxid_from_name(
    names: Union[str, List[str]],
    taxdb: TaxDb,
    fuzzy: bool = False,
    score_cutoff: float = 0.9,
) -> Union[List[int], List[List[int]]]:
    """
    Takes one (or more) taxon name and returns a list (or list of lists)
    containing the taxonomic identifiers associated with it (or them).

    Parameters
    ----------
    names : str or list of str
        The name of the taxon whose taxonomic identifier will be returned. A
        list of names can also be provided.
    taxdb : TaxDb
        A TaxDb object.
    fuzzy : bool, default False
        If True, the input name will be matched to the taxa names in the
        database using fuzzy string matching.
    score_cutoff : float, default 0.9
        The minimum score required for a match to be considered valid when
        fuzzy string matching is used. This value must be between 0.0 and 1.0.

    Returns
    -------
    list or list of list
        A list of all the taxonomic identifiers associated with the input taxon
        name. If a list of names is provided, a list of lists is returned.
    """
    score_cutoff = score_cutoff * 100
    if isinstance(names, list):
        taxid_list = _get_taxid_from_multiple_names(names, taxdb, fuzzy, score_cutoff)
        if not all(len(taxids) for taxids in taxid_list):
            warnings.warn(
                "At least one of the input names was not found in the taxonomy database.",
                Warning,
            )
    else:
        taxid_list = _get_taxid_from_single_name(names, taxdb, fuzzy, score_cutoff)
        if not len(taxid_list):
            warnings.warn(
                "The input name was not found in the taxonomy database.", Warning
            )
    return taxid_list