Usage
from lcastar import LcaStar, Lineage
Taxonomic inputs can be obtained from any source
Via scientific name (genus species)
Conversion taxonomic hierarchy is performed with the ETE toolkit.
orf_hits = [
"Muribaculaceae bacterium",
"Muribaculaceae bacterium",
"Bacteroidales bacterium",
"Muribaculaceae bacterium",
"Alistipes senegalensis",
]
tree = LcaStar()
for sci_name in orf_hits:
lin = Lineage.FromSciName(sci_name)
assert lin is not None
tree.NewObservation(lin)
Via NCBI taxonomy ID
orf_hits = [
2498093,
2498093,
2030927,
2498093,
1288121,
]
tree = LcaStar()
for tax_id in orf_hits:
lin = Lineage.FromTaxID(tax_id)
assert lin is not None
tree.NewObservation(lin)
Via custom taxonomy
Custom hierarchies can be used by providing traversal paths from the root node to each hit as a list of tuples in the format (rank, name). Paths need not be complete or reach the leaf. LCA* will do its best to construct the needed portions of the reference hierarchy. In this mode, LCA* is not limited to biological taxonomies.
orf_hits = [
[
("D", "Bacteria"),
("C", "Gammaproteobacteria"),
("G", "Escherichia"),
],
[
("D", "Bacteria"),
("C", "Gammaproteobacteria"),
("G", "Pseudomonas"),
],
[
("D", "Bacteria"),
("C", "Gammaproteobacteria"),
],
]
tree = LcaStar()
for path_from_root in orf_hits:
lin = Lineage(path_from_root)
tree.NewObservation(lin)
Example output
for node in tree.BestLineage():
print(node.level, node.name, node.fraction_votes, node.p_value)
Expected output of the examples using NCBI taxonomy ID and scientific name.
superkingdom Bacteria 1.0 0.08273697918531309
clade FCB group 1.0 0.08273697918531309
clade Bacteroidota/Chlorobiota group 1.0 0.08273697918531309
phylum Bacteroidota 1.0 0.08273697918531309
class Bacteroidia 1.0 0.08273697918531309
order Bacteroidales 1.0 0.08273697918531309
species Bacteroidales bacterium 0.2 1.0