Fuzzy string matching#

import bionty as bt
/home/runner/work/bionty/bionty/.nox/build-package-bionty/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
ct = bt.CellType()
ct.fuzzy_match("T cells", ct.name)
ontology_id definition synonyms children __ratio__
name
T cell CL:0000084 A Type Of Lymphocyte Whose Defining Characteri... T-lymphocyte|T lymphocyte|T-cell [CL:0002419, CL:0000789, CL:0000798, CL:0002420] 92.307692

By default, fuzzy_match also matches against synonyms:

ct.fuzzy_match("P cell", ct.name)
ontology_id definition synonyms children __ratio__
name
nodal myocyte CL:0002072 A Specialized Cardiac Myocyte In The Sinoatria... cardiac pacemaker cell|myocytus nodalis|P cell [CL:1000410, CL:1000409] 100.0

You can turn off synonym matching with synonyms_field=None:

ct.fuzzy_match("P cell", ct.name, synonyms_field=None)
ontology_id definition synonyms children __ratio__
name
PP cell CL:0000696 A Cell That Stores And Secretes Pancreatic Pol... type F enteroendocrine cell [CL:0002680] 92.307692

Return all results ranked by matching ratios:

ct.fuzzy_match("P cell", ct.name, return_ranked_results=True).head()
ontology_id definition synonyms children __ratio__
name
nodal myocyte CL:0002072 A Specialized Cardiac Myocyte In The Sinoatria... cardiac pacemaker cell|myocytus nodalis|P cell [CL:1000410, CL:1000409] 100.000000
double-positive, alpha-beta thymocyte CL:0000809 A Thymocyte Expressing The Alpha-Beta T Cell R... double-positive, alpha-beta immature T lymphoc... [CL:0002427, CL:0002428, CL:0002429, CL:000243... 92.307692
PP cell CL:0000696 A Cell That Stores And Secretes Pancreatic Pol... type F enteroendocrine cell [CL:0002680] 92.307692
pigmented ciliary epithelial cell CL:0002303 A Cell That Is Part Of Pigmented Ciliary Epith... PE cell [] 92.307692
GIP cell CL:0002278 An Enteroendocrine Cell Of Duodenum And Jejunu... type K enteroendocrine cell [] 85.714286

Tied results will all be returns:

ct.fuzzy_match("A cell", ct.name, synonyms_field=None)
ontology_id definition synonyms children __ratio__
name
T cell CL:0000084 A Type Of Lymphocyte Whose Defining Characteri... T-lymphocyte|T lymphocyte|T-cell [CL:0002419, CL:0000789, CL:0000798, CL:0002420] 83.333333
B cell CL:0000236 A Lymphocyte Of B Lineage That Is Capable Of B... B lymphocyte|B-cell|B-lymphocyte [CL:0009114, CL:0001201] 83.333333