StructSense Pipeline Upgrade: Specialized Tools, Robust Chunking, and BioPortal Integration#62
StructSense Pipeline Upgrade: Specialized Tools, Robust Chunking, and BioPortal Integration#62tekrajchhetri wants to merge 97 commits intomainfrom
Conversation
Note: to disable traces run "crewai traces disable"
…han just spacy NER
sample output
{
"text": "photopic spectral sensitivity curves",
"label": "CONCEPT",
"occurrences": [
{
"start": 51,
"end": 87,
"global_start": 1127,
"global_end": 1163,
"sentence": "Recently, electroretinogram (ERG) responses of the photopic spectral sensitivity curves of photoreceptors of rats and mice were measured throughout the UV-visible spectrum (300-700 nm) (Rocha et al., 2016)."
}
]
},
{
"text": "photoreceptors",
"label": "ANATOMICAL-CONCEPT",
"occurrences": [
{
"start": 91,
"end": 105,
"global_start": 1167,
"global_end": 1181,
"sentence": "Recently, electroretinogram (ERG) responses of the photopic spectral sensitivity curves of photoreceptors of rats and mice were measured throughout the UV-visible spectrum (300-700 nm) (Rocha et al., 2016)."
}
]
},
{
"text": "rats and mice",
"label": "ORGANISM",
"occurrences": [
{
"start": 109,
"end": 122,
"global_start": 1185,
"global_end": 1198,
"sentence": "Recently, electroretinogram (ERG) responses of the photopic spectral sensitivity curves of photoreceptors of rats and mice were measured throughout the UV-visible spectrum (300-700 nm) (Rocha et al., 2016)."
}
]
},
…loading of models.
I was thinking that tutorial/Readme.md could be used in the But I would also include the information about everything else that have to be run before running the structsense command, e.g., edit |
|
@djarecka After your comment, I had put the content of |
|
first comments based on testing tutorial:
|
pyproject.toml: Add en_core_web_sm as a direct dependency via URL so the model is installed at project setup time rather than downloaded at runtime via spacy.cli.download(), which uses pip internally and fails in non-pip environments (e.g. uv). Also narrow litellm version upper bound from <1.80.0 to <1.60.0 to avoid a regression where newer versions accidentally import litellm.proxy.proxy_server, pulling in proxy-only dependencies (apscheduler) even when the proxy is not used. ner_tool.py: Remove automatic spaCy model download at runtime; raise an explicit error if the model is not installed, prompting the user to install it manually.
|
@tekrajchhetri @djarecka, I have made a few changes to resolve some of the issues I faced when running. #64 |
pyproject.toml: Tighten spacy version constraint from ^3.8.11 to >=3.8.11,<3.9.0 to ensure compatibility with the pinned en_core_web_sm-3.8.0 model, which requires spaCy 3.8.x. ner_tool.py: Update error message to direct users to run 'poetry install' or 'uv sync' instead of 'python -m spacy download', consistent with the model now being a project dependency.
@tekrajchhetri - please take a look, and as I mentioned before I think the dependencies were not updated, I had to install |
|
@tekrajchhetri - I'm wondering how many entities you would expect for running the examples from tutorial (i.e., I also run for some longer paper that @puja-trivedi was testing and I got up to 3 |
@djarecka I did not run in non-chunking mode. |
Do you think chunking should be required with such a short text? But I also run with chunking, i.e., Content of resource_extraction_example_chunk_2.json |
|
@djarecka for the resource extraction, the output looks like what you've shared. It's not NER. And regarding chunking, if the provided text is smaller, it'll not processed in chunked mode. |
|
it looks like the resource config explicitly states:
not clear why it should be a single resource entry per input. |
@satra That needs to be updated but I don't think it's much of an impact as it's taken from the old config and we're able to extract multiple resources in the past and now. For example, the output from the current implementation. @djarecka |
pyproject.toml: Remove en-core-web-sm direct dependency; the spaCy model will be downloaded at runtime via spacy.cli.download() instead, which works in uv environments by running 'uv add pip'. ner_tool.py: Restore spacy.cli.download() fallback and re-add the from spacy.cli import download import.
Update dependencies and spaCy model loading
…into ups/improvement
|
@tekrajchhetri - coming back to the tutorial, since I'm trying to understand what you want to show there and what would be useful, I have a few comments:
|
|
@djarecka what you ran and the output you got is okay. My only point is that the reason why the output is not exhaustive is because the Regarding config file, what do you mean how I come up with config file. It follows the crew.ai, the standard and its the same old config files just updated with things removed not necessary now. Below is the NER for Details{
"entities": [
{
"entity": "TH94252310589",
"label": "ORGANIZATION",
"start": 30,
"end": 43,
"weighted_score": 0.565,
"model_count": 2,
"occurrences": [
{
"start": 30,
"end": 38,
"global_start": 30,
"global_end": 38,
"sentence": "Untitled Section 1\n['Defense (TH94252310589) to DJZ and VEK."
},
{
"start": 30,
"end": 43,
"global_start": 30,
"global_end": 43,
"sentence": "Untitled Section 1\n['Defense (TH94252310589) to DJZ and VEK."
}
],
"provenance": [
{
"label": "ORGANIZATION",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "TH94252310589"
}
]
},
{
"label": "bio",
"vote_weight": 3.0,
"sources": [
{
"source_model": "mobashgr/BC5CDR-chem-WLT-384-BioELECTRA-Pubmed-ENS-20-5",
"weight": 3.0,
"entity": "th942523"
}
]
}
],
"ontology_id": null,
"ontology_label": null,
"ontology": null,
"concept_mapping_provenance": "tool",
"judge_score": 0.8,
"remarks": "Entity correctly identified as an organization, but more context could clarify its role.",
"label_ontology_id": "http://purl.obolibrary.org/obo/OBI_0000245",
"label_ontology_label": "organization",
"label_ontology": "CCONT",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "DJZ",
"label": "PERSON",
"start": 48,
"end": 51,
"weighted_score": 0.796,
"model_count": 2,
"occurrences": [
{
"start": 48,
"end": 51,
"global_start": 48,
"global_end": 51,
"sentence": "Untitled Section 1\n['Defense (TH94252310589) to DJZ and VEK."
}
],
"provenance": [
{
"label": "PERSON",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "DJZ"
}
]
},
{
"label": "ORG",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "DJZ"
}
]
}
],
"ontology_id": null,
"ontology_label": null,
"ontology": null,
"concept_mapping_provenance": "tool",
"judge_score": 0.9,
"remarks": "Strong identification of DJZ as a person, supported by context, yet clarity in role could enhance understanding.",
"label_ontology_id": "http://www.semanticweb.org/mca/ontologies/2018/8/untitled-ontology-47#Person",
"label_ontology_label": "Person",
"label_ontology": "IBD",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "VEK",
"label": "PERSON",
"start": 56,
"end": 59,
"weighted_score": 0.796,
"model_count": 2,
"occurrences": [
{
"start": 56,
"end": 59,
"global_start": 56,
"global_end": 59,
"sentence": "Untitled Section 1\n['Defense (TH94252310589) to DJZ and VEK."
}
],
"provenance": [
{
"label": "PERSON",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "VEK"
}
]
},
{
"label": "ORG",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "VEK"
}
]
}
],
"ontology_id": null,
"ontology_label": null,
"ontology": null,
"concept_mapping_provenance": "tool",
"judge_score": 0.9,
"remarks": "VEK is well identified as a person, contextually supported, though further role clarification is needed.",
"label_ontology_id": "http://www.semanticweb.org/mca/ontologies/2018/8/untitled-ontology-47#Person",
"label_ontology_label": "Person",
"label_ontology": "IBD",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "imaging",
"label": "Diagnostic_procedure",
"start": 61,
"end": 68,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 0,
"end": 7,
"global_start": 61,
"global_end": 68,
"sentence": "Imaging was performed using the MICA Leica microscope in the Division of Neuropathology, supported by the Johns Hopkins Alzheimers Disease Research Center (ADRC; P30 AG066507), and support was also provided by P30 EY001765."
}
],
"provenance": [
{
"label": "Diagnostic_procedure",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "imaging"
}
]
}
],
"ontology_id": "http://purl.bioontology.org/ontology/PMR.owl#Imaging",
"ontology_label": "Imaging",
"ontology": "PMR",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Well-defined as a diagnostic procedure with strong supportive context.",
"label_ontology_id": "http://www.semanticweb.org/ontologies/2010/10/BPO.owl#diagnostic_procedure",
"label_ontology_label": "diagnostic_procedure",
"label_ontology": "BHO",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "mica leica microscope",
"label": "Detailed_description",
"start": 93,
"end": 114,
"weighted_score": 0.505,
"model_count": 3,
"occurrences": [
{
"start": 32,
"end": 53,
"global_start": 93,
"global_end": 114,
"sentence": "Imaging was performed using the MICA Leica microscope in the Division of Neuropathology, supported by the Johns Hopkins Alzheimers Disease Research Center (ADRC; P30 AG066507), and support was also provided by P30 EY001765."
}
],
"provenance": [
{
"label": "Detailed_description",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "mica leica microscope"
}
]
},
{
"label": "CELL_TYPE",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "MICA Leica microscope"
}
]
},
{
"label": "PRODUCT",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "Leica"
}
]
}
],
"ontology_id": "http://purl.obolibrary.org/obo/GAZ_00382380",
"ontology_label": "Leica",
"ontology": "GAZ",
"concept_mapping_provenance": "tool",
"judge_score": 0.7,
"remarks": "Described adequately but lacks specificity in context as a detailed description.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "Division of Neuropathology",
"label": "ORGANIZATION",
"start": 118,
"end": 148,
"weighted_score": 0.796,
"model_count": 2,
"occurrences": [
{
"start": 57,
"end": 87,
"global_start": 118,
"global_end": 148,
"sentence": "Imaging was performed using the MICA Leica microscope in the Division of Neuropathology, supported by the Johns Hopkins Alzheimers Disease Research Center (ADRC; P30 AG066507), and support was also provided by P30 EY001765."
}
],
"provenance": [
{
"label": "ORGANIZATION",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "Division of Neuropathology"
}
]
},
{
"label": "ORG",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "the Division of Neuropathology"
}
]
}
],
"ontology_id": "http://sbmi.uth.tmc.edu/ontology/ochv#C0876934",
"ontology_label": "neuropathology",
"ontology": "OCHV",
"concept_mapping_provenance": "tool",
"judge_score": 0.9,
"remarks": "Accurately identified as an organization with substantial contextual support.",
"label_ontology_id": "http://purl.obolibrary.org/obo/OBI_0000245",
"label_ontology_label": "organization",
"label_ontology": "CCONT",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "Johns Hopkins Alzheimers Disease Research Center",
"label": "ORGANIZATION",
"start": 167,
"end": 215,
"weighted_score": 0.661,
"model_count": 2,
"occurrences": [
{
"start": 106,
"end": 154,
"global_start": 167,
"global_end": 215,
"sentence": "Imaging was performed using the MICA Leica microscope in the Division of Neuropathology, supported by the Johns Hopkins Alzheimers Disease Research Center (ADRC; P30 AG066507), and support was also provided by P30 EY001765."
}
],
"provenance": [
{
"label": "ORGANIZATION",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "Johns Hopkins Alzheimers Disease Research Center"
}
]
},
{
"label": "bio",
"vote_weight": 2.0,
"sources": [
{
"source_model": "mobashgr/NCBI-disease-WLT-256-SciBERT-13INS",
"weight": 2.0,
"entity": "alzheimers disease"
}
]
}
],
"ontology_id": "http://purl.obolibrary.org/obo/GAZ_00145328",
"ontology_label": "Johns Hopkins Glacier",
"ontology": "GAZ",
"concept_mapping_provenance": "tool",
"judge_score": 0.8,
"remarks": "Identified as an organization with relevant context but could benefit from further specificity.",
"label_ontology_id": "http://purl.obolibrary.org/obo/OBI_0000245",
"label_ontology_label": "organization",
"label_ontology": "CCONT",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "ADRC",
"label": "ORGANIZATION",
"start": 217,
"end": 221,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 156,
"end": 160,
"global_start": 217,
"global_end": 221,
"sentence": "Imaging was performed using the MICA Leica microscope in the Division of Neuropathology, supported by the Johns Hopkins Alzheimers Disease Research Center (ADRC; P30 AG066507), and support was also provided by P30 EY001765."
}
],
"provenance": [
{
"label": "ORGANIZATION",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "ADRC"
}
]
}
],
"ontology_id": "http://purl.bioontology.org/ontology/PDQ/CDR0000589135",
"ontology_label": "adipose-derived regenerative cells",
"ontology": "PDQ",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Accurately identified and well-contextualized as an organization.",
"label_ontology_id": "http://purl.obolibrary.org/obo/OBI_0000245",
"label_ontology_label": "organization",
"label_ontology": "CCONT",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "P30 AG066507",
"label": "GRANT_NUMBER",
"start": 223,
"end": 235,
"weighted_score": 0.796,
"model_count": 2,
"occurrences": [
{
"start": 162,
"end": 165,
"global_start": 223,
"global_end": 226,
"sentence": "Imaging was performed using the MICA Leica microscope in the Division of Neuropathology, supported by the Johns Hopkins Alzheimers Disease Research Center (ADRC; P30 AG066507), and support was also provided by P30 EY001765."
}
],
"provenance": [
{
"label": "GRANT_NUMBER",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "P30 AG066507"
}
]
},
{
"label": "ORG",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "P30"
}
]
}
],
"ontology_id": "http://purl.bioontology.org/ontology/MESH/C106367",
"ontology_label": "cytochrome p30",
"ontology": "MESH",
"concept_mapping_provenance": "tool",
"judge_score": 0.9,
"remarks": "Correctly identified as a grant number with solid context provided.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "P30 EY001765",
"label": "GRANT_NUMBER",
"start": 271,
"end": 283,
"weighted_score": 0.494,
"model_count": 3,
"occurrences": [
{
"start": 210,
"end": 222,
"global_start": 271,
"global_end": 283,
"sentence": "Imaging was performed using the MICA Leica microscope in the Division of Neuropathology, supported by the Johns Hopkins Alzheimers Disease Research Center (ADRC; P30 AG066507), and support was also provided by P30 EY001765."
}
],
"provenance": [
{
"label": "GRANT_NUMBER",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "P30 EY001765"
}
]
},
{
"label": "bio",
"vote_weight": 3.0,
"sources": [
{
"source_model": "mobashgr/BC5CDR-chem-WLT-384-BioELECTRA-Pubmed-ENS-20-5",
"weight": 3.0,
"entity": "p30 ey001765"
}
]
},
{
"label": "PRODUCT",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "P30 EY001765"
}
]
}
],
"ontology_id": "http://purl.bioontology.org/ontology/MESH/C106367",
"ontology_label": "cytochrome p30",
"ontology": "MESH",
"concept_mapping_provenance": "tool",
"judge_score": 0.6,
"remarks": "Partially identified as a grant number, though context could be improved for clarity.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "ax",
"label": "Disease_disorder",
"start": 502,
"end": 504,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 81,
"end": 83,
"global_start": 502,
"global_end": 504,
"sentence": "]\n\nSignificance\n['Homotypic collateral sprouting -the process by which uninjured axons from the same neuronal source extend new branches to reinnervate targets deprived of their original connections-is a fundamental yet understudied mechanism for CNS repair following injury."
}
],
"provenance": [
{
"label": "Disease_disorder",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "ax"
}
]
}
],
"ontology_id": "http://purl.bioontology.org/ontology/HCPCS/AX",
"ontology_label": "Item furnished in conjunction with dialysis services",
"ontology": "HCPCS",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Well-defined as a disease/disorder, clearly supported by context.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "source",
"label": "Detailed_description",
"start": 531,
"end": 537,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 110,
"end": 116,
"global_start": 531,
"global_end": 537,
"sentence": "]\n\nSignificance\n['Homotypic collateral sprouting -the process by which uninjured axons from the same neuronal source extend new branches to reinnervate targets deprived of their original connections-is a fundamental yet understudied mechanism for CNS repair following injury."
}
],
"provenance": [
{
"label": "Detailed_description",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "source"
}
]
}
],
"ontology_id": "http://gbol.life/0.1/Source",
"ontology_label": "Source",
"ontology": "GBOL",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Successfully identified as a detailed description with strong contextual links.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "branches",
"label": "Detailed_description",
"start": 549,
"end": 557,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 128,
"end": 136,
"global_start": 549,
"global_end": 557,
"sentence": "]\n\nSignificance\n['Homotypic collateral sprouting -the process by which uninjured axons from the same neuronal source extend new branches to reinnervate targets deprived of their original connections-is a fundamental yet understudied mechanism for CNS repair following injury."
}
],
"provenance": [
{
"label": "Detailed_description",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "branches"
}
]
}
],
"ontology_id": "http://www.semanticweb.org/Terrorism#Branches",
"ontology_label": "Branches",
"ontology": "INTO",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Correctly identified with relevant context supporting its description.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "CNS",
"label": "ORGANIZATION",
"start": 668,
"end": 671,
"weighted_score": 0.796,
"model_count": 2,
"occurrences": [
{
"start": 247,
"end": 250,
"global_start": 668,
"global_end": 671,
"sentence": "]\n\nSignificance\n['Homotypic collateral sprouting -the process by which uninjured axons from the same neuronal source extend new branches to reinnervate targets deprived of their original connections-is a fundamental yet understudied mechanism for CNS repair following injury."
}
],
"provenance": [
{
"label": "ORGANIZATION",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "CNS"
}
]
},
{
"label": "ORG",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "CNS"
}
]
}
],
"ontology_id": "http://purl.jp/bio/4/id/200906047768068685",
"ontology_label": "CNS",
"ontology": "IOBC",
"concept_mapping_provenance": "tool",
"judge_score": 0.7,
"remarks": "Identified as an organization with relevant associations, though clarity of context could improve.",
"label_ontology_id": "http://purl.obolibrary.org/obo/OBI_0000245",
"label_ontology_label": "organization",
"label_ontology": "CCONT",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "repair",
"label": "Sign_symptom",
"start": 672,
"end": 678,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 251,
"end": 257,
"global_start": 672,
"global_end": 678,
"sentence": "]\n\nSignificance\n['Homotypic collateral sprouting -the process by which uninjured axons from the same neuronal source extend new branches to reinnervate targets deprived of their original connections-is a fundamental yet understudied mechanism for CNS repair following injury."
}
],
"provenance": [
{
"label": "Sign_symptom",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "repair"
}
]
}
],
"ontology_id": "http://www.projecthalo.com/aura#Repair",
"ontology_label": "Repair",
"ontology": "AURA",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Well-defined and contextualized as a sign/symptom of a process.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "injury",
"label": "Disease_disorder",
"start": 689,
"end": 695,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 268,
"end": 274,
"global_start": 689,
"global_end": 695,
"sentence": "]\n\nSignificance\n['Homotypic collateral sprouting -the process by which uninjured axons from the same neuronal source extend new branches to reinnervate targets deprived of their original connections-is a fundamental yet understudied mechanism for CNS repair following injury."
}
],
"provenance": [
{
"label": "Disease_disorder",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "injury"
}
]
}
],
"ontology_id": "http://www.icn.ch/icnp#Injury",
"ontology_label": "Injury",
"ontology": "ICNP",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Clearly defined as a disease/disorder, strongly supported by context.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "homo",
"label": "Detailed_description",
"start": 772,
"end": 776,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 75,
"end": 79,
"global_start": 772,
"global_end": 776,
"sentence": "Unlike heterotypic sprouting, involving sprouting from unrelated pathways, homotypic sprouting offers potential to restore circuit architecture after partial lesions."
}
],
"provenance": [
{
"label": "Detailed_description",
"vote_weight": 10.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "homo"
},
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "homo"
}
]
}
],
"ontology_id": "http://sbmi.uth.tmc.edu/ontology/ochv#53984",
"ontology_label": "homo",
"ontology": "OCHV",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Successfully identified as a detailed description with relevant supporting evidence.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "diffuse",
"label": "Detailed_description",
"start": 893,
"end": 914,
"weighted_score": 0.714,
"model_count": 2,
"occurrences": [
{
"start": 29,
"end": 36,
"global_start": 893,
"global_end": 900,
"sentence": "Here, we employed a model of diffuse axonal injury in the mouse visual system to examine this mechanism."
}
],
"provenance": [
{
"label": "Detailed_description",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "diffuse"
}
]
},
{
"label": "bio",
"vote_weight": 2.0,
"sources": [
{
"source_model": "mobashgr/NCBI-disease-WLT-256-SciBERT-13INS",
"weight": 2.0,
"entity": "diffuse axonal injury"
}
]
}
],
"ontology_id": "http://www.co-ode.org/ontologies/galen#diffuse",
"ontology_label": "diffuse",
"ontology": "GALEN",
"concept_mapping_provenance": "tool",
"judge_score": 0.8,
"remarks": "Identified reasonably well as a detailed description but lacks full context clarity.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "mouse visual system",
"label": "BRAIN_REGION",
"start": 922,
"end": 941,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 58,
"end": 77,
"global_start": 922,
"global_end": 941,
"sentence": "Here, we employed a model of diffuse axonal injury in the mouse visual system to examine this mechanism."
}
],
"provenance": [
{
"label": "BRAIN_REGION",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "mouse visual system"
}
]
}
],
"ontology_id": "http://purl.obolibrary.org/obo/PR_Q91V10",
"ontology_label": "visual system homeobox 1 (mouse)",
"ontology": "PR",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Clearly defined as a brain region and well-contextualized.",
"label_ontology_id": "http://www.semanticweb.org/rjyy/ontologies/2015/5/ESSO#Brain_Region",
"label_ontology_label": "Brain_Region",
"label_ontology": "ESSO",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "retinal ganglion cell",
"label": "CELL_TYPE",
"start": 1005,
"end": 1026,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 36,
"end": 57,
"global_start": 1005,
"global_end": 1026,
"sentence": "Our research demonstrates surviving retinal ganglion cell axons can re-establish terminal fields, achieving structural and functional connectivity."
}
],
"provenance": [
{
"label": "CELL_TYPE",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "retinal ganglion cell"
}
]
}
],
"ontology_id": "http://purl.obolibrary.org/obo/TAO_0009310",
"ontology_label": "retinal ganglion cell",
"ontology": "TAO",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Accurately identified as a cell type with good contextual description.",
"label_ontology_id": "http://www.ebi.ac.uk/efo/EFO_0000324",
"label_ontology_label": "cell type",
"label_ontology": "EFO",
"label_concept_mapping_provenance": "tool"
},
{
"entity": "female",
"label": "Detailed_description",
"start": 1173,
"end": 1184,
"weighted_score": 0.562,
"model_count": 2,
"occurrences": [
{
"start": 56,
"end": 62,
"global_start": 1173,
"global_end": 1179,
"sentence": "Importantly, we discovered significant sex differences: female mice showed delayedincomplete recovery compared to males."
}
],
"provenance": [
{
"label": "Detailed_description",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "female"
}
]
},
{
"label": "PERSON",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "female mice"
}
]
}
],
"ontology_id": "http://purl.bioontology.org/ontology/PMR.owl#Female",
"ontology_label": "Female",
"ontology": "PMR",
"concept_mapping_provenance": "tool",
"judge_score": 0.7,
"remarks": "Identified as a detailed description; however, more specificity about the context could benefit the entry.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "delayed",
"label": "Sign_symptom",
"start": 1192,
"end": 1199,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 75,
"end": 82,
"global_start": 1192,
"global_end": 1199,
"sentence": "Importantly, we discovered significant sex differences: female mice showed delayedincomplete recovery compared to males."
}
],
"provenance": [
{
"label": "Sign_symptom",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "delayed"
}
]
}
],
"ontology_id": "http://www.icn.ch/icnp#Delayed",
"ontology_label": "Delayed",
"ontology": "ICNP",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Well-defined as a sign/symptom with clear contextual backing.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "recovery",
"label": "Sign_symptom",
"start": 1210,
"end": 1218,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 93,
"end": 101,
"global_start": 1210,
"global_end": 1218,
"sentence": "Importantly, we discovered significant sex differences: female mice showed delayedincomplete recovery compared to males."
}
],
"provenance": [
{
"label": "Sign_symptom",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "recovery"
}
]
}
],
"ontology_id": "http://sbmi.uth.tmc.edu/ontology/ochv#C0237820",
"ontology_label": "recovery",
"ontology": "OCHV",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Strongly identified as a sign/symptom with appropriate context.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "brain",
"label": "Biological_structure",
"start": 1283,
"end": 1288,
"weighted_score": 1.0,
"model_count": 1,
"occurrences": [
{
"start": 45,
"end": 50,
"global_start": 1283,
"global_end": 1288,
"sentence": "These findings provide evidence of repair of brain circuits perturbed by TBI and the role of homotypic sprouting."
}
],
"provenance": [
{
"label": "Biological_structure",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "brain"
}
]
}
],
"ontology_id": "http://www.icn.ch/icnp#Brain",
"ontology_label": "Brain",
"ontology": "ICNP",
"concept_mapping_provenance": "tool",
"judge_score": 1.0,
"remarks": "Clearly defined as a biological structure, well-supported by contextual references.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
},
{
"entity": "tb",
"label": "Disease_disorder",
"start": 1311,
"end": 1314,
"weighted_score": 0.505,
"model_count": 3,
"occurrences": [
{
"start": 73,
"end": 76,
"global_start": 1311,
"global_end": 1314,
"sentence": "These findings provide evidence of repair of brain circuits perturbed by TBI and the role of homotypic sprouting."
}
],
"provenance": [
{
"label": "Disease_disorder",
"vote_weight": 5.0,
"sources": [
{
"source_model": "d4data/biomedical-ner-all",
"weight": 5.0,
"entity": "tb"
}
]
},
{
"label": "DISEASE",
"vote_weight": 3.9,
"sources": [
{
"source_model": "llm_ner",
"weight": 3.9,
"entity": "TBI"
}
]
},
{
"label": "ORG",
"vote_weight": 1.0,
"sources": [
{
"source_model": "en_core_web_sm",
"weight": 1.0,
"entity": "TBI"
}
]
}
],
"ontology_id": "http://purl.bioontology.org/ontology/HCPCS/TB",
"ontology_label": "Drug or biological acquired with 340b drug pricing program discount, reported for informational purposes for select entities",
"ontology": "HCPCS",
"concept_mapping_provenance": "tool",
"judge_score": 0.7,
"remarks": "Identified as a disease/disorder, but clarification on context could improve understanding.",
"label_ontology_id": null,
"label_ontology_label": null,
"label_ontology": null,
"label_concept_mapping_provenance": "tool"
}
],
"key_terms": [],
"metadata": {
"total_chunks": 1,
"chunks_with_entities": 1,
"entities_before_merge": 41,
"entities_after_merge": 25,
"verification": {
"entities_present": 25,
"entities_dropped": 0,
"entities_dropped_detail": [],
"key_terms_present": 0,
"key_terms_dropped": 0,
"all_entities_present_in_text": true,
"all_key_terms_present_in_text": true
}
},
"verification": {
"entities_present": 25,
"entities_dropped": 0,
"entities_dropped_detail": [],
"key_terms_present": 0,
"key_terms_dropped": 0,
"all_entities_present_in_text": true,
"all_key_terms_present_in_text": true
},
"errors": [],
"task_type": "ner",
"elapsed_time": 558.1336400508881
} |
But were you planning to add the pdf that you used in the tutorial to the repo?
You added a tutorial to the repo, what is great, but I'm assuming that you want be able to point people to this tutorial to learn how to use it, even if they haven't used There are also some additional decision you made about teh format of the output etc. I think some, even short, explanation of one of entry would be very useful. I could expand the text later, but would like to have something to start from. Right now in the tutorial there is no description event the general task you want to accomplish. btw. I edited your comment and added |
@djarecka for the output structure, you would need to check the structsense/src/utils/postprocessing.py Line 395 in 9c0c9d3 NER publications:
Resource Extraction:
Pdf2Reproschema: |
…into ups/improvement
removing tutorial output
|
WIP: search and reranking for ontology alignment. |
This pull request introduces major improvements to StructSense, including the addition of specialized tools, fixes to the chunking mechanism, and integration with BioPortal as the ontology database.
What’s Included
Issues this PR addresses