Skip to content

Reworking ECOD data wrangling to use ECOD hierarchy IDs instead of manually constructed ones#19

Open
piehld wants to merge 3 commits intomasterfrom
dev-ecod-mar2026
Open

Reworking ECOD data wrangling to use ECOD hierarchy IDs instead of manually constructed ones#19
piehld wants to merge 3 commits intomasterfrom
dev-ecod-mar2026

Conversation

@piehld
Copy link
Collaborator

@piehld piehld commented Mar 18, 2026

Previously, the code was manually constructing and assigning hierarchy/lineage IDs to the different levels of ECOD annotations (http://prodata.swmed.edu/ecod/af2_pdb/documentation); however, there is a hierarchy file that actually has predefined IDs that we could use: http://prodata.swmed.edu/ecod/distributions/ecod.v294.hierarchy.txt

Another issue with the existing code is that it would potentially overwrite the parent IDs for children if they have multiple parents.

Also, the parameter names are extremely confusing and are reused with different meanings throughout the code.

This PR is intended to replace the manually constructed ID process with the use of predefined ECOD IDs, allow for multiple parents, and clean up the code.

I got through most of this, but at the last moment realized that I am incidentally overwriting the mappings between certain ECOD group names and their IDs, because apparently even within a group (say, X), the same name can be used but with a different ID. So, I think the solution will be to create the child: parent mapping based on the 1.1.1.1 string, in which 1.1.1 is the parent, and so forth. Then, A group IDs (which are of a different format) would need to be included in the mapping where relevant (based on the content of the ECOD flat file).

There is a lot of commented out code and notes still in there, but it will all be removed once I figure out this last bit. I just want to create this PR so I don't lose track of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant