chore(indexers): 80 rename vectorstore id column to label#144
Open
frayle-ons wants to merge 5 commits intomainfrom
Open
chore(indexers): 80 rename vectorstore id column to label#144frayle-ons wants to merge 5 commits intomainfrom
frayle-ons wants to merge 5 commits intomainfrom
Conversation
lukeroantreeONS
requested changes
Mar 17, 2026
Collaborator
lukeroantreeONS
left a comment
There was a problem hiding this comment.
This is done well, and works.
This update is a good point to get the servers module naming conventions in sync with the indexers module though, so I've requested a few further changes.
Collaborator
There was a problem hiding this comment.
Can you update the columns in the response objects in the servers module to have a 1-1 match with the column names in the indexers module?
See differences for same input query here (from search, but to be reflected in embed/reverse_search too);
This would mean:
- rename 'input_label' -> 'query_label'
- add 'query_text' field at same level as 'query_label'
- rename 'label' -> 'doc_label',
- rename 'description' -> 'doc_text'
Contributor
Author
There was a problem hiding this comment.
refactored the pydantic models for th server in the latest commit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Summary
These suggested changes update the naming conventions of the
VectorStoreclass. Previously VectorStores contained row entries with values for['id', 'text', 'embedding'](as well as a UUID column).This was proposed for semantic reasons - for most use cases of ClassifAI a label for each entry in a VectorStore is easier to understand as the relevance/classification label associated with than a row id which can be confused with the UUID column.
Corresponding to this change in the
VectorStoreand vectors.parquet file, the dataclasses have also been updated to refer to the new 'label' name, for example theVectorStoreSearchResultdataclass previously had a columndoc_idwhich has now been replaced bydoc_label. Several other dataclasses have been updated as well and this is reflected in newVectorStoreandServercode logic to process different operations when using the vectorstore.Note: I updated the dataclasses, and vectorstore logic for this PR. And then made changes to the Server module as the data it is trying to convert from VectorStore logic to an API response has changed. But I have not fully reconfigured the API Pydantic models in any way. We may want to consider a rework of the API endpoints we build in the servers module because the current setup seems to have out of data examples and remains very close to the original implementation of ClassifAI app. Left this out of this PR as it seemed out of scope from the ticket.📜 Changes Introduced
✅ Checklist
terraform fmt&terraform validate)🔍 How to Test
Standard environment setup with this branch of the repo installed.
I ran through each DEMO notebook, including the server deployment DEMO script and verified that all the notebook cells and endpoints ran correctly. I adjusted the notebooks for the new format dataclass objects.
Running these notebooks or another test script and seeing the the
VectorStore.search()method return a dataframe with the column 'doc_label' will show the external working of the new features. As well as a new input object and return object for the reverse search method.