Skip to content

Ref api#3191

Open
YishaiGlasner wants to merge 44 commits intomasterfrom
ref-api
Open

Ref api#3191
YishaiGlasner wants to merge 44 commits intomasterfrom
ref-api

Conversation

@YishaiGlasner
Copy link
Copy Markdown
Contributor

@YishaiGlasner YishaiGlasner commented Mar 25, 2026

Summary

  • Add a new /api/ref/<tref> endpoint that returns structured metadata for any Sefaria Ref
  • Provide consistent navigation and structure metadata across node types (JaggedArrayNode, SchemaNode, Dictionary, Sheets, etc.)
  • Avoids redundant MongoDB queries by ensuring a single vstate fetch per request for standard text refs (i.e. not virtual nodes)
  • Fix prev_segment_ref / next_segment_ref to correctly handle virtual nodes (e.g. Siddur)
  • Add optional state_ja parameter to avoid redundant DB calls when state is already available
  • Add a pymongo QueryCounter listener for asserting query counts in tests
  • Add comprehensive test coverage and OpenAPI documentation

API Details

New endpoint: GET /api/ref/<tref>

Returns a JSON object with:

  • is_ref — whether the input resolves to a valid ref (returns {is_ref: false} for invalid input)
  • normalized, hebrew, url_ref — normalized representations
  • index_title, node_type — index and node metadata
  • depth, address_types, section_names — structure info (for JaggedArrayNode / DictionaryEntryNode)
  • start_indexes, start_labels, end_indexes, end_labels — section position
  • navigation_refs — contextual navigation:
    • lineage_refs_top_down — ancestor refs from root to parent
    • first_available_section_ref — first section with content
    • first_subref / last_subref — child navigation (non-segment, non-range)
    • prev_section_ref / next_section_ref — section-level navigation
    • prev_segment_ref / next_segment_ref — segment-level navigation
  • children — child node titles (for SchemaNode)
  • default_child_node — default child metadata when applicable
  • sheet_id, lexicon_name, headword — type-specific fields

Considerations

  • Navigation scope
    prev_* and next_* are only defined for section-level and segment-level refs.
    Navigation at higher levels is intentionally not exposed to avoid ambiguity. Consumers can traverse upward (via lineage_refs_top_down) and derive such relationships if needed.
  • Field presence
    Fields that are not applicable to a given ref type are omitted.
    Fields that are applicable but have no value (e.g. no previous or next ref exists) are returned as null.

Changes in Ref

  • Fix prev_segment_ref and next_segment_ref to support DictionaryEntryNode
  • Add optional state_ja parameter to selected methods (already supported in others) to improve performance
  • Add function get_subrefs_count and refactor all_subrefs, prev_segment_ref and next_segment_ref to use it.

pymongo listener

Adds QueryCounter, a pymongo CommandListener used in tests to:

  • Count MongoDB queries per request
  • Record query tracebacks for debugging
    Tests reset the counter before each API call and assert on QueryCounter.count. On failure, full query tracebacks are printed to help identify unnecessary database hits.
    The listener is only registered in test environments (sys._called_from_test), so there is zero production overhead.

Note on tests

api/tests.py is currently not part of the CI suite (historical decision).
All new tests were added there and can be run locally.

…o not return the sections (section are not defined well as parts when the ref is range).
…a` param, but call other functions that use `vstate`.
@YishaiGlasner YishaiGlasner requested a review from akiva10b March 25, 2026 12:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new GET /api/ref/<tref> API endpoint to validate and introspect Sefaria refs, returning structured node/structure metadata and navigation refs, with accompanying OpenAPI documentation and tests. The PR also updates core Ref navigation helpers to better support virtual nodes and to reduce redundant DB work by allowing callers to pass a pre-fetched VersionState.

Changes:

  • Add RefView (/api/ref/<tref>) returning normalized/hebrew/url forms, node metadata, structure fields, and navigation refs.
  • Introduce a pymongo QueryCounter listener (test-only) to assert Mongo command counts in API tests.
  • Extend/refine Ref navigation/state helpers (prev_segment_ref, next_segment_ref, first_available_section_ref, get_state_ja, is_empty) to accept an optional vstate.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sefaria/urls_shared.py Routes the new /api/ref/<tref> endpoint to RefView.
api/views.py Implements RefView response construction and navigation metadata.
sefaria/model/text.py Updates Ref navigation + state access to support vstate and virtual-node behavior.
sefaria/system/database.py Adds QueryCounter and registers it as a pymongo listener in test environments.
api/tests.py Adds comprehensive tests for the new endpoint + query-count assertions.
docs/openAPI.json Documents /api/ref/{tref} and the RefJSON response schema.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sefaria/model/text.py Outdated
Comment thread sefaria/model/text.py Outdated
# return db.texts.find(self.condition_query(), {"_id": 1}).count() == 0
if vstate and not self.index_node.is_virtual:
state_ja = self.get_state_ja(vstate=vstate)
return state_ja.sub_array_length([i - 1 for i in self.sections]) in (0, None)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_empty()'s new vstate fast-path is incorrect for many refs. VersionState.state_node(...).ja() returns a JaggedIntArray whose leaf values are ints; calling sub_array_length() after indexing down to a leaf hits the TypeError path and returns 0, which makes segment-level refs (and other fully-specified refs) appear empty even when text exists. Use a content check that works at arbitrary depth (e.g., state_ja.subarray_with_ref(self).is_empty() or get_element() for segment-level) instead of sub_array_length(self.sections).

Suggested change
return state_ja.sub_array_length([i - 1 for i in self.sections]) in (0, None)
subarray = state_ja.subarray_with_ref(self)
return subarray.is_empty()

Copilot uses AI. Check for mistakes.
Comment thread api/views.py Outdated
Comment thread sefaria/system/database.py
Comment thread docs/openAPI.json Outdated
Comment thread sefaria/model/text.py Outdated
if not r:
return None
if self.index_node.is_virtual:
return r.all_subrefs()[0]
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prev_segment_ref() for virtual nodes, when the current ref is the first segment of a section, the previous segment should be the last segment of the previous section. Returning r.all_subrefs()[0] returns the first segment instead (and can also raise IndexError if the previous section has no subrefs). Adjust this to return the last available subref (and handle empty subref lists).

Suggested change
return r.all_subrefs()[0]
subrefs = r.all_subrefs()
if not subrefs:
# No subrefs available in the previous section; fall back to the section ref itself
return r
return subrefs[-1]

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing to -1
the assumption that previous section has segments is also when not vurtual.

Comment thread docs/openAPI.json
Comment thread sefaria/model/text.py Outdated
@YishaiGlasner
Copy link
Copy Markdown
Contributor Author

Spec ing ref API and implementing [sc-40554]

Copy link
Copy Markdown
Contributor

@stevekaplan123 stevekaplan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested a possible approach for a re-factor that allows us to avoid checking node type in next_segment_ref or prev_segment_ref. Curious what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants