Skip to content

feat: In GTFS feed validation, only check consistency against disruption data#1392

Merged
jzimbel-mbta merged 13 commits into
masterfrom
jz-faster-external-fk-only-gtfs-feed-validations
Jun 10, 2026
Merged

feat: In GTFS feed validation, only check consistency against disruption data#1392
jzimbel-mbta merged 13 commits into
masterfrom
jz-faster-external-fk-only-gtfs-feed-validations

Conversation

@jzimbel-mbta

@jzimbel-mbta jzimbel-mbta commented Jun 3, 2026

Copy link
Copy Markdown
Member

Summary of changes

Asana Ticket: 🏹 Prevent feed validation from blocking most application functionality

The meat of the change:

  • The validation transaction now checks only the "external" FK constraints on the feed, those originating in Arrow disruption data tables. For example, shuttle_route_stops.gtfs_stop_id references gtfs_stops.id.
    • Feed data is imported only from the GTFS tables referenced by said "external" FKs.
    • While the transaction still locks certain application-critical tables, it runs much, much faster now so it's not really a problem.

In less abstract terms, this means validation only looks at lines.txt, routes.txt, and stops.txt now.

It completes in half a second locally now, should be about as fast in prod as well.

Reviewer Checklist

  • Meets ticket's acceptance criteria
  • Any new or changed functions have typespecs
  • Tests were added for any new functionality (don't just rely on Codecov)
  • This branch was deployed to the staging environment and is currently running with no unexpected increase in warnings, and no errors or crashes.

@jzimbel-mbta jzimbel-mbta requested a review from rudiejd June 3, 2026 18:17
@jzimbel-mbta jzimbel-mbta requested a review from a team as a code owner June 3, 2026 18:17
Comment thread lib/arrow/gtfs/import_worker.ex
@rudiejd

rudiejd commented Jun 3, 2026

Copy link
Copy Markdown
Member

noting that this change, if merged, would get rid of the only way we have to validate that each checkpoint_id in stop_times shows up in checkpoints.txt. We should have a follow-up to add this validation to GTFS creator

@jzimbel-mbta

Copy link
Copy Markdown
Member Author

noting that this change, if merged, would get rid of the only way we have to validate that each checkpoint_id in stop_times shows up in checkpoints.txt. We should have a follow-up to add this validation to GTFS creator

Thanks for calling that one out. Yeah, that's the only "feed-internal" consistency check by Arrow that doesn't duplicate what's already validated by other gtfs_creator CI steps.

I'll create a ticket.

@jzimbel-mbta

Copy link
Copy Markdown
Member Author

Ticket to implement checkpoints<->stop_times validation in gtfs_creator.

Comment thread lib/arrow/repo/foreign_key_constraint.ex
@jzimbel-mbta

Copy link
Copy Markdown
Member Author

I'm working on splitting the diff into meaningful commits to ease review. Will also try to add some integration tests for validation and import.

@jzimbel-mbta jzimbel-mbta force-pushed the jz-faster-external-fk-only-gtfs-feed-validations branch from 7544b50 to 3362cb3 Compare June 4, 2026 20:46
Comment thread lib/arrow/gtfs.ex Outdated
Comment thread lib/arrow/gtfs.ex Outdated
Comment thread lib/arrow/gtfs.ex Outdated
Comment thread test/arrow/repo/foreign_key_constraint_test.exs Outdated
@jzimbel-mbta

jzimbel-mbta commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Related gtfs_creator PR to add stop_times.checkpoint_id <-> checkpoints.checkpoint_id validation
Now merged.

@jzimbel-mbta jzimbel-mbta merged commit 082f450 into master Jun 10, 2026
5 checks passed
@jzimbel-mbta jzimbel-mbta deleted the jz-faster-external-fk-only-gtfs-feed-validations branch June 10, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants