HRINT-4776 - paginate RSS feeds + add ETag#137
Open
VladShpilMan wants to merge 1 commit into
Open
Conversation
|
I deployed this PR onto staging env - aka staging.ayende.com : omer@Omer-Ratsaby-NLP:~$
BASE="https://staging.ayende.com/blog/rss"
TOK_INTERNAL_1="http://staging.ayende.com/blog/rss?token=mg9...yqufpRwg%3D%3D"
TOK_INTERNAL_2="http://staging.ayende.com/blog/rss?token=mg9...kAWBfFlw%3D%3D"
TOK_NET="http://staging.ayende.com/blog/rss?token=mg9yE..............qC2z4hYx"
TOK_ACCLAIM="http://staging.ayende.com/blog/rss?token=mg................zE%3D" # ---> expired
TOK_DANIEL="http://staging.ayende.com/blog/rss?token=mg9yE...........BMudjis8" # ---> expired
# public feed: status + headers. Expect 200, ETag and Last-Modified present.
# Who: any RSS reader (Feedly, crawlers, etc ..) subscribed to https://ayende.com/blog/rss.
# Scenario: A reader opens their app in the morning and it fetches the feed for the first time today.
# Verified: Feed returns properly, with cache headers so future polls can be cheap.
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$BASE"
HTTP/2 200
date: Sun, 17 May 2026 08:42:34 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 05 May 2026 12:00:00 GMT
# public feed body size. Expect tens of KB (was megabytes for some feeds before fix).
# Who: same as above - measures what the reader actually downloads.
# Scenario: first poll of the day, body bytes count.
# Verified: response body is small thanks to pagination cap (PublicFeedPageSize = 25).
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "$BASE"
size=267633
# public feed conditional GET. Expect HTTP=304, size=0.
# Who: Same RSS readers as above, but on their repeat polls (every 15–30 min).
# Scenario: Reader polls again 30 minutes later. Nothing changed. Should get an empty response.
# Verified: Repeat poll returns 304 with zero bytes - the actual memory fix.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "HTTP=%{http_code} size=%{size_download}\n" \
-H "If-None-Match: $(curl -sSLI -X GET "$BASE" | grep -i ^etag | cut -d' ' -f2- | tr -d '\r')" "$BASE"
HTTP=304 size=0
# category feed headers. Expect 200, ETag and Last-Modified present.
# Who: Readers subscribed to a specific category (e.g. only RavenDB/database posts).
# Scenario: Category-specific reader polling for new DB-related posts.
# Verified: Same small-body + ETag treatment as the public feed.
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$BASE/databases"
HTTP/2 200
date: Sun, 17 May 2026 08:45:18 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 11 Mar 2025 12:00:00 GMT
# category feed body size. Expect tens of KB (was 1.9 MB before fix).
# Who: same category-feed readers.
# Scenario: first poll on the category URL - measure actual bytes downloaded.
# Verified: pagination applied to category feeds too.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "$BASE/databases"
size=2673386
# active token feed headers (no page= parameter). Expect 200, ETag and Last-Modified present, title contains "for ravendb-net".
# Who: FeedBurner proxy (feeds.feedburner.com/AyendeRahien) and the ravendb-net integration polling their existing URLs.
# Scenario: Existing feed reader polls the URL it has used for years - no client-side change needed.
# Verified: Existing URL shape still works after the pagination PR.
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$TOK_NET"
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:47:24 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg9...
HTTP/2 200
date: Sun, 17 May 2026 08:47:25 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 05 May 2026 12:00:00 GMT
mer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_NET" | grep -m1 '<title>'
<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien for ravendb-net</title><link>http://staging.ayende.com/</...
# active token feed body size. Expect tens of KB (was 6.7 MB before fix).
# Who: same FeedBurner / ravendb-net pollers.
# Scenario: a single feed fetch - measure actual bytes returned now vs the 6.7 MB baseline.
# Verified: token feed is now capped at 50 items per response (TokenAuthorizedFeedPageSize = 50).
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "$TOK_NET"
size=625683
omer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_NET" | grep -c '<item>'
46
omer@Omer-Ratsaby-NLP:~$
# active token conditional GET. Expect HTTP=304, size=0.
# Who: FeedBurner on its repeat polls (~every 16 min), and any other reader on the token feed.
# Scenario: 4 polls/hour where nothing has changed since last time.
# Verified: 304 round-trip works - this is what kills the LOH pressure that crashed the EB instance every 2 hours.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "HTTP=%{http_code} size=%{size_download}\n" \
-H "If-None-Match: $(curl -sSLI -X GET "$TOK_NET" | grep -i ^etag | cut -d' ' -f2- | tr -d '\r')" "$TOK_NET"
HTTP=304 size=0
omer@Omer-Ratsaby-NLP:~$
# active token page=2 headers. Expect 200, ETag ending with _p2, Last-Modified older than page 1.
# Who: future clients that want to walk historical posts beyond the latest 50 (e.g. one-time-import scripts).
# Scenario: backfill script asking for the next batch of posts within the token's numberOfDays window.
# Verified: pagination opt-in works; each page has its own ETag so it caches independently.
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:56:32 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg...x&page=2
HTTP/2 200
date: Sun, 17 May 2026 08:56:34 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af_p2"
last-modified: Mon, 21 Jul 2025 12:00:00 GMT
# active token page=2 body size + different content than page 1.
# Who: same backfill scripts.
# Scenario: confirming that page=2 actually returns the next 50 items, not duplicates of page 1.
# Verified: Skip = (page-1) * 50 is applied correctly.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "${TOK_NET}&page=2"
size=2302127
# other active tokens - sanity check that all active tokens behave the same shape as TOK_NET.
# Who: the two ravendb-internal token holders.
# Scenario: confirming consistent behavior across active tokens.
# Verified: each returns 200 with ETag and Last-Modified.
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$TOK_INTERNAL_1"
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:58:41 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg...
HTTP/2 200
date: Sun, 17 May 2026 08:58:41 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Mon, 01 Jan 0001 00:00:00 GMT
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$TOK_INTERNAL_2"
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:58:47 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg9...
HTTP/2 200
date: Sun, 17 May 2026 08:58:47 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 05 May 2026 12:00:00 GMT
# expired tokens - must not 500. Expect HTTP=200 (body would say "EXPIRED TOKEN" in title).
# Who: old token holders whose tokens have expired but their feed reader is still polling (daniel expired 2024, acclaim expired 2021).
# Scenario: long-forgotten reader still hits the URL daily despite token being invalid for years.
# Verified: server returns 200, no log spam, the reader sees a feed labeled EXPIRED TOKEN instead of crashing.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "acclaim HTTP=%{http_code}\n" "$TOK_ACCLAIM"
acclaim HTTP=200
omer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_ACCLAIM" | grep -m1 '<title>'
<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien for acclaim-wordpress-access-one-time-import EXPIRED TOKEN</....
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "daniel HTTP=%{http_code}\n" "$TOK_DANIEL"
daniel HTTP=200
omer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_DANIEL" | grep -m1 '<title>'
<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien for daniel EXPIRED TOKEN</...
# edge cases - bad page= values must not 500. Expect HTTP=200 for all.
# Who: bots, vulnerability scanners, broken bookmarks, scrapers sending garbage in query parameters.
# Scenario: someone probes the endpoint with malformed page values to look for crashes.
# Verified: server treats invalid pages as page 1 (per the `if (page < 1) page = 1;` guard) and never 500s.
omer@Omer-Ratsaby-NLP:~$ for P in 0 -1 9999 abc; do echo -n "page=$P "; curl -sSL -o /dev/null -w "HTTP=%{http_code}\n" "${TOK_NET}&page=$P"; done
page=0 HTTP=200
page=-1 HTTP=200
page=9999 HTTP=200
page=abc HTTP=200
omer@Omer-Ratsaby-NLP:~$
|
TheGoldenPlatypus
approved these changes
May 17, 2026
There was a problem hiding this comment.
Pull request overview
This PR updates RSS endpoints to support token-feed pagination and conditional HTTP caching with ETag and Last-Modified headers.
Changes:
- Adds configurable RSS page sizes and a
pageparameter for token-authorized RSS feed pagination. - Quotes and parses ETags using HTTP header utilities, and adds page-specific ETags.
- Adds
Last-Modifiedhandling to XML and 304 responses.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
RaccoonBlog.Web/Controllers/SyndicationController.cs |
Adds RSS pagination logic and conditional request handling for ETag/Last-Modified. |
RaccoonBlog.Web/Controllers/RaccoonController.cs |
Extends shared XML and 304 helpers to emit cache headers. |
RaccoonBlog.Web/Extensions/XmlResult.cs |
Emits ETag and Last-Modified headers on XML responses. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+246
to
+247
| if (DateTimeOffset.TryParse(Request.Headers["If-Modified-Since"].ToString(), out var ifModifiedSince) && lastModified <= ifModifiedSince) | ||
| return true; |
| string responseETagHeader; | ||
| if (CheckEtag(stats, out responseETagHeader)) | ||
| return HttpNotModified(); | ||
| var lastModified = posts.Count > 0 ? posts[0].PublishAt : (DateTimeOffset)stats.Timestamp; |
| if (CheckEtag(stats, out var responseETagHeader)) | ||
| return HttpNotModified(); | ||
| DateTimeOffset? lastModified = commentsTuples.Count > 0 | ||
| ? commentsTuples[0].Item1.CreatedAt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue:
https://issues.hibernatingrhinos.com/issue/HRINT-4776/ayende.com-Paginate-RSS-feeds-add-ETag-Last-Modified-headers