Skip to content

HRINT-4776 - paginate RSS feeds + add ETag#137

Open
VladShpilMan wants to merge 1 commit into
ayende:DOTNET8-MIGRATIONfrom
VladShpilMan:HRINT-4776
Open

HRINT-4776 - paginate RSS feeds + add ETag#137
VladShpilMan wants to merge 1 commit into
ayende:DOTNET8-MIGRATIONfrom
VladShpilMan:HRINT-4776

Conversation

@VladShpilMan

Copy link
Copy Markdown

@TheGoldenPlatypus

Copy link
Copy Markdown

I deployed this PR onto staging env - aka staging.ayende.com :

omer@Omer-Ratsaby-NLP:~$ 
BASE="https://staging.ayende.com/blog/rss"
TOK_INTERNAL_1="http://staging.ayende.com/blog/rss?token=mg9...yqufpRwg%3D%3D"
TOK_INTERNAL_2="http://staging.ayende.com/blog/rss?token=mg9...kAWBfFlw%3D%3D"
TOK_NET="http://staging.ayende.com/blog/rss?token=mg9yE..............qC2z4hYx"
TOK_ACCLAIM="http://staging.ayende.com/blog/rss?token=mg................zE%3D" # ---> expired
TOK_DANIEL="http://staging.ayende.com/blog/rss?token=mg9yE...........BMudjis8" # ---> expired


# public feed: status + headers. Expect 200, ETag and Last-Modified present.
# Who: any RSS reader (Feedly, crawlers, etc ..) subscribed to https://ayende.com/blog/rss.
# Scenario: A reader opens their app in the morning and it fetches the feed for the first time today.
# Verified: Feed returns properly, with cache headers so future polls can be cheap.
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$BASE"
HTTP/2 200
date: Sun, 17 May 2026 08:42:34 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 05 May 2026 12:00:00 GMT


# public feed body size. Expect tens of KB (was megabytes for some feeds before fix).
# Who: same as above - measures what the reader actually downloads.
# Scenario: first poll of the day, body bytes count.
# Verified: response body is small thanks to pagination cap (PublicFeedPageSize = 25).
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "$BASE"
size=267633


# public feed conditional GET. Expect HTTP=304, size=0.
# Who: Same RSS readers as above, but on their repeat polls (every 15–30 min).
# Scenario: Reader polls again 30 minutes later. Nothing changed. Should get an empty response.
# Verified: Repeat poll returns 304 with zero bytes -  the actual memory fix.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "HTTP=%{http_code} size=%{size_download}\n" \
  -H "If-None-Match: $(curl -sSLI -X GET "$BASE" | grep -i ^etag | cut -d' ' -f2- | tr -d '\r')" "$BASE"
HTTP=304 size=0


# category feed headers. Expect 200, ETag and Last-Modified present.
# Who: Readers subscribed to a specific category (e.g. only RavenDB/database posts).
# Scenario: Category-specific reader polling for new DB-related posts.
# Verified: Same small-body + ETag treatment as the public feed.
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$BASE/databases"
HTTP/2 200
date: Sun, 17 May 2026 08:45:18 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 11 Mar 2025 12:00:00 GMT


# category feed body size. Expect tens of KB (was 1.9 MB before fix).
# Who: same category-feed readers.
# Scenario: first poll on the category URL - measure actual bytes downloaded.
# Verified: pagination applied to category feeds too.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "$BASE/databases"
size=2673386


# active token feed headers (no page= parameter). Expect 200, ETag and Last-Modified present, title contains "for ravendb-net".
# Who: FeedBurner proxy (feeds.feedburner.com/AyendeRahien) and the ravendb-net integration polling their existing URLs.
# Scenario: Existing feed reader polls the URL it has used for years - no client-side change needed.
# Verified: Existing URL shape still works after the pagination PR.
omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$TOK_NET"
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:47:24 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg9...

HTTP/2 200
date: Sun, 17 May 2026 08:47:25 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 05 May 2026 12:00:00 GMT

mer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_NET" | grep -m1 '<title>'
<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien for ravendb-net</title><link>http://staging.ayende.com/</...


# active token feed body size. Expect tens of KB (was 6.7 MB before fix).
# Who: same FeedBurner / ravendb-net pollers.
# Scenario: a single feed fetch - measure actual bytes returned now vs the 6.7 MB baseline.
# Verified: token feed is now capped at 50 items per response (TokenAuthorizedFeedPageSize = 50).
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "$TOK_NET"
size=625683
omer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_NET" | grep -c '<item>'
46
omer@Omer-Ratsaby-NLP:~$


# active token conditional GET. Expect HTTP=304, size=0.
# Who: FeedBurner on its repeat polls (~every 16 min), and any other reader on the token feed.
# Scenario: 4 polls/hour where nothing has changed since last time.
# Verified: 304 round-trip works -  this is what kills the LOH pressure that crashed the EB instance every 2 hours.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "HTTP=%{http_code} size=%{size_download}\n" \
  -H "If-None-Match: $(curl -sSLI -X GET "$TOK_NET" | grep -i ^etag | cut -d' ' -f2- | tr -d '\r')" "$TOK_NET"
HTTP=304 size=0
omer@Omer-Ratsaby-NLP:~$


# active token page=2 headers. Expect 200, ETag ending with _p2, Last-Modified older than page 1.
# Who: future clients that want to walk historical posts beyond the latest 50 (e.g. one-time-import scripts).
# Scenario: backfill script asking for the next batch of posts within the token's numberOfDays window.
# Verified: pagination opt-in works; each page has its own ETag so it caches independently.
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:56:32 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg...x&page=2

HTTP/2 200
date: Sun, 17 May 2026 08:56:34 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af_p2"
last-modified: Mon, 21 Jul 2025 12:00:00 GMT


# active token page=2 body size + different content than page 1.
# Who: same backfill scripts.
# Scenario: confirming that page=2 actually returns the next 50 items, not duplicates of page 1.
# Verified: Skip = (page-1) * 50 is applied correctly.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "size=%{size_download}\n" "${TOK_NET}&page=2"
size=2302127


# other active tokens - sanity check that all active tokens behave the same shape as TOK_NET.
# Who: the two ravendb-internal token holders.
# Scenario: confirming consistent behavior across active tokens.
# Verified: each returns 200 with ETag and Last-Modified.
omer@Omer-Ratsaby-NLP:~$  curl -sSLI -X GET "$TOK_INTERNAL_1"
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:58:41 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg...

HTTP/2 200
date: Sun, 17 May 2026 08:58:41 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Mon, 01 Jan 0001 00:00:00 GMT

omer@Omer-Ratsaby-NLP:~$ curl -sSLI -X GET "$TOK_INTERNAL_2"
HTTP/1.1 302 Moved Temporarily
Server: awselb/2.0
Date: Sun, 17 May 2026 08:58:47 GMT
Content-Type: text/html
Content-Length: 110
Connection: keep-alive
Location: https://staging.ayende.com:443/blog/rss?token=mg9...

HTTP/2 200
date: Sun, 17 May 2026 08:58:47 GMT
content-type: text/xml
server: nginx/1.28.3
etag: "2026-05-15T13:00:25.8153737Zff196a28-5f0e-4ce5-a326-172dbe25f4af"
last-modified: Tue, 05 May 2026 12:00:00 GMT


# expired tokens - must not 500. Expect HTTP=200 (body would say "EXPIRED TOKEN" in title).
# Who: old token holders whose tokens have expired but their feed reader is still polling (daniel expired 2024, acclaim expired 2021).
# Scenario: long-forgotten reader still hits the URL daily despite token being invalid for years.
# Verified: server returns 200, no log spam, the reader sees a feed labeled EXPIRED TOKEN instead of crashing.
omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "acclaim HTTP=%{http_code}\n" "$TOK_ACCLAIM"
acclaim HTTP=200
omer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_ACCLAIM" | grep -m1 '<title>'
<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien for acclaim-wordpress-access-one-time-import EXPIRED TOKEN</....

omer@Omer-Ratsaby-NLP:~$ curl -sSL -o /dev/null -w "daniel  HTTP=%{http_code}\n" "$TOK_DANIEL"
daniel  HTTP=200
omer@Omer-Ratsaby-NLP:~$ curl -sSL "$TOK_DANIEL"  | grep -m1 '<title>'
<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien for daniel EXPIRED TOKEN</...


# edge cases - bad page= values must not 500. Expect HTTP=200 for all.
# Who: bots, vulnerability scanners, broken bookmarks, scrapers sending garbage in query parameters.
# Scenario: someone probes the endpoint with malformed page values to look for crashes.
# Verified: server treats invalid pages as page 1 (per the `if (page < 1) page = 1;` guard) and never 500s.
omer@Omer-Ratsaby-NLP:~$ for P in 0 -1 9999 abc; do echo -n "page=$P "; curl -sSL -o /dev/null -w "HTTP=%{http_code}\n" "${TOK_NET}&page=$P"; done
page=0 HTTP=200
page=-1 HTTP=200
page=9999 HTTP=200
page=abc HTTP=200
omer@Omer-Ratsaby-NLP:~$

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates RSS endpoints to support token-feed pagination and conditional HTTP caching with ETag and Last-Modified headers.

Changes:

  • Adds configurable RSS page sizes and a page parameter for token-authorized RSS feed pagination.
  • Quotes and parses ETags using HTTP header utilities, and adds page-specific ETags.
  • Adds Last-Modified handling to XML and 304 responses.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
RaccoonBlog.Web/Controllers/SyndicationController.cs Adds RSS pagination logic and conditional request handling for ETag/Last-Modified.
RaccoonBlog.Web/Controllers/RaccoonController.cs Extends shared XML and 304 helpers to emit cache headers.
RaccoonBlog.Web/Extensions/XmlResult.cs Emits ETag and Last-Modified headers on XML responses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +246 to +247
if (DateTimeOffset.TryParse(Request.Headers["If-Modified-Since"].ToString(), out var ifModifiedSince) && lastModified <= ifModifiedSince)
return true;
string responseETagHeader;
if (CheckEtag(stats, out responseETagHeader))
return HttpNotModified();
var lastModified = posts.Count > 0 ? posts[0].PublishAt : (DateTimeOffset)stats.Timestamp;
if (CheckEtag(stats, out var responseETagHeader))
return HttpNotModified();
DateTimeOffset? lastModified = commentsTuples.Count > 0
? commentsTuples[0].Item1.CreatedAt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants