A comprehensive HTTP request throttling middleware with flexible configuration from JSON/YAML files and both rate limiting and in-flight request limiting capabilities.
- Rate Limiting: Controls the frequency of requests over time using leaky bucket (with burst support) or sliding window algorithms
- In-Flight Limiting: Controls the number of concurrent requests being processed
- Flexible Key Extraction: Global, per-client IP, per-header value, or custom identity-based throttling
- Nginx-style Route Matching: Location-based route selection with exact matches and patterns
- Request Backlogging: Queue requests when limits are reached with configurable timeouts
- Dry-Run Mode: Test throttling configurations without enforcement
- Comprehensive Metrics: Prometheus metrics for monitoring and alerting
- Tag-Based Rules: Apply different throttling rules to different middleware instances
- Key Filtering: Include/exclude specific keys with glob pattern support
- Auto Retry-After: Automatic calculation of retry intervals for rate limits
Please see testable example to understand how to configure and use the middleware.
The throttling configuration is usually stored in the JSON or YAML file. The configuration consists of the following parts:
rateLimitZones. Each zone has a rate limit, burst limit, and other parameters.inFlightLimitZones. Each zone has an in-flight limit, backlog limit, and other parameters.rules. Each rule contains a list of routes and rate/in-flight limiting zones that should be applied to these routes.
Global throttling assesses all traffic coming into an API from all sources and ensures that the overall rate/concurrency limit is not exceeded. Overwhelming an endpoint with traffic is an easy and efficient way to carry out a denial-of-service attack. By using a global rate/concurrency limit, you can ensure that all incoming requests are within a specific limit.
rateLimitZones:
rl_total:
rateLimit: 5000/s
burstLimit: 10000
responseStatusCode: 503
responseRetryAfter: 5s
inFlightLimitZones:
ifl_total:
inFlightLimit: 5000
backlogLimit: 10000
backlogTimeout: 30s
responseStatusCode: 503
responseRetryAfter: 1m
rules:
- routes:
- path: "/"
rateLimits:
- zone: rl_total
inFlightLimits:
- zone: ifl_totalWith this configuration, all HTTP requests will be limited by rate (no more than 5000 requests per second, rateLimit). Excessive requests within the burst limit (10000 here, burstLimit) will be served immediately regardless of the specified rate, requests above the burst limit will be rejected with the 503 error (responseStatusCode), and "Retry-After: 5" HTTP header (responseRetryAfter).
Additionally, there is a concurrency limit. If 5000 requests (inFlightLimit) are being processed right now, new incoming requests will be backlogged (suspended).
If there are more than 10000 such backlogged requests (backlogLimit), the rest will be rejected immediately. The request can be in backlogged status for no more than 30 seconds (backlogTimeout) and then it will be rejected. Response for rejected request contains 503 (responseStatusCode) HTTP error code and "Retry-After: 60" HTTP header (responseRetryAfter).
backlogLimit and backlogTimeout can be specified for rate-limiting zones too.
Per-client throttling is focused on controlling traffic from individual sources and making sure that API clients are staying within their prescribed limits. Per-client throttling allows avoiding cases when one client exhausts the application resources of the entire backend service (for example, it uses all connections from the DB pool), and all other clients have to wait for their release.
To implement per-client throttling, the package uses the concept of "identity". If the client is identified by a unique key, the package can throttle requests per this key.
MiddlewareOpts struct has a GetKeyIdentity callback that should return the key for the current request. It may be a user ID, JWT "sub" claim, or any other unique identifier.
If any rate/in-flight limiting zone's key.type field is set to identity, the GetKeyIdentity callback must be implemented.
Example of per-client throttling configuration:
rateLimitZones:
rl_identity:
rateLimit: 50/s
burstLimit: 100
responseStatusCode: 429
responseRetryAfter: auto
key:
type: identity
maxKeys: 50000
inFlightLimitZones:
ifl_identity:
inFlightLimit: 64
backlogLimit: 128
backlogTimeout: 30s
responseStatusCode: 429
key:
type: identity
maxKeys: 50000
ifl_identity_expensive_op:
inFlightLimit: 4
backlogLimit: 8
backlogTimeout: 30s
responseStatusCode: 429
key:
type: identity
maxKeys: 50000
excludedKeys:
- "150853ab-322c-455d-9793-8d71bf6973d9" # Exclude root admin.
rules:
- routes:
- path: "/"
rateLimits:
- zone: rl_identity
inFlightLimits:
- zone: ifl_identity
alias: per_identity
- routes:
- path: "= /api/v1/do_expensive_op_1"
methods: POST
- path: "= /api/v1/do_expensive_op_2"
methods: POST
inFlightLimits:
- zone: ifl_identity_expensive_op
alias: per_identity_expensive_opsAll throttling counters are stored inside an in-memory LRU cache (maxKeys determines its size).
For the rate-limiting zone, responseRetryAfter may be specified as "auto". In this case, the time when a client may retry the request will be calculated automatically.
Each throttling rule may contain an unlimited number of rate/in-flight limiting zones. All rule zones will be applied to all specified routes. The route is described as path + list of HTTP methods. To select a route, exactly the same algorithm is used as to select a location in Nginx (http://nginx.org/en/docs/http/ngx_http_core_module.html#location). Also, the route may have an alias that will be used in the Prometheus metrics label (see example below).
rateLimitZones:
rl_identity:
alg: sliding_window
rateLimit: 15/m
responseStatusCode: 429
responseRetryAfter: auto
key:
type: identity
maxKeys: 50000
rules:
- routes:
- path: "/"
rateLimits:
- zone: rl_identity
alias: per_identityIn this example sliding window algorithm will be used for rate-limiting (alg parameter has "token_bucket" value by default). It means, only 15 requests are allowed per minute. They could be sent even simultaneously, but all exceeding requests that are received in the same minute will be rejected.
rateLimitZones:
rl_bad_user_agents:
rateLimit: 500/s
burstLimit: 1000
responseStatusCode: 503
responseRetryAfter: 15s
key:
type: header
headerName: "User-Agent"
noBypassEmpty: true
includedKeys:
- ""
- "Go-http-client/1.1"
- "python-requests/*"
- "Python-urllib/*"
maxKeys: 1000
rules:
- routes:
- path: "/"
rateLimits:
- zone: rl_bad_user_agentsrateLimitZones:
rl_by_remote_addr:
rateLimit: 100/s
burstLimit: 1000
responseStatusCode: 503
responseRetryAfter: auto
key:
type: remote_addr
maxKeys: 10000
rules:
- routes:
- path: "/"
rateLimits:
- zone: rl_by_remote_addrThe package collects several metrics in the Prometheus format:
rate_limit_rejects_total. Type: counter; Labels: dry_run, rule.in_flight_limit_rejects_total. Type: counter; Labels: dry_run, rule, backlogged.
Tags are useful when different rules of the same configuration should be used by different middlewares. For example, suppose you want to have two different throttling rules:
- A rule for all requests.
- A rule for all identity-aware (authorized) requests.
Tags can be specified at two levels:
Tags can be specified at the rule level. This approach is useful when you want different middlewares to process completely different sets of rules:
# ...
rules:
- routes:
- path: "/hello"
methods: GET
rateLimits:
- zone: rl_zone1
tags: all_reqs
- routes:
- path: "/feedback"
methods: POST
inFlightLimits:
- zone: ifl_zone1
tags: all_reqs
- routes:
- path: /api/1/users
methods: PUT
rateLimits:
- zone: rl_zone2
tags: require_auth_reqs
# ...In your code, you will have two middlewares that will be executed at different steps of the HTTP request serving process. Each middleware should only apply its own throttling rule.
allMw := MiddlewareWithOpts(cfg, "my-app-domain", throttleMetrics, MiddlewareOpts{Tags: []string{"all_reqs"}})
requireAuthMw := MiddlewareWithOpts(cfg, "my-app-domain", throttleMetrics, MiddlewareOpts{Tags: []string{"require_auth_reqs"}})You can specify tags per zone within a rule, allowing fine-grained control over which zones are applied by different middlewares. This approach avoids route duplication when the same routes need different zones for different middlewares:
# ...
rules:
- routes:
- path: "/"
excludedRoutes:
- path: "/healthz"
- path: "/metrics"
rateLimits:
- zone: rl_total
tags: all_reqs
- zone: rl_identity
tags: authn_reqs
inFlightLimits:
- zone: ifl_total
tags: all_reqs
- zone: ifl_identity
tags: authn_reqs
# ...Different middlewares can selectively apply zones based on their tags:
allMw := MiddlewareWithOpts(cfg, "my-app-domain", throttleMetrics, MiddlewareOpts{Tags: []string{"all_reqs"}})
authnMw := MiddlewareWithOpts(cfg, "my-app-domain", throttleMetrics, MiddlewareOpts{Tags: []string{"authn_reqs"}})When both rule-level and zone-level tags are specified, rule-level tags take precedence:
- If the middleware's tags match the rule-level tags, all zones in that rule are applied (regardless of zone-level tags).
- If the middleware's tags don't match the rule-level tags, then zone-level tags are checked for each zone individually.
- If neither rule-level nor zone-level tags match, the rule is skipped entirely.
Before configuring real-life throttling, usually, it's a good idea to try the dry-run mode. It doesn't affect the processing requests flow, however, all excessive requests are still counted and logged. Dry-run mode allows you to better understand how your API is used and determine the right throttling parameters.
The dry-run mode can be enabled using the dryRun configuration parameter. Example:
rateLimitZones:
rl_identity:
rateLimit: 50/s
burstLimit: 100
responseStatusCode: 429
responseRetryAfter: auto
key:
type: identity
maxKeys: 50000
dryRun: true
inFlightLimitZones:
ifl_identity:
inFlightLimit: 64
backlogLimit: 128
backlogTimeout: 30s
responseStatusCode: 429
key:
type: identity
maxKeys: 50000
dryRun: true
rules:
- routes:
- path: "/"
rateLimits:
- zone: rl_identity
inFlightLimits:
- zone: ifl_identity
alias: per_identityIf specified limits are exceeded, the corresponding messages will be logged.
For rate-limiting:
{"msg": "too many requests, serving will be continued because of dry run mode", "rate_limit_key": "ee9a0dd8-7396-5478-8b83-ab7402d6746b"}For in-flight limiting:
{"msg": "too many in-flight requests, serving will be continued because of dry run mode", "in_flight_limit_key": "3c00e780-5721-59f8-acad-f0bf719777d4"}