Skip to content

feat(weather): HKO Open Data API — daily-temp source to unblock deferred Hong Kong settlement #47

@helloiamvu

Description

@helloiamvu

TL;DR

The HKO Open Data API (data.weather.gov.hk/weatherAPI/opendata/opendata.php) exposes daily max / min / mean temperature for the Hong Kong Observatory (HKO) headquarters and ~34 other HK stations, back to 1884, as JSON or CSV.

This is significant because Hong Kong's Polymarket market settles against HKO (the Observatory), which has no airport ICAO and no METAR feed — which is the exact reason HK is currently deferred to v0.2:

  • DEFERRED_STATIONS = frozenset({"HKO", "RCSS"})packages/core/src/mostlyright/international.py:68
  • "Hong Kong settles against HKO (the Observatory) which has no airport ICAO and is not a registry station, so it carries no tag (v0.2-deferred source)."packages/core/src/mostlyright/_internal/_stations.py:1069
  • "previously HK-high routed via VHHH METAR" (the airport — the wrong station) — international.py:64

Our METAR-only sources (IEM ASOS, AWC) structurally cannot serve the Observatory. The HKO Open Data API's station=HKO is the literal settlement source. So this is not "just another international source" — it is the unblock path for the deferred HK market.

Bottom line on "same data as IEM": HKO provides the settlement-critical daily high/low directly and authoritatively, but it does not replicate the full IEM observation record (no sub-daily obs, no time-of-extreme, no raw_metar, no dewpoint/wind/precip from these endpoints). It is structurally a daily-climate source (like the IEM CLI path), not a sub-daily observation source (like IEM ASOS).

All findings below are empirically verified against the live API on 2026-05-29.


1. The API

Endpoint https://data.weather.gov.hk/weatherAPI/opendata/opendata.php
Method GET
Formats JSON or CSV (`rformat=json
Auth None (public open data)
Doc HKO Open Data API Documentation v1.13 (Sep 2025), §3 "Open Data (Climate and Weather Information) API"

Temperature dataTypes

dataType Meaning IEM analogue
CLMMAXT Daily maximum temperature (°C) obs_high_f
CLMMINT Daily minimum temperature (°C) obs_low_f
CLMTEMP Daily mean temperature (°C) obs_mean_f

Query parameters

Param Values Notes
dataType CLMMAXT / CLMMINT / CLMTEMP required
station 3-letter HKO code, e.g. HKO, HKA required for temp datasets
rformat json / csv optional, default CSV
year e.g. 2024 optional; omit → entire history in one response
month 112 optional, pass with year
lang en / tc / sc optional, default en

Verified request/response

GET .../opendata.php?dataType=CLMMAXT&rformat=json&station=HKO&year=2024&month=1
{"type":["","Daily Maximum Temperature (°C) at the Hong Kong Observatory"],
 "fields":["年/Year","月/Month","日/Day","數值/Value","數據完整性/data Completeness"],
 "data":[["2024","1","1","22.0","C"],["2024","1","2","20.5","C"], ],
 "legend":["*** 沒有數據/unavailable","# 數據不完整/data incomplete","C 數據完整/data Complete"]}

CSV form is the same data with a UTF-8 BOM, two title rows, a header row, the data rows, then three legend rows appended (legend rows must be stripped by any parser).


2. Empirical verification (2026-05-29)

History depth — excellent for max/min

Series Earliest verified Latest Notes
CLMMAXT @ HKO 1884-01-01 2026-04-30 49,428 rows, single pull = 838 KB
CLMMINT @ HKO 1884-01-01 confirmed back to 1884
CLMTEMP @ HKO (mean) later than 1884 year=1884 returns data:[] → mean history is shallower than max/min (exact start TBD)
CLMMAXT @ HKA (airport) 1997-06-01 2026-03-31 ~28 yrs (airport opened 1998); first row ***,#
  • WWII gap: 1940–1946 are entirely absent rows (not flagged) at HKO; 1938, 1939, 1947 present.
  • Completeness flags: C complete / # incomplete / *** unavailable. HKO HQ max-temp is ~100% C across 142 years; impossible dates (e.g. 1900-02-29) emit a *** row.

Latency — monthly, NOT real-time (major gap vs IEM)

  • year=2026&month=5 returns data:[] for all temp datasets as of 2026-05-29 → current month is not published.
  • Data is published monthly after month-end. Per-station lag varies (HKO had through April; HKA only through March on the same day).
  • Contrast: IEM ASOS is near-real-time (hourly METAR).

Station coverage — 35 stations confirmed returning daily temp data

Settlement-relevant: HKO = Hong Kong Observatory (settlement station), HKA = Hong Kong International Airport (≈ VHHH).
Full verified set (CLMMAXT): CCH HKA HKO HKS HPV JKB KAT KFB KLT KP KTG LFS NGP NLS PEN PLC SEK SHA SKG SSP STY TAP TC TKL TMS TU1 TW TWN TY1 VP1 WGL WLP WTS YCT YLP.
(Tested codes SE SF SHL TBT TMT TPK TPO TUN returned the invalid-param error. Note: the temperature station set differs from the tide station set documented in the PDF.)

Units / precision

  • Values are °C at 0.1° resolution (e.g. 22.0, 19.6). Conversion: °F = °C × 9/5 + 32 (parity-sensitive rounding — cf. the recent GHCNh integer-°F precision work, commits 01d81cd / b9fed51).

Error behavior

  • Invalid station/params → trilingual HTML error text (not JSON), returned with HTTP 200:

    Please include valid parameters in API request. For details… <a href="…HKO_Open_Data_API_Documentation.pdf">…

  • A client must treat "non-JSON body" and "empty data array" as distinct failure/empty signals.

Timezone

  • HKO daily extremes are calendar-day figures in HKT (UTC+8, no DST), conventionally midnight-to-midnight met-day. No timestamps are returned. ⚠️ The exact daily-window definition should be confirmed against HKO methodology — it is settlement-critical.

3. IEM parity gap analysis

IEM (ASOS observation path)iem_asos.py:63 (…/cgi-bin/request/asos.py?data=all&tz=Etc/UTC&format=comma) → _iem.py parser → daily aggregation in _internal/_pairs.py (_obs_aggregates, line 142) → merged via SOURCE_PRIORITY = {"awc": 3, "iem": 2, "ghcnh": 1} (_internal/merge/observations.py:18), dedup key (station_code, observed_at, observation_type).

Capability / field IEM ASOS HKO Open Data API Verdict
obs_high_f (daily max) max(temp_f) from METAR CLMMAXT (°C→°F) direct + authoritative for HK
obs_low_f (daily min) min(temp_f) from METAR CLMMINT (°C→°F) direct + authoritative for HK
obs_mean_f mean(temp_f) CLMTEMP ⚠️ available but shallower history; computed differently (continuous mean vs mean of hourly)
obs_mean_dewpoint_f from METAR
obs_max_wind_kt / obs_max_gust_kt from METAR — (separate HKO wind datasets, not these endpoints)
obs_total_precip_in sum(precip_1hr) — (HKO daily rainfall is a separate dataset)
obs_count # obs in window n/a (pre-aggregated)
obs_high_at / obs_low_at (time of extreme) sub-daily-derived (weather/obs.py:82) unattainable (no sub-daily)
raw_metar (MetPy re-parse) preserved — (HKO HQ has no METAR)
sub-daily observations yes (hourly + SPECI) no
units °F native °C (convert) ⚠️
timezone semantics UTC obs → LST settlement window HKT calendar-day, pre-aggregated ⚠️ different aggregation model
latency near-real-time monthly; current month absent ⚠️ major
history varies by station 1884 (max/min) ✅ excellent

Conclusion: For the daily high/low that drives settlement, HKO is a clean YES — and for Hong Kong specifically it is more authoritative than IEM, because IEM's only HK option (VHHH airport METAR) is the wrong station. For the full IEM observation record (sub-daily rows, time-of-extreme, raw_metar, dewpoint/wind/precip), HKO is a NO from these endpoints. HKO behaves like a daily-climate source (closer to the IEM CLI path) than a sub-daily observation source.


4. Integration considerations (inputs for a later GSD phase — not a committed plan)

  • Register HKO as a station. It is currently not in the registry (_stations.py:1069, :1106). Likely: code="HKO", icao="", ghcnh_id="", tz="Asia/Hong_Kong", country="HK", lat/lon of Observatory HQ (~22.302, 114.174 — confirm), venues={"polymarket"}.
  • New fetcher + parser (packages/weather/.../_fetchers/hko.py) returning daily rows — does not flow through merge_observations (which dedups sub-daily by observed_at).
  • Column semantics decision: does HKO populate obs_high_f/obs_low_f, a new hko_* namespace, or the cli_* slots? (This is the key design question.)
  • Source priority: for HK, HKO is the sole/authoritative source (no AWC/IEM/GHCNh for the Observatory), so priority interplay is largely moot — but matters if HKO's HKA is used to cross-check VHHH METAR.
  • Remove "HKO" from DEFERRED_STATIONS (international.py:68) and adjust DeferredMarketError routing.
  • Units & rounding: °C→°F policy (parity-sensitive — mirror the GHCNh integer-°F precedent).
  • Cadence/cache: monthly publication, current month absent → historical months are immutable and cache cleanly.
  • Tests: live + recorded cassettes; parity fixtures if research() output changes.

5. Open questions

  1. Exact HKO daily-window definition (HKT midnight–midnight?) — settlement-critical, confirm vs HKO methodology.
  2. CLMTEMP (mean) history start year.
  3. Per-station publication latency / SLA (HKO vs HKA differ).
  4. Does Polymarket settle HK on max, min, or both? (drives required dataTypes)
  5. HKO Open Data licensing/attribution + any rate limits.
  6. Relationship to the v0.2 MCP / international roadmap sequencing.

6. TS Parity (per CLAUDE.md dual-SDK rule)

Touches the public research() surface for HK. TS SDK would need identical research() behavior for HK; implementation adapted (fetch HKO Open Data API — simple GET JSON/CSV, no Node-only deps, browser-friendly). Same-phase vs parity-ticket per CROSS-SDK-SYNC.md — TBD at planning.

References


Investigation + empirical verification by Claude Code on 2026-05-29. Next step: GSD planning pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions