Skip to content

terrafloww/rasteret

Rasteret

Build a collection once. Query it like a table. Read pixels 20x faster from cloud COGs.

Documentation Discord PyPI Python License

Rasteret is an index-first reader for cloud-hosted tiled GeoTIFFs and COGs. It builds a queryable Arrow/Parquet collection with scene metadata, asset URLs, CRS sidecars, and parsed COG header metadata. Pixels stay in the original COGs.

After that, you can filter, join, and enrich the collection as a table, then read only the pixels you need into NumPy, xarray, GeoPandas, TorchGeo, or Arrow point-sample tables.

STAC / Parquet / Arrow table -> Rasteret Collection -> NumPy / xarray / GeoPandas / TorchGeo
external labels / plots / points    filter/join/share          read pixels on demand

Why Rasteret

Remote raster workflows often repeat the same setup work: STAC loops, COG header parsing, tile byte-range planning, CRS transforms, retries, and output assembly.

Rasteret moves the expensive raster metadata discovery into a Collection build step and reuses that metadata for later reads.

That helps when you:

  • train or evaluate models over many remote COG scenes
  • repeatedly sample the same imagery with different AOIs, points, labels, or splits
  • avoid rediscovering raster header metadata in new notebooks, containers, or machines
  • want one source collection to feed TorchGeo, xarray, NumPy, GeoPandas, and Arrow tools
  • need DuckDB, Polars, PyArrow, or GeoPandas to work on metadata and external geometries before pixel reads

Quick Example

import rasteret

sentinel2_collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="s2_bangalore",
    bbox=(77.5, 12.9, 77.7, 13.1),
    date_range=("2024-01-01", "2024-01-31"),
)

clear = sentinel2_collection.subset(cloud_cover_lt=50)

arr = clear.get_numpy(
    geometries=(77.55, 13.01, 77.58, 13.08),
    bands=["B04", "B08"],
)

The same collection can feed TorchGeo:

dataset = clear.to_torchgeo_dataset(
    bands=["B04", "B03", "B02", "B08"],
    chip_size=256,
)

Bring Your Own Geometry And Metadata

Rasteret works well with the table tools you already use. External labels, farm plots, asset locations, fire boundaries, or point samples can stay in GeoPandas, DuckDB, Polars, or PyArrow until you need pixels.

import duckdb
import geopandas as gpd
import rasteret
from shapely.geometry import box

plots = gpd.GeoDataFrame(
    {
        "plot_id": ["plot-a"],
        "crop": ["rice"],
    },
    geometry=[box(77.55, 13.01, 77.58, 13.08)],
    crs="OGC:CRS84",
)
plots_arrow = plots.to_arrow(geometry_encoding="WKB")

con = duckdb.connect()
con.sql("INSTALL spatial; LOAD spatial;")
con.register("sen2_rasteret", clear)
con.register("plots", plots_arrow)

# Bring your own geometries
plot_aois = con.sql("""
    SELECT
        plots.plot_id,
        plots.crop,
        plots.geometry AS plot_geometry
    FROM sen2_rasteret, plots
    WHERE sen2_rasteret."eo:cloud_cover" < 10
      AND ST_Intersects(
          ST_GeomFromWKB(sen2_rasteret.geometry),
          ST_GeomFromWKB(plots.geometry)
      )
""")

plot_patches = clear.get_gdf(
    geometries=plot_aois,
    geometry_column="plot_geometry",
    geometry_crs=4326,
    bands=["B04", "B08"],
)

The same pattern works with Polars or PyArrow for split/label columns, and with sample_points(...) when your external data is point-based. get_gdf(...) and sample_points(...) keep business columns such as plot_id in their outputs.

What You Can Do

Task Rasteret surface
Build from a registered dataset rasteret.build("catalog/id", ...)
Build from your own Parquet, GeoParquet, DuckDB, Polars, or Arrow record table rasteret.build_from_table(...)
Reopen a saved or prebuilt Collection rasteret.load(path_or_dataset_id)
Re-wrap a read-ready Arrow object rasteret.as_collection(...)
Get numpy arrays Collection.get_numpy(...)
Get xarray dataset Collection.get_xarray(...)
Get GeoPandas rows with pixel arrays Collection.get_gdf(...)
Sample pixels at points Collection.sample_points(...)
Train/infer with TorchGeo Collection.to_torchgeo_dataset(...)

Dataset Catalog

Rasteret ships with dataset IDs so you do not have to remember STAC endpoints, band maps, license metadata, or cloud access settings. Most catalog entries are recipes for rasteret.build(...): Rasteret searches the source catalog, parses the COG metadata once, and writes a reusable local Collection.

Only one built-in ID is already a read-ready Rasteret Collection: aef/v1-annual. Use rasteret.load("aef/v1-annual") for AlphaEarth Foundation Embeddings. The built-in alias loads Rasteret's maintained Source Cooperative Collection. You do not need to call build() for this dataset.

ID Dataset Coverage Auth Use
aef/v1-annual AlphaEarth Foundation Embeddings (Annual) global none rasteret.load(...)
earthsearch/sentinel-2-l2a Sentinel-2 Level-2A global none rasteret.build(...)
earthsearch/landsat-c2-l2 Landsat Collection 2 Level-2 global required rasteret.build(...)
earthsearch/naip NAIP north-america required rasteret.build(...)
earthsearch/cop-dem-glo-30 Copernicus DEM 30m global none rasteret.build(...)
earthsearch/cop-dem-glo-90 Copernicus DEM 90m global none rasteret.build(...)
pc/sentinel-2-l2a Sentinel-2 Level-2A (Planetary Computer) global required rasteret.build(...)
pc/io-lulc-annual-v02 ESRI 10m Land Use/Land Cover global required rasteret.build(...)
pc/alos-dem ALOS World 3D 30m DEM global required rasteret.build(...)
pc/nasadem NASADEM global required rasteret.build(...)
pc/esa-worldcover ESA WorldCover global required rasteret.build(...)
pc/usda-cdl USDA Cropland Data Layer conus required rasteret.build(...)

You can browse the same list from the CLI:

rasteret datasets list
rasteret datasets info aef/v1-annual

To make your own dataset ID for a reusable local collection or Parquet record table, see Register A Local Collection into Dataset Catalog.

Performance

Rasteret is 10x to 20x faster than rasterio/GDAL

Scenario TorchGeo/rasterio Rasteret Speedup
Single AOI, 15 scenes 9.08 s 1.14 s 8.0x
Multi-AOI, 30 scenes 42.05 s 2.25 s 18.7x
Cross-CRS, 12 scenes 12.47 s 0.59 s 21.3x

Processing time comparison

Rasteret also compares well against time-series workflows that use Google Earth Engine or thread-pooled rasterio for the measured setup:

Library First run (cold) Subsequent runs (hot)
Rasterio + ThreadPool 32 s 24 s
Google Earth Engine 10-30 s 3-5 s
Rasteret 3 s 3 s

Single request performance

See the Benchmarks guide for methodology, environment details, and additional Hugging Face datasets comparisons.

Install

uv pip install rasteret

Optional integrations:

uv pip install "rasteret[torchgeo]"
uv pip install "rasteret[aws]"
uv pip install "rasteret[azure]"
uv pip install "rasteret[all]"  # all optional integrations for exploration

Rasteret requires Python 3.12 or later.

Learn More

License

Code: Apache-2.0

About

Rasteret is a library for 20x+ faster reads of GeoTIFF than Rasterio/GDAL. Interops with TorchGeo, Xarray, DuckDB, Polars

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors