Build a collection once. Query it like a table. Read pixels 20x faster from cloud COGs.
Rasteret is an index-first reader for cloud-hosted tiled GeoTIFFs and COGs. It builds a queryable Arrow/Parquet collection with scene metadata, asset URLs, CRS sidecars, and parsed COG header metadata. Pixels stay in the original COGs.
After that, you can filter, join, and enrich the collection as a table, then read only the pixels you need into NumPy, xarray, GeoPandas, TorchGeo, or Arrow point-sample tables.
STAC / Parquet / Arrow table -> Rasteret Collection -> NumPy / xarray / GeoPandas / TorchGeo
external labels / plots / points filter/join/share read pixels on demand
Remote raster workflows often repeat the same setup work: STAC loops, COG header parsing, tile byte-range planning, CRS transforms, retries, and output assembly.
Rasteret moves the expensive raster metadata discovery into a Collection build
step and reuses that metadata for later reads.
That helps when you:
- train or evaluate models over many remote COG scenes
- repeatedly sample the same imagery with different AOIs, points, labels, or splits
- avoid rediscovering raster header metadata in new notebooks, containers, or machines
- want one source collection to feed TorchGeo, xarray, NumPy, GeoPandas, and Arrow tools
- need DuckDB, Polars, PyArrow, or GeoPandas to work on metadata and external geometries before pixel reads
import rasteret
sentinel2_collection = rasteret.build(
"earthsearch/sentinel-2-l2a",
name="s2_bangalore",
bbox=(77.5, 12.9, 77.7, 13.1),
date_range=("2024-01-01", "2024-01-31"),
)
clear = sentinel2_collection.subset(cloud_cover_lt=50)
arr = clear.get_numpy(
geometries=(77.55, 13.01, 77.58, 13.08),
bands=["B04", "B08"],
)The same collection can feed TorchGeo:
dataset = clear.to_torchgeo_dataset(
bands=["B04", "B03", "B02", "B08"],
chip_size=256,
)Rasteret works well with the table tools you already use. External labels, farm plots, asset locations, fire boundaries, or point samples can stay in GeoPandas, DuckDB, Polars, or PyArrow until you need pixels.
import duckdb
import geopandas as gpd
import rasteret
from shapely.geometry import box
plots = gpd.GeoDataFrame(
{
"plot_id": ["plot-a"],
"crop": ["rice"],
},
geometry=[box(77.55, 13.01, 77.58, 13.08)],
crs="OGC:CRS84",
)
plots_arrow = plots.to_arrow(geometry_encoding="WKB")
con = duckdb.connect()
con.sql("INSTALL spatial; LOAD spatial;")
con.register("sen2_rasteret", clear)
con.register("plots", plots_arrow)
# Bring your own geometries
plot_aois = con.sql("""
SELECT
plots.plot_id,
plots.crop,
plots.geometry AS plot_geometry
FROM sen2_rasteret, plots
WHERE sen2_rasteret."eo:cloud_cover" < 10
AND ST_Intersects(
ST_GeomFromWKB(sen2_rasteret.geometry),
ST_GeomFromWKB(plots.geometry)
)
""")
plot_patches = clear.get_gdf(
geometries=plot_aois,
geometry_column="plot_geometry",
geometry_crs=4326,
bands=["B04", "B08"],
)The same pattern works with Polars or PyArrow for split/label columns, and with
sample_points(...) when your external data is point-based. get_gdf(...) and
sample_points(...) keep business columns such as plot_id in their outputs.
| Task | Rasteret surface |
|---|---|
| Build from a registered dataset | rasteret.build("catalog/id", ...) |
| Build from your own Parquet, GeoParquet, DuckDB, Polars, or Arrow record table | rasteret.build_from_table(...) |
| Reopen a saved or prebuilt Collection | rasteret.load(path_or_dataset_id) |
| Re-wrap a read-ready Arrow object | rasteret.as_collection(...) |
| Get numpy arrays | Collection.get_numpy(...) |
| Get xarray dataset | Collection.get_xarray(...) |
| Get GeoPandas rows with pixel arrays | Collection.get_gdf(...) |
| Sample pixels at points | Collection.sample_points(...) |
| Train/infer with TorchGeo | Collection.to_torchgeo_dataset(...) |
Rasteret ships with dataset IDs so you do not have to remember STAC endpoints,
band maps, license metadata, or cloud access settings. Most catalog entries are
recipes for rasteret.build(...): Rasteret searches the source catalog, parses
the COG metadata once, and writes a reusable local Collection.
Only one built-in ID is already a read-ready Rasteret Collection:
aef/v1-annual. Use rasteret.load("aef/v1-annual") for AlphaEarth Foundation
Embeddings. The built-in alias loads Rasteret's maintained Source Cooperative
Collection. You do not need to call build() for this dataset.
| ID | Dataset | Coverage | Auth | Use |
|---|---|---|---|---|
aef/v1-annual |
AlphaEarth Foundation Embeddings (Annual) | global | none | rasteret.load(...) |
earthsearch/sentinel-2-l2a |
Sentinel-2 Level-2A | global | none | rasteret.build(...) |
earthsearch/landsat-c2-l2 |
Landsat Collection 2 Level-2 | global | required | rasteret.build(...) |
earthsearch/naip |
NAIP | north-america | required | rasteret.build(...) |
earthsearch/cop-dem-glo-30 |
Copernicus DEM 30m | global | none | rasteret.build(...) |
earthsearch/cop-dem-glo-90 |
Copernicus DEM 90m | global | none | rasteret.build(...) |
pc/sentinel-2-l2a |
Sentinel-2 Level-2A (Planetary Computer) | global | required | rasteret.build(...) |
pc/io-lulc-annual-v02 |
ESRI 10m Land Use/Land Cover | global | required | rasteret.build(...) |
pc/alos-dem |
ALOS World 3D 30m DEM | global | required | rasteret.build(...) |
pc/nasadem |
NASADEM | global | required | rasteret.build(...) |
pc/esa-worldcover |
ESA WorldCover | global | required | rasteret.build(...) |
pc/usda-cdl |
USDA Cropland Data Layer | conus | required | rasteret.build(...) |
You can browse the same list from the CLI:
rasteret datasets list
rasteret datasets info aef/v1-annualTo make your own dataset ID for a reusable local collection or Parquet record table, see Register A Local Collection into Dataset Catalog.
Rasteret is 10x to 20x faster than rasterio/GDAL
| Scenario | TorchGeo/rasterio | Rasteret | Speedup |
|---|---|---|---|
| Single AOI, 15 scenes | 9.08 s | 1.14 s | 8.0x |
| Multi-AOI, 30 scenes | 42.05 s | 2.25 s | 18.7x |
| Cross-CRS, 12 scenes | 12.47 s | 0.59 s | 21.3x |
Rasteret also compares well against time-series workflows that use Google Earth Engine or thread-pooled rasterio for the measured setup:
| Library | First run (cold) | Subsequent runs (hot) |
|---|---|---|
| Rasterio + ThreadPool | 32 s | 24 s |
| Google Earth Engine | 10-30 s | 3-5 s |
| Rasteret | 3 s | 3 s |
See the Benchmarks guide
for methodology, environment details, and additional Hugging Face datasets
comparisons.
uv pip install rasteretOptional integrations:
uv pip install "rasteret[torchgeo]"
uv pip install "rasteret[aws]"
uv pip install "rasteret[azure]"
uv pip install "rasteret[all]" # all optional integrations for explorationRasteret requires Python 3.12 or later.
- Getting Started
- Build from Parquet and Arrow Tables
- Bring Your Own AOIs, Points, And Metadata
- TorchGeo Integration
- Benchmarks
- API Reference
Code: Apache-2.0

