Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
data
node_modules
package-lock.json
.npm-cache
.geonames-build
.DS_Store
tmp
*.sqlite
*.db
252 changes: 246 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Offline Geocoder

Node library for reverse geocoding. Designed to be used offline (for example
embedded in a desktop or mobile application) - no web requests are made to
perform a lookup.
Node and React Native library for offline geocoding. Designed to be used
offline (for example embedded in a desktop or mobile application) no web
requests are made to perform a lookup.

## Data

Expand Down Expand Up @@ -32,20 +32,44 @@ lookups per second with a single process.
npm install --save offline-geocoder
```

For Node you also need `sqlite3`:

```
npm install --save sqlite3
```

For Expo / React Native, install `expo-sqlite` instead:

```
npx expo install expo-sqlite
```

You also need to obtain a database which isn't included in the package, to
generate your own take a look in `scripts`.
generate your own take a look at the [Generating the database](#generating-the-database)
section below.

## Usage

When you initialize the library you need to pass the location of the database:

```javascript
const geocoder = require('offline-geocoder')({ database: 'data/geodata.db' })
const geocoder = require('offline-geocoder')({ database: 'data/geocoder.sqlite' })
```

To enable boundary-aware reverse geocoding, pass `reverseMode: 'boundary'`
(default is `centroid` for backward compatibility):

```javascript
const geocoder = require('offline-geocoder')({
database: 'data/geocoder.sqlite',
reverseMode: 'boundary',
boundary: { basePrecision: 4, maxPrecision: 7 }
})
```

### Reverse Geocoding

To perform a revese geocode lookup just pass the coordinates:
To perform a reverse geocode lookup just pass the coordinates:

```javascript
geocoder.reverse(41.89, 12.49)
Expand Down Expand Up @@ -76,6 +100,222 @@ geocoder.reverse(41.89, 12.49, function(error, result) {
})
```

Boundary mode keeps the same return payload shape and supports two boundary
storage modes:
- compact lookup (`compact_places` + `compact_geohash_lookup`)
- full polygon mode (`places` + `place_geohash_cover` + `place_geometry`)

### Forward Geocoding

Forward geocoding matches a city name to its canonical entry. Requires a
database generated with the updated schema (see below).

```javascript
geocoder.forward('rome')
.then(function(result) {
console.log(result)
})
```

Returns `undefined` when no match is found, or when using an older database
without the required columns.

### Location Lookup

Look up a city by its GeoNames id:

```javascript
geocoder.location().find(3169070)
geocoder.location.find('geonames:3169070')
```

Returns `undefined` when the id doesn't exist. Both numeric ids and
`geonames:<id>` strings are accepted — use the prefixed form as a stable
grouping key across datasets.

## Expo / React Native

The React Native entrypoint avoids Node-only modules:

```javascript
const createGeocoder = require('offline-geocoder/expo')

const db = await SQLite.openDatabaseAsync('geocoder.sqlite')
const geocoder = createGeocoder({ db: db })

geocoder.reverse(41.89, 12.49)
.then(function(result) {
console.log(result)
})
```

You'll need to bundle the SQLite database file with your app assets and copy
it to a location accessible by `expo-sqlite` on first launch.

## Generating the database

The repo includes a script to generate a SQLite database from GeoNames dumps:

```bash
./scripts/generate_geonames.sh data/geocoder.sqlite
```

Environment variables for customization:

| Variable | Default | Description |
|---|---|---|
| `GEONAMES_DATASET` | `cities1000` | GeoNames dump file to use |
| `GEONAMES_WORKDIR` | current directory | Working directory for temp files |
| `GEONAMES_DOWNLOAD` | `1` | Set to `0` to skip downloads |
| `GEONAMES_FEATURE_CODES` | `PPLA,PPLA2,PPLA3,PPLA4,PPLA5,PPLC` | Feature codes to keep |
| `GEONAMES_MIN_POPULATION` | `0` | Minimum population filter |
| `GEONAMES_INCLUDE_ADMIN1` | `1` | Set to `0` to skip admin1 data |

The default feature codes exclude `PPL` which can include neighbourhood-like
populated places. The schema is defined in [`scripts/schema.sql`](scripts/schema.sql).

### Generating a Boundary Index

Build boundary-aware reverse lookup tables from a polygon source (GeoJSON
FeatureCollection/Feature or newline-delimited GeoJSON):

```bash
node scripts/generate_boundary_index.js \
--database data/geocoder.sqlite \
--input data/localities.geojson \
--index-mode compact \
--include-region true \
--min-population 10000 \
--base-precision 4 \
--max-precision 7
```

You can also run `npm run build:boundary -- --database ... --input ...`.

You can point the builder directly at directories of WOF GeoJSON files:

```bash
node scripts/generate_boundary_index.js \
--database data/geocoder.sqlite \
--input-dir tmp/wof-build/extracted/fr/.../data \
--index-mode compact \
--include-region true \
--min-population 10000 \
--base-precision 4 \
--max-precision 7 \
--drop-contained-localities true
```

`--drop-contained-localities true` removes `locality` polygons that are fully
contained in larger localities within the same country/admin1 group. This is
intended to suppress duplicate neighbourhood-like localities while keeping
small isolated places (for example islands) that are not contained.

#### Place selection pipeline

The builder uses a multi-stage pipeline to decide which localities make it into the index:

1. **Primary filter** (`--min-population`): localities at or above this threshold are always included. Country capitals are always included regardless of population.
2. **Isolation pass** (`--isolation-min-population`): localities between the isolation floor and the primary threshold are evaluated as candidates. A candidate is promoted if at least one of its geohash cover cells (at base precision) is not already claimed by a primary locality. This ensures small but geographically isolated places like islands, remote towns, and oases get their own label without adding noise in dense urban areas.
3. **Country guarantee** (`--ensure-country-locality`): after the isolation pass, any country that still has zero localities gets its highest-population candidate promoted unconditionally.
4. **Contained-locality pruning** (`--drop-contained-localities`): removes localities whose polygon is fully contained inside a larger locality in the same country/admin1 group.
5. **Dominant-city rollup**: in the geohash index, when a major city (population >= `--dominant-locality-population`) dominates its neighbours by a ratio of `--dominant-locality-ratio`, smaller nearby localities are absorbed into the major city label.
6. **Locality-over-region promotion**: when a locality and a region compete for the same parent geohash cell, the locality wins if it covers >= `--parent-locality-min-share` of child cells.

Builder notes:

- Keeps current records only (drops deprecated/superseded where source metadata is present)
- Includes `locality` placetypes by default (`localadmin` optional via `--include-localadmin true`)
- Optional `region` fallback polygons via `--include-region true`
- `--min-population` applies to `locality` only, so low-pop localities can roll up to broader admin areas when `region` is included
- Point-only capital localities are retained (single-cell locality fallback) so country/admin capitals are not dropped by polygon-only filtering
- Per-placetype precision caps are supported:
- `--locality-max-precision`
- `--localadmin-max-precision`
- `--region-max-precision`
- `--region-sparse-max-precision` + `--region-sparse-min-area-km2` for very large sparse regions (for example geohash-3 in Amazon-like interiors)
- `--promote-locality-over-region` (default `true`) prefers locality labels in shared parent cells when there is no competing locality (keeps city labels sticky against region-only outskirts)
- Dominant-city rollup keeps broad city labels sticky in mixed city/suburb cells unless there is competing major-city pressure:
- `--dominant-locality-population` (default `100000`)
- `--dominant-locality-ratio` (default `3`)
- Parent-cell takeover guard:
- `--parent-locality-min-share` (default `0.5`) requires locality ownership of at least that child-cell share before replacing a parent cell label
- Excludes neighbourhood-like placetypes from default reverse output
- `--index-mode compact` (default) stores only geohash-to-place mappings (`compact_geohash_lookup`) and no runtime geometry payloads.
Compact schema uses `compact_places(id,name,country_id,admin1_id,placetype_code,latitude,longitude)`.
- `--index-mode full` stores geohash cover + geometry for runtime point-in-polygon

### Building From Who's On First (WOF)

Use the WOF helper script to download country admin repos and build in one step:

```bash
WOF_COUNTRIES=FR,IT \
WOF_BASE_PRECISION=4 \
WOF_MAX_PRECISION=5 \
WOF_INCLUDE_REGION=1 \
WOF_MIN_POPULATION=10000 \
./scripts/generate_wof_boundary.sh data/geocoder.sqlite
```

Equivalent npm script:

```bash
npm run build:wof -- data/geocoder.sqlite
```

Useful WOF build env vars:

- `WOF_COUNTRIES` comma-separated country codes (default `FR,IT`)
- `WOF_WORKDIR` working directory for downloads/extracted files (default `tmp/wof-build`)
- `WOF_DOWNLOAD=0` reuse existing archives only
- `WOF_REF` branch/ref to download (default `master`)
- `WOF_REF_LOCK_FILE` optional per-country pinned refs (`<iso2> <ref>` per line); when set, this overrides `WOF_REF` per country
- `WOF_LOCALITY_MAX_PRECISION` locality precision cap
- `WOF_REGION_MAX_PRECISION` region precision cap (default `4`)
- `WOF_REGION_SPARSE_MAX_PRECISION` sparse very-large-region precision (default `3`)
- `WOF_REGION_SPARSE_MIN_AREA_KM2` area threshold for sparse region precision (default `80000`)
- `WOF_PROMOTE_LOCALITY_OVER_REGION=1|0` prefer locality labels over region in shared parent cells (default `1`)
- `WOF_DOMINANT_LOCALITY_POPULATION` major-locality threshold for dominant-city rollup (default `100000`)
- `WOF_DOMINANT_LOCALITY_RATIO` dominant-vs-next locality population ratio (default `3`)
- `WOF_PARENT_LOCALITY_MIN_SHARE` minimum child-cell share for locality parent takeover (default `0.5`)
- `WOF_GEOMETRY_DECIMALS` round coordinates before storage/indexing (for example `4`)
- `WOF_MIN_POPULATION` filter out places below threshold (for example `10000`)
- `WOF_ISOLATION_MIN_POPULATION` lower population floor for isolated localities (default `500`). Places between this and `WOF_MIN_POPULATION` are included only if they occupy otherwise-empty geohash cells
- `WOF_ENSURE_COUNTRY_LOCALITY=1|0` guarantee at least one locality per country (default `1`)
- `WOF_INCLUDE_REGION=1|0` include/exclude region fallback boundaries
- `WOF_MAX_PLACES` cap places for experiment runs
- `WOF_DROP_CONTAINED_LOCALITIES=1|0` enable/disable contained-locality pruning
- `WOF_SKIP_INVALID_REPOS=1|0` skip malformed/unexpected WOF admin repos during bulk runs (default `1`)
- `WOF_APPEND=1|0` append to an existing compact DB instead of replacing schema (default `0`)

Boundary runtime modes:

- `reverseMode: 'centroid'` (default): legacy nearest-centroid reverse lookup
- `reverseMode: 'boundary'`: boundary tables lookup.
- Uses compact `compact_geohash_lookup` when present (fast geohash-to-place).
- Falls back to full polygon-aware tables when compact rows are absent.

### External Reverse Validation (LocationIQ)

Use this script to compare local reverse results against LocationIQ at sampled
coordinates, with persistent SQLite caching so requests are not repeated:

```bash
LOCATIONIQ_API_KEY=... node scripts/validate_with_locationiq.js \
--database tmp/wof-fr-it-compact-p5-d3-pop10k-region.sqlite \
--samples 300 \
--export-csv tmp/locationiq-validation-fr-it.csv
```

It creates/updates:

- `sample_points` (coordinates sampled from your geohash table)
- `locationiq_cache` (raw LocationIQ responses keyed by coordinate)
- `validation_results` (local vs LocationIQ comparison verdicts)

Cache DB path is automatic (default behavior): `tmp/locationiq-validation-<database-basename>.sqlite`.

## License

This library is licensed under [the MIT license](https://github.com/lucaspiller/offline-geocoder/blob/master/LICENSE).
Expand Down
21 changes: 20 additions & 1 deletion bin/geocoder
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,26 @@

"use strict";

const geocoder = require('../src/index.js')()
function parseOptionalNumber(value) {
if (value === undefined) return undefined
var parsed = Number(value)
return Number.isFinite(parsed) ? parsed : undefined
}

var options = {}
if (process.env.GEOCODER_REVERSE_MODE) {
options.reverseMode = process.env.GEOCODER_REVERSE_MODE
}

var boundaryBase = parseOptionalNumber(process.env.GEOCODER_BOUNDARY_BASE_PRECISION)
var boundaryMax = parseOptionalNumber(process.env.GEOCODER_BOUNDARY_MAX_PRECISION)
if (boundaryBase !== undefined || boundaryMax !== undefined) {
options.boundary = {}
if (boundaryBase !== undefined) options.boundary.basePrecision = boundaryBase
if (boundaryMax !== undefined) options.boundary.maxPrecision = boundaryMax
}

const geocoder = require('../src/index.js')(options)
const args = process.argv.slice(2)

if (args.length != 2) {
Expand Down
21 changes: 20 additions & 1 deletion bin/geocoder-bench
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,26 @@

"use strict";

const geocoder = require('../src/index.js')()
function parseOptionalNumber(value) {
if (value === undefined) return undefined
var parsed = Number(value)
return Number.isFinite(parsed) ? parsed : undefined
}

var options = {}
if (process.env.GEOCODER_REVERSE_MODE) {
options.reverseMode = process.env.GEOCODER_REVERSE_MODE
}

var boundaryBase = parseOptionalNumber(process.env.GEOCODER_BOUNDARY_BASE_PRECISION)
var boundaryMax = parseOptionalNumber(process.env.GEOCODER_BOUNDARY_MAX_PRECISION)
if (boundaryBase !== undefined || boundaryMax !== undefined) {
options.boundary = {}
if (boundaryBase !== undefined) options.boundary.basePrecision = boundaryBase
if (boundaryMax !== undefined) options.boundary.maxPrecision = boundaryMax
}

const geocoder = require('../src/index.js')(options)
const args = process.argv.slice(2)

if (args.length != 2) {
Expand Down
4 changes: 4 additions & 0 deletions bin/geocoder-build-boundary
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env node
"use strict";

require('../scripts/generate_boundary_index')
14 changes: 14 additions & 0 deletions bin/geocoder-build-wof
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/usr/bin/env node
"use strict";

const { spawnSync } = require('child_process')
const path = require('path')

const script = path.join(__dirname, '..', 'scripts', 'generate_wof_boundary.sh')
const args = process.argv.slice(2)

const result = spawnSync(script, args, { stdio: 'inherit' })
if (result.error) {
throw result.error
}
process.exit(result.status === null ? 1 : result.status)
Loading