Skip to content

BCC-PHM/onsdecodeR

Repository files navigation

onsdecodeR

geodecodeR is an R package for decoding UK ONS geography codes into human-readable names.

It is designed for public health, population health, and geographic analysis workflows where datasets contain ONS geography codes (e.g. LSOA, MSOA, wards) and you need fast, consistent name lookups.

The package ships with an internal ONS codelist and performs lazy, cached lookups, so repeated decoding is fast.


Installation

From GitHub

# install.packages("devtools")
devtools::install_github("BCC-PHM/onsdecodeR")

What the package does

geodecodeR:

  • Detects ONS geography code columns automatically
  • Matches codes based on their standard prefixes (e.g. E01, E02, E05)
  • Adds human-readable name columns (e.g. lsoa_name)
  • Uses a cached lookup so performance stays fast for large datasets

The codelist is stored inside the package at inst/extdata/codelist.xlsx and is loaded only when needed.


Basic usage

library(geodecodeR)

df <- data.frame(
  lsoa = c("E01000001", "E01000002"),
  stringsAsFactors = FALSE
)

add_geography_names(df)

Output

        lsoa        lsoa_name
1 E01000001 City of London 001A
2 E01000002 City of London 001B

Multiple geography columns

If your data contains more than one geography code column, add_geography_names() will attempt to decode each column independently.

df <- data.frame(
  lsoa = c("E01000001", "E01000002"),
  ward = c("E05000001", "E05000002"),
  stringsAsFactors = FALSE
)

add_geography_names(df)

If a matching lookup exists, a corresponding *_name column is added.


Important limitation

Each column passed to add_geography_names() must contain only one type of geography code.

If a single column contains multiple different geography types (for example, a mix of LSOA and MSOA codes), the function will not be able to reliably detect the correct lookup and names will not be added for that column.

In such cases, geography codes should be separated into distinct columns before decoding.


Behaviour when a lookup is unavailable

If a column contains codes for which no codelist is available:

  • the original column is left unchanged
  • no *_name column is added

This allows the function to be safely used on mixed or partially coded datasets.


Performance notes

  • Lookups are lazy-loaded (the codelist is not read until first use)
  • Lookups are cached in memory for the R session
  • Decoding uses vectorised matching rather than repeated joins

This makes the function suitable for large datasets.


Included data

The package includes an internal Excel workbook containing ONS geography lookups:

inst/extdata/codelist.xlsx

Each worksheet corresponds to a geography type and is keyed by the standard ONS geography code.


Development status

This package is under active development.
The API is stable, but additional helpers and geography utilities may be added.


Bug reports and feature requests

Please report issues or suggestions via GitHub:

https://github.com/BCC-PHM/onsdecodeR/issues

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

Generated from BCC-PHM/BCCproject