This repository provides code to aggregate the 2021/22 Output Area Classification (OAC) from the lowest geographical level (Output Areas) to:
- 2021 Middle Layer Super Output Areas (MSOA) in England & Wales
- 2022 Intermediate Zones (IZ) in Scotland
This code aggregates OAC classifications from UK Census Output Areas up to their respective mid-level geographies (MSOA/IZ). It then selects the dominant OAC Subgroup within each mid-level geography based on the largest total population contribution, effectively classifying each MSOA or IZ by the most populous Subgroup within it.
The workflow is as follows:
- Merge lookup tables of Output Areas to MSOA/IZ.
- Import OAC classifications for all Output Areas.
- Join total population counts.
- Aggregate OAC Subgroups to the mid-level geography.
- Generate a final aggregated classification.
- Compare original OA-level classification with aggregated classification using an alluvial plot.
- Export final outputs as CSV, Parquet, and GeoPackage files.
The script uses the following R packages:
- tidyverse
- arrow
- magrittr
- sf
- ggalluvial
Make sure these packages are installed before running the script:
install.packages(c("tidyverse", "magrittr", "sf", "ggalluvial"))
# For 'arrow', install from CRAN or the appropriate binary source
install.packages("arrow")-
Lookup: OA to MSOA (England & Wales)
- Retrieved via ArcGIS REST API: OA_LSOA_MSOA_EW_DEC_2021_LU_v3
-
Lookup: OA to IZ (Scotland)
- Downloaded from NRS Scotland: OA22_DZ22_IZ22.zip
-
OAC Input
- Parquet file:
./data/UK_OAC_Final.parquet
- Parquet file:
-
Total Population Data
- England & Wales: ts001.parquet (GitHub)
- Scotland: UV101b.parquet (GitHub)
-
Geographical Boundaries
- MSOA/IZ boundaries in GeoPackage format:
./data/MSOA_IZ.gpkg
- MSOA/IZ boundaries in GeoPackage format:
- Clone or download this repository.
- Place the required data files in the correct directories, as indicated in the script (e.g.,
./data/UK_OAC_Final.parquet,./data/MSOA_IZ.gpkg). - Install the required R packages (see Requirements).
- Open the R script (or copy-paste it into an R environment).
- Run the script from start to finish.
The script will read data from the specified sources, perform the aggregation, and produce outputs including CSV, Parquet, GeoPackages, and a comparison plot.
The script generates several key outputs:
./data/GB_OA_Lookup.parquet: Combined lookup table for Great Britain Output Areas to MSOA/IZ./data/MSOA_IZ_Lookup.csvand./data/MSOA_IZ_Lookup.parquet: Final aggregated OAC classifications for MSOA/IZ./data/MSOA_IZ_SF_Counts.gpkg: Spatial data with OAC classifications and diversity counts./plot/Comparison.png: Alluvial plot showing classification flows
The script produces an alluvial plot (Comparison.png) showing how Supergroups at the Output Area level flow into the aggregated Supergroups at the MSOA/IZ level.
- Left Axis: Original OA-level Supergroups
- Right Axis: Aggregated MSOA/IZ Supergroups
This helps visualize the degree of alignment or shifts in classification that occur during aggregation.
Several tables and data frames show how many distinct Subgroups, Groups, and Supergroups are contained within each MSOA or IZ. This reveals how homogeneous or diverse each mid-level geography is in terms of OAC classes.
Compare_Subgroup,Compare_Group, andCompare_Supergroup- Show distributions of how many different classes exist per MSOA/IZ.
n_all- Merges the counts of distinct Supergroups, Groups, and Subgroups.
These outputs are joined to the spatial data frame and written to MSOA_IZ_SF_Counts.gpkg.
The analysis uses 8 OAC Supergroups with descriptive labels:
- Retired Professionals
- Suburbanites & Peri-Urbanites
- Multicultural & Educated Urbanites
- Low-Skilled Migrant & Student Communities
- Ethnically Diverse Suburban Professionals
- Baseline UK
- Semi & Un-Skilled Workforce
- Legacy Communities
- ONS Output Area Classification: Data provided under the Open Government Licence.
- Geography Boundaries: Sourced from ONS and NRS Scotland.
- Code Contributors: @alexsingleton, Geographic Data Service.
For any questions or issues, please open a GitHub issue or reach out to the authors.