Analyses to identify climate analogs and implement climate-analog impact models
This repository contains dual implementations (R and Julia) for identifying climate analogs using Mahalanobis distance and applying them to predict vegetation shifts under climate change.
- Overview
- Project Workflow
- File Structure
- Key Variables & Concepts
- Core Functions
- Dependencies
- Usage Examples
Reverse climate analogs are geographic locations that currently experience climate conditions similar to what a focal location will experience under future climate change. Forward analogs are the opposite. This project:
- Calculates climate dissimilarity from a focal location and its future climate to the contemporary climate mean of surrounding analog candidates using Mahalanobis distance (MD) transformed to sigma (σ) values
- Identifies the best analogs from a contemporary climate pool
- Predicts future impacts (e.g., vegetation shifts) by extracting vegetation at analog locations
focal_data_cov- Stack of future climate data (annual observations) used to calculate covariance matrix for each focal location. Right now this should be a list/vector of dataframes.focal_data_mean- Future climate normals at focal locations as a dataframeanalog_data/analog_pool- Contemporary climate normals at focal locations as a dataframevar_names- Climate variable names (e.g., tmax, tmin)
x,y- Geographic coordinates (longitude, latitude) in decimal degreesf_x,f_y- Focal point coordinatesa_x,a_y- Analog point coordinates
md- Mahalanobis Distance (squared multivariate distance measure)sigma(σ) - Climate dissimilarity metric (0-∞ scale, derived from chi-squared distribution)- Lower σ = better analog
- σ < 2.0 typically considered good match
dist_km- Geographic distance between focal and analog points (kilometers)cov_i- Covariance matrix calculated from future climate annuals
n_analog_pool- Size of randomly sampled analog pool (e.g., 1000-10000)n_analog_use- Number of best analogs to retain (e.g., 100-1000)min_dist- Minimum geographic distance filter (km) - excludes nearby pointsmax_dist- Maximum search radius (km or Inf for unlimited)
climate_analogs.RORclimate_analogs.jl- These must be run withsource()orinclude()(respectively) to load all the functions necessary for calculating climate analogsfind_analogs()- Function for computing climate analogs. This should be the only function you need to run to calculate climate analogs
calculate_analogs()- Takes an input pixel and computes it’s climate analogscalculate_analogs_distributed()- Runs calculate_analogs() distributed over multiple cores
calc_mahalanobis()- Computes Mahalanobis distance from focal to analog poolcalc_sigma()- Converts Mahalanobis distance to sigma (chi-squared transform)great_circle_distance()- Calculates geographic distance using Haversine formula
max_distance_coordinates()- Creates bounding box at max_dist radius from focal pointcreate_bitVector()- Fast spatial filter to subset analog pool within bounding box
sample_analogs()- Random sample of n_analog_pool from filtered dataspatial_partition()- Tiles study area for memory-efficient processingcheck_memory()- Validates available memory vs. required memory
setup_veg_prediction_bps()- Takes outputs of the primary analysis, a study area border, a template raster, and the BPS raster to create a clean dataset of every focal pixel, each analog for each focal pixel, and the vegetation group at that pixel.calculate_top_sigma()- Tallies sigma weighted votes and chooses the vegetation group with the most votes.build_accuracy_stats()- Used in the validation process to take vegetation predictions using contemporary normals and annuals to calculate alignment with actual BPS.rasterize_predicted_veg()- Converts predictions to categorical rasters
Core Data Manipulation:
data.table- High-performance data framesdplyr- Data wranglingpurrr- Functional programming tools
Spatial/GIS:
terra- Raster and vector spatial datasf- Simple features for vector data
Parallel Processing:
future- Parallel execution backendfurrr- Future-based apply functionsprogressr- Progress bars
Statistics:
caret- Classification and accuracy metricsscales- Data rescaling
Utilities:
tidyr- Data reshapingtictoc- Timing/benchmarking
Core Data:
DataFrames- Tabular data structuresDataFramesMeta- DataFrame macrosCSV- CSV file I/OCodecZlib- Compression support
Statistics & Math:
Distributions- Chi-squared distribution functionsDistances- Mahalanobis distance calculationsStatistics- Basic statistical functionsStatsBase- Extended statisticsLinearAlgebra- Covariance, matrix operations
Performance:
Base.Threads- Multi-threadingDistributed- Distributed computingThreadPools- Advanced threading (@bthreads)Suppressor- Suppress warning output
Utilities:
ProgressMeter- Progress trackingRandom- Random sampling
code/reverse_analogs/tile_script.jl (or .R) is where primary analysis occured
(variables were changed as needed for future and
contemporary analog predictions). This is currently a script that takes a tile_id as input.
Please refer to the Julia CLI manual
or the RScript manual for other flags, such as setting the
number of cores and threads for analysis. It is easy to edit these scripts to run interactively, simply replace any call
of ARGS or commandArgs() with the tile number or name of your input identifier. Within the tile script, it uses that tile ID and hard coded paths
to obtain rds files created by code/reverse_analogs/create_wna_pool_reverse_tiles.R.
Those rds files are then slightly prepped, and other variables such as the proportion of the landscape to sample,
the maximum distance to sample analogs, among others, can be set.
The find_analogs() function then computes the climate analogs and writes the outputs to your output_dir
named output_file as a gzipped CSV (file extension is handled for you).
Key Algorithm: Adapted from Mahony et al. 2017, Global Change Biology: https://doi.org/10.1111/gcb.13645