An R package for survey research data processing. Works with exported CSVs from Qualtrics or SurveyMonkey -- no API required to get started. API integration is available for automated downloads and codebook generation.
- ⚡ Faster project setup -- From hours to minutes with scaffolded project structure and starter templates
- 📋 Consistent workflows -- Standardized processes across team members, with easy-reference config files
- 📦 Works with CSV exports -- No API access required; just export from your survey platform and load
- 🔄 End-to-end processing -- Complete pipeline from raw export to analysis-ready dataset
- 📖 Intelligent codebook system -- Self-documenting data processing with institutional memory through the maximal codebook library
- 📊 Standardized instrument scoring -- Automated scoring for validated scales with subscale support and reverse coding
- ✅ Quality control built in -- Attention checks, straightlining detection, duration filtering, and response validation
- 📈 HTML quality dashboard -- QC summaries, demographic plots, and scored scale distributions at a glance
- 🔌 Multi-platform support -- Qualtrics and SurveyMonkey, with API integration for both
- 🧩 Modular architecture -- Reusable functions following tidyverse conventions; use the full pipeline or individual steps
- 🔐 Secure data handling -- Token-based authentication for API access, credentials stored outside the repo
- Two Ways to Use It
- Getting Started
- Codebook Generation
- The Codebook System
- Processing Pipeline
- Function Reference
- Platform Setup Guides
Set up a config/parameters.R with your study settings, then process everything at once. Best for routine studies where the standard QC + scoring steps apply.
library(surveyUtils)
source("config/parameters.R")
df_raw <- read_csv(file.path(params$raw_dir, params$raw_filename))
codebook <- load_codebook(params$codebook_path)
result <- process_survey_data(df_raw, codebook, config = params)
generate_dashboard(params)Call each function directly for full control. Best for custom analyses, exploratory work, or when you only need specific steps.
library(surveyUtils)
df <- read_csv("data/raw/my_survey.csv")
codebook <- load_codebook("config/codebook.csv")
df <- filter_dates(df, start_date_col = "StartDate", min_date = "2026-01-15")
attn_checks <- get_attention_checks(codebook)
df <- score_individual_attention_checks(df, attn_checks)
df <- calculate_attention_score(df, attn_checks)
df <- convert_response_text_to_codes(df, codebook)
df <- convert_to_numeric_from_codebook(df, codebook)
df <- score_surveys(df, codebook, get_available_surveys(codebook))
write_csv(df, "data/processed/my_survey_scored.csv")Both approaches use the same underlying functions and the same codebook format.
# install.packages("devtools")
devtools::install_github("sashasomms/surveyUtils")library(surveyUtils)
setup_new_survey_project("My Study")This creates a project directory with config/, data/raw/, data/processed/, data/results/, and starter scripts at the project root. Output directories for dashboards, QC reports, etc. are created automatically when the pipeline writes to them.
Copy config/parameters_template.R to config/parameters.R and customize. It calls survey_config() with your study-specific settings -- only specify what differs from the defaults:
# config/parameters.R
library(surveyUtils)
params <- survey_config(
survey_name = "My Study",
platform = "qualtrics",
codebook_path = "config/codebook.csv",
raw_filename = "my_survey.csv",
min_date = "2026-01-15",
duration_threshold_method = "absolute",
duration_lower_threshold = 3,
attention_threshold = 0.8,
# Dashboard plot specs
demo_hist_vars = list(demo_age = "Age"),
demo_waffle_vars = list(demo_gender = "Gender"),
demo_bar_vars = list(demo_race = list(label = "Race", sort_by_count = TRUE))
)See ?survey_config for all available options and their defaults.
A self-contained example project is bundled with synthetic data, a codebook, and a config file:
source(system.file("examples", "run_example.R", package = "surveyUtils"))surveyUtils generates codebook skeletons through three enrichment tiers that stack together:
df_raw <- read_csv("data/raw/my_survey.csv")
skeleton <- build_codebook_skeleton(df_raw)
write_csv(skeleton, "config/codebook.csv")
# Fill in: scale_abbrev, response_choices, response_coding, etc.With API access, surveyUtils pulls question text, response options, and coding directly from the platform:
# Qualtrics
load_qualtrics_credentials("../qualtrics_secrets.csv")
survey_id <- get_qualtrics_survey_id("My Study")
codebook <- build_qualtrics_codebook_skeleton(survey_id)
# SurveyMonkey
token <- load_sm_token("../.survey_monkey_secrets.csv")
codebook <- generate_enhanced_codebook_template(token, survey_name = "My Study",
output_path = "config/")The maximal codebook is an institutional knowledge base of known survey instruments. surveyUtils matches your questions against this library using fuzzy text matching -- automatically filling in scale_abbrev, coding_direction, and scoring info for recognized instruments.
codebook <- generate_qualtrics_codebook(
survey_id,
survey_name = "My Study",
output_dir = "config/",
maximal_codebook_path = "path/to/maximal_codebook.csv"
)Tiers stack: API + maximal codebook gives the richest result. The 00_generate_codebook.R template handles all three paths. See Qualtrics API Setup and SurveyMonkey API Setup for credential configuration.
The codebook is the central organizing principle -- it serves as both documentation (complete metadata for every variable) and processing configuration (machine-readable instructions for cleaning, recoding, and scoring).
| Column | Description |
|---|---|
col_num |
Column number for ordering |
question_text |
Full question text as presented to participants |
variable_name |
Variable name in the raw data |
short_variable_name |
Concise analysis-friendly variable name |
cat |
Variable category: demo, attention_check, survey_items, computed_scores |
scale_abbrev |
Short scale name (phq, gad, pss, etc.) |
scale_full |
Complete instrument name |
subscale |
Subscale designation for multi-factor instruments |
coding_direction |
1 = forward, -1 = reverse |
min / max |
Valid response range |
response_choices |
Semicolon-separated response options |
response_coding |
Semicolon-separated numeric codes matching response_choices |
response_format |
Format for conversion (numeric, text, etc.) |
correct_response |
Correct answer for attention check items |
Each new study can contribute back to the library via update_maximal_codebook(), so matching improves over time.
Whether called individually or via process_survey_data(), the standard pipeline is:
| Step | Function | What it does |
|---|---|---|
| 1 | clean_names() |
Standardize column names, strip HTML |
| 2 | apply_short_names() |
Rename columns using codebook short names |
| 3 | process_double_headers() |
Handle SurveyMonkey double-header format |
| 4 | create_multiselect_summary_cols() |
Summarize multi-select questions |
| 5 | filter_dates() |
Remove test/QA responses outside date range |
| 6 | process_survey_duration() |
Calculate and flag completion times |
| 7 | process_attention_checks() |
Score and flag inattentive responders |
| 8 | convert_response_text_to_codes() |
Map text responses to numeric codes |
| 9 | convert_to_numeric_from_codebook() |
Convert columns to numeric type |
| 10 | detect_straightlining() |
Flag identical responses across scales |
| 11 | reverse_code_from_codebook() |
Reverse-code items marked in codebook |
| 12 | score_surveys() |
Score instruments with subscales and prorating |
QC steps (6, 7, 10) flag but don't remove rows by default. Removal is controlled by config settings (remove_attention_fails, remove_duration_outliers, remove_straightliners) and applied together after all flags are set.
survey_config()-- Create a validated config object for the pipelineis_survey_config()-- Check if an object is a survey_config
process_double_headers()-- Handle SurveyMonkey double-header formatapply_short_names()-- Rename columns using codebook mappingscreate_multiselect_summary_cols()-- Create summary columns for multi-select questions
Date Filtering (qc_date_filters.R)
parse_date_columns()-- Auto-detect and parse date formatsfilter_dates()-- Remove responses outside date range
Duration (qc_duration.R)
calculate_survey_duration()-- Compute completion timefilter_duration()-- Flag/remove duration outliers (mean+SD, percentile, or absolute)process_survey_duration()-- Combined duration pipeline
Attention Checks (qc_attention_checks.R)
get_attention_checks()-- Extract attention check items from codebookscore_individual_attention_checks()-- Score each attention itemcalculate_attention_score()-- Compute overall attention scoreprocess_attention_checks()-- Combined attention check pipeline
Straightlining (qc_straightlining.R)
detect_straightlining()-- Flag identical responses within scales
convert_response_text_to_codes()-- Map text to numeric using codebookconvert_to_numeric_from_codebook()-- Type conversion with validationreverse_code_from_codebook()-- Reverse-code items per codebook
var_score()-- Core scoring with prorating and missing data handlingscore_surveys()-- Batch score all instruments with subscalesget_available_surveys()-- List scoreable instruments in codebook
build_codebook_skeleton()-- Build codebook skeleton from data column namesload_codebook()-- Load and validate codebook CSVload_maximal_codebook()-- Load institutional knowledge basegenerate_enhanced_codebook_template()-- Generate codebook from SurveyMonkey APIbuild_qualtrics_codebook_skeleton()-- Generate codebook from Qualtrics APIgenerate_qualtrics_codebook()-- Generate + enrich codebook from Qualtrics APIupdate_maximal_codebook()-- Add new items to knowledge base
load_sm_token()-- Load OAuth tokenprocess_survey_responses()-- Download and flatten survey dataflatten_survey_responses()-- Transform nested API responses to flat data frame
load_qualtrics_credentials()-- Load API credentialsget_qualtrics_survey_id()-- Look up survey ID by namefetch_qualtrics_questions()-- Retrieve survey metadata
process_survey_data()-- Full processing pipeline (accepts config object)complete_processing_pipeline()-- Pipeline wrapper for params-based workflowcomplete_survey_workflow()-- End-to-end: download, generate codebook, processsetup_new_survey_project()-- Scaffold a new study project directory
generate_dashboard()-- Render HTML quality dashboardcreate_variable_summary()-- Export variable specs for dashboard
make_hist()-- Histograms with mean/SD annotationsplot_horizontal_bars()-- Horizontal bar charts with percentagesplot_waffle()-- Waffle charts for categorical distributionsplot_state_heatmap()-- US state choropleth maps
- Qualtrics API Setup -- API key, base URL, credential file format
- SurveyMonkey API Setup -- Developer app, OAuth token setup