surveyUtils

An R package for survey research data processing. Works with exported CSVs from Qualtrics or SurveyMonkey -- no API required to get started. API integration is available for automated downloads and codebook generation.

What You Get

⚡ Faster project setup -- From hours to minutes with scaffolded project structure and starter templates
📋 Consistent workflows -- Standardized processes across team members, with easy-reference config files
📦 Works with CSV exports -- No API access required; just export from your survey platform and load
🔄 End-to-end processing -- Complete pipeline from raw export to analysis-ready dataset
📖 Intelligent codebook system -- Self-documenting data processing with institutional memory through the maximal codebook library
📊 Standardized instrument scoring -- Automated scoring for validated scales with subscale support and reverse coding
✅ Quality control built in -- Attention checks, straightlining detection, duration filtering, and response validation
📈 HTML quality dashboard -- QC summaries, demographic plots, and scored scale distributions at a glance
🔌 Multi-platform support -- Qualtrics and SurveyMonkey, with API integration for both
🧩 Modular architecture -- Reusable functions following tidyverse conventions; use the full pipeline or individual steps
🔐 Secure data handling -- Token-based authentication for API access, credentials stored outside the repo

Two Ways to Use It

Full pipeline -- one call, config-driven

Set up a config/parameters.R with your study settings, then process everything at once. Best for routine studies where the standard QC + scoring steps apply.

library(surveyUtils)
source("config/parameters.R")

df_raw   <- read_csv(file.path(params$raw_dir, params$raw_filename))
codebook <- load_codebook(params$codebook_path)

result <- process_survey_data(df_raw, codebook, config = params)
generate_dashboard(params)

Individual functions -- step by step

Call each function directly for full control. Best for custom analyses, exploratory work, or when you only need specific steps.

library(surveyUtils)

df <- read_csv("data/raw/my_survey.csv")
codebook <- load_codebook("config/codebook.csv")

df <- filter_dates(df, start_date_col = "StartDate", min_date = "2026-01-15")
attn_checks <- get_attention_checks(codebook)
df <- score_individual_attention_checks(df, attn_checks)
df <- calculate_attention_score(df, attn_checks)

df <- convert_response_text_to_codes(df, codebook)
df <- convert_to_numeric_from_codebook(df, codebook)
df <- score_surveys(df, codebook, get_available_surveys(codebook))

write_csv(df, "data/processed/my_survey_scored.csv")

Both approaches use the same underlying functions and the same codebook format.

Getting Started

Install

# install.packages("devtools")
devtools::install_github("sashasomms/surveyUtils")

Scaffold a new project

library(surveyUtils)
setup_new_survey_project("My Study")

This creates a project directory with config/, data/raw/, data/processed/, data/results/, and starter scripts at the project root. Output directories for dashboards, QC reports, etc. are created automatically when the pipeline writes to them.

Configure your study

Copy config/parameters_template.R to config/parameters.R and customize. It calls survey_config() with your study-specific settings -- only specify what differs from the defaults:

# config/parameters.R
library(surveyUtils)

params <- survey_config(
  survey_name   = "My Study",
  platform      = "qualtrics",
  codebook_path = "config/codebook.csv",
  raw_filename  = "my_survey.csv",
  min_date      = "2026-01-15",
  duration_threshold_method = "absolute",
  duration_lower_threshold  = 3,
  attention_threshold       = 0.8,

  # Dashboard plot specs
  demo_hist_vars   = list(demo_age = "Age"),
  demo_waffle_vars = list(demo_gender = "Gender"),
  demo_bar_vars    = list(demo_race = list(label = "Race", sort_by_count = TRUE))
)

See ?survey_config for all available options and their defaults.

Run the example

A self-contained example project is bundled with synthetic data, a codebook, and a config file:

source(system.file("examples", "run_example.R", package = "surveyUtils"))

Codebook Generation

surveyUtils generates codebook skeletons through three enrichment tiers that stack together:

Tier 1: From your data (no API needed)

df_raw <- read_csv("data/raw/my_survey.csv")
skeleton <- build_codebook_skeleton(df_raw)
write_csv(skeleton, "config/codebook.csv")
# Fill in: scale_abbrev, response_choices, response_coding, etc.

Tier 2: From API metadata

With API access, surveyUtils pulls question text, response options, and coding directly from the platform:

# Qualtrics
load_qualtrics_credentials("../qualtrics_secrets.csv")
survey_id <- get_qualtrics_survey_id("My Study")
codebook <- build_qualtrics_codebook_skeleton(survey_id)

# SurveyMonkey
token <- load_sm_token("../.survey_monkey_secrets.csv")
codebook <- generate_enhanced_codebook_template(token, survey_name = "My Study",
                                                 output_path = "config/")

Tier 3: Maximal codebook matching

The maximal codebook is an institutional knowledge base of known survey instruments. surveyUtils matches your questions against this library using fuzzy text matching -- automatically filling in scale_abbrev, coding_direction, and scoring info for recognized instruments.

codebook <- generate_qualtrics_codebook(
  survey_id,
  survey_name           = "My Study",
  output_dir            = "config/",
  maximal_codebook_path = "path/to/maximal_codebook.csv"
)

Tiers stack: API + maximal codebook gives the richest result. The 00_generate_codebook.R template handles all three paths. See Qualtrics API Setup and SurveyMonkey API Setup for credential configuration.

The Codebook System

The codebook is the central organizing principle -- it serves as both documentation (complete metadata for every variable) and processing configuration (machine-readable instructions for cleaning, recoding, and scoring).

Codebook Structure

Column	Description
`col_num`	Column number for ordering
`question_text`	Full question text as presented to participants
`variable_name`	Variable name in the raw data
`short_variable_name`	Concise analysis-friendly variable name
`cat`	Variable category: `demo`, `attention_check`, `survey_items`, `computed_scores`
`scale_abbrev`	Short scale name (`phq`, `gad`, `pss`, etc.)
`scale_full`	Complete instrument name
`subscale`	Subscale designation for multi-factor instruments
`coding_direction`	`1` = forward, `-1` = reverse
`min` / `max`	Valid response range
`response_choices`	Semicolon-separated response options
`response_coding`	Semicolon-separated numeric codes matching response_choices
`response_format`	Format for conversion (`numeric`, `text`, etc.)
`correct_response`	Correct answer for attention check items

Maximal Codebook Library

Each new study can contribute back to the library via update_maximal_codebook(), so matching improves over time.

Processing Pipeline

Whether called individually or via process_survey_data(), the standard pipeline is:

Step	Function	What it does
1	`clean_names()`	Standardize column names, strip HTML
2	`apply_short_names()`	Rename columns using codebook short names
3	`process_double_headers()`	Handle SurveyMonkey double-header format
4	`create_multiselect_summary_cols()`	Summarize multi-select questions
5	`filter_dates()`	Remove test/QA responses outside date range
6	`process_survey_duration()`	Calculate and flag completion times
7	`process_attention_checks()`	Score and flag inattentive responders
8	`convert_response_text_to_codes()`	Map text responses to numeric codes
9	`convert_to_numeric_from_codebook()`	Convert columns to numeric type
10	`detect_straightlining()`	Flag identical responses across scales
11	`reverse_code_from_codebook()`	Reverse-code items marked in codebook
12	`score_surveys()`	Score instruments with subscales and prorating

QC steps (6, 7, 10) flag but don't remove rows by default. Removal is controlled by config settings (remove_attention_fails, remove_duration_outliers, remove_straightliners) and applied together after all flags are set.

Function Reference

Configuration (`survey_config.R`)

survey_config() -- Create a validated config object for the pipeline
is_survey_config() -- Check if an object is a survey_config

Column Naming (`column_naming.R`)

process_double_headers() -- Handle SurveyMonkey double-header format
apply_short_names() -- Rename columns using codebook mappings
create_multiselect_summary_cols() -- Create summary columns for multi-select questions

Quality Control

Date Filtering (qc_date_filters.R)

parse_date_columns() -- Auto-detect and parse date formats
filter_dates() -- Remove responses outside date range

Duration (qc_duration.R)

calculate_survey_duration() -- Compute completion time
filter_duration() -- Flag/remove duration outliers (mean+SD, percentile, or absolute)
process_survey_duration() -- Combined duration pipeline

Attention Checks (qc_attention_checks.R)

get_attention_checks() -- Extract attention check items from codebook
score_individual_attention_checks() -- Score each attention item
calculate_attention_score() -- Compute overall attention score
process_attention_checks() -- Combined attention check pipeline

Straightlining (qc_straightlining.R)

detect_straightlining() -- Flag identical responses within scales

Response Coding (`response_coding.R`)

convert_response_text_to_codes() -- Map text to numeric using codebook
convert_to_numeric_from_codebook() -- Type conversion with validation
reverse_code_from_codebook() -- Reverse-code items per codebook

Scoring (`survey_scoring_utils.R`)

var_score() -- Core scoring with prorating and missing data handling
score_surveys() -- Batch score all instruments with subscales
get_available_surveys() -- List scoreable instruments in codebook

Codebook Management (`codebook_io.R`, `codebook_generate.R`)

build_codebook_skeleton() -- Build codebook skeleton from data column names
load_codebook() -- Load and validate codebook CSV
load_maximal_codebook() -- Load institutional knowledge base
generate_enhanced_codebook_template() -- Generate codebook from SurveyMonkey API
build_qualtrics_codebook_skeleton() -- Generate codebook from Qualtrics API
generate_qualtrics_codebook() -- Generate + enrich codebook from Qualtrics API
update_maximal_codebook() -- Add new items to knowledge base

SurveyMonkey API (`sm_auth.R`, `sm_api.R`, `sm_format.R`)

load_sm_token() -- Load OAuth token
process_survey_responses() -- Download and flatten survey data
flatten_survey_responses() -- Transform nested API responses to flat data frame

Qualtrics API (`qualtrics_utils.R`)

load_qualtrics_credentials() -- Load API credentials
get_qualtrics_survey_id() -- Look up survey ID by name
fetch_qualtrics_questions() -- Retrieve survey metadata

Workflow (`workflow_utils.R`)

process_survey_data() -- Full processing pipeline (accepts config object)
complete_processing_pipeline() -- Pipeline wrapper for params-based workflow
complete_survey_workflow() -- End-to-end: download, generate codebook, process
setup_new_survey_project() -- Scaffold a new study project directory

Dashboard (`dashboard_utils.R`, `inst/dashboard/`)

generate_dashboard() -- Render HTML quality dashboard
create_variable_summary() -- Export variable specs for dashboard

Visualization (`plot_utils.R`)

make_hist() -- Histograms with mean/SD annotations
plot_horizontal_bars() -- Horizontal bar charts with percentages
plot_waffle() -- Waffle charts for categorical distributions
plot_state_heatmap() -- US state choropleth maps

Platform Setup Guides

Qualtrics API Setup -- API key, base URL, credential file format
SurveyMonkey API Setup -- Developer app, OAuth token setup

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
R		R
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

surveyUtils

What You Get

Table of Contents

Two Ways to Use It

Full pipeline -- one call, config-driven

Individual functions -- step by step

Getting Started

Install

Scaffold a new project

Configure your study

Run the example

Codebook Generation

Tier 1: From your data (no API needed)

Tier 2: From API metadata

Tier 3: Maximal codebook matching

The Codebook System

Codebook Structure

Maximal Codebook Library

Processing Pipeline

Function Reference

Configuration (survey_config.R)

Column Naming (column_naming.R)

Quality Control

Response Coding (response_coding.R)

Scoring (survey_scoring_utils.R)

Codebook Management (codebook_io.R, codebook_generate.R)

SurveyMonkey API (sm_auth.R, sm_api.R, sm_format.R)

Qualtrics API (qualtrics_utils.R)

Workflow (workflow_utils.R)

Dashboard (dashboard_utils.R, inst/dashboard/)

Visualization (plot_utils.R)

Platform Setup Guides

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration (`survey_config.R`)

Column Naming (`column_naming.R`)

Response Coding (`response_coding.R`)

Scoring (`survey_scoring_utils.R`)

Codebook Management (`codebook_io.R`, `codebook_generate.R`)

SurveyMonkey API (`sm_auth.R`, `sm_api.R`, `sm_format.R`)

Qualtrics API (`qualtrics_utils.R`)

Workflow (`workflow_utils.R`)

Dashboard (`dashboard_utils.R`, `inst/dashboard/`)

Visualization (`plot_utils.R`)

Packages