Skip to content

sashasomms/surveyUtils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

surveyUtils

An R package for survey research data processing. Works with exported CSVs from Qualtrics or SurveyMonkey -- no API required to get started. API integration is available for automated downloads and codebook generation.

What You Get

  • Faster project setup -- From hours to minutes with scaffolded project structure and starter templates
  • 📋 Consistent workflows -- Standardized processes across team members, with easy-reference config files
  • 📦 Works with CSV exports -- No API access required; just export from your survey platform and load
  • 🔄 End-to-end processing -- Complete pipeline from raw export to analysis-ready dataset
  • 📖 Intelligent codebook system -- Self-documenting data processing with institutional memory through the maximal codebook library
  • 📊 Standardized instrument scoring -- Automated scoring for validated scales with subscale support and reverse coding
  • Quality control built in -- Attention checks, straightlining detection, duration filtering, and response validation
  • 📈 HTML quality dashboard -- QC summaries, demographic plots, and scored scale distributions at a glance
  • 🔌 Multi-platform support -- Qualtrics and SurveyMonkey, with API integration for both
  • 🧩 Modular architecture -- Reusable functions following tidyverse conventions; use the full pipeline or individual steps
  • 🔐 Secure data handling -- Token-based authentication for API access, credentials stored outside the repo

Table of Contents


Two Ways to Use It

Full pipeline -- one call, config-driven

Set up a config/parameters.R with your study settings, then process everything at once. Best for routine studies where the standard QC + scoring steps apply.

library(surveyUtils)
source("config/parameters.R")

df_raw   <- read_csv(file.path(params$raw_dir, params$raw_filename))
codebook <- load_codebook(params$codebook_path)

result <- process_survey_data(df_raw, codebook, config = params)
generate_dashboard(params)

Individual functions -- step by step

Call each function directly for full control. Best for custom analyses, exploratory work, or when you only need specific steps.

library(surveyUtils)

df <- read_csv("data/raw/my_survey.csv")
codebook <- load_codebook("config/codebook.csv")

df <- filter_dates(df, start_date_col = "StartDate", min_date = "2026-01-15")
attn_checks <- get_attention_checks(codebook)
df <- score_individual_attention_checks(df, attn_checks)
df <- calculate_attention_score(df, attn_checks)

df <- convert_response_text_to_codes(df, codebook)
df <- convert_to_numeric_from_codebook(df, codebook)
df <- score_surveys(df, codebook, get_available_surveys(codebook))

write_csv(df, "data/processed/my_survey_scored.csv")

Both approaches use the same underlying functions and the same codebook format.


Getting Started

Install

# install.packages("devtools")
devtools::install_github("sashasomms/surveyUtils")

Scaffold a new project

library(surveyUtils)
setup_new_survey_project("My Study")

This creates a project directory with config/, data/raw/, data/processed/, data/results/, and starter scripts at the project root. Output directories for dashboards, QC reports, etc. are created automatically when the pipeline writes to them.

Configure your study

Copy config/parameters_template.R to config/parameters.R and customize. It calls survey_config() with your study-specific settings -- only specify what differs from the defaults:

# config/parameters.R
library(surveyUtils)

params <- survey_config(
  survey_name   = "My Study",
  platform      = "qualtrics",
  codebook_path = "config/codebook.csv",
  raw_filename  = "my_survey.csv",
  min_date      = "2026-01-15",
  duration_threshold_method = "absolute",
  duration_lower_threshold  = 3,
  attention_threshold       = 0.8,

  # Dashboard plot specs
  demo_hist_vars   = list(demo_age = "Age"),
  demo_waffle_vars = list(demo_gender = "Gender"),
  demo_bar_vars    = list(demo_race = list(label = "Race", sort_by_count = TRUE))
)

See ?survey_config for all available options and their defaults.

Run the example

A self-contained example project is bundled with synthetic data, a codebook, and a config file:

source(system.file("examples", "run_example.R", package = "surveyUtils"))

Codebook Generation

surveyUtils generates codebook skeletons through three enrichment tiers that stack together:

Tier 1: From your data (no API needed)

df_raw <- read_csv("data/raw/my_survey.csv")
skeleton <- build_codebook_skeleton(df_raw)
write_csv(skeleton, "config/codebook.csv")
# Fill in: scale_abbrev, response_choices, response_coding, etc.

Tier 2: From API metadata

With API access, surveyUtils pulls question text, response options, and coding directly from the platform:

# Qualtrics
load_qualtrics_credentials("../qualtrics_secrets.csv")
survey_id <- get_qualtrics_survey_id("My Study")
codebook <- build_qualtrics_codebook_skeleton(survey_id)

# SurveyMonkey
token <- load_sm_token("../.survey_monkey_secrets.csv")
codebook <- generate_enhanced_codebook_template(token, survey_name = "My Study",
                                                 output_path = "config/")

Tier 3: Maximal codebook matching

The maximal codebook is an institutional knowledge base of known survey instruments. surveyUtils matches your questions against this library using fuzzy text matching -- automatically filling in scale_abbrev, coding_direction, and scoring info for recognized instruments.

codebook <- generate_qualtrics_codebook(
  survey_id,
  survey_name           = "My Study",
  output_dir            = "config/",
  maximal_codebook_path = "path/to/maximal_codebook.csv"
)

Tiers stack: API + maximal codebook gives the richest result. The 00_generate_codebook.R template handles all three paths. See Qualtrics API Setup and SurveyMonkey API Setup for credential configuration.


The Codebook System

The codebook is the central organizing principle -- it serves as both documentation (complete metadata for every variable) and processing configuration (machine-readable instructions for cleaning, recoding, and scoring).

Codebook Structure

Column Description
col_num Column number for ordering
question_text Full question text as presented to participants
variable_name Variable name in the raw data
short_variable_name Concise analysis-friendly variable name
cat Variable category: demo, attention_check, survey_items, computed_scores
scale_abbrev Short scale name (phq, gad, pss, etc.)
scale_full Complete instrument name
subscale Subscale designation for multi-factor instruments
coding_direction 1 = forward, -1 = reverse
min / max Valid response range
response_choices Semicolon-separated response options
response_coding Semicolon-separated numeric codes matching response_choices
response_format Format for conversion (numeric, text, etc.)
correct_response Correct answer for attention check items

Maximal Codebook Library

Each new study can contribute back to the library via update_maximal_codebook(), so matching improves over time.


Processing Pipeline

Whether called individually or via process_survey_data(), the standard pipeline is:

Step Function What it does
1 clean_names() Standardize column names, strip HTML
2 apply_short_names() Rename columns using codebook short names
3 process_double_headers() Handle SurveyMonkey double-header format
4 create_multiselect_summary_cols() Summarize multi-select questions
5 filter_dates() Remove test/QA responses outside date range
6 process_survey_duration() Calculate and flag completion times
7 process_attention_checks() Score and flag inattentive responders
8 convert_response_text_to_codes() Map text responses to numeric codes
9 convert_to_numeric_from_codebook() Convert columns to numeric type
10 detect_straightlining() Flag identical responses across scales
11 reverse_code_from_codebook() Reverse-code items marked in codebook
12 score_surveys() Score instruments with subscales and prorating

QC steps (6, 7, 10) flag but don't remove rows by default. Removal is controlled by config settings (remove_attention_fails, remove_duration_outliers, remove_straightliners) and applied together after all flags are set.


Function Reference

Configuration (survey_config.R)

  • survey_config() -- Create a validated config object for the pipeline
  • is_survey_config() -- Check if an object is a survey_config

Column Naming (column_naming.R)

  • process_double_headers() -- Handle SurveyMonkey double-header format
  • apply_short_names() -- Rename columns using codebook mappings
  • create_multiselect_summary_cols() -- Create summary columns for multi-select questions

Quality Control

Date Filtering (qc_date_filters.R)

  • parse_date_columns() -- Auto-detect and parse date formats
  • filter_dates() -- Remove responses outside date range

Duration (qc_duration.R)

  • calculate_survey_duration() -- Compute completion time
  • filter_duration() -- Flag/remove duration outliers (mean+SD, percentile, or absolute)
  • process_survey_duration() -- Combined duration pipeline

Attention Checks (qc_attention_checks.R)

  • get_attention_checks() -- Extract attention check items from codebook
  • score_individual_attention_checks() -- Score each attention item
  • calculate_attention_score() -- Compute overall attention score
  • process_attention_checks() -- Combined attention check pipeline

Straightlining (qc_straightlining.R)

  • detect_straightlining() -- Flag identical responses within scales

Response Coding (response_coding.R)

  • convert_response_text_to_codes() -- Map text to numeric using codebook
  • convert_to_numeric_from_codebook() -- Type conversion with validation
  • reverse_code_from_codebook() -- Reverse-code items per codebook

Scoring (survey_scoring_utils.R)

  • var_score() -- Core scoring with prorating and missing data handling
  • score_surveys() -- Batch score all instruments with subscales
  • get_available_surveys() -- List scoreable instruments in codebook

Codebook Management (codebook_io.R, codebook_generate.R)

  • build_codebook_skeleton() -- Build codebook skeleton from data column names
  • load_codebook() -- Load and validate codebook CSV
  • load_maximal_codebook() -- Load institutional knowledge base
  • generate_enhanced_codebook_template() -- Generate codebook from SurveyMonkey API
  • build_qualtrics_codebook_skeleton() -- Generate codebook from Qualtrics API
  • generate_qualtrics_codebook() -- Generate + enrich codebook from Qualtrics API
  • update_maximal_codebook() -- Add new items to knowledge base

SurveyMonkey API (sm_auth.R, sm_api.R, sm_format.R)

  • load_sm_token() -- Load OAuth token
  • process_survey_responses() -- Download and flatten survey data
  • flatten_survey_responses() -- Transform nested API responses to flat data frame

Qualtrics API (qualtrics_utils.R)

  • load_qualtrics_credentials() -- Load API credentials
  • get_qualtrics_survey_id() -- Look up survey ID by name
  • fetch_qualtrics_questions() -- Retrieve survey metadata

Workflow (workflow_utils.R)

  • process_survey_data() -- Full processing pipeline (accepts config object)
  • complete_processing_pipeline() -- Pipeline wrapper for params-based workflow
  • complete_survey_workflow() -- End-to-end: download, generate codebook, process
  • setup_new_survey_project() -- Scaffold a new study project directory

Dashboard (dashboard_utils.R, inst/dashboard/)

  • generate_dashboard() -- Render HTML quality dashboard
  • create_variable_summary() -- Export variable specs for dashboard

Visualization (plot_utils.R)

  • make_hist() -- Histograms with mean/SD annotations
  • plot_horizontal_bars() -- Horizontal bar charts with percentages
  • plot_waffle() -- Waffle charts for categorical distributions
  • plot_state_heatmap() -- US state choropleth maps

Platform Setup Guides

About

A comprehensive R package for streamlined survey research data processing: from SurveyMonkey or Qualtrics API extraction to cleaned datasets ready for analysis

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages