Skip to content

Comprehensive data analysis pipelines for experimental linguistic research, covering EEG, speech recordings, eye-tracking, and reaction time studies. These notebooks demonstrate reproducible workflows for processing and analyzing diverse experimental data types commonly used in phonetics, psycholinguistics, and cognitive science research.

License

Notifications You must be signed in to change notification settings

chemvatho/XLinCoLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

57 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Experimental Linguistics Analysis Pipelines

Python License: MIT Open In Colab

Comprehensive data analysis pipelines for experimental linguistic research, covering EEG, speech recordings, eye-tracking, and reaction time studies. These notebooks demonstrate reproducible workflows for processing and analyzing diverse experimental data types commonly used in phonetics, psycholinguistics, and cognitive science research.

Author: Chem Vatho, PhD
Affiliation: University of Cologne, IfL – Phonetik
Contact: chemvatho@gmail.com


πŸ“‹ Overview

This repository contains analysis pipelines developed for the Experimental Service Hub at the Data Center for the Humanities (DCH), University of Cologne. Each notebook provides a complete, reproducible workflow from data loading to statistical analysis and visualization.

diphthong

Pipeline Data Type Key Methods
EEG Analysis EEG recordings ERP analysis, time-frequency decomposition
Speech Recording Audio files F0 extraction, formant analysis, voice quality
Eye-tracking Gaze data Fixation analysis, time course visualization
Reaction Time Behavioral data RT distributions, mixed-effects modeling

🧠 EEG Analysis

Notebook: EEG_Analysis_Pipeline.ipynb

A comprehensive pipeline for processing electroencephalography (EEG) data in linguistic experiments, with focus on event-related potentials (ERPs) relevant to language processing.

Features

  • Preprocessing: Bandpass filtering (0.1–30 Hz), baseline correction, artifact rejection
  • ERP Analysis: N400 and P600 component extraction for semantic/syntactic processing
  • Time-Frequency Analysis: Morlet wavelet decomposition for oscillatory dynamics
  • Statistical Analysis: T-tests, effect sizes (Cohen's d), condition comparisons
  • Visualization: Multi-channel ERP plots, topographic maps, TFR spectrograms

erp comparison

Sample Output

Statistical Results at Electrode Pz
══════════════════════════════════════════════════════════
N400 Component (300-500 ms):
  Congruent: M = -2.15 Β΅V (SD = 1.82)
  Incongruent: M = -5.43 Β΅V (SD = 2.01)
  t = 8.234, p < .001, Cohen's d = 1.72

Dependencies

mne, numpy, pandas, scipy, matplotlib, seaborn, scikit-learn

🎀 Speech Recording Analysis

Notebook: Speech_Recording_Analysis.ipynb

riverbank

A complete acoustic phonetics pipeline using Praat (via Parselmouth) and librosa for analyzing speech recordings.

Features

  • F0 Extraction: Multiple algorithms (Praat autocorrelation, pYIN, CREPE)
  • Formant Analysis: F1–F4 tracking with bandwidth measurements
  • Voice Quality: Jitter, shimmer, harmonics-to-noise ratio (HNR)
  • Visualization: Spectrograms, formant tracks, F1-F2 vowel space plots
  • Batch Processing: Automated pipeline for multiple audio files

diphthong

Sample Output

Voice Quality Measures:
══════════════════════════════════════════════════════════
Jitter (local): 0.892%
Shimmer (local): 3.241%
HNR: 18.45 dB

Formant Statistics:
  F1: M = 512 Hz (SD = 89)
  F2: M = 1543 Hz (SD = 234)
  F3: M = 2651 Hz (SD = 187)

Dependencies

parselmouth, librosa, numpy, pandas, scipy, matplotlib, seaborn, soundfile

πŸ‘οΈ Eye-tracking Analysis

Notebook: Peekbank_Eyetracking_Analysis.ipynb

Analysis pipeline for visual world paradigm eye-tracking experiments using the Peekbank database framework.

Features

  • Data Processing: Fixation extraction, area-of-interest (AOI) mapping
  • Time Course Analysis: Proportion of looks over time
  • Growth Curve Analysis: Polynomial and GAM modeling of looking behavior
  • Statistical Analysis: Cluster-based permutation tests, bootstrapped CIs
  • Visualization: Time course plots, heatmaps, individual differences

Repository

See the dedicated repository: github.com/chemvatho/Peekbank-Analysis


⏱️ Reaction Time Analysis

Notebook: Reaction_Time_Analysis.ipynb

diphthong

A psycholinguistics-focused pipeline for analyzing reaction time data from lexical decision and similar paradigms.

Features

  • Preprocessing: Outlier removal (absolute bounds + SD-based), accuracy filtering
  • Distribution Analysis: RT histograms, Q-Q plots, log transformation assessment
  • Effect Analysis: Lexicality, word frequency, semantic priming effects
  • ANOVA: Repeated measures with pairwise comparisons (Bonferroni correction)
  • Mixed-Effects Models: Linear mixed models with random slopes for subjects/items
  • Visualization: Interaction plots, by-subject distributions, effect size plots

diphthong

Sample Output

Repeated Measures ANOVA (Frequency Γ— Priming):
══════════════════════════════════════════════════════════════════════
Source          SS        DF       MS         F       p-unc    Ξ·Β²
──────────────────────────────────────────────────────────────────────
frequency    125432.1      1   125432.1   89.234    <.001   0.124
priming       45621.8      1    45621.8   32.451    <.001   0.045
freq*prime     2341.2      1     2341.2    1.665     .207   0.002

Dependencies

pandas, numpy, scipy, statsmodels, pingouin, matplotlib, seaborn

diphthong

..............................

πŸš€ Quick Start

..............................

Option 1: Google Colab (Recommended)

Click the "Open in Colab" badge on any notebook to run directly in your browserβ€”no installation required.

Option 2: Local Installation

# Clone the repository
git clone https://github.com/chemvatho/experimental-linguistics-pipelines.git
cd experimental-linguistics-pipelines

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter notebook

Requirements File

# requirements.txt
numpy>=1.21.0
pandas>=1.3.0
scipy>=1.7.0
matplotlib>=3.4.0
seaborn>=0.11.0
statsmodels>=0.13.0
pingouin>=0.5.0
mne>=1.0.0
librosa>=0.9.0
parselmouth>=0.4.0
scikit-learn>=1.0.0
soundfile>=0.10.0

πŸ“Š Datasets

These pipelines are designed to work with the following open datasets:

Data Type Dataset Source
EEG EEG Dataset Kaggle
EEG ZuCo (reading EEG) OSF
Speech Speech Accent Archive Kaggle
Speech LJSpeech Kaggle
Eye-tracking Peekbank peekbank.stanford.edu
Reaction Time British Lexicon Project crr.ugent.be
Reaction Time MALD Database Springer

πŸ“ Repository Structure

experimental-linguistics-pipelines/
β”‚
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ requirements.txt                   # Python dependencies
β”œβ”€β”€ LICENSE                            # MIT License
β”‚
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ EEG_Analysis_Pipeline.ipynb
β”‚   β”œβ”€β”€ Speech_Recording_Analysis.ipynb
β”‚   └── Reaction_Time_Analysis.ipynb
β”‚
β”œβ”€β”€ data/                              # Sample data (gitignored for large files)
β”‚   └── .gitkeep
β”‚
β”œβ”€β”€ outputs/                           # Analysis outputs
β”‚   β”œβ”€β”€ figures/
β”‚   └── results/
β”‚
└── utils/                             # Helper functions
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ preprocessing.py
    └── visualization.py

πŸ“š Related Publications

  • Chem, Vatho. (in prep.). Adapting forced alignment for Khmer, a low-resource language.
  • Chem, Vatho. (in prep.). The Illustration of IPA: Khmer (Phnom Penh Dialect). Journal of the IPA.
  • Chem, Vatho. (2020). Khmer Vowel System: Structure and Variation. CJBAR, 2(2).

πŸ”— Related Resources


🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Supervisors: Prof. Dr. Martine Grice & PD Dr. Constantijn Kaland (University of Cologne)
  • Mentors: Prof. Dr. Reinhold Greisbach & T. Mark Ellison (University of Cologne)
  • Funding: KAAD (Katholischer Akademischer AuslΓ€nder-Dienst)
  • Tools: Praat, MNE-Python, librosa

πŸ™ AI Assisted tools

The repository was developed with the assistance of Claude (Anthropic), which supported logical reasoning and algorithm design. Its integration with NotebookLM led to the creation of a pilot project, which is documented on GitHub.


Developed for the Experimental Service Hub, Data Center for the Humanities (DCH)
University of Cologne, Germany

About

Comprehensive data analysis pipelines for experimental linguistic research, covering EEG, speech recordings, eye-tracking, and reaction time studies. These notebooks demonstrate reproducible workflows for processing and analyzing diverse experimental data types commonly used in phonetics, psycholinguistics, and cognitive science research.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published