I am testing out the Samplics library by comparing its output to that of R's survey and srvyr packages. I'm finding very slight variations in P values between Samplics and R when running chi squared tests, and I'd love your help in figuring out the cause of these variations.
Here's my initial data import code:
from samplics.estimation import TaylorEstimator
from samplics.utils.types import PopParam
from samplics.categorical import CrossTabulation
import pandas as pd
import numpy as np
# Reading in dataset
df_car_survey = pd.read_csv(
'https://raw.githubusercontent.com/ifstudies/carsurveydata/\
refs/heads/main/car_survey.csv')
df_car_survey['Enjoy_Driving_Fast'] = df_car_survey[
'Enjoy_Driving_Fast'].str.replace(' ','_').copy()
print(df_car_survey.head())
Output:
Car_Color Weight Enjoy_Driving_Fast Count Response_Sort_Map
0 Red 1.975884 Strongly_Agree 1 0
1 Red 0.943725 Strongly_Agree 1 0
2 Red 1.342593 Strongly_Agree 1 0
3 Red 1.704274 Strongly_Agree 1 0
4 Red 0.348622 Strongly_Agree 1 0
Here is the code I used to estimate proportions and run a chi squared test within Samplics:
# Creating filtered copy for use within Samplics
df_car_survey_filtered = df_car_survey.query(
"Car_Color in ['Red', 'Black']").copy()
print("Samplics proportion estimates:")
df_survey_estimator = TaylorEstimator(param=PopParam.prop)
df_survey_estimator.estimate(
y=df_car_survey_filtered["Enjoy_Driving_Fast"],
domain=df_car_survey_filtered["Car_Color"],
samp_weight=df_car_survey_filtered['Weight'],
remove_nan = True
)
print(df_survey_estimator.to_dataframe())
print("\nSamplics chi squared test:")
rb_chisq_initial = CrossTabulation(param=PopParam.prop)
rb_chisq_initial.tabulate(
vars=df_car_survey_filtered[["Car_Color", "Enjoy_Driving_Fast"]],
samp_weight=df_car_survey_filtered['Weight'],
remove_nan = True
)
print(rb_chisq_initial)
Samplics output:
Samplics proportion estimates:
_param _domain _level _estimate _stderror _lci \
0 PopParam.prop Black Agree 0.238712 0.024280 0.194339
1 PopParam.prop Black Disagree 0.038357 0.010993 0.021736
2 PopParam.prop Black Slightly_Agree 0.084976 0.015647 0.058874
3 PopParam.prop Black Slightly_Disagree 0.049802 0.012735 0.029975
4 PopParam.prop Black Strongly_Agree 0.475709 0.028543 0.420207
5 PopParam.prop Black Strongly_Disagree 0.112444 0.018153 0.081427
6 PopParam.prop Red Agree 0.190626 0.026221 0.144352
7 PopParam.prop Red Disagree 0.029421 0.011059 0.013973
8 PopParam.prop Red Slightly_Agree 0.128908 0.022078 0.091400
9 PopParam.prop Red Slightly_Disagree 0.040227 0.012731 0.021466
10 PopParam.prop Red Strongly_Agree 0.481333 0.032954 0.417302
11 PopParam.prop Red Strongly_Disagree 0.129485 0.021744 0.092436
_uci _cv
0 0.289574 0.101711
1 0.066820 0.286600
2 0.121160 0.184140
3 0.081639 0.255706
4 0.531818 0.060001
5 0.153305 0.161438
6 0.247444 0.137553
7 0.060892 0.375888
8 0.178780 0.171270
9 0.074140 0.316491
10 0.545983 0.068465
11 0.178463 0.167926
Samplics chi squared test:
Cross-tabulation of Car_Color and Enjoy_Driving_Fast
Number of strata: 1
Number of PSUs: 718
Number of observations: 718
Degrees of freedom: 717.00
Car_Color Enjoy_Driving_Fast PopParam.prop stderror lower_ci upper_ci
Black Agree 0.135314 0.014665 0.109015 0.166771
Black Disagree 0.021743 0.006286 0.012289 0.038188
Black Slightly_Agree 0.048169 0.009040 0.033221 0.069361
Black Slightly_Disagree 0.028230 0.007305 0.016931 0.046712
Black Strongly_Agree 0.269657 0.019140 0.233767 0.308836
Black Strongly_Disagree 0.063739 0.010575 0.045870 0.087929
Red Agree 0.082569 0.012114 0.061692 0.109685
Red Disagree 0.012744 0.004830 0.006038 0.026697
Red Slightly_Agree 0.055836 0.009952 0.039222 0.078910
Red Slightly_Disagree 0.017424 0.005577 0.009269 0.032520
Red Strongly_Agree 0.208488 0.017628 0.175985 0.245208
Red Strongly_Disagree 0.056086 0.009785 0.039695 0.078692
Pearson (with Rao-Scott adjustment):
Unadjusted - chi2(5): 6.3434 with p-value of 0.2742
Adjusted - F(4.99, 3579.82): 0.9571 with p-value of 0.4427
Likelihood ratio (with Rao-Scott adjustment):
Unadjusted - chi2(5): 6.3364 with p-value of 0.2748
Adjusted - F(4.99, 3579.82): 0.9561 with p-value of 0.4434
Meanwhile, here are my tests within R: (This code was executed within the same Python notebook using rpy2's %%R and %R tools.)
Initializing R and loading survey dataset:
%load_ext rpy2.ipython
%R -i df_car_survey # Imports DataFrame into R
Here is my R code for estimating proportions and running a chi squared test: (this code was based on chapters 5 and 6 of Exploring Complex Survey Data Analysis Using R.)
%%R
library(survey)
library(srvyr)
print("R proportion estimates:")
# Creating a survey design object:
df_des <- df_car_survey %>% as_survey_design(weights=Weight)
# Filtering this survey design object:
df_des_filtered <- df_des %>% filter(Car_Color %in% c('Red', 'Black'))
# df_des_filtered %>% group_by(Car_Color, Enjoy_Driving_Fast) %>%
# summarize(row_count = n())
df_proportions <- df_des_filtered %>% group_by(
Car_Color, Enjoy_Driving_Fast) %>%
summarize(proportion = survey_prop(
vartype=c('ci', 'var', 'cv')))
print(df_proportions)
print("R chi squared test:")
df_chi2 <- df_des_filtered %>% svychisq(
formula = ~ Car_Color + Enjoy_Driving_Fast,
design = .,
statistic = "Chisq",
na.rm = TRUE)
print(df_chi2)
R Output:
[1] "R proportion estimates:"
# A tibble: 12 × 7
# Groups: Car_Color [2]
Car_Color Enjoy_Driving_Fast proportion proportion_low proportion_upp
<chr> <chr> <dbl> <dbl> <dbl>
1 Black Agree 0.239 0.194 0.290
2 Black Disagree 0.0384 0.0217 0.0668
3 Black Slightly_Agree 0.0850 0.0589 0.121
4 Black Slightly_Disagree 0.0498 0.0300 0.0816
5 Black Strongly_Agree 0.476 0.420 0.532
6 Black Strongly_Disagree 0.112 0.0814 0.153
7 Red Agree 0.191 0.144 0.247
8 Red Disagree 0.0294 0.0140 0.0609
9 Red Slightly_Agree 0.129 0.0914 0.179
10 Red Slightly_Disagree 0.0402 0.0215 0.0741
11 Red Strongly_Agree 0.481 0.417 0.546
12 Red Strongly_Disagree 0.129 0.0924 0.178
# ℹ 2 more variables: proportion_var <dbl>, proportion_cv <dbl>
[1] "R chi squared test:"
Pearson's X^2: Rao & Scott adjustment
data: NextMethod()
X-squared = 6.3434, df = 5, p-value = 0.4464
The p value reported by R for the Pearson's chi squared test (0.4464) differs slightly from the equivalent Samplics P value (0.4427), even though the chi squared statistic reported by both R and samplics (6.3434) is the same. Do you know why this might be the case? I know the difference is slight, but depending on the underlying P value and confidence level, this discrepancy could mean the difference between a significant and non-significant result.
I am testing out the Samplics library by comparing its output to that of R's
surveyandsrvyrpackages. I'm finding very slight variations in P values between Samplics and R when running chi squared tests, and I'd love your help in figuring out the cause of these variations.Here's my initial data import code:
Output:
Here is the code I used to estimate proportions and run a chi squared test within Samplics:
Samplics output:
Meanwhile, here are my tests within R: (This code was executed within the same Python notebook using rpy2's %%R and %R tools.)
Initializing R and loading survey dataset:
Here is my R code for estimating proportions and running a chi squared test: (this code was based on chapters 5 and 6 of Exploring Complex Survey Data Analysis Using R.)
R Output:
The p value reported by R for the Pearson's chi squared test (0.4464) differs slightly from the equivalent Samplics P value (0.4427), even though the chi squared statistic reported by both R and samplics (6.3434) is the same. Do you know why this might be the case? I know the difference is slight, but depending on the underlying P value and confidence level, this discrepancy could mean the difference between a significant and non-significant result.