Trying to understand slight differences in p values between Samplics and R's survey/srvyr packages

I am testing out the Samplics library by comparing its output to that of R's `survey` and `srvyr` packages. I'm finding very slight variations in P values between Samplics and R when running chi squared tests, and I'd love your help in figuring out the cause of these variations.

Here's my initial data import code:
```
from samplics.estimation import TaylorEstimator
from samplics.utils.types import PopParam
from samplics.categorical import CrossTabulation
import pandas as pd
import numpy as np
# Reading in dataset
df_car_survey = pd.read_csv(
    'https://raw.githubusercontent.com/ifstudies/carsurveydata/\
refs/heads/main/car_survey.csv')
df_car_survey['Enjoy_Driving_Fast'] = df_car_survey[
'Enjoy_Driving_Fast'].str.replace(' ','_').copy()
print(df_car_survey.head())
```

Output:
```
  Car_Color    Weight Enjoy_Driving_Fast  Count  Response_Sort_Map
0       Red  1.975884     Strongly_Agree      1                  0
1       Red  0.943725     Strongly_Agree      1                  0
2       Red  1.342593     Strongly_Agree      1                  0
3       Red  1.704274     Strongly_Agree      1                  0
4       Red  0.348622     Strongly_Agree      1                  0
```

Here is the code I used to estimate proportions and run a chi squared test within Samplics:

```
# Creating filtered copy for use within Samplics
df_car_survey_filtered = df_car_survey.query(
    "Car_Color in ['Red', 'Black']").copy()

print("Samplics proportion estimates:")
df_survey_estimator = TaylorEstimator(param=PopParam.prop)
df_survey_estimator.estimate(
    y=df_car_survey_filtered["Enjoy_Driving_Fast"],
    domain=df_car_survey_filtered["Car_Color"],
    samp_weight=df_car_survey_filtered['Weight'],
    remove_nan = True
)
print(df_survey_estimator.to_dataframe())


print("\nSamplics chi squared test:")
rb_chisq_initial = CrossTabulation(param=PopParam.prop)
rb_chisq_initial.tabulate(
    vars=df_car_survey_filtered[["Car_Color", "Enjoy_Driving_Fast"]],
    samp_weight=df_car_survey_filtered['Weight'],
    remove_nan = True
)
print(rb_chisq_initial)
```

Samplics output:
```
Samplics proportion estimates:
           _param _domain             _level  _estimate  _stderror      _lci  \
0   PopParam.prop   Black              Agree   0.238712   0.024280  0.194339   
1   PopParam.prop   Black           Disagree   0.038357   0.010993  0.021736   
2   PopParam.prop   Black     Slightly_Agree   0.084976   0.015647  0.058874   
3   PopParam.prop   Black  Slightly_Disagree   0.049802   0.012735  0.029975   
4   PopParam.prop   Black     Strongly_Agree   0.475709   0.028543  0.420207   
5   PopParam.prop   Black  Strongly_Disagree   0.112444   0.018153  0.081427   
6   PopParam.prop     Red              Agree   0.190626   0.026221  0.144352   
7   PopParam.prop     Red           Disagree   0.029421   0.011059  0.013973   
8   PopParam.prop     Red     Slightly_Agree   0.128908   0.022078  0.091400   
9   PopParam.prop     Red  Slightly_Disagree   0.040227   0.012731  0.021466   
10  PopParam.prop     Red     Strongly_Agree   0.481333   0.032954  0.417302   
11  PopParam.prop     Red  Strongly_Disagree   0.129485   0.021744  0.092436   

        _uci       _cv  
0   0.289574  0.101711  
1   0.066820  0.286600  
2   0.121160  0.184140  
3   0.081639  0.255706  
4   0.531818  0.060001  
5   0.153305  0.161438  
6   0.247444  0.137553  
7   0.060892  0.375888  
8   0.178780  0.171270  
9   0.074140  0.316491  
10  0.545983  0.068465  
11  0.178463  0.167926  

Samplics chi squared test:

Cross-tabulation of Car_Color and Enjoy_Driving_Fast
 Number of strata: 1
 Number of PSUs: 718
 Number of observations: 718
 Degrees of freedom: 717.00

 Car_Color Enjoy_Driving_Fast  PopParam.prop  stderror  lower_ci  upper_ci
    Black              Agree       0.135314  0.014665  0.109015  0.166771
    Black           Disagree       0.021743  0.006286  0.012289  0.038188
    Black     Slightly_Agree       0.048169  0.009040  0.033221  0.069361
    Black  Slightly_Disagree       0.028230  0.007305  0.016931  0.046712
    Black     Strongly_Agree       0.269657  0.019140  0.233767  0.308836
    Black  Strongly_Disagree       0.063739  0.010575  0.045870  0.087929
      Red              Agree       0.082569  0.012114  0.061692  0.109685
      Red           Disagree       0.012744  0.004830  0.006038  0.026697
      Red     Slightly_Agree       0.055836  0.009952  0.039222  0.078910
      Red  Slightly_Disagree       0.017424  0.005577  0.009269  0.032520
      Red     Strongly_Agree       0.208488  0.017628  0.175985  0.245208
      Red  Strongly_Disagree       0.056086  0.009785  0.039695  0.078692

Pearson (with Rao-Scott adjustment):
	Unadjusted - chi2(5): 6.3434 with p-value of 0.2742
	Adjusted - F(4.99, 3579.82): 0.9571  with p-value of 0.4427

  Likelihood ratio (with Rao-Scott adjustment):
	 Unadjusted - chi2(5): 6.3364 with p-value of 0.2748
	 Adjusted - F(4.99, 3579.82): 0.9561  with p-value of 0.4434
```

Meanwhile, here are my tests within R: (This code was executed within the same Python notebook using [rpy2's %%R and %R tools](https://rpy2.github.io/doc/latest/html/interactive.html#rmagic).)

Initializing R and loading survey dataset:
```
%load_ext rpy2.ipython
%R -i df_car_survey # Imports DataFrame into R
```

Here is my R code for estimating proportions and running a chi squared test: (this code was based on [chapters 5 and 6 of Exploring Complex Survey Data Analysis Using R](https://tidy-survey-r.github.io/tidy-survey-book/c06-statistical-testing.html#c06-statistical-testing).)
```
%%R
library(survey)
library(srvyr)

print("R proportion estimates:")
# Creating a survey design object:
df_des <- df_car_survey %>% as_survey_design(weights=Weight)
# Filtering this survey design object:
df_des_filtered <- df_des %>% filter(Car_Color %in% c('Red', 'Black'))
# df_des_filtered %>% group_by(Car_Color, Enjoy_Driving_Fast) %>% 
#     summarize(row_count = n())
df_proportions <- df_des_filtered %>% group_by(
    Car_Color, Enjoy_Driving_Fast) %>% 
    summarize(proportion = survey_prop(
        vartype=c('ci', 'var', 'cv')))
print(df_proportions)

print("R chi squared test:")
df_chi2 <- df_des_filtered %>% svychisq(
    formula = ~ Car_Color + Enjoy_Driving_Fast,
    design = ., 
statistic = "Chisq",
    na.rm = TRUE)
print(df_chi2)
```

R Output:

```
[1] "R proportion estimates:"
# A tibble: 12 × 7
# Groups:   Car_Color [2]
   Car_Color Enjoy_Driving_Fast proportion proportion_low proportion_upp
   <chr>     <chr>                   <dbl>          <dbl>          <dbl>
 1 Black     Agree                  0.239          0.194          0.290 
 2 Black     Disagree               0.0384         0.0217         0.0668
 3 Black     Slightly_Agree         0.0850         0.0589         0.121 
 4 Black     Slightly_Disagree      0.0498         0.0300         0.0816
 5 Black     Strongly_Agree         0.476          0.420          0.532 
 6 Black     Strongly_Disagree      0.112          0.0814         0.153 
 7 Red       Agree                  0.191          0.144          0.247 
 8 Red       Disagree               0.0294         0.0140         0.0609
 9 Red       Slightly_Agree         0.129          0.0914         0.179 
10 Red       Slightly_Disagree      0.0402         0.0215         0.0741
11 Red       Strongly_Agree         0.481          0.417          0.546 
12 Red       Strongly_Disagree      0.129          0.0924         0.178 
# ℹ 2 more variables: proportion_var <dbl>, proportion_cv <dbl>
[1] "R chi squared test:"

	Pearson's X^2: Rao & Scott adjustment

data:  NextMethod()
X-squared = 6.3434, df = 5, p-value = 0.4464
```

The p value reported by R for the Pearson's chi squared test (0.4464) differs slightly from the equivalent Samplics P value (0.4427), even though the chi squared statistic reported by both R and samplics (6.3434) is the same. Do you know why this might be the case? I know the difference is slight, but depending on the underlying P value and confidence level, this discrepancy could mean the difference between a significant and non-significant result.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trying to understand slight differences in p values between Samplics and R's survey/srvyr packages #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Trying to understand slight differences in p values between Samplics and R's survey/srvyr packages #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions