Skip to content

contrast for multiple conditions in DS analysis #417

@feanaros

Description

@feanaros

Dear Helena, thank you for the package.
I have a dataset with multiple conditions and I would like to perform DS analysis but I am not sure I the data are prepared properly.

ei
   sample_id condition patient_id n_cells
1   HealthyA   Healthy        IDA   57406
2   HealthyB   Healthy        IDB   57360
3   HealthyE   Healthy        IDE  186564
4      NAFL1      NAFL        ID1  129166
5      NAFL2      NAFL        ID2   84568
6      NAFL3      NAFL        ID4  144629
7      NAFL4      NAFL        ID5  328842
8      NAFL5      NAFL       ID10  209022
9       NAS1       NAS        ID8   84714
10      NAS2       NAS        ID3  216991
11      NAS3       NAS        ID7   85073
12     NASH1      NASH        ID6   95879
13     NASH2      NASH       ID11   67581
14     NASH3      NASH       ID12   47626

> ds_formula1 <- createFormula(ei, cols_fixed = "condition")
> ds_formula1
$formula
y ~ condition
<environment: 0x7fbba1a1a2a8>

$data
   condition
1    Healthy
2    Healthy
3    Healthy
4       NAFL
5       NAFL
6       NAFL
7       NAFL
8       NAFL
9        NAS
10       NAS
11       NAS
12      NASH
13      NASH
14      NASH

$random_terms
[1] FALSE

> contrast <- createContrast(c(0, 1, 0, 0))
> contrast
     [,1]
[1,]    0
[2,]    1
[3,]    0
[4,]    0
> 
> ds_res4 <- diffcyt(
+   sce, 
+   formula = ds_formula1, 
+   contrast = contrast, 
+   analysis_type = "DS", 
+   method_DS = c("diffcyt-DS-LMM"),
+   clustering_to_use = "meta14", 
+   subsampling = 10000, 
+   verbose = TRUE
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta14' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DS tests using method 'diffcyt-DS-LMM'...
There were 50 or more warnings (use warnings() to see the first 50)
> diffcyt::topTable(ds_res4, format_vals = TRUE, top_n = 1000, order_by = "p_adj")
DataFrame with 504 rows and 4 columns
    cluster_id   marker_id     p_val     p_adj
      <factor>    <factor> <numeric> <numeric>
6           6        CD69   0.002060     0.147
6           6        CXCR3  0.003700     0.147
9           9        CXCR3  0.002050     0.147
12          12       CXCR3  0.002580     0.147
3           3        FoxP3  0.000912     0.147
...        ...         ...       ...       ...
11          11 CD223_LAG-3        NA        NA
12          12 CD223_LAG-3        NA        NA
13          13 CD223_LAG-3        NA        NA
14          14 CD223_LAG-3        NA        NA
13          13 CD16               NA        NA

If I try with limma DS:

> ds_design <- createDesignMatrix(ei, cols_design = "condition")
> ds_formula1 <- createFormula(ei, cols_fixed = "condition")
> contrast <- createContrast(c(0, 1, 0, 0))
> ds_res3 <- diffcyt(
+   sce, 
+   design = ds_design, 
+   contrast = contrast, 
+   analysis_type = "DS", 
+   clustering_to_use = "meta14", 
+   subsampling = 10000, 
+   verbose = TRUE,
+   transform = F
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta14' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DS tests using method 'diffcyt-DS-limma'...
Warning messages:
1: In fitFDist(var, df1 = df, covariate = covariate) :
  More than half of residual variances are exactly zero: eBayes unreliable
2: In splines::ns(covariate, df = splinedf, intercept = TRUE) :
  shoving 'interior' knots matching boundary knots to inside

> diffcyt::topTable(ds_res3, format_vals = TRUE, show_logFC = T, top_n = 1000, order_by = "marker_id")
DataFrame with 504 rows and 5 columns
    cluster_id marker_id     logFC     p_val     p_adj
      <factor>  <factor> <numeric> <numeric> <numeric>
1            1      CD45     0.057    0.7820     1.000
2            2      CD45     0.303    0.5810     1.000
3            3      CD45    -0.150    0.4660     1.000
4            4      CD45    -0.140    0.2630     1.000
5            5      CD45    -0.295    0.0535     0.655
...        ...       ...       ...       ...       ...
10          10      CD16     0.000     1.000         1
11          11      CD16     0.000     1.000         1
12          12      CD16    -0.023     0.235         1
13          13      CD16     0.000     1.000         1
14          14      CD16     0.000     1.000         1


I am not understanding why it is not working with both methods, hope for an help, thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions