Multilevel (cross-over) PCA - artifact?

Dear all,
I would like to express that mixOmics is a great contribution, and as far I know, one of the very few options for performing PCA and PLS-DA with cross-over designs in Bioconductor, along with all omics integration features.
The reason I am contacting you is because I used mixOmics to generate PCA for a simple cross-over design without additional factors, meaning that the same subjects had the blood collected in two time-points (before and after), but all subjects received the same treatment. Using the function ‘withinVariation’ as in the example of data(vac18), I have noted that the PCA is symmetric, like a mirrored image (vertical and horizontal). Although I may have misinterpreted something since I have never performed PCA in cross-over design before, this does not make sense to me. If this is not the way to do it, how could I specify that each pair of samples is from the same individual?

I could reproduce the plots generated with data(vac18), but with my data frame of 25 subjects in two time-points (50 samples) and ~300 measured features, I got a suspicious symmetric plot (like an artifact). I apologize for contacting you for this specific matter, and I hope you do not mind to share some thoughts about this.

Simulated data:
matD<-data.frame(replicate(50, sample(1:100000, 317, rep=TRUE))) #generate data frame
tmatD<-t(matD) #samples in each row
library(mixOmics)
X ← tmatD #just to use same lines below
samplesID<-rep(1:25,2) # assign 25 blood samples, measured twice in the same individuals
design ← data.frame(sample = samplesID)
pca.multilevel.1 ← pca(X, ncomp = 3, scale = TRUE, center = TRUE, multilevel=design) #not centered, or not scaled will present the symmetry
plotIndiv(pca.multilevel.1)
#same using withinVariation()
Xw ← withinVariation(X = X, design = design)
plotIndiv(res.pca.1level

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=pt_BR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] mixOmics_6.8.5 ggplot2_3.3.2 lattice_0.20-41 MASS_7.3-53

loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 RSpectra_0.16-0 pillar_1.4.6 compiler_3.6.3
[5] RColorBrewer_1.1-2 plyr_1.8.6 tools_3.6.3 digest_0.6.25
[9] lifecycle_0.2.0 tibble_3.0.3 gtable_0.3.0 pkgconfig_2.0.3
[13] rlang_0.4.7 Matrix_1.2-18 igraph_1.2.5 parallel_3.6.3
[17] gridExtra_2.3 withr_2.2.0 dplyr_1.0.2 stringr_1.4.0
[21] generics_0.0.2 vctrs_0.3.2 grid_3.6.3 tidyselect_1.1.0
[25] glue_1.4.1 ellipse_0.4.2 R6_2.4.1 rARPACK_0.11-0
[29] purrr_0.3.4 tidyr_1.1.1 reshape2_1.4.4 farver_2.0.3
[33] corpcor_1.6.9 magrittr_1.5 scales_1.1.1 ellipsis_0.3.1
[37] matrixStats_0.56.0 colorspace_1.4-1 labeling_0.3 stringi_1.4.6
[41] munsell_0.5.0 crayon_1.3.4

Hi @iglezer,

I just added a few lines of code, I am happy to dig further into how the multilevel is working. Basically you can consider it as a sort of normalisation per sample. Why it creates this symmetry I am not sure yet, but you would only use that approach if you visualised a strong within individual variation (e.g. 2 samples from the same unique individual are clustering). This is not the case here and so that could explain these outputs.

matD<-data.frame(replicate(50, sample(1:100000, 317, rep=TRUE))) #generate data frame
tmatD<-t(matD) #samples in each row
library(mixOmics)
X <- tmatD #just to use same lines below
samplesID<-rep(1:25,2) # assign 25 blood samples, measured twice in the same individuals
design <- data.frame(sample = samplesID)
pca.result <- pca(X, ncomp = 3, scale = TRUE, center = TRUE)
# sample ID info:
plotIndiv(pca.result, group = samplesID, ellipse = TRUE)
# 'repeat' info
plotIndiv(pca.result, group = c(rep(1, 25), rep(2, 25)))

pca.multilevel.1 <- pca(X, ncomp = 3, scale = TRUE, center = TRUE, multilevel=design) #not centered, or not scaled will present the symmetry
plotIndiv(pca.multilevel.1, group = samplesID, ellipse = TRUE)
plotIndiv(pca.multilevel.1, group = c(rep(1, 25), rep(2, 25)))
#same using withinVariation()
Xw <- withinVariation(X = X, design = design)
plotIndiv(res.pca.1level)

Kim-Anh

Thanks Kim-Anh,
This is intriguing, I am surprised that random values could generate mirrored image, no matter how many features are included in the data frame, 300 or 30,000.
Best, Isaias

My hint is that usually the multilevel decomposition tries to disentangle the individual variation. In your simulated data however, there is no such relationship between samples from the same individual and so that may create this result. However, I’ll try check it out in the next few days.

Kim-Anh