Explained variance PCA plot not monotonic

jgiemza · April 8, 2021, 2:19pm

Hello,

I got a weird explained variance PCA plot with mixOmics pca function: it decreases up to PC4, then it increases for PC5 and PC6 and decreases again.

I wonder how it is possible. The dataset is not omics, just clinical data (14 variables) and many NAs.

aljabadi · April 14, 2021, 12:28am

Hi @jgiemza,

It’s caused by a combination of 1) an abundance of missing values in your data and 2) the heuristic we use to calculate the explained variance, which in practice replaces the missing values in the centered matrix by 0 (the feature mean).

You can either only work with the first few PCs, or look into imputing the missing values using a method of your choice. We have impute.nipals function which does just that. Below is an example:

library(mixOmics)
data("nutrimouse")
X <- data.matrix(nutrimouse$lipid)
X <- scale(X, center = TRUE, scale = TRUE)
## add missing values to X to impute and compare to actual values
set.seed(42)
na.ind <- sample(seq_along(X), size = 10)
true.values <- X[na.ind]
X[na.ind] <- NA
X.impute <- impute.nipals(X, ncomp = 8)
## compare
rbind('actual' = round(true.values, 2),
      'imputed' = round(X.impute[na.ind], 2)
      )
#>          [,1]  [,2] [,3]  [,4]  [,5] [,6] [,7] [,8]  [,9] [,10]
#> actual  -0.73 -0.43 1.73 -0.80 -0.58 1.41 0.77 0.65 -1.39 -0.17
#> imputed -0.70 -0.20 1.20 -1.48 -0.32 1.27 1.04 0.96 -0.96  1.11

Topic		Replies	Views
PCA bug in the latest devel Bugs	1	626	October 22, 2020
Non-orthogonality with NIPALS after filtering for data with high rates of missing values Analysis	2	212	March 17, 2023
Pca on data with missing values Analysis	1	385	April 13, 2022
Errors vignette 3 PCA with MixOmics 6.16.3 on R version 4.1.0 Support	0	378	October 6, 2021
Principal Components in plsda() Analysis	1	103	May 16, 2024

Explained variance PCA plot not monotonic

Related topics