Explained variance PCA plot not monotonic

Hello,

I got a weird explained variance PCA plot with mixOmics pca function: it decreases up to PC4, then it increases for PC5 and PC6 and decreases again.

I wonder how it is possible. The dataset is not omics, just clinical data (14 variables) and many NAs.

Hi @jgiemza,

It’s caused by a combination of 1) an abundance of missing values in your data and 2) the heuristic we use to calculate the explained variance, which in practice replaces the missing values in the centered matrix by 0 (the feature mean).

You can either only work with the first few PCs, or look into imputing the missing values using a method of your choice. We have impute.nipals function which does just that. Below is an example:

library(mixOmics)
data("nutrimouse")
X <- data.matrix(nutrimouse$lipid)
X <- scale(X, center = TRUE, scale = TRUE)
## add missing values to X to impute and compare to actual values
set.seed(42)
na.ind <- sample(seq_along(X), size = 10)
true.values <- X[na.ind]
X[na.ind] <- NA
X.impute <- impute.nipals(X, ncomp = 8)
## compare
rbind('actual' = round(true.values, 2),
      'imputed' = round(X.impute[na.ind], 2)
      )
#>          [,1]  [,2] [,3]  [,4]  [,5] [,6] [,7] [,8]  [,9] [,10]
#> actual  -0.73 -0.43 1.73 -0.80 -0.58 1.41 0.77 0.65 -1.39 -0.17
#> imputed -0.70 -0.20 1.20 -1.48 -0.32 1.27 1.04 0.96 -0.96  1.11