LOO CV when running a sPCA

Hi,

I have a proteomics data set with a small number of samples (10 samples x 13872 proteins), and was trying to run an sPCA on it. Because the number of samples is too low, I tried to run a LOO-CV, so I set up the following code:

# define ncomp
ncomp = 2

# set a list of keepX values
keepX <- c(rep(5,100,5),
           rep(100,500,10))

# tune sPCA with parameters
set.seed(1001)
tune.spca.sol <- tune.spca(X = soluble.imput,
                           ncomp = ncomp,
                           test.keepX = keepX,
                           folds = nrow(soluble.imput),
                           nrepeat = 100)

However, I get the following message:

Error in tune.spca(X = soluble.imput, ncomp = ncomp, test.keepX = keepX,  : 
  'folds' must be an integer > 2 and <= floor(nrow(X)/3)=3

If I run with folds = 3, it seems to run fine… Any advice on what I should do in this case?
Thanks :slightly_smiling_face:

hi @MychelMorais,

I think you are testing the limit of the tune.spca function with 10 samples. Given that PCA and sPCA are primarily exploratory tools, I am not sure you need to do any tuning. What you want to show is that both PCA and sPCA highlight meaningful sources of variation in the data, and if you use sPCA, that you can have some hints about which (top) variables are driving this variation (use selectVar and plotLoadings). This will give you some hints about how you can formulate new hypotheses about your data.

Kim-Anh

1 Like