How to determine number of variables tio be used when we say 𝑛≪𝑝;

we know that PLSr is good modeling method for 𝑛≪𝑝; but how to determine the best/right number of variables; how do we know the number of variables that we are using is right or is affecting the model if very large say 20k; Is there a reference for knowing the statistical power or knowing what should be the number of p that we could say focus on with feature selection method to derive subset of features to be used from big data say transcriptomiocs or metabolomics with variables in thousands (upto20k) ;I will appreciate if somebody could point me to the direction where I can use properly and cite such reference for my analysis.

Hi @amnah,

The number of variables to select can be tuned using the tune.spls function. It used cross-validation to find the best set of features to keep in the model. Please refer to ?tune.spls for more details.

Hope it helps,

Al