I’m using PLS-DA and PLS to analyze metabolomic data. To select the important variables I use a VIP higher than 1 and loadings different from zero. However, I need the standard error associated with the estimation of these loadings to know if the interval of loading values containing the zero. In this case, these variables are not important for my analysis. Where or How I can obtain this error for the estimation of the loadings?
The method calculates the loadings directly through an iterative optimisation algorithm so there is no error associated to the estimates. However, if it’s of any help, you can always get a cross-validated estimate of these values (loadings and importance) by randomly simply splitting the samples into multiple (>2) sub-groups and running the models. You can then calculate the summary statistics for the said estimates (loadings and/or importance).
For PLSDA methods, you must ensure your sample sub-groups represent all classes proportionately. You can use the function mixOmics:::stratified.subsampling for that.