Number of variables per component in tuning vs checking stability

Dear Mixomics Team,

I am very new to the field of metabolomics and to using your package and am keen to explore the sPLS-DA for my data. I have been adapting the model script for the SRBCT data to my own and I am at the stage of checking the stability of variables and have a question regarding the number of variables.

  • From my understanding, in the tutorial we check and select the optimal number of variables using $ tune.splsda.srbct$choice.keepX . This gives us a table of variables for each component for the optimal number of components. Is this correct?

If so, then moving forward, when I check the numbers in stability of variables (perf.splsda.srbct$features$stable), the number of features/varaibles per component should either be less than or equal to the numbers in tune.splsda.srbct$choice.keepX. However, this is not the case. I would be very grateful for clarificarion on this point as I seem to be missing some points here.

hi @Meghsw,

  • From my understanding, in the tutorial we check and select the optimal number of variables using $ tune.splsda.srbct$choice.keepX . This gives us a table of variables for each component for the optimal number of components. Is this correct?

Yes, this will output the optimal number of variables to select per component

Then, using the perf() function, you need to remember that your sPLS-DA is run on a given keepX (that y you have chosen above) on a subset of samples (as you are using cross-validation). Therefore there is not a perfect overlap of variable selection between different CV runs, as the samples change and this create some variability (and, some instability, which is what we want to assess here). This is why the stability often outputs a larger number of variables than keepX.

Kim-Anh

Ah, I see. Thank you very much for the clarification!

Meghna