Variable importance in sPLS-DA

I’m utilizing sparse SPLS-DA to try to calssify my samples into 2 groups. The error rate is lowest with 1 single component. I have 800 variables, and the tuning results from tune.splsda advises me to utlize 790 of them for the component. My main interest is to decipher which variables are meaningful to classify groups. Is there a good way to decide a cutoff point to choose the variables?

Hey sophia what is your sample size? Because it sounds like you might be running into a really flat grid search. Perhaps that means you need to try tuning criteria two from “A novel approach for biomarker selection and the integration of repeated measures experiments from two assays”.

Hi my sample size is 100. Also the group separation is not very good, my error rate is around 40% even after utilizing sPLS-DA. Will take a look at the paper you mentioned!

The tuning criteria two in that paper is for very small sample sizes when cross validation is not practical so I’m not sure useful it would be.

Maybe an alternative method will provide better results, such as elastic net.

Hi @sophia, what does your list.keepX look like? (It seems exaggerated to test 800 variables). Did you use Mfold or leave-one-out cross-validation? If Mfold was used, what is the nrepeat and how many folds did you use?

It would be a little easier to help you, if you would share the perf and tune outputs and the code that you have used.

  • Christopher