Difference between PLS-DA and sPLS-DA

Hello,

I am quite new to the mixOmics community (and PLS-DA analysis). But, I am very happy with the website and R-tutorials that you make available! Thanks!

I have a question concerning my analysis. I have a binary Y response variable and 22 predictor X variables (continuous) and around 400k rows. I want to perform a PLS-DA analysis with these variables. I used the R script from the tutorial and started immediately with sPLS-DA (splsda()), while not paying attention to the ‘sparse’ term in front. I included all X variables in the analysis.

So what I did: splsda(X,Y) ; I didn’t set other arguments.

But after a while I realised that I did not use the regular PLS-DA function (plsda()). Does it make a big difference when you include all variables (and don’t set keepX values or ncomp)?

In addition, it would be nice to know which variabels lead to the best performance/are the most important.

What do you suggest?

Many thanks!
Annelies

hi @Lizzie

If you run splsda(X,Y) with no arguments, it will run a PLS-DA with all variables.
If you do perform variable selection, then yes the classification performance might be different. You can use the perf() (the stability output in particular), selectVar() and various graphical outputs to assess the importance of the variables.

Kim-Anh

@kimanh.lecao

Dear Kim-Anh,

Thank you for your answer!

I only have 22 variables (in contradiction to the mixOmics examples, containing 2000+ variables). I do not know whether it is necessary to select certain variables, I think it would be fine to see the loadings of each component and use that to evaluate the variables.

In that case, can I perform a PLS-DA with 10 components, to start with and use that to identify the optimal number of components, based on the overall error rate per component?
So is there a difference between:

splsda(X,Y, ncomp=10) and plsda(X,Y, ncomp=10) ?

Is it the KeepX argument that defines the difference between PLS-DA and sPLS-DA?

I have a last question:
The idea is to work with three datasets: a trainingdataset, which PLS-DA loadings will be applied on the validation dataset to identify the optimal number of components (perf() function with 10 fold CV)) and this optimal number of components will be used in the test dataset as a final model, and which will be used to evaluate the performance (again perf() function with 10 fold CV).
My question is: how can I use the loadings of the components of the PLS-DA analysis from the trainingdataset for the data of the validation dataset? Is there a way in R I can do that?

Many thanks for your answer!

Kind regards,
Annelies

@Lizzie

In that case, can I perform a PLS-DA with 10 components, to start with and use that to identify the optimal number of components, based on the overall error rate per component?
So is there a difference between:

splsda (X,Y, ncomp=10) and plsda (X,Y, ncomp=10) ?

Is it the KeepX argument that defines the difference between PLS-DA and sPLS-DA?

Yes, I recommend you read our associated publication if you are still unclear about the difference between the two. If you run a splsda** (X,Y, ncomp=10) but you do not specify keepX, then it reverts to a classical PLS-DA.

I have a last question :

The idea is to work with three datasets: a trainingdataset, which PLS-DA loadings will be applied on the validation dataset to identify the optimal number of components (perf() function with 10 fold CV)) and this optimal number of components will be used in the test dataset as a final model, and which will be used to evaluate the performance (again perf() function with 10 fold CV).
My question is: how can I use the loadings of the components of the PLS-DA analysis from the trainingdataset for the data of the validation dataset? Is there a way in R I can do that?

Have a look at the example here on how to use the predict function. http://mixomics.org/mixdiablo/case-study-tcga/
Here you wont use block.splsda() but a normal plsda() or splsda() but the call to the predict function is the same. I might be able to provide further examples in the new year.

Kim-Anh