Hello,

I am quite new to the mixOmics community (and PLS-DA analysis). But, I am very happy with the website and R-tutorials that you make available! Thanks!

I have a question concerning my analysis. I have a binary Y response variable and 22 predictor X variables (continuous) and around 400k rows. I want to perform a PLS-DA analysis with these variables. I used the R script from the tutorial and started immediately with sPLS-DA (splsda()), while not paying attention to the ‘sparse’ term in front. I included all X variables in the analysis.

So what I did: splsda(X,Y) ; I didn’t set other arguments.

But after a while I realised that I did not use the regular PLS-DA function (plsda()). Does it make a big difference when you include all variables (and don’t set keepX values or ncomp)?

In addition, it would be nice to know which variabels lead to the best performance/are the most important.

What do you suggest?

Many thanks!

Annelies