PLS-DA predictions over 100 splits of the data

kalemdjr · June 3, 2022, 5:08pm

Hi everyone.

I am using PLS-DA to predict (2 categories). I have used the perf function on the original dataset to determine the number of components for the final model (ncomp=2). Now, for the prediction, I have split the data into training and testing sets (80% and 20%). I want to repeat this process 100 times and compute the average AUC at the end. My question is ;
1-/ Should I use the ncomp=2 for each split?
2-/ Or should I determine the number of components (using the perf function) for each of the 100 training data? If this is the case, how can I choose these numbers? since with 100 splits, I don’t have the chance to visualize the perf plot?

Thank for your advice.

MaxBladen · June 9, 2022, 12:55am

I would certainly say option (1) is the preferable option. Having each model use the same ncomp means they are comparable and an average AUC is reflective of this model. Using option (2) is possible (explore the choice.ncomp component of perf() output) but would really only be used in a context of evaluating how good perf() is at selecting the optimal ncomp.

kalemdjr · June 9, 2022, 1:12am

Thanks, Max for the quick answer. I was thinking of option (1) and needed a point of view of the mixOmics support team,

Topic		Replies	Views
sPLS-DA prediction problem Analysis	4	860	August 11, 2020
ROC analysis on a PLS-DA model built on only training data Analysis	10	1918	April 18, 2024
Help deciding the number of components in PLS-DA Analysis	3	409	June 27, 2024
Optimal components from perf() and tune.splsda() functions are not optimal?	2	347	January 12, 2023
Perf() and tune() producing different optimal component counts Analysis	7	1217	May 26, 2022

PLS-DA predictions over 100 splits of the data

Related topics