Perf() and tune() producing different optimal component counts

MaxBladen · May 15, 2022, 10:08pm

The choice.ncomp parameter for both tune() and perf() functions utilises a t-test to determine if the addition of another component reduces the BER to a statistically significant degree. However, these functions are not using the same models at each step. You provided a standard plsda to perf(), meaning all features were used for each component. In comparison, you provided the object keepXlist to tune.splsda(), meaning that the number of features to be used for each component is restricted when compared to perf().

It’s likely that in this case a difference in component composition between the models created by perf() and tune() (due to different sets of features being considered) is causing the mismatch between outputs. For instance, the decrease in BER between 1 and 2 components seen via perf() is reflective of a model using all features and is hence statistically significant. By using a subset of these features via the tune() function, the statistical significance of the BER change between 1 and 2 components is reduced. Hence, tune() suggests leaving just 1 component whereas perf(), with access to all features, suggests using 6 components.

As an added note, one may take from this that perf() should be used in preference as it takes in “the most information” to make its decision. This is note the case. There are many downsides to using all (in your case, 300 000) features, such as bloated runtime, potential overfitting and a lack of interpretability. The general pipeline I always suggest is:

Use the perf() on an arbitrarily high number of components
The output of this will indicate your maximum ncomp to worry about
Determine roughly how much you want to simplify your model (ie. how many features to consider per component)
Run the tune() function iteratively with the ncomp from perf(). From this, determine the exact number of features and components to yield the optimal model. I’d suggest following my advice on this post for determine the test.keepX values
Use your final, optimised model

Hope this was of some help.

Max

Topic		Replies	Views
Optimal components from perf() and tune.splsda() functions are not optimal?	2	357	January 12, 2023
sPLS-DA prediction problem Analysis	4	864	August 11, 2020
Transcriptomic signature with sPLS-DA Analysis	7	1470	October 3, 2019
error while trying to choose the optimum number of components Support	1	348	July 27, 2023
sPLS choice of optimal number of components Analysis	4	1188	July 29, 2021

Perf() and tune() producing different optimal component counts

Related topics