Perf() and tune() producing different optimal component counts

The choice.ncomp parameter for both tune() and perf() functions utilises a t-test to determine if the addition of another component reduces the BER to a statistically significant degree. However, these functions are not using the same models at each step. You provided a standard plsda to perf(), meaning all features were used for each component. In comparison, you provided the object keepXlist to tune.splsda(), meaning that the number of features to be used for each component is restricted when compared to perf().

It’s likely that in this case a difference in component composition between the models created by perf() and tune() (due to different sets of features being considered) is causing the mismatch between outputs. For instance, the decrease in BER between 1 and 2 components seen via perf() is reflective of a model using all features and is hence statistically significant. By using a subset of these features via the tune() function, the statistical significance of the BER change between 1 and 2 components is reduced. Hence, tune() suggests leaving just 1 component whereas perf(), with access to all features, suggests using 6 components.

As an added note, one may take from this that perf() should be used in preference as it takes in “the most information” to make its decision. This is note the case. There are many downsides to using all (in your case, 300 000) features, such as bloated runtime, potential overfitting and a lack of interpretability. The general pipeline I always suggest is:

  • Use the perf() on an arbitrarily high number of components
  • The output of this will indicate your maximum ncomp to worry about
  • Determine roughly how much you want to simplify your model (ie. how many features to consider per component)
  • Run the tune() function iteratively with the ncomp from perf(). From this, determine the exact number of features and components to yield the optimal model. I’d suggest following my advice on this post for determine the test.keepX values
  • Use your final, optimised model

Hope this was of some help.

Max