I just have a quick question on understanding exactly what $choice.keepX and $choice.keepY mean. After running perf and then tune.spls (ncomp=10 and then a list of keepX variables), when I then run tune.spls$choice.keepX it returns all the components and how many features for each one.

I thought this function is supposed to return the optimal number of components to be used?

Or is it just telling me the optimal number of features to use with each comp?

For example this would be my output:

```
> tune.spls.cor$choice.kncomp
> tune.spls.cor$choice.keepX
comp1 comp2 comp3 comp4 comp5 comp6 comp7 comp8 comp9 comp10
100 25 50 500 25 25 25 25 500 300
```

So, how do I know which comp to actually use?

Here is the figure that is produced:

Am I using the comp with the highest correlation value? I cannot find a clear description on what this figure is supposed to show me. From the figure I would think to use comp 1 with any of the keepX values because they’re all about cor=1?

I thought this function is supposed to return the optimal number of components to be used?

Or is it just telling me the optimal number of features to use with each comp?

So the `tune()`

actually does both of these things! As you identified, the `$choice.keepX`

object tells you the optimal number of features to use to construct each given component. The `$choice.ncomp`

object (which you’ve typed as `choice.kncomp`

- assuming just a typo) will tell you the optimal number of components to use for your model. You can also use the `perf()`

function to determine the optimal number of components - but its usually easier to just use `tune()`

.

Based on your figure, it seems that a single component is optimal. This is due to the maximisation of the correlation on the first component. I would advise against using visual inspection to determine this though - make sure you use `choice.ncomp`

. A t.test is used to determine the optimal component count so sometimes the figure can be misleading (but this is not the case with your example).

Am I using the comp with the highest correlation value? I cannot find a clear description on what this figure is supposed to show me. From the figure I would think to use comp 1 with any of the keepX values because they’re all about cor=1?

You’ve mostly got the right idea! When building models, we are engaging in a constant balancing act. In most scenarios, adding more components which each use more features will increase model accuracy. However, model simplicity (fewer components and feature) is optimal. So we’re trying to maximise model accuracy (measured here by correlation) while attempting to use the minimal number of features/components.

For example, on your first component, 100 features is selected even though using 300, 500, or 1000 increases the correlation value. This is because the aforementioned t.test determined that the addition of features beyond 100 improves accuracy negligibly while vastly increasing complexity. This is not optimal. Hence, 100 features strikes that balance between accuracy and complexity.

I hope this all helps with your understanding a little bit. Please reach out if not.

Thank you so much your explanations are extremely helpful. I understand this a lot better now.