Spls / keepx / keep specific variables

JeremyTournayre · July 8, 2022, 1:22pm

Hello,

Thanks to have created this tool.

I’m using the “spls” function to select variables but I want to keep specific variables in my model (ie. dont have them with a coefficient of 0). Is it possible to achieve that please?

Have a nice day,
Jérémy Tournayre

MaxBladen · July 27, 2022, 12:37am

I was working on an implementation for this exact feature some weeks ago. Unfortunately, the conclusion I came to was that the most mathematically valid way to achieve this is the simplest. Hence, here’s what I’d do:

Use tune.spls to determine the optimal number of features.
Take these values and pass them to spls to give you a list of the optimal features for your analysis.
Extract these features (by name or index) along with your desired features to keep from your input dataframe.
Run pls (the NON-sparse variant) using just this subset dataframe which includes your desired features and your selection of the optimal features selected by the previous model.

I hope this makes sense. If not, let me know and I can go into a bit more depth for you.

JeremyTournayre · August 30, 2022, 8:49am

Hello,

Thanks!
If I understand correctly the specific features will be added artificially on the optimal features. So, the model will not be “sparse”, no?

To be very optimal, I think if the specific features can be declared in the spls (so the sooner as possible) some of the optimal features can be removed because the specific features replace them.

Maybe I misunderstood something?

Have a nice day,
Jérémy Tournayre

MaxBladen · August 30, 2022, 10:18pm

If I understand correctly the specific features will be added artificially on the optimal features. So, the model will not be “sparse”, no?

In the context of mixOmics, “sparse” just refers to the methods which select a subset of features to use - rather than using all input features. By taking the features selected by the spls() in addition to your specific features, its still “sparse” as only a subset of input features are being utilised.

To be very optimal, I think if the specific features can be declared in the spls (so the sooner as possible) some of the optimal features can be removed because the specific features replace them.

I believed the same when attempting this implementation. However, this is entirely inappropriate. The short version is that by artificially setting aside some features, the mathematical consistency of the function is lost. The long version you can find by reading my responses here and here.

A potential ammendent I’d make to my previous comment in this thread: I stated in my last dot point to use the NON-sparse method on the selected and optimal features. Using the sparse method (eg spls()) might be better here to address your concern. By using the sparse method the specific features can “replace” the optimal features if this results in a better model.

Let me know if this all makes sense

Topic		Replies	Views
Unable to understand selectVar() output in sPLS-DA Bugs	4	1042	June 9, 2020
Model performance vs. colinearity between features	1	311	September 22, 2020
How to determine number of variables tio be used when we say 𝑛≪𝑝;	1	325	May 10, 2021
Using keep.X from separate sPLS-DA analyses for Diablo Analysis	3	965	October 8, 2020
Tuning spls keeps running into error Analysis	6	312	August 30, 2022

Spls / keepx / keep specific variables

Related topics