sPLS tuning always selecting minimum in range of test values

I have a dataset of around 3000 preprocessed features and 164 observations (Y) to relate to a dataset of 17 exposures for the same 164 people (X), and as with the PLS2 documentation I am trying to run a sPLS model to answer the question “if I consider the features as response data, can I model the features given the predictor variables of the exposure data?”. However, whenever I range of test values for X and Y the smallest value is always specified as the choice.keep no matter what I put in. Has anyone had similar issues? Is there anything else that I can try?

hi @bolomics,

First, I would probably consider:
Y = exposure
X = your ‘pre-processed features’
The question being: what are the features that in combination can explain specific (or all) exposures. (you can then try Y = a specific exposure type, or selected them in combination with sPLS2).

First, I’d start with PCA on the exposure and look at plotVar correlation circle plot to understand the correlations and associations between exposure type.

Then
The tuning of PLS in that context is very tricky, so I would adopt a more lenient approach, and select, say, 50 features per components and about 5 exposure type (depending on what you find above). Again I’d look at the plotVar plot to see if that makes sense. Then I’d use the perf function to evaluate how well this model is doing.

Kim-Anh