Proportion explained variance in PLS vs sPLS model

Hi Eva,

This was very helpful, thank you very much! I spent some more time reading around the topic and your

I have a few follow-up questions if that’s OK, please let me know if I should create a new post for one or more of them!

What is the right thing to do when the initial step e.g. in my case tune_pls1_plasma ← mixOmics::tune(df_plasma, df_plasma_traits[, “age”]) suggests 1 component, but the next step to select metabolite suggests 2 components? Should I keep 1 or 2 in this case?

In terms of using perf() to assess model performance, how do I interpret MSEP/RMSEP/R2/Q2? I understand that lower MSEP/RMSEP indicate a better predictive accuracy, a higher R2 suggests a better fit to the training data and a higher Q2 suggests a better predictive ability. However, what does getting an MSEP of 1.04 ± 0.23 mean? Also, as I have so few samples, my Q2 starts close to 0 or even negative, and adding a 2nd component as suggested by the tuning process always makes it smaller/more negative. Does this mean I am overfitting my data?

Finally, I have a more general question so I can understand which parts of the dataset to apply these techniques. I have paired data, so samples collected before and after an intervention, and for the subjects I have their sex and age.
Should I perform ALL of the following?

  1. PLS for age and PLSDA for sex in the baseline data
  2. PLS for age and PLSDA for sex in the post data
  3. PLS for age, PLSDA for sex and PLSDA for the intervention effect in the full data
    I was thinking that if I had a simpler study design that wasn’t metabolomics, for example if I had measured weight before and after a marathon race, I’d analyse if there were age or sex differences before the race and after the race, as well as analyse the weight change that occurred as a result of the race.

Thank you in advance for your help! :slight_smile:

Best wishes,
Evelyn