Hi Eva,
This was very helpful, thank you very much! I spent some more time reading around the topic and your
I have a few follow-up questions if that’s OK, please let me know if I should create a new post for one or more of them!
What is the right thing to do when the initial step e.g. in my case tune_pls1_plasma ← mixOmics::tune(df_plasma, df_plasma_traits[, “age”]) suggests 1 component, but the next step to select metabolite suggests 2 components? Should I keep 1 or 2 in this case?
In terms of using perf() to assess model performance, how do I interpret MSEP/RMSEP/R2/Q2? I understand that lower MSEP/RMSEP indicate a better predictive accuracy, a higher R2 suggests a better fit to the training data and a higher Q2 suggests a better predictive ability. However, what does getting an MSEP of 1.04 ± 0.23 mean? Also, as I have so few samples, my Q2 starts close to 0 or even negative, and adding a 2nd component as suggested by the tuning process always makes it smaller/more negative. Does this mean I am overfitting my data?
Finally, I have a more general question so I can understand which parts of the dataset to apply these techniques. I have paired data, so samples collected before and after an intervention, and for the subjects I have their sex and age.
Should I perform ALL of the following?
- PLS for age and PLSDA for sex in the baseline data
- PLS for age and PLSDA for sex in the post data
- PLS for age, PLSDA for sex and PLSDA for the intervention effect in the full data
I was thinking that if I had a simpler study design that wasn’t metabolomics, for example if I had measured weight before and after a marathon race, I’d analyse if there were age or sex differences before the race and after the race, as well as analyse the weight change that occurred as a result of the race.
Thank you in advance for your help!
Best wishes,
Evelyn