Hi,
I have just started using mixomics and I’m really enjoying exploring the package so far. I’m a PhD student working on a multi omics study with longitudinal samples collected at 12-week intervals from 15 patients during a clinical trial.
Currently, I’m concerned with proteomics data (250 proteins). I want to answer ‘Alterations in which proteins are associated with treatment response?’ After studying your website and forum, I have decided to use sPLS1 with multilevel adjustment for repeated samples from individual patients.
My samples
Patients = 15
Timepoints = 4
My sPLS variables
X = Proteomics data (counts of 250 proteins)
y = Total Improvement Score (a continuous compound clinical measurement used to assess response of patients to drug - 0 is minimum, 100 is maximum)
Multilevel = Patient ID
My question for the forum is, will looking at proteins with the greatest loading on sPLS component 1 tell me eg. ‘an increase/decrease in protein 1 is most important for an increase in y’? Or is there another way you’d suggest I answer this question?
Yes, proteins with the greatest loadings on the sPLS components are the ones which are most informative in distinguishing your samples based on improvement score. You can also try tuning with a range of smaller test.keepX variables (I see you settled on 250, but perhaps even less will work?) and these remaining variables after tuning will also be ones that are best at distinguishing the total improvement score outcome.
I’ve tried tuning with 100 proteins and found that optimal KeepX = 14 proteins with this range.
However, MESP for this model was 0.60 - compared to 0.50 with KeepX = 250 proteins.
Would you agree that this suggests i should stick with KeepX = 250?
I think perhaps something to think about here is what you would like your end goal to be.
If you would like a short list of proteins (say 10 or so) that you can dig deeper into, you can take the proteins that have the highest loadings in your sPLS1 model. You can visualise and extract these using plotLoadings() function.
If instead you would like to build a model that can accurately predict improvement score from your proteomics data, I would recommend running tuning between 1 and 250 proteins until you find the optimal number of proteins. You can run a broad range first (like every 20) and then a more fine-grained grid where you saw the lowest error rates next.