Need help in reviewing data analysis

Hi mixOmics team,

I have run a multivariate analysis to compare metabolomics between authentic and adulterated rice samples using mixOmics package version 6.15.0 in R program version 4.0.4. The analysis was completed without any issues, but I am not really sure about the analysis pipeline and there is no one in my team to help me check the results. Does mixOmics team provide help in such situations?

Best regards,
Hoa

Hi @hoanq8x,
Please feel free to ask all the mixOmics-related questions you have. If you want feedback on your pipeline, you can just post your script, outputs, etc. here, and we will do our best to help you.

  • Christopher

Dear Christopher,

Thank you for your kind help. Here is my script and outputs:

[Script]
(7.9 KB file on MEGA)

[Outputs]
(536.9 KB file on MEGA)

I tried to upload the files directed to this post but unsuccessfully. Could you help me check my pipeline and relevant results? Thank you again!

Hi @hoanq8x,

Everything seems to be done correctly. I have some minor comments:

  1. You can set a cutoff in the biplot or do a sparse PCA model to avoid so many overlapping arrows.
  2. The final sPLS-DA model has 3 components, but you don’t show whats happening on component 3.
  3. What does most “significant” metabolites mean? Did you filter away metabolites based on what is significant in the volcano plot, or did you set a threshold for vip score? You can also use the cim function to create a heatmap with clustering for the selected variables on comp 1, comp 2 and comp 3 each, and a heatmap for all the selected variables combined.
  • Christopher

Hi Christopher,

Thanks for your quick response!

  1. You can set a cutoff in the biplot or do a sparse PCA model to avoid so many overlapping arrows.

=> Good suggestion!
2. The final sPLS-DA model has 3 components, but you don’t show whats happening on component 3.
=> Yeap, that’s true. The first 2 components show quite clear separation among groups, so that I did not plot component 3 against the others. I should have such plots anyway.

  1. What does most “significant” metabolites mean? Did you filter away metabolites based on what is significant in the volcano plot, or did you set a threshold for vip score? You can also use the cim function to create a heatmap with clustering for the selected variables on comp 1, comp 2 and comp 3 each, and a heatmap for all the selected variables combined.
    => I will try the heat map again using cim. I have used it for my RNAseq data. The most “significant” metabolites refer to those which could actually differentiate among the groups, and also the targeted metabolites we want to filter. We set the VIP score greater than 1 and AUC greater than 0.7. After filtering, we did some further logistic regression analysis to confirm their effects on group differentiation again. What do you think about this?

Best,
Hoa

Hi @hoanq8x,

I am not sure about this one. @aljabadi maybe you can answer this? :smiley:

  • Christopher

Hi @hoanq8x,

Christopher provided some great feedback on your pipeline so I’ll just encourage you to check out our updated vignette which expands on all the available methods and their functionalities mixOmics vignette.

On a side note, I noticed that the first 2 components are unable to show separation between two of the groups. I recommend you do look into the 3rd component either by another 2D plot or using plotIndiv(style='3d').

I think “signature metabolites” or “selected metabolites” would be a better term to describe the selected features.

Hope it helps.

Al

1 Like

Hi @aljabadi,

Christopher provided some great feedback on your pipeline so I’ll just encourage you to check out our updated vignette which expands on all the available methods and their functionalities mixOmics vignette.

I will check the updated vignette once again.

I think “signature metabolites” or “selected metabolites” would be a better term to describe the selected features.

Thanks for your suggestion!

Also, can you help me answer my question regarding further statistical analysis on detected signature metabolites?

The most “significant” metabolites refer to those which could actually differentiate among the groups, and also the targeted metabolites we want to filter. We set the VIP score greater than 1 and AUC greater than 0.7. After filtering, we did some further logistic regression analysis to confirm their effects on group differentiation again.

Thank you!

Best regards,
Hoa

Hi @hoanq8x,

We don’t recommend using AUC criteria for model performance on its own. You might want to cross-validate the model using the perf function and look into the error rates first. AUC can be a complementary measure.

Also, you can certainly compare the multivariate analysis outcomes with those from logistic regression and investigate any differences/agreements. However, it’s outside the scope of what we can advise you in detail.

Hope it helps,

Al