PLS-DA for continuous outcomes

Hello Team,

I am currently using PLS-DA to identify metabolites associated with diet exposures. The outcome is a particular food consumption (continuous variable), and I would like to identify top metabolites that are associated with this food. I have looked into using elasticnet regression, however, due to multicollinearity, it flips the sign for some metabolites that are highly correlated and I would like to like to identify the best metabolites associated with the outcome so if two metabolites are both correlated with each other and with the outcome then I want to identify them and both should have a positive slope value. I understand that I can change the penalties in elasticnet to reduce the impact of collinearity but then the model does not give me the top metabolites correlated with the outcome. I would really appreciate if you could share your thought on this and if you had any advice on what is best analytical approach to answer this question.

Thank you!

Hello,

I would use PLS (metabolites, y = food consumption). It’s called a PLS1 model as you only have one outcome variable, continuous.

I would use spls(metabolites, y = food consumption, keepX = 5), i.e specify arbitrarily how many metabolites you are hoping to extract. We have a few tuning criterion but they are a bit fidly. They are featured on our website and vignette.

Note that all the methods you’ve used (elastic net) are multivariate, meaning that in combination, these metabolites may yield to a positive slope value. If you are looking at one metabolite at a time, then you need to use a univariate approach.

Kim-Anh