Multilevel approach possible?

Dear mixOmics community and team,

I would have a general question about when to use the multilevel approach. I am involved in the analysis in two human studies and try to figure out for which of the two the multilevel analysis would make sense.

Study 1:
Intervention study with baseline and endline (6 months) metabolomics and metagenomics data. The interventions are three diets (parallel, not randomized) and the outcomes are different neurodevelopmental measures (count or continuous variables). The aim is to find metabolites or microbes that are associated with better neurodevelopmental scores and diet. In recent literature, the analysis was done primarily univariate and I would like to approach this more holistically. I was thinking of doing a multi-block PLS-DA with the three diets as the outcome to find microbes or metabolites that discriminate the three dietary groups. Subsequently, I would use those ‘markers’ to associate with neurodevelopmental scores. I would do this only at the endline, but I wondered if a multilevel PLS-DA would work in this scenario to incorporate the baseline information?

Study 2: This study is more exploratory.
Longitudinal study with 18 participants with each 2-5 time points over 1 year (metabolomic and metagenomic data). The problem with this study is that the sampling time is very different for each participant and within each participant (two visits could be 90 days apart but also 30 or 120 days). Generally, it was tried to sample every 3 months, but this was not strictly followed.
The outcome is a single continuous variable. Alternatively, I group the participants into one of two groups based on this continuous variable (e.g., control vs. disease group). The aim is to find microbial or metabolic biomarkers that are associated with the outcome variable (correlation) or the outcome group (discrimination).

I was wondering if multi-block PLS-DA with multilevel would work in this scenario to identify metabolites or taxa that discriminate my two groups or if there are better approaches for this. I saw the timeOmics package, but I was unsure if this is a better approach since it would only show me microbes and metabolites that behave the same trend over time.

Thank you very much in advance and best regards,
Niklas

Hi @Niklas,

Study 1: yes a multilevel decomposition would be appropriate here. As the argument is not set in block.splsda, you need to use the withinVariation() function beforehand. Have a look at our website on how to process count data for the metagenomics. One thing to keep in mind in this process is that the multilevel decomposition acts as some sort of normalisation, so for your downstream analysis, you may want to stick to those data (there are many existing posts on ‘multilevel’ in this forum too!)

Study 2: the multilevel decomposition is not well suited for multiple time points.
timeOmics/lmms might be more appropriate because you can use linear mixed model splines to interpolate time points that are missing before you input this into block.splsda. You can have a look at the timeOmics vignette. The lmms has not been maintained so there are a few hiccups (or you need to use an earlier version of R, using the RStudio cloud is best to do this). You will be able to test for differences across time and groups too, see this paper: A Linear Mixed Model Spline Framework for Analysing Time Course ‘Omics’ Data
Also have a look at our latest review, because, as you say you can analyse longitudinal omics data in very different ways, depending on your question!
Statistical challenges in longitudinal microbiome data analysis | Briefings in Bioinformatics | Oxford Academic

My feeling is that you will not be able to analyse the data as you wish (i.e take time into account, in order to discriminate an outcome), so you may have to analyse a specific time point at a time (based on lmms interpolated data if you are missing time points), or look a how different patterns across time explain differences in outcome. Happy to discuss further on that point as we are currently developing new methods in that space.

Kim-Anh