N-integration across multiple cohorts

ravibot · May 13, 2022, 10:44pm

Hello,

First of all, thank you so much for your hard work on the mixOmics package and statistical framework. I am interested in doing N-integration using 4+ datatypes for multiple cohorts. The same N-integration features will be applied across cohorts, but the cohorts are aged differently across the lifespan (children vs. young adults vs. older adults). I was interested to see if similar features could predict individual across cohorts. Do you have a framework you suggest for this or another approach?

Thanks in advance!
Ravi

MaxBladen · May 15, 2022, 10:20pm

Sorry just want to clarify a few things:

Firstly, do you have access to data on each cohort for all types - ie 12 datasets (4 datatypes x 3 cohorts)?
Within one datatype, do you have measurements across all the same features - ie, if sequencing, was each cohort sequenced over the same regions/loci?
Was this a longitudinal study - ie, does each cohort have the same set of samples at different times? Or are the individuals measured in each cohort different?

The reason I ask is you state using an N-integrative framework. However, I’m not sure the N-integrative methodology will apply here. Once I have a bit more info about your study design and such I can help guide you in the right direction.

Cheers,
Max.

ravibot · May 18, 2022, 4:18pm

Hi Max,

Sorry let me clarify a bit more.

I do have access to data for each cohort for all types. So I have 4 datatypes for the 3 cohorts.
Within each datatype, all the features are the same for all the cohorts. They are all the same brain imaging features for each cohort.
This is not a longitudinal study. The individuals in each cohort are different.

Hope this helps.

Best,
Ravi

MaxBladen · May 25, 2022, 12:17am

While it may not be the absolute best method to analyse your data, I believe the NP-Integration framework may apply here. The relevant function(s) to explore is mint.block.plsda() and mint.block.splsda(), use the latter if you want to reduce the number of features in your model.

You could essentially treat each cohort as the “independent studies” referenced in the documentation and each datatype as the “blocks”. Hence, pass in the four dataframes (ensuring same sample name and order) as the X paramter and your cohort vector as the study parameter.

Alternatively (or additionally), you could run block.splsda() on each cohort individually and look for homogenity across the results, though I’d only say this is necessary if the mint.block.splsda() method fails to produce anything worthwhile.

Best of luck and feel free to reach out if you have any more questions.

Cheers,
Max.

Topic		Replies	Views
N-integration from different sample groups	2	228	May 30, 2023
Multi-omics longitudinal analysis Analysis	4	316	March 3, 2023
N-integration with smaller datasets (few predictors) Support	3	544	July 4, 2019
Study Design Viability Inquiry	3	20	June 6, 2025
Appropriate mixOmics methodology for multi-omics integration with time course measurements Analysis	3	283	March 2, 2023

N-integration across multiple cohorts

Related topics