N-integration across multiple cohorts


First of all, thank you so much for your hard work on the mixOmics package and statistical framework. I am interested in doing N-integration using 4+ datatypes for multiple cohorts. The same N-integration features will be applied across cohorts, but the cohorts are aged differently across the lifespan (children vs. young adults vs. older adults). I was interested to see if similar features could predict individual across cohorts. Do you have a framework you suggest for this or another approach?

Thanks in advance!

Sorry just want to clarify a few things:

  • Firstly, do you have access to data on each cohort for all types - ie 12 datasets (4 datatypes x 3 cohorts)?
  • Within one datatype, do you have measurements across all the same features - ie, if sequencing, was each cohort sequenced over the same regions/loci?
  • Was this a longitudinal study - ie, does each cohort have the same set of samples at different times? Or are the individuals measured in each cohort different?

The reason I ask is you state using an N-integrative framework. However, I’m not sure the N-integrative methodology will apply here. Once I have a bit more info about your study design and such I can help guide you in the right direction.


Hi Max,

Sorry let me clarify a bit more.

  1. I do have access to data for each cohort for all types. So I have 4 datatypes for the 3 cohorts.

  2. Within each datatype, all the features are the same for all the cohorts. They are all the same brain imaging features for each cohort.

  3. This is not a longitudinal study. The individuals in each cohort are different.

Hope this helps.


While it may not be the absolute best method to analyse your data, I believe the NP-Integration framework may apply here. The relevant function(s) to explore is mint.block.plsda() and mint.block.splsda(), use the latter if you want to reduce the number of features in your model.

You could essentially treat each cohort as the “independent studies” referenced in the documentation and each datatype as the “blocks”. Hence, pass in the four dataframes (ensuring same sample name and order) as the X paramter and your cohort vector as the study parameter.

Alternatively (or additionally), you could run block.splsda() on each cohort individually and look for homogenity across the results, though I’d only say this is necessary if the mint.block.splsda() method fails to produce anything worthwhile.

Best of luck and feel free to reach out if you have any more questions.