N-integration from different sample groups

We have proteomic and metabolomic data from a transgenic mouse model, comparing wild-type to transgenics. I would like to perform N-integration, however the data was not collected with multi-omics in mind.

We have 5 WT 5 +/- for proteomics, and 10 WT 11+/- for metabolomics, all different animals, all hippocampal tissue. Transcriptome may be upcoming, again from different animals.

Ideally we would of course have liked to have a larger number of animals and to have taken samples from the same animals so drawing conclusions is going to be pretty tentative. However, I am wondering if the DIABLO model could put out anything useful if samples are not individually matched (garbage in, garbage out)?

If the latter is true, might it be better to look at a simpler, more manual pathway analysis from KEGG, for example? Or some other method, eg. 3omics, paintomics?

One additional question from this bioinformatics newbie: can PCA be used to pull out relevant proteins/metabolites? I have done PCA with some other experimental groups and if I see that eg. Principal Component 2 accounts for the split in genotype, would then the top proteins/metabolites be indicative of pathways involved? If so, could I weight them for pathway analysis? And if so, how?

All advice gratefully recieved.

hi @Peptoabysmal,

Unfortunately, as the name suggest, N-integration is on the same N samples. It is because we need to calculate the covariance between the different data sets (where the N dimension is common).

As you suggested, you will have to analyse each omics separately, and then do some interpretative integration from the results. I am not really familiar with the other methods you mention.

You can use PCA or sparse PCA to identify the variables driving most of the variance in the data. It remains an exploratory approach of course, so there is no super clear criterion on how many variables you should look at. Remember to center and scale your data (in the PCA arguments), unless you are interested in pulling variables with the largest variance across your samples.

You could have a look at this (rather old) method called Pathifier to do the weighting directly. We had some good results in a breast cancer data set for a pathway analysis: Personalised pathway analysis reveals association between DNA repair pathway dysregulation and chromosomal instability in sporadic breast cancer - ScienceDirect.
Also look at the concepts of ‘Eigen genes’, it’s based on the same PCA principles.


Thank you so much Kim-Anh, that answers my question regarding N-integration. I will check out the Pathifier package and look at Eigen genes.
Sparse PCA sounds useful, I may try this with my current script in R using nsprcomp or in mixOmics.