I have a quick question about designing X. Does it matter in which order I have my datasets? Should I have the most influential first? When having transcriptomics, proteomics and metabolomics it makes sense to start with RNA, the proteins and than metabolites. But what it the order or influence is unknown? E.g. metagenomics and metabolomics in a dietary intervention where it is not known whether the change in metabolites impacts the micrbiota or vice versa (most likely the interaction is bidirectional). Does it matter in that case if the taxa or the metabolites come first in X?
Thank you very much in advance!
The methods are design to select the most influential variables in each data set. Each data set is treated equally and the input order does not matter, but for a multi block analysis (DIABLO), you can also change the design matrix to put more weight onto a particular pair of data sets to maximise their correlation. It will be a bidirectional relationship in this case, with all data sets explaining the outcome.
I suggest you start first with a 2-dataset integration with PLS, where you can choose uni or bidirectional relationship to first investigate the data.
Have a look at our website and bookdown doc for examples, and also in this forum there has been a few questions about the design matrix.