Order of list X

stepra · November 10, 2020, 11:27am

Hi,
I have a quick question about designing X. Does it matter in which order I have my datasets? Should I have the most influential first? When having transcriptomics, proteomics and metabolomics it makes sense to start with RNA, the proteins and than metabolites. But what it the order or influence is unknown? E.g. metagenomics and metabolomics in a dietary intervention where it is not known whether the change in metabolites impacts the micrbiota or vice versa (most likely the interaction is bidirectional). Does it matter in that case if the taxa or the metabolites come first in X?
Thank you very much in advance!
/Stefanie

kimanh.lecao · November 10, 2020, 9:32pm

Dear @stepra,

The methods are design to select the most influential variables in each data set. Each data set is treated equally and the input order does not matter, but for a multi block analysis (DIABLO), you can also change the design matrix to put more weight onto a particular pair of data sets to maximise their correlation. It will be a bidirectional relationship in this case, with all data sets explaining the outcome.
I suggest you start first with a 2-dataset integration with PLS, where you can choose uni or bidirectional relationship to first investigate the data.

Have a look at our website and bookdown doc for examples, and also in this forum there has been a few questions about the design matrix.

Kim-Anh

annaol · June 26, 2025, 7:45pm

Dear @kimanh.lecao ,

I was going through old topics to check if anyone had the same question I am having now, which brought me to this topic.

Does your answer holds true for multple datases using block.spls in the canonical mode?

Thanks,
Ana

kimanh.lecao · July 18, 2025, 12:14am

hi @annaol,

I would say so, although I would need to dig deeper into the code of what a canonical mode means in this case, since it is a block.spls that still aways for a Y variable. But the order of your X datasets should not matter. You can test this easily by running a model with 2 components, and compare with another version where the X datasets have been shuffled and compare the 2 components of each model. Let me know if you see any major differences!

Kim-Anh

Topic		Replies	Views
Choosing Diablo Design Matrix Analysis	9	2798	April 18, 2024
PLS - choose X and Y dataset Support	3	317	July 21, 2023
Design matrix between omics datasets? Analysis	7	1837	May 18, 2020
Using keep.X from separate sPLS-DA analyses for Diablo Analysis	3	1028	October 8, 2020
The number of variables selected in a sPLS-DA should be similar? Analysis	5	415	September 20, 2022

Order of list X

Related topics