Possible bug in block.splsda + various questions

pvgeende · November 25, 2021, 12:56pm

Dear mixOmics community

The latest month I’v been working on data integration of several metabolomics/lipidomics datasets with microbiome (16S) data. Lately I encoutered this strange behaviour of the block.splsda function. When I create a model using this list.keepX object:
$Pol_Met_Feces
[1] 17 6

$Pol_Met_Urine
[1] 14 5

$Lip_Met_Feces
[1] 20 12

$Otu
[1] 6 5

The list indicates my otu (microbiome) dataset should have 6 & 5 variables for comp 1 & 2 respectivly in the model. However, when looking at the resulting model we can see that there are 51 & 5 variables for the otu dataset! In another try, with the same input, it resulted in 52 & 6 variables!

Any idea on why this might be the case, or what is happening here?

While I’m at it, I have some other (not as important) questions:

For all my datasets, both in multi omic & single omic analyses, the optimal number of components is always = 1. Is this erratic behaviour, or should this be no problem? For visualization purposes I always construct models with ncomp = 2. Also the error rate of my models stays high, even after tuning, so I guess my data is not suited for prediction, but can be used for pathway analysis / search for biologically relevant correlations?
Is there a way to disregard “within block correlations” when plotting the circosplot or networks?

Many thanks in advance if someone takes the time to read and answer this!

Kind regards
Pablo
PhD Student @ Laboratory for Chemical Analysis, Faculty of Veterinary Medicine, Ghent University

kimanh.lecao · December 2, 2021, 11:49pm

Hi @pvgeende,

@aljabadi may want to ask you for more details (and data), in case this is related to a bug in the function. My intuition is that your OTU data are highly collinear and so you have ~ 51 variables considered as important (and potentially exactly the same values) on the first component. Have a look back at the data, the variables selected and let us know.

1 - It means that the discrimination only happens on the first component (as you will visualise on the plot), and after that you are only adding noise. The performance results indicate that yes, it is difficult to separate the groups. It might be better not to tune, and instead choose a reasonable number of variables ad-hoc that will allow you for exploration and interpretation using pathway analysis etc. For visualisation, you can still use 2 components but focus your interpretation on the variables selected on component 1.

2 - No, but you can extract the similarity matrix from circosPlot (see post here) and use cytoscape for customised plots.

Kim-Anh

pvgeende · December 6, 2021, 7:53am

Dear Kim Ahn

Thanks a lot for taking the time to respond to my questions.
I’ll take time to investigate my data / selected variables again and will post more info here later if needed.

Kind regards
Pablo

Topic		Replies	Views
Tune.block.splsda error Bugs	8	2071	April 17, 2020
The number of variables selected in a sPLS-DA should be similar? Analysis	5	307	September 20, 2022
Tune.block.splsda() allowing 0 and 1 Bugs	1	237	November 10, 2022
Perf block.plsda error Analysis	5	318	January 10, 2023
N-integration with 10 datasets Support	3	484	September 25, 2020

Possible bug in block.splsda + various questions

Related topics