Observing sample varialbe in Comp 1 and Comp2


Thanks again for the wonderful mixOmics package and DIABLO. In another project where I am integrating mRNA (~3000), protein (~1300), and metabolites (~160) across 31 samples, I am finding some issues with selected features from the metabolites panel that is discriminating my categorical outcome.
I ran DIABLO using list.keepX with a full design. This list.keepX was obtained using tune.block.splsda function as below.

test.tune.BBM = tune.block.splsda(X = data, Y = Y, ncomp = 2,test.keepX = test.keepX, design = design2, validation = 'Mfold', folds = 5, nrepeat = 1,dist = "centroids.dist")

# $mRNA
# [1] 30  5

# $protein
# [1] 14 16

# $metabolite
# [1] 9 5
MyResult.diablo.tune <- block.splsda(X, Y, keepX=list.keepX,design = design2)

> metabolite.c1$metabolite$value
Cystine                   -0.5932985
citrulline                -0.4469610
succinate                 -0.4053713
Hydroxyphenylacetic.acid  -0.3018932
fructose.6.phosphate      -0.2679044
Hydroxyisocaproic.acid    -0.2487210
S.adenosyl.L.homoCysteine -0.1651630
arginine                  -0.1488625
myo.inositol              -0.0984053
> metabolite.c2$metabolite$value
anthranilate         -0.68325915
trehalose.sucrose    -0.62615245
Mevalonic.Acid       -0.27213253
fructose.6.phosphate -0.25845751
dGTP                 -0.01528562

Now when I print the list of metabolites that are discriminating my categorical class, I find that one analyte is particularly repeated. As seen above fructose.6.phosphate is showing up twice in both the components in the metabolite data with different value.var. This has not happened earlier with other projects. Can you give some inputs as to why this is happening and how to explain this issue?

Hi @vd4mmind,

Please note that a given variable can be used by multiple components to achieve a discriminant model.

I’ll expand with a simple example. Let’s say you have classes 1, 2, and 3 and the component mainly discriminates 1 from 2 and 3 while component 2 mainly discriminates 2 from 3.

This simply means that, for example and for this case, fructose.6.phosphate is important indiscriminating not only 1 from 2 and 3 but also 2 from 3. See a hypothetical case below that represents such behaviour:

class <- rep(c(1,2,3), 30)
## hypothetical values for fructose.6.phosphate
value <- rnorm(n = 90, mean = c(1,13,18), sd = rep(c(2, 1, 1), 90))

## plot the density of these values across classes
ggplot(data.frame(class = factor(class), fructose.6.phosphate = value), 
       aes(class, fructose.6.phosphate, fill = class)) +
    geom_violin() + theme_bw()

Created on 2021-01-15 by the reprex package (v0.3.0)

Hi @aljabadi ,

Thanks for this response. However, in my case, I have only two categorial classes that are apriori defined as my outcome of interest.

Does this mean that this data (at least the metabolites n =14 selected by DIABLO) is indicative of discriminating more than 2 categorial classes even though I had 2 fixed ones and that there is a possibility of having multiple components for discriminant analysis even though I am using 2 components? Is this a fair interpretation?

Hi @vd4mmind,

Generally, if you have n classes, a maximum of n-1 components should be enough to achieve an optimal discriminant model. In your case, you can simply use 1 component to do so. I understand that you might want to use 2 components for visualisation purposes, in which case I wouldn’t read much into the second components loadings/variates.