Network() similarity matrix computation

Hello,

Thank you for this package! I’m currently using (s)PLS to predict metabolite concentrations with infant gut microbiome. I’m also using DIABLO to integrate these -omics to classify the infant’s arsenic exposure status.

I would like to make my own custom network function as the one provided is difficult to customize when the feature names of my microbes are really long. However, when I try to calculate the similarity matrix following the help guide for network(), the resulting correlations are much lower compared to when I simply use your network() function.

Below is how I’m computing the similarity matrix. Could you point out where I’m going wrong? Thank you!

Extract components for X

spls.comp ← spls.model$variates$X

Extract important variables VIP >= 1

vip.tune.pls ← data.frame(vip(spls.model)) %>% filter(comp1 >= 1)

Extract stability measures for selected variables

if(validation == “Mfold”){
perf.final.spls ← perf(spls.model, validation = validation,folds = 5, nrepeat = 5,
progressBar = T, seed = seed, near.zero.var = T)
} else{
perf.final.spls ← perf(spls.model, validation = ‘loo’,
progressBar = T, seed = seed, near.zero.var = T)

}

stab.spls2.microbes ← data.frame(stab = perf.final.spls$features$stability.X$comp1) %>%
filter(stab >= 0.8)

stab.spls2.metabs ← data.frame(stab = perf.final.spls$features$stability.Y$comp1) %>%
filter(stab >= 0.8)

vip.tune.spls.final ← vip.tune.pls %>% filter(rownames(vip.tune.pls) %in% rownames(stab.spls2.microbes))

Extract feature weights for X (microbiome) and Y (metabolites)

microbiome_loadings ← data.frame(comp1 = spls.model$loadings$X)
metabolite_loadings ← data.frame(comp1 = spls.model$loadings$Y)

filter microbiome loadings with vip variables

microbiome_loadings ← microbiome_loadings %>%
filter(rownames(microbiome_loadings) %in% rownames(vip.tune.spls.final))

filter metabolite loadings to ones most stable >= 0.8

metabolite_loadings ← metabolite_loadings %>%
filter(rownames(metabolite_loadings)%in% rownames(stab.spls2.metabs))

extract important metabolite variables with selectVar

top_microbiome_features ← microbiome_loadings %>%
rownames_to_column(‘taxon’)

top_metabolite_features ← metabolite_loadings %>%
rownames_to_column(‘metab’)

Xvars ← top_microbiome_features$taxon
Yvars ← top_metabolite_features$metab

Pearson’s correlation against the original training matrices with the X comps

Xt2 ← data.frame(spls.model$X) %>% select(all_of(Xvars))
Yt2 ← data.frame(spls.model$Y) %>% select(all_of(Yvars))

corX ← cor(Xt2, spls.comp)
corY ← cor(Yt2, spls.comp)

calculate similarity matrix between correlation matrices

simmat ← corX %*% t(corY)

Hi @kookd18 ,

I think this might be because we add the similarities across components, potentially.

It might be better for your to extract the similarity matrix direction from the network() function by saving your result into an object, and then amend the row / col names. Or amend the colnames of your data before you run the entire analysis.

Kim-Anh