Hi mixomics team,
I am analyzing 16S amplicons microbiome data of soil samples.
My intention is to classify my samples based on the sampling depth of the soil horizons.
Specifically, my dataset has an unbalanced sapling size and consist of 10 samples at depth D1, 12 samples at depth D2 and 12 samples at depth D3.
-
my otutable was transformed in percent values obtaining relative abundance of features per sample.
-
otutable was log and center transformed by the function
bac_otutab.t.log <- logratio.transfo(bac_otu_count.t, logratio = "CLR", offset = 1)
then I peformed my pls-da
Y= cl_depth
bac.plsX = plsda(bac_otutab.t.log, Y, ncomp = 10)
color.per.group = c("darkgreen","darkorange","darkviolet") # assign colors to Y groups
bac_pls_2d <- plotIndiv(bac.plsX, comp = 1:2, cex = 2,
pch = 16, ellipse = T,
ind.names = F, col = color.per.group,
ellipse.level = .8, star = T, legend = T, centroid = T,
title = "PLS-DA: ZOTUs ~ Depth") # 2D plot of PLS_DA
- I wanted to assess the performance of the classification
set.seed(999)
MyPerf.bac.plsX <- perf(bac.plsX, validation = "Mfold",
folds = 5,
progressBar = FALSE, auc = TRUE,
nrepeat = 50, cpus = 8) # we suggest nrepeat = 50-100
plot(MyPerf.bac.plsX, col = color.mixo(5:7), sd = TRUE, legend.position = "horizontal")
My question is about the classification error rate that seems to increase along with the number of components.
I don’t understand how is it possible and I am wondering if is it a data problem? Is there a conceptual mistake I don’t see or is just a matter of a mistake in the scripts/workflows?
Thanks in advance for your assistance,
Cheers, Alberto.