Hello,
I’ve been trying to run spls on my gene expression and microbial abundance data. No matter how much I try to tune for keep.X, I run into errors. My code is below:
dim(meta); dim(filtered_gene)
# 136 148
# 136 4953
ncomptry=10
MyResult.spls <- spls(meta, filtered_gene, ncomp=ncomptry, near.zero.var = FALSE) # this finishes no problem
# Below code snippet runs fine, no problem here. Determined that ncomp=5 should be kept.
set.seed(22)
perf.pls <- perf(MyResult.spls, validation="Mfold", folds=5, progressBar=TRUE, nrepeat=50)
plot(perf.pls, criterion = 'Q2.total')
X=seq(1:ncomptry)
Y=perf.pls$measures$Q2.total$summary$sd
par(mar=c(5,5,5,5))
plot.new()
plot(X,Y)
abline(h = 0.0975) # keep 5 components to test
# Tuning
list.keepX <- c(25, 50, 100, 500, 1000, 2500, 3000)
set.seed(22)
tune.spls.cor <- tune.spls(meta, filtered_gene, ncomp = 5,
test.keepX = list.keepX,
validation = "Mfold", folds = 5,
nrepeat = 50, progressBar = TRUE,
measure = NULL)
# below is the output I always obtain
tuning component: 1
[======= ] 14%Error in X.test %*% a.cv : non-conformable arguments
Always crashes at 14%. Even if I change the set.seed value, it just fails no matter what I do. The only time this ever worked was with using a very small subset of my data (31 taxa instead of 148), but I have different datasets of microbial abundances that I want to run spls for.
I realize my microbial abundance has a lot of zeros and somehow this is affecting it - because when I just ran spls in the first place, it would fail unless I removed any taxa with less than 10 counts across the samples. But the issue is - zeros are quite a normal value to be assessed in abundance data so I don’t want to just keep arbitrarily removing taxa with low counts because that is actually meaningful data that I want to compare with host gene expression.
How to overcome this??