Hello everyone,
I’m trying to do an N-integration analysis to identify responders from a clinical trial using DIABLO. I’m tunning the model for different design values c(0, 0.25, 0.5, 1.0) and trying to pick the one that maximizes AUC. For each design value, I do:
design = matrix(d,
ncol = length(data),
nrow = length(data),
dimnames = list(names(data),
names(data)))
diag(design) = 0
basic.diablo.model = block.splsda(X = data,
Y = Y,
ncomp = 4,
design = design)
perf.diablo = perf(basic.diablo.model,
validation = 'Mfold',
folds = 5,
nrepeat = 50)
optimal_ncomp = perf.diablo$choice.ncomp$WeightedVote["Overall.BER", "max.dist"]
test.keepX = list(df_metabolic = c(seq(1,ncol(data$df_metabolic),5)),
df_lipids = c(seq(1,ncol(data$df_lipids),25)),
df_pgs = c(seq(1,ncol(data$df_pgs),1)),
df_microbiome = c(seq(1,ncol(data$df_microbiome),20))
)
tune.TCGA = tune.block.splsda(X = data,
Y = Y,
ncomp = optimal_ncomp,
test.keepX = test.keepX,
design = design,
validation = 'Mfold',
folds = 5,
nrepeat = 50,
dist = "max.dist",
progressBar = TRUE,
BPPARAM = BiocParallel::SnowParam(workers = 20))
The optimal number of components for all designs varies between 3 and 4. However, I get the following error after a while of running the tunning function for all design models:
Error: BiocParallel errors
20 remote errors, element index: 1, 2, 3, 4, 5, 6, ...
30 unevaluated and other errors
first remote error:
Error in get.keepA(X = X, keepX = keepX, ncomp = ncomp): each component of 'keepX[[4]]'
must be lower or equal to ncol(X[[4]])=4.
I am clueless about where this error comes from. The dimensions of my datasets are the following:
lapply(data, dim)
$df_metabolic
[1] 138 49
$df_lipids
[1] 138 165
$df_microbiome
[1] 138 154
$df_pgs
[1] 138 4
It would be really helpful if you could point out some solution to the problem.
Thanks a lot!
Best,
Carolina