Perf functions' plot does not seem correct

Hello, I am trying to analyze 3 datasets (methylation, expression and proteomics) for a drug sensitivity in DIABLO. In the first part of the code, the graph expressing the classification error rate does not seem correct since all the lines are clustered at 0 as y=0 lines (can be seen in attached picture). Anybody has any idea, what might be the problem? Additionally, when I change the nrepeat in a higher number the graph changes just a little bit.

Screen Shot 2020-08-26 at 7.48.24 PM

and here is my code till this plot;


----message = TRUE------------------------------------------------------



Data1 = read.csv(“expression_traindata.csv” , sep= “,”, row.names = 1,header = T,check.names = F,stringsAsFactors = F)
Data2 = read.csv(“methylation1_traindata.csv” , sep= “,”, row.names = 1,header = T,check.names = F,stringsAsFactors = F)
Data3 = read.csv(“proteomics_traindata.csv” , sep= “,”, row.names = 1,header = T,check.names = F,stringsAsFactors = F)

data = list(expression=as.matrix((Data1)),methylation=as.matrix((Data2)),proteomics=as.matrix((Data3)))
lapply(data, dim)
Y=read.csv(“sensitive_resistant.csv” , sep= “,”, row.names = 1, header = T, stringsAsFactors = T)


design = matrix(0.1, ncol = length(data), nrow = length(data),
dimnames = list(names(data), names(data)))
diag(design) = 0



sgccda.res = block.splsda(X = data, Y = Y$subtype, ncomp = 10, design = design)


perf.diablo = perf(sgccda.res, validation = ‘Mfold’, folds = 3, nrepeat = 100)

#perf.diablo # lists the different outputs


Hi @sugeyigi

Do you happen to have the sensitivity resistant data (Y) also in one or more of X blocks by any chance? The model might be using Y along with other data to predict Y which is why it has zero error rate from the first component.



Replied in another topic post.