Perf MFold cross-validation error

Hi,

I’m getting the following error message:

Error in Check.entry.single(newdata[[q]], ncomp[q], q = q) :
samples should have a unique identifier/rowname

I’ve checked all the row names are unique and gone through a similar problem on the forum (Diablo perf error - #7 by aljabadi) like updating bioconductor and the mixomics package.

I wasn’t getting this error initially on a much smaller dataset. Any help would be appreciated thank you!

Hi @Tom,

Thanks for sharing your data, I’ve found that like the other post you linked (Diablo perf error - #7 by aljabadi), you have one sample which is in its own class. This causes the error when running perf with Mfold validation (but not with leave-one-out validation). The error is unfortunately not very informative, so I will flag that as something to improve in the perf function!

The offending sample is 23, which has the class ‘#N/A’, removing this sample avoids the error:

# model building with original data, perf returns error
basic.diablo.model = block.splsda(X = data, Y = Y, ncomp = 5, design = design, near.zero.var = TRUE) 
perf.diablo <- perf(basic.diablo.model) 
# Error in Check.entry.single(newdata[[q]], ncomp[q], q = q) : 
#   samples should have a unique identifier/rowname

# this error does not occur during LOO cross-validation
perf.diablo <- perf(basic.diablo.model, validation = "loo") 

# one sample is in its own category '#N/A'
sort(table(Y))
#N/A   IV+IV AZLI+IV 
# 1      19      20 

# identify the offending sample - sample 23
Y
# [1] "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV"
# [17] "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "#N/A"    "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"   "IV+IV"  
# [33] "IV+IV"   "IV+IV"   "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV" "AZLI+IV"

# remove sample 23 from data
Y <- Y[-23]
Bacteria_filtered <- Bacteria[-23, ]
Metabolites_table_filtered <- Metabolites_table[-23, ]
data = list(Bacteria=Bacteria_filtered, 
            Metabolites_table=Metabolites_table_filtered)

# re-run model building and perf without errors
basic.diablo.model = block.splsda(X = data, Y = Y, ncomp = 5, design = design, near.zero.var = TRUE) 
perf.diablo <- perf(basic.diablo.model) 
plot(perf.diablo)

Cheers,
Eva

This has fixed it, thank you!
Tom