Diablo perf error

Hi,

I am trying to integrate 5 data sets. I have filtered the data frames to matching samples across all 5 datasets and made a list. I am able to run, block.splsda(X = data, Y = Y, ncomp = 5, design = design) with no issues and get the following output:

Call:
block.splsda(X = data, Y = Y, ncomp = 5, design = design)

sGCCA with 5 components on block 1 named RNAseq
sGCCA with 5 components on block 2 named Flow
sGCCA with 5 components on block 3 named Lipoproteins
sGCCA with 5 components on block 4 named Small_metabolites
sGCCA with 5 components on block 5 named Amino_acids
sGCCA with 5 components on the outcome Y

Dimension of block 1 is 259 14736
Dimension of block 2 is 259 26
Dimension of block 3 is 259 100
Dimension of block 4 is 259 41
Dimension of block 5 is 259 37
Outcome Y has 19 levels

Selection of 14736 14736 14736 14736 14736 variables on each of the sGCCA components on the block 1
Selection of 26 26 26 26 26 variables on each of the sGCCA components on the block 2
Selection of 100 100 100 100 100 variables on each of the sGCCA components on the block 3
Selection of 40 40 40 40 40 variables on each of the sGCCA components on the block 4
Selection of 37 37 37 37 37 variables on each of the sGCCA components on the block 5

Main numerical outputs:

loading vectors: see object$loadings
variates: see object$variates
variable names: see object$names

Functions to visualise samples:

plotIndiv, plotArrow, cimDiablo, plotDiablo

Functions to visualise variables:

plotVar, plotLoadings, network, circosPlot

Other functions:

selectVar, perf, auc

perf.diablo = perf(sgccda.res, validation = ‘Mfold’, folds = 10, nrepeat = 10)

However when I run:

perf.diablo = perf(sgccda.res, validation = ‘Mfold’, folds = 10, nrepeat = 10)

I get the following error:
Error in Check.entry.single(newdata[[q]], ncomp[q], q = q) : samples should have a unique identifier/rowname

There are no duplicate rownames and all my rownames match

Thank you

hi @Santi
What is your mixOmics version? (type sessionInfo() to find out). It is possible that this bug has been resolved if you update the latest version of the package, see that link.

Kim-Anh

Hi @Santi,

As @kimanh.lecao mentioned it would be helpful to know which version you are using, and if you have tried updating to the latest Bioconductor version.

Additionally, please make sure the sample names are standard characters without any special characters (space, !@#$() etc.). You can paste here the output of the following code:
lapply(X, rownames)
for us to try and diagnose the issue.

You can also simply send us your data following step (5) at Reproducible example to clarify issues for us to spot the problem.

Best,

Al

Thanks Kim-Ang and Al

My session info is: mixOmics_6.12.2

My rownames just have an underscore:

“CV0003_D14” “CV0003_D21” “CV0003_D7” “CV0006_D0” “CV0010_D0”
[6] “CV0010_D28” “CV0013_D0” “CV0027_D7” “CV0037_D0” “CV0037_D28”
[11] “CV0039_D0” “CV0039_D7” “CV0042_D0” “CV0042_D28” “CV0043_D0”
[16] “CV0043_D28” “CV0045_D0” “CV0045_D28” “CV0047_D14” “CV0047_D7”
[21] “CV0048_D0” “CV0048_D7” “CV0050_D0” “CV0050_D28” “CV0050_D7”
[26] “CV0052_D0” “CV0058_D0” “CV0058_D14” “CV0058_D28” “CV0058_D7”
[31] “CV0062_D0” “CV0062_D28” “CV0067_D0” “CV0068_D0” “CV0068_D14”
[36] “CV0068_D28” “CV0069_D0” “CV0069_D14” “CV0069_D28” “CV0069_D7”
[41] “CV0071_D0” “CV0071_D28” “CV0071_D7” “CV0073_D0” “CV0074_D0”
[46] “CV0074_D28” “CV0075_D7” “CV0076_D0” “CV0076_D28” “CV0077_D7”
[51] “CV0080_D0” “CV0080_D28” “CV0084_D0” “CV0086_D0” “CV0088_D0”

Thank you,

Santi

Hi @Santi, please send us your data for us to spot the problem.

Your files will be completely confidential and destroyed safely after the issue is resolved. You can click on this text to send us an email. Alternatively, you can right-click on the above text and choose ‘Copy Email Address’

Best,

Al

Email from user:

Hi Al,

Thanks for the email. Here is the input that I would feed into the code below:
 X_Diablo_input.rds
 Y_Diablo_input.rds

design = matrix(0.1, ncol = length(data), nrow = length(data),
                dimnames = list(names(data), names(data)))
diag(design) = 0
sgccda.res = block.splsda(X = X_Diablo_input, Y = Y_Diablo_input, ncomp = 5,
                           design = design)


Thank you,

Santi

Hi @Santi,

Thank you for the email and the data.

I ran the code using the same version but did not have any issues. See below.

X_Diablo_input <- readRDS('buildignore/X_Diablo_input.rds')
Y_Diablo_input <- readRDS('buildignore/Y_Diablo_input.rds')

sgccda.res = block.splsda(X = X_Diablo_input, Y = Y_Diablo_input, ncomp = 2)
#> Warning in cor(A[[k]], variates.A[[k]]): the standard deviation is zero

plotIndiv(sgccda.res, pch=16)
#> Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
#> 'ind.names' is set to FALSE as 'pch' overrides it

packageVersion('mixOmics')
#> [1] '6.12.2'

I also did not face the issue using previous and later versions.

Unfortunately, it’s difficult to find the issue if it is specific to your system. You can try on another system and/or on RStudio Cloud to see if you still face the problem. If not, it should be specific to your machine.

Please keep us updated on how you go.

Best,

Al

Hi @Santi,

I realised that you wanted to run perf eventually and that is when the problem occurs. The reason is that one of your classes has only one sample which creates a glitch in the code which must produce a more informative error on our end too. See below.

sort(table(Y_Diablo_input))
#> Y_Diablo_input
#> C_25_36 B_37_48 A_13_24 A_25_36 D_37_48  E_0_12 C_37_48 C_13_24 B_25_36 B_13_24 
#>       1       4       7       9       9       9      10      11      12      13 
#> D_13_24 E_13_24  A_0_12 E_37_48  D_0_12  C_0_12 E_25_36   HC_HC  B_0_12 
#>      13      14      16      17      18      19      20      28      29

However, I recommend you either merge that sample with the closest class or perform the analysis without it.

Best,
Al

Hi Al,

Thanks so much.

I would never have worked that out.

Santi