Number of items to replace is not a multiple of replacement length error in perf() function

Dear mixOmics team,

I am really excited by the package you have created and am currently trying to integrate transcriptomic, metabolomic and microbiomic data. I already ran through your tutorials and everything worked fine, but now I am trying to perform sPLS pairwise integration on my data sets, which have been imputed beforehand, and it seems like something is not quite right.

# Read in imputed data
# Transcriptomics
tran_imp <- read_excel("~/Documents/AIT/Transcriptomics/tran_mad_imp.xlsx")
tran_imp <- as.data.frame(tran_imp)
rownames(tran_imp) <- tran_imp[,1]
tran_imp <- tran_imp[,-1]
# Metabolomics
meta_imp <- read_excel("~/Documents/AIT/Metabolomics/t0_metabolites.xlsx",sheet = 3)
meta_imp <- as.data.frame(meta_imp)
rownames(meta_imp) <- meta_imp[,1]
meta_imp <- meta_imp[,-1]
# Microbiomics
mibi_imp <- read_excel("~/Documents/AIT/Microbiomics/asv_imp.xlsx")
mibi_imp <- as.data.frame(mibi_imp)
rownames(mibi_imp) <- mibi_imp[,1]
mibi_imp <- mibi_imp[,-1]

# Storability Classes
storability <- read_excel("~/Documents/AIT/Beetroot Files/Metatable_Multiomics.xlsx", sheet = 1, col_names = TRUE)
storability <- as.factor(storability$Storability)

dim(tran_imp) #28 10000
dim(meta_imp) #28 25
dim(mibi_imp) #28 4398

# Transcriptomics & Metabolomics
X <- tran_imp
Y <- meta_imp
cbind(rownames(X), rownames(Y)) 
      [,1]      [,2]     
 [1,] "V1_1_t0" "V1_1_t0"
 [2,] "V1_2_t0" "V1_2_t0"
 [3,] "V1_3_t0" "V1_3_t0"
 [4,] "V1_4_t0" "V1_4_t0"
 [5,] "V2_1_t0" "V2_1_t0"
 [6,] "V2_2_t0" "V2_2_t0"
 [7,] "V2_3_t0" "V2_3_t0"
 [8,] "V2_4_t0" "V2_4_t0"
 [9,] "V3_1_t0" "V3_1_t0"
[10,] "V3_2_t0" "V3_2_t0"
[11,] "V3_3_t0" "V3_3_t0"
[12,] "V3_4_t0" "V3_4_t0"
[13,] "V4_1_t0" "V4_1_t0"
[14,] "V4_2_t0" "V4_2_t0"
[15,] "V4_3_t0" "V4_3_t0"
[16,] "V4_4_t0" "V4_4_t0"
[17,] "V5_1_t0" "V5_1_t0"
[18,] "V5_2_t0" "V5_2_t0"
[19,] "V5_3_t0" "V5_3_t0"
[20,] "V5_4_t0" "V5_4_t0"
[21,] "V6_1_t0" "V6_1_t0"
[22,] "V6_2_t0" "V6_2_t0"
[23,] "V6_3_t0" "V6_3_t0"
[24,] "V6_4_t0" "V6_4_t0"
[25,] "V7_1_t0" "V7_1_t0"
[26,] "V7_2_t0" "V7_2_t0"
[27,] "V7_3_t0" "V7_3_t0"
[28,] "V7_4_t0" "V7_4_t0"

tranmeta.spls <- spls(X, Y, ncomp = 3)
tune.spls <- perf(tranmeta.spls, validation = "Mfold", folds = 5, progressBar = TRUE, nrepeat = 100)

The spls function works, on all but the microbiomics data, where it says “In cor(A[[k]], variates.A[[k]]) : Standard devation is zero”. That might be because of the many zeros in my ASV data.

The main thing that bothers me is that , the perf() function always gives me the error " number of items to replace is not a multiple of replacement length".
I suspect it might have something to do with my data sets, but I have checked them
multiple times now I am still none the wiser.

Transcriptomics: 28 subjects, 3 classes, 10000 genes
Metabolomics: 28 subjects, 3 classes, 25 metabolites
Microbiomics: 28 subjects, 3 classes, 4398 ASVs (with a lot of zeros)
Since the datasets were imputed, there are no missing values in them.

Do you might know what could be causing the problem or do you have any suggestions on what to try to make it work?
Thank you so much for your assistance and thank you again for all the work that has been done!

Hi @lisatucek,

Thank you for reporting this issue. Can you please send us the data so we can fix this issue?

You can click on this text to send us an email.
Alternatively, you can right-click on the above text and choose ‘Copy Email Address’’

Thanks

Al

Thank you so much, Al. I sent you an email on the listed address.
Kind regards,
Lisa

Hello @aljabadi and @lisatucek,

Has this been solved? I am facing the same issue now with my data.

I have to say that my issue is not consistent.

When I change validation = 'loo' in the tune.spls function, then the error disappears.

Other observations:

If I define a vector for test.keepX, then validation = 'MFold' usually crashes (same error). If I don’t define the vector, the function runs normally under default parameters… I said usually because changing the definition of set.seed and nrepeat = 10 sometimes allowed the function to run completely through, even when defining a vector for test.keepX… But generally, when changing to nrepeat = 50 would cause the function to throw the error when using validation = 'MFold'

Let me know if I can help you with more information.

Best wishes,
Miguel

hi @miguelcos
My feeling is (but I have not had access to the data, so @aljabadi can confirm) that there are special cases where the variables are highly collinear (i.e. identical) except for a few samples. So when doing loo it is ok but you still retain some variability in the variables. But if you remove a larger chunk of samples, then it may happen that a large number of variables have exactly the same value. This may happen randomly depending on the folds in the CV.

I hope that helps.
Kim-Anh

1 Like