Features with zero variance in block

psolev · November 20, 2024, 1:05pm

Good morning everyone,

I am quite new with mixOmics. I have been working with DIABLO for a few days. I managed to run all the analysis, even represent some plots, and I obtained reasonable results.
However, this morning I updated the package (I was working with 6.22 version) and now the perf() function I was using to select ncomp it’s not working with my data.

I keep getting the error:
Error: There are features with zero variance in block 'transcriptomics'. If nearZeroVar() function or 'near.zero.var' parameter hasn't been used, please use it. If you have used one of these, you may need to manually filter out these features

However, I verified if there were actually variables with zero variance and I get that ‘transcriptomics’ block has not any feature with zero variance:

nearZeroVar(basic.diablo.model$X$transcriptomics)
$Position
integer(0)

$Metrics
[1] freqRatio     percentUnique
<0 rows> (or 0-length row.names)

I don’t know if I am missing something, but I don’t know how to solve this error.

Thank you very much in advance for your help.

evahamrud · November 22, 2024, 12:56am

Hi @psolev,

I haven’t been able to reproduce your error unfortunately, and looking into the code I can’t find the point at which perf() looks for features with zero variance, this check should happen earlier when you create the DIABLO model using block.splsda().

I have created this short piece of code using the TGCA test data to show how features with 0 variance throw an error during block.splsda() but when I don’t have any features with 0 variance I don’t get any errors when running block.splsda() or running perf() on the created DIABLO model.

To see whether the error you’re getting is due to something specific to your data or something else, could you please 1) let me know which mixOmics version you are using (sharing the output of sessionInfo() would be ideal) and 2) could you please copy-paste the following code and try running it on your computer and let me know if you get any errors? Then we can take it from there!

## create a normal diablo model - runs without errors
data("breast.TCGA")
data = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna,
            protein = breast.TCGA$data.train$protein)
design = matrix(1, ncol = length(data), nrow = length(data),
                dimnames = list(names(data), names(data)))
diag(design) =  0
list.keepX = list(mrna = rep(8,2), mirna = rep(8,2), protein = rep(8,2))

TCGA.block.splsda = block.splsda(X = data, Y = breast.TCGA$data.train$subtype, 
                                 ncomp = 2, keepX = list.keepX, design = design)

perf(TCGA.block.splsda) # runs without errors

# check that none of DIABLO blocks have features with 0 variance
nearZeroVar(TCGA.block.splsda$X$mrna)
nearZeroVar(TCGA.block.splsda$X$mirna)
nearZeroVar(TCGA.block.splsda$X$protein)

## edit the input data to introduce 0 variance to 20 of the mRNA variables
nearZeroVar(breast.TCGA$data.train$mrna)
zero_variance_mrna_data <- breast.TCGA$data.train$mrna
zero_variance_mrna_data[,1:20] <- 1
nearZeroVar(zero_variance_mrna_data)

# create a new DIABLO model with this new data - `block.splsda() fails`
data = list(mrna = zero_variance_mrna_data, 
            mirna = breast.TCGA$data.train$mirna,
            protein = breast.TCGA$data.train$protein)
TCGA.block.splsda = block.splsda(X = data, Y = breast.TCGA$data.train$subtype, 
                                 ncomp = 2, keepX = list.keepX, design = design)

Cheers,
Eva

psolev · November 22, 2024, 8:54am

Hello @evahamrud,

First of all thank you so much for your response. I really appreciate your help.

I’m usign mixOmics_6.30.0:

sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-apple-darwin20
Running under: macOS Monterey 12.7.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] igraph_2.1.1    gridExtra_2.3   readxl_1.4.3    mixOmics_6.30.0 ggplot2_3.5.1  
[6] lattice_0.22-6  MASS_7.3-61    

loaded via a namespace (and not attached):
 [1] utf8_1.2.4          generics_0.1.3      tidyr_1.3.1         stringi_1.8.4      
 [5] digest_0.6.37       magrittr_2.0.3      evaluate_1.0.1      grid_4.4.2         
 [9] RColorBrewer_1.1-3  fastmap_1.2.0       cellranger_1.1.0    plyr_1.8.9         
[13] Matrix_1.7-1        ggrepel_0.9.6       RSpectra_0.16-2     purrr_1.0.2        
[17] fansi_1.0.6         scales_1.3.0        codetools_0.2-20    cli_3.6.3          
[21] crayon_1.5.3        rlang_1.1.4         munsell_0.5.1       withr_3.0.2        
[25] yaml_2.3.10         ellipse_0.5.0       tools_4.4.2         parallel_4.4.2     
[29] reshape2_1.4.4      BiocParallel_1.40.0 dplyr_1.1.4         colorspace_2.1-1   
[33] corpcor_1.6.10      vctrs_0.6.5         R6_2.5.1            matrixStats_1.4.1  
[37] lifecycle_1.0.4     stringr_1.5.1       pkgconfig_2.0.3     pillar_1.9.0       
[41] gtable_0.3.6        glue_1.8.0          rARPACK_0.11-0      Rcpp_1.0.13-1      
[45] xfun_0.49           tibble_3.2.1        tidyselect_1.2.1    knitr_1.49         
[49] farver_2.1.2        htmltools_0.5.8.1   labeling_0.4.3      rmarkdown_2.29     
[53] compiler_4.4.2

The code you provided works as expected: the first part doesn’t return any error, but in the segon DIABLO model with the data with 0 variance variables returns the error:

Error: There are features with zero variance in block 'mrna'. If nearZeroVar() function or 'near.zero.var' parameter hasn't been used,  please use it. If you have used one of these, you may need to manually filter out these features.

So I guess the problem is specific to my data
I don’t know where can it be, as I filtered the variables with 0 variance before creating the model. And I don’t understant why the error occurs in the perf() function but not in the block.splsda().

I share my code so you can see if I’m doing something wrong:

data <- read.table("PA_dataframe_log.txt", header=TRUE, sep="\t", row.names=1)

# Delete columns with NA
columns_NA <- colSums(is.na(data)) > 0
index_columns_NA <- which(columns_NA)
data <- data[, -index_columns_NA]

# Delete variables with 0 variance
nzv <- nearZeroVar(data)
data <- data[, -nzv$Position, drop=FALSE]

# Subset data to create the different blocks
physiological <- data[, (1:3)]
transcriptomics <- data[, grepl("SOLYC", colnames(data))]
metabolomics <- data[, grepl("met", colnames(data))]

# Create a list with the 3 blocks
data = list(physiological = physiological,
                transcriptomics = transcriptomics, 
                metabolomics = metabolomics)
# Define Y 
Y <- as.factor(groups)
summary(Y)
LUK_C_PA LUK_S_PA NOT_C_PA NOT_S_PA 
       3        3        3        3 

# Create the design matrix
design = matrix(0.8, ncol = length(data), nrow = length(data), 
+                 dimnames = list(names(data), names(data)))
diag(design) = 0
design 
                physiological transcriptomics metabolomics
physiological             0.0             0.8          0.8
transcriptomics           0.8             0.0          0.8
metabolomics              0.8             0.8          0.0

basic.diablo.model = block.splsda(X = data, Y = Y, ncomp = 3, design = design) 
Design matrix has changed to include Y; each block will be
            linked to Y.

perf.diablo <- perf(basic.diablo.model, validation = "loo") 
Error: There are features with zero variance in block 'transcriptomics'. If nearZeroVar() function or 'near.zero.var' parameter hasn't been used,  please use it. If you have used one of these, you may need to manually filter out these features.

Maybe the problem is that I use validation = 'lool' in the perf() function? Or something with my data?

Thank you.

Paula

psolev · November 22, 2024, 11:46am

Hello again,

I have been trying different things and filtering my data in different way but nothing worked. But just now I realized that adding the parameter near.zero.var = TRUE to the block.splsda() function, the perf() function works well and without any error.

New version:
basic.diablo.model = block.splsda(X, Y, ncomp = 3, design = design, near.zero.var = TRUE)

Maybe the solution was as simple as that, but I didn’t realize before. It may be useful for other people having the same problem.

Thank you very much and congrats for this incredible bioinformatics tool.

evahamrud · November 27, 2024, 1:15am

That is interesting and thank you for posting the solution you found! I will look into this a bit more in the future to see whats going on, in the meantime your workaround seems like the best solution. Feel free to add another post if you encounter any other issues!

Cheers,
Eva

Topic		Replies	Views
Tune.block.splsda() allowing 0 and 1 Bugs	1	236	November 10, 2022
Keep having nearZerovar error	1	87	July 18, 2024
I Have a problem/error with tune.block.splsda Support	5	1260	July 27, 2022
Perf on DIABLO with one component Support	5	990	December 7, 2020
Generic questions about DIABLO: perf, keepX and no variable selection Support	5	1377	December 11, 2022

Features with zero variance in block

Related topics