Reducing number of features DIABLO

Jose_Gracia · February 14, 2023, 10:51am

Hi everyone,

I have a small dataset, Its composed of resistant cell lines and their parental. There’s only one parental per line. And there are no more than 5 clones, in some cases just one.

Also, MUTATION block does have NA values, but I still get the same error message wherever I use It or not.

When I try to run tune.block.splsda

test.keepX = list (CNV = c(25,25), 
                   MUTATION = c(25,25),
                   RNA = c(25,25),
                   ENHANCER = c(25,25),
                   PROMOTER = c(25,25))

tune.TCGA = tune.block.splsda(X = diablo_input, Y = diablo_metadata, ncomp = 2, 
                              test.keepX = test.keepX, design = design,
                              validation = 'loo', folds = 5, nrepeat = 1,
                              dist = "centroids.dist",max.iter = 200)

Design matrix has changed to include Y; each block will be
linked to Y.

You have provided a sequence of keepX of length: 2 for block CNV and 2 for block MUTATION and 2 for block RNA and 2 for block ENHANCER and 2 for block PROMOTER.
This results in 32 models being fitted for each component and each nrepeat, this may take some time to run, be patient!

You can look into the ‘BPPARAM’ argument to speed up computation time.
Error in 1:n : NA/NaN argument

Does anybody have some idea? I leave a link to the Robject.

Also, I would like to ask If, with this many samples, this has any statistical sense at all.

sessionInfo()

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.14.4 forcats_0.5.2 stringr_1.4.1 purrr_0.3.5 readr_2.1.3 tidyverse_1.3.2 pheatmap_1.0.12
[8] tibble_3.1.8 tidyr_1.2.1 mixOmics_6.19.3 ggplot2_3.3.6 lattice_0.20-45 MASS_7.3-58 dplyr_1.0.10

loaded via a namespace (and not attached):
[1] matrixStats_0.62.0 fs_1.5.2 lubridate_1.8.0 METACLUSTER_1.0.0 bit64_4.0.5 RColorBrewer_1.1-3
[7] httr_1.4.4 rprojroot_2.0.3 tools_4.2.1 backports_1.4.1 utf8_1.2.2 R6_2.5.1
[13] DBI_1.1.3 colorspace_2.0-3 withr_2.5.0 prettyunits_1.1.1 processx_3.7.0 tidyselect_1.2.0
[19] gridExtra_2.3 curl_4.3.3 bit_4.0.4 compiler_4.2.1 cli_3.4.1 rvest_1.0.3
[25] xml2_1.3.3 labeling_0.4.2 scales_1.2.1 callr_3.7.2 digest_0.6.30 rmarkdown_2.17
[31] pkgconfig_2.0.3 htmltools_0.5.4 dbplyr_2.2.1 fastmap_1.1.0 rlang_1.0.6 readxl_1.4.1
[37] rstudioapi_0.14 farver_2.1.1 generics_0.1.3 jsonlite_1.8.3 BiocParallel_1.30.4 vroom_1.6.0
[43] googlesheets4_1.0.1 magrittr_2.0.3 Matrix_1.5-1 Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3
[49] lifecycle_1.0.3 stringi_1.7.8 yaml_2.3.6 pkgbuild_1.3.1 plyr_1.8.7 grid_4.2.1
[55] parallel_4.2.1 ggrepel_0.9.1 crayon_1.5.2 haven_2.5.1 hms_1.1.2 ps_1.7.1
[61] knitr_1.40 pillar_1.8.1 igraph_1.4.0 corpcor_1.6.10 reshape2_1.4.4 codetools_0.2-18
[67] reprex_2.0.2 glue_1.6.2 evaluate_0.17 remotes_2.4.2 BiocManager_1.30.18 modelr_0.1.9
[73] vctrs_0.5.0 tzdb_0.3.0 cellranger_1.1.0 gtable_0.3.1 assertthat_0.2.1 xfun_0.34
[79] broom_1.0.1 RSpectra_0.16-1 googledrive_2.0.0 gargle_1.2.1 rARPACK_0.11-0 ellipse_0.4.3
[85] ellipsis_0.3.2

kimanh.lecao · February 23, 2023, 11:49pm

Hi @Jose_Gracia,

My hint is because some of your groups only include one sample, when this is removed in cross-validation, it breaks the system. You would need to regroup some samples so that all classes are represented in leave-one-out cross validation (i.e at least 2 samples per class).

the MUTATION data may also posing a problem, as you noticed.

And also you are using the tune.keepX parameters incorrectly, they should specify the parameters to try for each component, e.g.

test.keepX = list(CNV = c(5,10, 25), 
                  RNA = c(5,10, 25),
                  ENHACER = c(5,10, 25),  # note here that is should also match exactly how you named your datasets
                  PROMOTER = c(5,10, 25))

I hope that helps, thanks for sharing your data.

Kim-Anh

Jose_Gracia · February 28, 2023, 12:35pm

Hi @kimanh.lecao ,

Thank you for your response.

I have another question, this time regarding the heatmap.

Is there a way of getting the legend out of the figure?
How can I keep my gene names from being cut?
Is there a way of extracting this data so I can plot It with other libraries?

Thank you for your time!

Jose.

kimanh.lecao · March 2, 2023, 10:34pm

hi @Jose_Gracia

Assuming you are using the function cimDiablo(), you will see in the help file of that function that you can extract the M object from that function.

e.g.

data(nutrimouse)
Y = nutrimouse$diet
data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid)
design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3, byrow = TRUE)


nutrimouse.sgccda <- block.splsda(X = data,
Y = Y,
design = design,
keepX = list(gene = c(10,10), lipid = c(15,15)),
ncomp = 2,
scheme = "centroid")

output = cimDiablo(nutrimouse.sgccda)
oputput$M

I am not sure how to solve the legend / cropping issues. You could transpose, and change the name of your genes. See the help file which lists these arguments:

`margins`	numeric vector of length two containing the margins (see `par(mar)`) for column and row names respectively.
`legend.position`	position of the legend, one of “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.
`transpose`	logical indicating if the matrix should be transposed for plotting. Defaults to `FALSE`.
`row.names, col.names`	logical, should the name of rows and/or columns of `mat` be shown? If `TRUE` (defaults) `rownames(mat)` and/or `colnames(mat)` are used. Possible character vectors with row and/or column labels can be used.

Kim-Anh

Topic		Replies	Views
Tune.block.splsda() allowing 0 and 1 Bugs	1	255	November 10, 2022
I Have a problem/error with tune.block.splsda Support	5	1319	July 27, 2022
Excessive runtime using tune.block.splsda() Analysis	13	2172	December 14, 2023
Perf on DIABLO with one component Support	5	1026	December 7, 2020
Error when trying to tune number of features Support	1	68	December 8, 2024

Reducing number of features DIABLO

Related topics