Error with perf() for a block.plsda analysis: clash with epiDisplay

Hi everyone,

First, many thanks for the great package!
I am using the package version 6.24.0.
I am running a DIABLO analysis in 223 individuals with the following data sets:

  • 150 lipid-related metabolites (~normal distributions)
  • 35 proteins (~normal distributions)
  • 51 genetic variants (0/1/2 values)
    The outcome is a 3-category variable (sample size per category = 103, 64 and 56).

X ← list(metabo = alldata[, c(104:155,173:270)],
prot = alldata[, c(2:36)],
SNPs = alldata[, c(37:59,76:103)])
Y ← alldata$Beck.orig; table(Y)

All the X variables are coded in numeric, scaled and centered (even the SNPs). The outcome variable is coded as a factor. I have chosen a 0.5 design matrix given the correlations between the first component of the different data sets (using a PLS analysis).

design ← matrix(0.5, ncol = length(X), nrow = length(X),
dimnames = list(names(X), names(X)))
diag(design) ← 0; design

The “quick start” code works fine:
diablo.result1 ← block.plsda(X, Y)
plotIndiv(diablo.result1)
plotVar(diablo.result1)

However, here is the error message that I get when I use the perf() function:
tune.diablo ← block.plsda(X, Y, ncomp = 5, design = design)
perf.diablo ← perf(tune.diablo, validation = ‘Mfold’, folds = 5, nrepeat = 10)
Error in filter():
:information_source: In argument: row_number(x) == n().
:information_source: In group 1: Group.2 = "100".
Caused by error in vec_rank():
! Unsupported vctrs type null.
:information_source: In file type-info.c at line 189.
:information_source: This is an internal error that was detected in the vctrs package.
Please report it at https://github.com/r-lib/vctrs/issues with a reprex and the full backtrace.
Backtrace:

  1. ├─mixOmics::perf(…)
  2. ├─mixOmics:::perf.sgccda(…)
  3. │ └─base::lapply(…)
  4. │ └─mixOmics (local) FUN(X[[i]], …)
  5. │ └─mixOmics (local) repeat_cv_perf.diablo(nrep)
  6. │ └─base::lapply(…)
  7. │ └─mixOmics (local) FUN(X[[i]], …)
  8. │ └─mixOmics:::predict.block.spls(model[], X.test[], dist = “all”)
  9. │ └─mixOmics:::internal_predict.DA(…)
  10. │ ├─dplyr::filter(.data = data_max, row_number(x) == n())
  11. │ └─dplyr:::filter.data.frame(.data = data_max, row_number(x) == n())
  12. │ └─dplyr:::filter_rows(.data, dots, by)
  13. │ └─dplyr:::filter_eval(…)
  14. │ ├─base::withCallingHandlers(…)
  15. │ └─mask$eval_all_filter(dots, env_filter)
  16. │ └─dplyr (local) eval()
  17. ├─dplyr::row_number(x)
  18. │ └─vctrs::vec_rank(x, ties = “sequential”, incomplete = “na”)
  19. └─rlang:::stop_internal_c_lib(…)
  20. └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)

I have performed PLSDA analyses between each X data set and Y, and didn’t encounter this issue.

I have unsuccessfully tried the followings:

  • removing the SNPs from X to see if these variables could have caused the issue.
  • change ncomp and folds.
  • change the design matrix.

Any idea about what could be causing this issue?
Many thanks in advance.
Simon Nusinovici

For the same question , I put here the reprex version of the code:

library(mixOmics)
#> Loading required package: MASS
#> Loading required package: lattice
#> Loading required package: ggplot2
#> 
#> Loaded mixOmics 6.24.0
#> Thank you for using mixOmics!
#> Tutorials: http://mixomics.org
#> Bookdown vignette: https://mixomicsteam.github.io/Bookdown
#> Questions, issues: Follow the prompts at http://mixomics.org/contact-us
#> Cite us:  citation('mixOmics')

source('C:/Users/Common/Documents/PROJECTS SERI/Metabolic profile ED/Effect statin/SCRIPTS/Proteomics/new scripts/prepData_multiOmics.R')
#> Loading required package: foreign
#> Loading required package: survival
#> Loading required package: nnet
#> 
#> Attaching package: 'epiDisplay'
#> The following object is masked from 'package:ggplot2':
#> 
#>     alpha
#> The following object is masked from 'package:lattice':
#> 
#>     dotplot
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> 
#> Attaching package: 'lmtest'
#> The following object is masked from 'package:epiDisplay':
#> 
#>     lrtest
#> Loading required package: carData
#> Use the command
#>     lattice::trellis.par.set(effectsTheme())
#>   to customize lattice options for effects plots.
#> See ?effectsTheme for details.
#> Loading required package: Matrix
#> Loaded glmnet 4.1-8
#> corrplot 0.92 loaded
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
#> Loading required package: grid
#> Loading required package: checkmate
#> Loading required package: abind
#> 
#> Attaching package: 'survey'
#> The following object is masked from 'package:Hmisc':
#> 
#>     deff
#> The following object is masked from 'package:graphics':
#> 
#>     dotchart
#>  cobalt (Version 4.5.1, Build Date: 2023-04-27)
#> 
#> Attaching package: 'cobalt'
#> The following object is masked from 'package:MatchIt':
#> 
#>     lalonde
#> Warning in read.dta13(paste(path, "Data SEED/SEED DATA/SEED-1/SiMES-SINDI-SCES-Essential-13Sep18.dta", : 
#>    Missing factor labels for variables
#> 
#>    L_NDR_cat
#> 
#>    No labels have been assigned.
#>    Set option 'generate.factors=TRUE' to generate labels.

## prepare the data
alldata <- subset(alldata, Beck.orig!='Late AMD')
alldata$Beck.orig <- droplevels(alldata$Beck.orig)
alldata[, c(37:59,76:103)] <- data.frame(scale(alldata[, c(37:59,76:103)], center = T, scale = T))
X <- list(metabo = alldata[, c(104:155,173:270)]  
          , prot = alldata[, c(2:36)]
          , SNPs = alldata[, c(37:59,76:103)]
)
Y <- alldata$Beck.orig; table(Y)
#> Y
#>     No AMD  Early AMD Interm AMD 
#>        103         64         56

## matrix design
design <- matrix(0.3, ncol = length(X), nrow = length(X), 
                 dimnames = list(names(X), names(X)))
diag(design) <- 0; design 
#>        metabo prot SNPs
#> metabo    0.0  0.3  0.3
#> prot      0.3  0.0  0.3
#> SNPs      0.3  0.3  0.0

## run the model
tune.diablo <- block.plsda(X, Y, ncomp = 5, design = design)
#> Design matrix has changed to include Y; each block will be
#>             linked to Y.
perf.diablo <- perf(tune.diablo, validation = 'Mfold', folds = 5, nrepeat = 10)
#> Error in `filter()`:
#> ℹ In argument: `row_number(x) == n()`.
#> ℹ In group 1: `Group.2 = "11"`.
#> Caused by error in `vec_rank()`:
#> ! Unsupported vctrs type `null`.
#> ℹ In file 'type-info.c' at line 189.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
#> Backtrace:
#>      ▆
#>   1. ├─mixOmics::perf(...)
#>   2. ├─mixOmics:::perf.sgccda(...)
#>   3. │ └─base::lapply(...)
#>   4. │   └─mixOmics (local) FUN(X[[i]], ...)
#>   5. │     └─mixOmics (local) repeat_cv_perf.diablo(nrep)
#>   6. │       └─base::lapply(...)
#>   7. │         └─mixOmics (local) FUN(X[[i]], ...)
#>   8. │           └─mixOmics:::predict.block.spls(model[[x]], X.test[[x]], dist = "all")
#>   9. │             └─mixOmics:::internal_predict.DA(...)
#>  10. │               ├─dplyr::filter(.data = data_max, row_number(x) == n())
#>  11. │               └─dplyr:::filter.data.frame(...)
#>  12. │                 └─dplyr:::filter_rows(.data, dots, by)
#>  13. │                   └─dplyr:::filter_eval(...)
#>  14. │                     ├─base::withCallingHandlers(...)
#>  15. │                     └─mask$eval_all_filter(dots, env_filter)
#>  16. │                       └─dplyr (local) eval()
#>  17. ├─dplyr::row_number(x)
#>  18. │ └─vctrs::vec_rank(x, ties = "sequential", incomplete = "na")
#>  19. └─rlang:::stop_internal_c_lib(...)
#>  20.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)

Created on 2023-09-28 with reprex v2.0.2

I contacted the authors of vctrs package (as suggested in the message error), here is his response:

“> ! Unsupported vctrs type null.
suggests that somewhere in perf() you end up with a data frame that has a NULL column in it, which is a corrupt data frame. I’d suggest asking the mixOmics author about this”

I dont know why but if I dont call the epiDisplay package (in the source function), then no more error message.
Cheers!

hi @Simon65

Ah this is great that you found the issue, because your code looked fine to me! (and we are very stretched on the maintenance of the package at the moment).

I’ll keep that in mind, thanks for the feedback!

Kim-Anh