Unfortunately, that function is pretty old and isnβt the most intuitive to use. It doesnβt quite work the same as the other tune()
functions. Youβre right in that the tutorial does not cover how to use it.
The way youβd select these values is to compute the optimal values for each component individually and iteratively. This is tedious and prone to error.
What Iβve done for you (and anyone else who stumbles along here) is created a little function for you to copy to your script:
tune.splslevel.wrapper <- function(X,
Y,
design,
ncomp,
test.keepX,
test.keepY) {
ncomps <- 1:ncomp # set range of ncomp values to use
final.obj <- list() # initialise returned object
# default values are -1 for debugging
atX <- rep(-1, length(ncomps)-1) # already.tested.X
atY <- rep(-1, length(ncomps)-1) # already.tested.Y
for (ncomp in ncomps) {
# previously calculated optimal keepX/Y value
tmp.atX <- c(atX[1:ncomp-1])
tmp.atY <- c(atY[1:ncomp-1])
# need to be NULL for first iteration
if (ncomp==1) { tmp.atX <- NULL; tmp.atY <- NULL }
cat("=== NCOMP:", ncomp, "===\n")
model <- suppressMessages(tune.splslevel(X, Y, multilevel=design,
mode="regression",
ncomp=ncomp,
test.keepX = test.keepX,
test.keepY = test.keepY,
already.tested.X <- tmp.atX,
already.tested.Y <- tmp.atY))
# extract position in cor.value which corresponds to max
opt.pos <- which(model$cor.value==max(model$cor.value), arr.ind=T)
# set these for future iterations
atX[ncomp] <- test.keepX[opt.pos[1]]
atY[ncomp] <- test.keepY[opt.pos[2]]
# add to returned object
final.obj[["X"]][ncomp] <- test.keepX[opt.pos[1]]
final.obj[["Y"]][ncomp] <- test.keepY[opt.pos[2]]
final.obj[["cor"]][ncomp] <- max(model$cor.value)
}
# determine ncomp with optimal correlation
cor.max <- which(final.obj[["cor"]]==max(final.obj[["cor"]]))
final.obj[["opt.ncomp"]] <- cor.max
return(final.obj)
}
tune.splslevel.wrapper()
will allow you to use your input as you would in other tune()
functions. Hereβs an example of its use:
library(mixOmics)
data(vac18)
X <- vac18$genes[,1:500]
Y <- vac18$genes[,501:1000]
ML <- vac18$sample
design <- data.frame(sample = ML)
ncomp <- 5
test.keepX = seq(10,100,10)
test.keepY = seq(10,100,10)
tune.splslevel.wrapper(X, Y, design,
ncomp, test.keepX, test.keepY)
#> $X
#> [1] 100 100 100 80 100
#>
#> $Y
#> [1] 100 100 50 100 100
#>
#> $cor
#> [1] 0.9942615 0.9857412 0.9857662 0.9841769 0.9828682
#>
#> $opt.ncomp
#> [1] 1
Created on 2022-08-08 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.2.1 (2022-06-23 ucrt)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz Australia/Sydney
#> date 2022-08-08
#> pandoc 2.18 @ C:/Users/Work/AppData/Local/Pandoc/ (via rmarkdown)
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.1)
#> BiocParallel 1.30.3 2022-06-07 [1] Bioconductor
#> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.1)
#> codetools 0.2-18 2020-11-04 [1] CRAN (R 4.2.1)
#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.1)
#> corpcor 1.6.10 2021-09-16 [1] CRAN (R 4.2.0)
#> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.1)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.1)
#> dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.2.1)
#> ellipse 0.4.3 2022-05-31 [1] CRAN (R 4.2.1)
#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.1)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.1)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.1)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.1)
#> ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.2.1)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
#> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.1)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.1)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.1)
#> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.1)
#> igraph 1.3.4 2022-07-19 [1] CRAN (R 4.2.1)
#> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.1)
#> lattice * 0.20-45 2021-09-22 [1] CRAN (R 4.2.1)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
#> MASS * 7.3-57 2022-04-22 [1] CRAN (R 4.2.1)
#> Matrix 1.4-1 2022-03-23 [1] CRAN (R 4.2.1)
#> matrixStats 0.62.0 2022-04-19 [1] CRAN (R 4.2.1)
#> mixOmics * 6.20.0 2022-04-26 [1] Bioconductor (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.1)
#> pillar 1.8.0 2022-07-18 [1] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
#> plyr 1.8.7 2022-03-24 [1] CRAN (R 4.2.1)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.1)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.1)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.2.1)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
#> rARPACK 0.11-0 2016-03-10 [1] CRAN (R 4.2.1)
#> RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.2.0)
#> Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.1)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.1)
#> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.2.1)
#> rlang 1.0.4 2022-07-12 [1] CRAN (R 4.2.1)
#> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.1)
#> RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.2.1)
#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
#> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.1)
#> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.1)
#> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
#> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.2.1)
#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.1)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.1)
#> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.1)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1)
#> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.1)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
#>
#> [1] D:/Programs/Work Programs/R-4.2.1/library
#>
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
A brief explanation of whats going (follow along with code comments if needed):
- Over the desired range of
ncomp
values (ncomps
), we calculate the optimal keepX
and keepY
value.
- These values are stored in
atX
and atY
. These vectors use -1
as a default. If you see a -1 in the output, something has gone wrong.
- the
opt.ncomp
object in the output is determined via the maximisation of the cor
output object.
This last point requires a bit of explanation if youβre wanting to properly understand it. Essentially, when calculating the optimal tuneX/Y
and ncomp
values, we use the maximisation of βcorrelationβ as our criteria. What is this βcorrelationβ?
When forming an sPLS model (using a given ncomp
, keepX
and keepY
), the tune
functions split the data into training (majority) and testing subsets. A model is trained using the training data. This model then predicts the component projection values of the testing data. The agreement between the predictions of the test samples (predicted component) and the true test sample values in the component space (actual component) is measure by Pearson correlation.
Hope all this helps a bit. Let me know if you have any follow up questions