Continuous response variable Y in DIABLO?

Santi · September 9, 2019, 6:41am

I am looking to integrate a large transcriptomic data set using DIABLO. I was just wondering, can my outcome data be continuous (such as age) or do I need to make it categorical? Thank you

aljabadi · September 9, 2019, 7:26am

Hi Santi,

Thanks for using mixOmics!

If you wish to use continuous Y you can use block.spls without variable selection on Y and mode = "regression". You must ensure you provide Y (age) as a matrix with one column. See example below:

library(mixOmics)
data("breast.TCGA")
# this is the X data as a list of mRNA and miRNA
X_block = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna)
numeric_Y = as.matrix(breast.TCGA$data.train$protein[,1]) ## your "age" data goes here
dim(numeric_Y)
# set up a full design where every block is connected
design = matrix(1, ncol = length(data), nrow = length(data),
                dimnames = list(names(data), names(data)))
diag(design) =  0
# set number of component per data set
ncomp = 2
# set number of variables to select, per component and per data set (this is set arbitrarily)
list.keepX = list(mrna = rep(20, 2), mirna = rep(10,2))

TCGA.block.spls = block.spls(X = X_block, Y = numeric_Y, mode = "regression",
                             ncomp = ncomp, keepX = list.keepX, design = design)
TCGA.block.spls

By using such a method, you assume there’s a continuous relationship between predictors and response (age). You have to ensure this is a valid assumption (for example an average person’s height and weight increase with age but up to a certain age only).

As per DIABLO, using age directly as a response variable in Discriminatory Analysis is not advisable, unless you create and use relevant and distinct categorical variables (baby, adult, elderly etc) from it.

Hope it helps.

kimanh.lecao · September 9, 2019, 8:40am

I would like to add to Al’s answer that choosing the optimal keepX values in the context of block.spls is not straightforward, so we have not developed nor implemented a tuning function for this (whereas it is implemented for block.splsd a.k.a DIABLO).

However, if you are only interested in a first pass exploratory analysis (looking at the plots, identifying the top variables) that may do when you set your own keepX values. Let us know otherwise if you wish to go further in the analysis.

Kim-Anh

YEE99 · February 21, 2023, 3:45am

Hi, I have a few questions I would like to ask, if I were to use DIABLO with continuous response, it would be best to use the “block.spls” function, but to get the optimal KeepX values, is there any recommended method to use, like cross-validation or such ?

kimanh.lecao · March 2, 2023, 10:41pm

hi @YEE99,

It’s a bit tricky (or at least the output of analysis is not straightforward). You could use cross-validation and assess the quality of the prediction of Y. But we have only implemented the function for sPLS for now because I think it requires further methodological development first (you can have a look for some inspiration with predict / tune / perf).

Kim-Anh

Topic		Replies	Views
Is it possible to run DIABLO (or sPLS-DA) on a Y containing a matrix with multiple variables? Analysis	1	354	July 31, 2022
DIABLO without outcome variables? Analysis	1	28	May 9, 2025
sPLSDA - Categorical predictors and dealing with confounders Analysis	1	73	February 25, 2025
DIABLO - Questions about TCGA case study Support	5	599	August 2, 2021
Convert continuous response variable to category response variables in DIABLO Analysis	1	292	June 21, 2021

Continuous response variable Y in DIABLO?

Related topics