Continuous response variable Y in DIABLO?

I am looking to integrate a large transcriptomic data set using DIABLO. I was just wondering, can my outcome data be continuous (such as age) or do I need to make it categorical? Thank you

Hi Santi,

Thanks for using mixOmics!

If you wish to use continuous Y you can use block.spls without variable selection on Y and mode = "regression". You must ensure you provide Y (age) as a matrix with one column. See example below:

library(mixOmics)
data("breast.TCGA")
# this is the X data as a list of mRNA and miRNA
X_block = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna)
numeric_Y = as.matrix(breast.TCGA$data.train$protein[,1]) ## your "age" data goes here
dim(numeric_Y)
# set up a full design where every block is connected
design = matrix(1, ncol = length(data), nrow = length(data),
                dimnames = list(names(data), names(data)))
diag(design) =  0
# set number of component per data set
ncomp = 2
# set number of variables to select, per component and per data set (this is set arbitrarily)
list.keepX = list(mrna = rep(20, 2), mirna = rep(10,2))

TCGA.block.spls = block.spls(X = X_block, Y = numeric_Y, mode = "regression",
                             ncomp = ncomp, keepX = list.keepX, design = design)
TCGA.block.spls

By using such a method, you assume there’s a continuous relationship between predictors and response (age). You have to ensure this is a valid assumption (for example an average person’s height and weight increase with age but up to a certain age only).

As per DIABLO, using age directly as a response variable in Discriminatory Analysis is not advisable, unless you create and use relevant and distinct categorical variables (baby, adult, elderly etc) from it.

Hope it helps.

I would like to add to Al’s answer that choosing the optimal keepX values in the context of block.spls is not straightforward, so we have not developed nor implemented a tuning function for this (whereas it is implemented for block.splsd a.k.a DIABLO).

However, if you are only interested in a first pass exploratory analysis (looking at the plots, identifying the top variables) that may do when you set your own keepX values. Let us know otherwise if you wish to go further in the analysis.

Kim-Anh