I am looking to integrate a large transcriptomic data set using DIABLO. I was just wondering, can my outcome data be continuous (such as age) or do I need to make it categorical? Thank you
Thanks for using mixOmics!
If you wish to use continuous Y you can use
block.spls without variable selection on Y and
mode = "regression". You must ensure you provide Y (age) as a matrix with one column. See example below:
library(mixOmics) data("breast.TCGA") # this is the X data as a list of mRNA and miRNA X_block = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna) numeric_Y = as.matrix(breast.TCGA$data.train$protein[,1]) ## your "age" data goes here dim(numeric_Y) # set up a full design where every block is connected design = matrix(1, ncol = length(data), nrow = length(data), dimnames = list(names(data), names(data))) diag(design) = 0 # set number of component per data set ncomp = 2 # set number of variables to select, per component and per data set (this is set arbitrarily) list.keepX = list(mrna = rep(20, 2), mirna = rep(10,2)) TCGA.block.spls = block.spls(X = X_block, Y = numeric_Y, mode = "regression", ncomp = ncomp, keepX = list.keepX, design = design) TCGA.block.spls
By using such a method, you assume there’s a continuous relationship between predictors and response (age). You have to ensure this is a valid assumption (for example an average person’s height and weight increase with age but up to a certain age only).
As per DIABLO, using age directly as a response variable in Discriminatory Analysis is not advisable, unless you create and use relevant and distinct categorical variables (baby, adult, elderly etc) from it.
Hope it helps.
I would like to add to Al’s answer that choosing the optimal keepX values in the context of block.spls is not straightforward, so we have not developed nor implemented a tuning function for this (whereas it is implemented for block.splsd a.k.a DIABLO).
However, if you are only interested in a first pass exploratory analysis (looking at the plots, identifying the top variables) that may do when you set your own keepX values. Let us know otherwise if you wish to go further in the analysis.