Working on TCGA data using mixOmics

Dear MixOmics team,

Many thanks for developing mixOmics tools/package that help us a lot to apply different data mining approaches on RNAseq data. I am very interested to do the analysis like what you did with breast cancer from TCGA and expand it a bit on another cancer type.
In your work flow you mentioned that the data is downloaded from TCGA and did not tell (as far as I checked) how you divided them to data.train and data.test (I also checked the diablo paper S2 section). I did downloaded the data but do you have any scripts that can help me to proceed with further analysis/normalization and make the same train data set for another cancer types?

Many many thanks in advance,
Kind Regards,

The protein data served as the splitting factor, as it had the least number of samples available. The training data comprised of 4 data-types (mRNA, miRNA, CpGs and proteins), whereas the test data did not have any proteomics data (see Table 1 in this paper). Since the predictions are made separately for each dataset (block) and compiled using average or majority vote (for a consensus prediction), DIABLO allows for entire datasets to be missing in the test set.

you can find the links to the data compilation and manuscript analysis repos here.

1 Like