Some doubts about the Case Study of DIABLO with Breast TCGA Dataset

ada · March 1, 2024, 11:00am

This case study has a particularly good classification effect. The classification error rate is only 0.02539683. I think this may be related to the high correlations among omics data.
The correlations between the first component of each dataset for all three PLS models are very strong. The correlation coefficient is as high as 0.88, 0.83, 0.93.
Usually, the correlation between the different omics data we process cannot reach such a high level. Did the data used in this study use any methods to extract variables with strong correlations during the data processing? The article did not provide much description on this aspect. But I want to know, I hope someone can help me, I am very grateful.

kimanh.lecao · March 7, 2024, 10:13pm

hi @ada

You might be interpreting the results incorrectly. A classification error rate that is low is good.

The correlations are between the PLS components, which summarise the information from both data sets. The criterion is block.splsda is to maximise this correlation so that makes sense that is is high. It is not the correlation between pairs of variables. Variables were filtered to be highly variables, mostly so that we could store them in Bioconductor, not to bias the results. You can read more about PLS methods etc in our resources.

Kim-Anh

ada · March 11, 2024, 8:01am

Thank you for your reply, but I still have some questions to consult with. In the breast cancer data included in the mixomics package, the sample size of mRNA is 220, and the number of characteristic variables is 200. But the number of variables I obtained, using the preprocessing method of DIABLO: an integrated approach for identifying key molecular drivers from multi omics assessments, is much higher than 200. How is breast cancer data processed in mixoimcs. thanks

kimanh.lecao · March 14, 2024, 12:22am

hi @ada

From memory for this data example, we randomly selected 220 out of the most variable mRNAs because of storage issues in the package.

For you, you should just pre-filter the normalised data and keep the top most highly variable features (between 500 up to 5,000 for each omics), then do the analysis.

Kim-Anh

ada · March 14, 2024, 1:09am

Thank you. But why is mRNA using normalized data , while miRNA using raw data.
mRNA: illuminahiseq_rnaseqv2-RSEM_genes_normalized;
miRNA: illuminahiseq_mirnaseqmiR_gene_expression and illuminaga_mirnaseq-miR_gene_expression

kimanh.lecao · March 21, 2024, 9:55pm

Hi @ada

I am not sure what you are referring to, but in the example itself, all data are normalised, and this is how they should be before being input into mixOmics analyses.

data(‘breast.TCGA’)
breast.TCGA$data.train$mirna[1:5,1:5]
breast.TCGA$data.train$mrna[1:5,1:5]

Kim-Anh

ada · March 22, 2024, 12:34am

Sorry, maybe I didn’t describe it clearly. What I want to ask is that the raw data comes from http://firebrowse.org/ . Why is mRNA using iluminahiseq_rnaseqv2_RSEM_genes_normalized (this is normalized data)? But miRNA uses Illuminahiseq_mircaseqmiR_geneuexpression and Illuminaga_mircaseq miR_geneuexpression (this is raw data).

I also want to know if the data was screened for differentially expressed genes using limma voom, or if it was only normalized using limma voom.

Thank you very much.

kimanh.lecao · April 18, 2024, 11:00pm

hi @ada

No the data were not pre filtered based De genes. We do not recommend doing this as you will introduce overfitting in the analysis.
All data were normalised.

Kim-Anh

Topic		Replies	Views
How to link data in DIABLO Analysis	1	678	March 22, 2020
Using PLS to determine the concordance between omics Analysis	1	363	December 3, 2020
Supervised binary classification of 2 distinct datasets sharing only a small number of common samples Analysis	3	42	November 7, 2024
Working on TCGA data using mixOmics Analysis	1	367	September 9, 2019
Filtering features for classification using Diablo Analysis	2	389	April 13, 2021

Some doubts about the Case Study of DIABLO with Breast TCGA Dataset

Related topics