DIABLO inputs and optimal number of components

hellofuture · December 8, 2021, 3:53pm

Hi DIABLO team,
I have two questions about the input data and the optimal number of components. Can you help me to answer them? Thank you so much!

What is the optimal number of candidates for each omics data before integrating them using DIABLO? For example, I have host mRNA (20,000 transcripts), host miRNA (1,200 miRNAs) and taxonomy data (350 species), do I need to reduce the 20000 mRNA to somewhere around 1,000 before running DIABLO?
May I ask how I can interpret the performance plot when I am tuning the number of components? My interpretation is generally the lower the line the better (i.e., smaller classification error rate), and from 1 to 2 components, it’s better to decrease. And centroids.dist is generally better than max.dist? Are these interpretation correct?

Thank you and best,
ZZ

christoa · December 9, 2021, 3:11pm

Hi @hellofuture,

It is definitely recommended to filter out the transcripts, but there is no precise answer on how many variables to keep. You could try to keep 10-20% of most the variable transcripts or maybe filter out transcripts that are not present in at least 70% of samples within groups of interest.

Yes, this is correct. If the error rate increase when adding more components, it’s a sign of increasing noise levels.

Yes, centroids.dist and especially mahalanobis.dist seems to be more accurate for N-integrations. You can read more about it in the supplementary material (Section 1.3) of this paper

Best
Christopher

hellofuture · December 9, 2021, 4:16pm

Thank you so much, Christopher!

hellofuture · December 10, 2021, 3:46pm

Hi Christopher, I have generated a performance plot, however, I cannot upload this website to show you. I want to know whether I should choose Centroid or Mahalanobis distance from this plot. May I ask if you have an email address where I can email you the plot? Thank you!

Zhaohzong

christoa · December 10, 2021, 9:48pm

Hi @hellofuture, yes of course. My email is cabo@hst.aau.dk

-Christopher

Topic		Replies	Views
Choice of components for DIABLO Analysis	5	230	May 16, 2024
Generic questions about DIABLO: perf, keepX and no variable selection Support	5	1506	December 11, 2022
DIABLO: Handling high dimensionality and tuning keepX Analysis	10	1183	December 11, 2022
Analytical issues using DIABLO Analysis	2	785	April 13, 2022
DIABLO data transformation and tuning Analysis	1	539	February 28, 2022

DIABLO inputs and optimal number of components

Related topics