Diablo outputs for publication

gracava96 · June 3, 2025, 2:31pm

Hello, I am currently using a multilevel Diablo analysis for my data to extract possible biomarkers of inflammation looking at the most important features separating the time points after infection. My design is: 11 plasma metabolites, 11 saliva metabolites, 38 plasma amino acids, 40 saliva amino acids and 19 blood cells counts. All the 5 blocks have 40 observations, divided by 5 time points, so 8 subjects x group (time). When running the perf() function using 5-folds and 50 nrepeat, the optimal number of comp is 5.
I have doubts about how to proceed to make the graphs for publication purposes, such as the correlation circle plot or circus plot and the loadings plot. Since the ncomp that was associated with the lowest BER value was 5, does this imply that for publication I should plot comp1 vs comp5 and only focus on the loadings weights on comp5?
Or is it advisable to only plot the first two components?

Thank you in advance for your time.

evahamrud · June 13, 2025, 4:24am

Hi @gracava96,

Exactly which visualisations you generate for publication will depend very much on your personal preference and the message you are trying to relay. However, I would highlight that if perf() found that the optimal comp is 5, this means that a model made of components 1, 2, 3, 4 and 5 is the optimal model (not just component 5).

In general, the importance of the components with 1 containing the most information, which is why plotting components 1 and 2 is common. However, if one of the other components separates one of your groups of interest more clearly, you can also make sample plots with any of those components in any combination (1 and 5, 1 and 2, 2 and 5, etc).

I hope that helps.
Eva

gracava96 · June 24, 2025, 1:31pm

Hi Eva, thank you very much for your answer.
I still have a doubt in case the following situation happens:

lflps.perf.diablo = perf(lflps.final.diablo.model, validation = ‘Mfold’,

                     folds = 5, nrepeat = 50,

                     dist = 'centroids.dist')

lflps.perf.diablo$MajorityVote.error.rate
$centroids.dist
comp1 comp2 comp3 comp4 comp5
0 0.5050 0.0075 0.0000 0.0025 0.0025
2 0.5250 0.2800 0.2400 0.2425 0.2475
4 0.6125 0.5800 0.6150 0.5175 0.4500
6 0.6000 0.4125 0.3250 0.2650 0.2100
12 0.2750 0.0575 0.0175 0.0250 0.0300
Overall.ER 0.5035 0.2675 0.2395 0.2105 0.1880
Overall.BER 0.5035 0.2675 0.2395 0.2105 0.1880

lflps.perf.diablo$WeightedVote.error.rate
$centroids.dist
comp1 comp2 comp3 comp4 comp5
0 0.4175 0.0075 0.0000 0.0025 0.0025
2 0.3075 0.1050 0.1300 0.1175 0.1475
4 0.4825 0.4075 0.3700 0.2750 0.2100
6 0.3750 0.1900 0.1475 0.1275 0.0750
12 0.2100 0.0475 0.0150 0.0200 0.0200
Overall.ER 0.3585 0.1515 0.1325 0.1085 0.0910
Overall.BER 0.3585 0.1515 0.1325 0.1085 0.0910

Given that comp 1 and 2 have higher overall BER, if I decide to publish plots using comp 1 and 2 because containing the most information, would it be wrong or not reliable?

I am trying to understand what is the best to do since inlcuding all the combinations of the 5 components would be too long.

Thank you again for your help!

Topic		Replies	Views
Analytical issues using DIABLO Analysis	2	737	April 13, 2022
Choice of components for DIABLO Analysis	5	147	May 16, 2024
Number of variables per components error Bugs	3	326	February 11, 2022
Generic questions about DIABLO: perf, keepX and no variable selection Support	5	1381	December 11, 2022
Perf on DIABLO with one component Support	5	991	December 7, 2020

Diablo outputs for publication

Related topics