Hello, I am currently using a multilevel Diablo analysis for my data to extract possible biomarkers of inflammation looking at the most important features separating the time points after infection. My design is: 11 plasma metabolites, 11 saliva metabolites, 38 plasma amino acids, 40 saliva amino acids and 19 blood cells counts. All the 5 blocks have 40 observations, divided by 5 time points, so 8 subjects x group (time). When running the perf() function using 5-folds and 50 nrepeat, the optimal number of comp is 5.
I have doubts about how to proceed to make the graphs for publication purposes, such as the correlation circle plot or circus plot and the loadings plot. Since the ncomp that was associated with the lowest BER value was 5, does this imply that for publication I should plot comp1 vs comp5 and only focus on the loadings weights on comp5?
Or is it advisable to only plot the first two components?
Thank you in advance for your time.
Hi @gracava96,
Exactly which visualisations you generate for publication will depend very much on your personal preference and the message you are trying to relay. However, I would highlight that if perf()
found that the optimal comp is 5, this means that a model made of components 1, 2, 3, 4 and 5 is the optimal model (not just component 5).
In general, the importance of the components with 1 containing the most information, which is why plotting components 1 and 2 is common. However, if one of the other components separates one of your groups of interest more clearly, you can also make sample plots with any of those components in any combination (1 and 5, 1 and 2, 2 and 5, etc).
I hope that helps.
Eva
Hi Eva, thank you very much for your answer.
I still have a doubt in case the following situation happens:
lflps.perf.diablo = perf(lflps.final.diablo.model, validation = ‘Mfold’,
-
folds = 5, nrepeat = 50,
-
dist = 'centroids.dist')
lflps.perf.diablo$MajorityVote.error.rate
$centroids.dist
comp1 comp2 comp3 comp4 comp5
0 0.5050 0.0075 0.0000 0.0025 0.0025
2 0.5250 0.2800 0.2400 0.2425 0.2475
4 0.6125 0.5800 0.6150 0.5175 0.4500
6 0.6000 0.4125 0.3250 0.2650 0.2100
12 0.2750 0.0575 0.0175 0.0250 0.0300
Overall.ER 0.5035 0.2675 0.2395 0.2105 0.1880
Overall.BER 0.5035 0.2675 0.2395 0.2105 0.1880
lflps.perf.diablo$WeightedVote.error.rate
$centroids.dist
comp1 comp2 comp3 comp4 comp5
0 0.4175 0.0075 0.0000 0.0025 0.0025
2 0.3075 0.1050 0.1300 0.1175 0.1475
4 0.4825 0.4075 0.3700 0.2750 0.2100
6 0.3750 0.1900 0.1475 0.1275 0.0750
12 0.2100 0.0475 0.0150 0.0200 0.0200
Overall.ER 0.3585 0.1515 0.1325 0.1085 0.0910
Overall.BER 0.3585 0.1515 0.1325 0.1085 0.0910
Given that comp 1 and 2 have higher overall BER, if I decide to publish plots using comp 1 and 2 because containing the most information, would it be wrong or not reliable?
I am trying to understand what is the best to do since inlcuding all the combinations of the 5 components would be too long.
Thank you again for your help!