Hello, how can I extract the prediction values for each sample in my training dataset (with internal cross-validation) using Diablo analysis? I have the overall performance error, but would like to know which sample is not correctly classified or which are borderline. Thank you so much.
Hi @Andreas,
You can use the perf
function to evaluate the model performance and inspect the predictions for each class. See example below:
data("breast.TCGA")
# this is the X data as a list of mRNA, miRNA and proteins
data = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna,
protein = breast.TCGA$data.train$protein)
# set up a full design where every block is connected
design = matrix(1, ncol = length(data), nrow = length(data),
dimnames = list(names(data), names(data)))
diag(design) = 0
design
#> mrna mirna protein
#> mrna 0 1 1
#> mirna 1 0 1
#> protein 1 1 0
# set number of component per data set
ncomp = c(2)
# set number of variables to select, per component and per data set (this is set arbitrarily)
list.keepX = list(mrna = rep(20, 2), mirna = rep(10,2), protein = rep(10, 2))
TCGA.block.splsda = block.splsda(X = data, Y = breast.TCGA$data.train$subtype,
ncomp = ncomp, keepX = list.keepX, design = design)
#> Design matrix has changed to include Y; each block will be
#> linked to Y.
#evaluate the performance using repeated CV and 'max.dist' measure
perf.res <- perf(TCGA.block.splsda, folds = 5, nrepeat = 3, dist = 'max.dist')
## view the predicted classes using the mrna block for the first repeat
perf.res$class$nrep1$mrna
#> $max.dist
#> comp1 comp2
#> A0JL "Basal" "Basal"
#> A12V "Basal" "Basal"
#> A0AT "Basal" "Basal"
#> A0B3 "Basal" "Basal"
#> A04D "Basal" "Basal"
#> A1AY "Basal" "Basal"
#> A0AR "Basal" "Basal"
#> A04T "Basal" "Basal"
#> A0RT "Basal" "Basal"
#> A0TX "LumA" "Her2"
#> A09G "LumA" "Her2"
#> A12L "Basal" "Her2"
#> A152 "LumA" "Her2"
#> A12Q "LumA" "Her2"
#> A0RH "LumA" "Her2"
#> A0CT "LumA" "LumA"
#> A0BM "LumA" "LumA"
#> A0CD "LumA" "LumA"
#> A0I8 "LumA" "LumA"
#> A12Y "LumA" "LumA"
#> A03L "LumA" "Her2"
#> A1AL "LumA" "LumA"
#> A1AV "LumA" "LumA"
#> A0EX "LumA" "LumA"
#> A0FS "LumA" "LumA"
#> A0RO "LumA" "LumA"
#> A0JF "LumA" "LumA"
#> A0RV "LumA" "LumA"
#> A0XN "LumA" "LumA"
#> A15R "LumA" "LumA"
#> A1AI "Basal" "Basal"
#> A13E "Basal" "Basal"
#> A04U "Basal" "Basal"
#> A0D0 "Basal" "Basal"
#> A0SX "Basal" "Basal"
#> A131 "Basal" "Her2"
#> A0AL "Basal" "Basal"
#> A0FL "Basal" "Basal"
#> A04P "Basal" "Basal"
#> A0EE "Basal" "Her2"
#> A14P "Basal" "Her2"
#> A12T "LumA" "LumA"
#> A07I "LumA" "LumA"
#> A08X "Basal" "Her2"
#> A135 "Basal" "Her2"
#> A140 "LumA" "LumA"
#> A0CS "LumA" "LumA"
#> A18N "LumA" "LumA"
#> A0BQ "LumA" "LumA"
#> A086 "LumA" "LumA"
#> A08O "LumA" "LumA"
#> A0EW "LumA" "LumA"
#> A0SH "LumA" "LumA"
#> A1AU "LumA" "LumA"
#> A0DV "LumA" "LumA"
#> A0SU "LumA" "LumA"
#> A0T6 "LumA" "LumA"
#> A0RG "LumA" "LumA"
#> A0B0 "LumA" "LumA"
#> A0EI "LumA" "LumA"
#> A143 "Basal" "Her2"
#> A07R "Basal" "Basal"
#> A128 "Basal" "Basal"
#> A0E0 "Basal" "Her2"
#> A0I2 "Basal" "Basal"
#> A0CE "Basal" "Basal"
#> A0SK "Basal" "Basal"
#> A147 "Basal" "Basal"
#> A0DA "Basal" "Basal"
#> A18R "Basal" "Basal"
#> A09X "LumA" "Her2"
#> A1B0 "Basal" "Her2"
#> A12P "Basal" "Her2"
#> A0EQ "Basal" "Basal"
#> A137 "LumA" "Her2"
#> A0TZ "LumA" "LumA"
#> A0DK "LumA" "LumA"
#> A08A "LumA" "LumA"
#> A0T7 "LumA" "LumA"
#> A0J5 "LumA" "LumA"
#> A0YL "LumA" "LumA"
#> A08T "LumA" "LumA"
#> A12B "LumA" "LumA"
#> A1BD "LumA" "LumA"
#> A1AP "LumA" "LumA"
#> A0XW "LumA" "LumA"
#> A1AK "LumA" "LumA"
#> A1B1 "LumA" "LumA"
#> A18S "LumA" "LumA"
#> A0IO "LumA" "LumA"
#> A0RX "Basal" "Basal"
#> A0CM "Basal" "Basal"
#> A0D2 "Basal" "Basal"
#> A0G0 "Basal" "Basal"
#> A0AV "Basal" "Basal"
#> A0U4 "Basal" "Basal"
#> A0B9 "Basal" "Basal"
#> A124 "Basal" "Basal"
#> A14X "Basal" "Basal"
#> A18P "LumA" "Her2"
#> A0D1 "Basal" "Her2"
#> A094 "LumA" "Her2"
#> A0I9 "LumA" "Her2"
#> A12D "LumA" "Her2"
#> A1AT "LumA" "LumA"
#> A0EU "LumA" "LumA"
#> A133 "LumA" "LumA"
#> A07Z "LumA" "LumA"
#> A0W5 "LumA" "LumA"
#> A0DP "LumA" "LumA"
#> A15L "LumA" "LumA"
#> A15E "LumA" "LumA"
#> A06P "LumA" "LumA"
#> A0DS "LumA" "LumA"
#> A04A "LumA" "LumA"
#> A0ES "LumA" "LumA"
#> A0AS "LumA" "Her2"
#> A0BP "LumA" "LumA"
#> A0EA "LumA" "LumA"
#> A146 "LumA" "LumA"
#> A0XU "Basal" "Basal"
#> A0YM "Basal" "Basal"
#> A0T0 "Basal" "Basal"
#> A0WX "Basal" "Basal"
#> A0FJ "Basal" "Basal"
#> A1B6 "Basal" "Basal"
#> A150 "Basal" "Basal"
#> A0T2 "Basal" "Basal"
#> A1AZ "Basal" "Basal"
#> A13Z "Basal" "Her2"
#> A0IK "LumA" "Her2"
#> A0T1 "LumA" "Her2"
#> A04W "LumA" "Her2"
#> A0A7 "LumA" "LumA"
#> A08L "Basal" "Her2"
#> A0XS "LumA" "LumA"
#> A0AZ "LumA" "LumA"
#> A0X0 "LumA" "LumA"
#> A09A "LumA" "LumA"
#> A12X "LumA" "LumA"
#> A0E1 "LumA" "LumA"
#> A18F "LumA" "LumA"
#> A12H "LumA" "LumA"
#> A0RM "LumA" "LumA"
#> A0FD "LumA" "LumA"
#> A0BS "LumA" "LumA"
#> A12A "LumA" "LumA"
#> A0H7 "LumA" "LumA"
#> A08Z "LumA" "LumA"
#> A0W4 "LumA" "LumA"
However, it is not currently possible to directly assess whether a given predicted class was a ‘borderline’ case using either the predicted classes or predicted values. Although, assuming you have enough repeats you can create confusion matrices from the predicted classes to investigate that.
Hope it helps.
Al
Thank you so much, Al. It would be great to get some “probability” of belonging to a group per sample. Looking forward to see how this super-valuable tool develops.
Thanks, @Andreas for your valuable suggestion. We’ll certainly keep that in mind in future developments.