I am using DIABLO to integrate transcriptomics and proteomics in order to identify a signature made up of ~2 transcripts and ~2 genes.
I have identified the signature using block.splsda, and now I would like to use leave one out cross validation to find out the AUC of the combined signature (genes and proteins).
I have used the perf function, however it will not provide the AUCs. This is the code I have ran:
perf.null <- perf(diablo.res.null, validation = ‘loo’, auc = TRUE,
nrepeat = 2, dist = ‘mahalanobis.dist’)
When I do perf.null$auc, auc is empty.
Is there an alternative way to do cross validation and then obtaining the combined cross-validated AUC?
Hi Heather,
Once you have your parameters keepX etc, run a full block.splsda on the whole data set and then: auroc(object).
However, we won’t perform cross validation in this case, so we will follow up on this perf() code in the meantime.
Note: if you do leave-one-out cross-validation, you do not need to repeat the CV, only onces covers all possibilities!
You can now install the latest version in which the perf function calculates the combined AUC (averaged across all blocks) for each component. Also, feel free to use the cpus argument for faster computation.
When I run the perf() function, it apparently uses the sgccda class in stead of the block.splsda. I am not sure whether that is the reason why it is not able to run the auroc() function… so far I have not able to figure out why is does not use the block.splsda and how to fix it:
The auroc function can be applied to the diablo object directly. It cannot be applied to the perf object. You essentially use the perf function to choose the number of components and then evaluate the final diablo model using auroc. That is:
Thanks for your reply!
Ok, then I misunderstood the line you wrote above " You can now install the latest version in which the perf function calculates the combined AUC (averaged across all blocks) for each component."
How can I get the combined AUC for each component? That is the point where I understood I had to run the auroc after the perf function. I unfortunately cannot find the combined AUC of ‘diablo.final’ when using the auroc function.
Thanks for the tweak to compute the combined AUC per component. But the auroc plot function still plots per block per component?
I am currently using mixOmics_6.15.1. I tried looking into the auroc function but the current parameters does not seem to allow plotting those combined AUC. Is something out there that I might be missing?
Thanks again for your amazing suite of methods in mixOmics.
Hi @kimanh.lecao
i split my data into training and test set.
i used the perf function for average AUC for the training set (diablo object).
but now i want to know how to calculate the combined average auc for my test set.
as perf can not be applied to predict object.
You can use the predict function on your test set, as shown at the end of this vignette DIABLO TCGA Case Study | mixOmics and then calculate what your prediction error is, the sensitivity, specificity etc.
I have some confusions regarding the combined AUC of DIABLO when interpretating it. I really hope that I can get your advice about them if possible.
As mentioned in above comments, output of perf() function when we set auc = TRUE is the combined AUC for each component.
I wonder whether it means that when performing the classification, model only use predicted scores from exactly that latent component, OR it means that the model will use upto that number of components for classification
For example, if I run the perf() function to calculate the combined AUC of a block.plsda model with two components, and the output combined AUC of comp 1 = 0.9 and of comp 2 = 0.8. Does it mean that model with one component will have AUC = 0.9 and model with two components will AUC = 0.8, OR it means that in that two-components model, the comp 1 will give us AUC = 0.9 and the comp 2 will give us AUC = 0.8.
In the case if the AUC is given separately for each component, may I ask whether we have any ways to combine the predictions of all components of the model to obtain a single prediction. Of note, I don’t have a test set to use predict () function as my sample size is too small. I found it is quite hard to interpret the AUC of each component separately.
Thank you so much in advance for your kind support. I am looking for your response.
For example, if I run the perf() function to calculate the combined AUC of a block.plsda model with two components, and the output combined AUC of comp 1 = 0.9 and of comp 2 = 0.8. Does it mean that model with one component will have AUC = 0.9 and model with two components will AUC = 0.8, OR it means that in that two-components model, the comp 1 will give us AUC = 0.9 and the comp 2 will give us AUC = 0.8.
Our legend is poorly worded. All our PLS models are sequential, meaning that what you learn from component 2 includes any information learnt previously for component 1.
So in your case:
AUC_comp1 = 0.9
AUC_comp 1 and 2 = 0.8 (a decrease in performance if you add a second component). See my previous post about AUC and how this can be a bit optimistic compared to classification error rate.