Confidence ellipsoids vs. CV-based permuation test

Hello everybody,
I was interested to see if two types of trees “R” vs. “S” type can be differentiated based on their phytochemical profile.
Therefore, I conducted a PLS-DA following the book “Multivariate data integration using R with the mixOmics package”.

To conclude whether the two types can be seperated I conducted an 95CI ellipsoid plot as well as a CV-based permutation test (RVAideMemoire package).
My ellispsoid plots indicate not a significant seperation (overlapping elipsoids).

The permutation test, however, is significatn (p>0.009, CER=0.30) indicating that S and R trees differ in the phytochemical profile.

These two results are contradicting each other. How would you interpret the results. There seems to be a shy pattern of a seperation on the ellipsoid plots but not along the first or the second axis (does that matter?)

Me as a non-expert would have interpreted the results in the following way:
Phytochemical profiles differ between S and R trees (sign. permutation test). However, the effects are very subtl (overlapping ellipsoids). Is that correct?

Thanks for your help
Michael

hi @michal.oskiera,

You did not specify the function you are using for the permutation test, so I can’t provide much advice. If the permutation test is performed on the already learnt PLS-DA model (i.e based on the components), then it will be overfitting.

Given the graphical output that you show, I would say the separation is subtle. I am a bit doubtful of your permutation test result!

Kim-Anh

Dear Kim-Anh,
Thanks for replying.

I am using the RVAideMemoire package and the following function for my permutation test.
MVA.test(scale_X,Y,cmv=TRUE,ncomp=2,model=“PLS-DA”)

ncomp was chosen because I determined in previous steps that this is the optimal number of components for my PLS-DA model.
Was that my mistake? Do I have to provide MVA.test with a unbiased number of components, let’s say 10 components (as I did in the beginning for my PLS-DA model)?

In this case I get the following output

data: scale_X and Y
Model: PLS-DA
10 components maximum
999 permutations
CER = 0.33333, p-value = 0.022

Still significant but less so. So the interpretation would remain the same

scale_X and Y is the data I also used for my PLS-DA model (no new data).
Is that wrong? I would not really understand why the test would overfit like that as it is “independent” from my previous PLS-DA analysis (ie. not receiving any “information” from my original PLS-DA model) except for the information about the components (see above).

How would I have to specify my test otherwise?

Many thanks,
Michael

Hi there,
Any feedback regarding my last post/questions would be very much appreciated.
Thanks
Mike

hi @eimichae,

I only monitor this forum once a week (or less) because of other time commitments.

What I see from the MVA output is that your classification error rate CER is still quite high and that reflect what you observe on the PLS-DA plot. The permutation p-value tells you that the difference between the groups is significantly different from random, it does not necessarily reflect the quality of the separation between groups (which is reflected by the CER).

Kim-Anh

Dear Kim-Anh,
Thanks a lot for your answer. Good to know anf much appreciated that you check on the forum once a week.

A CER of 0.33 in this model would mean that roughly every third sample is wrongly classified, right? What CER would you consider to be associated with a high-quality classification? CER <0.1?

Cheers,
Michael

hi @eimichae
There is not hard threshold for classification performance, it depends on your data. Sometimes 0.3 can be considered pretty good.

Kim-Anh