Strange prediction background and increasing error rate

christoa · September 22, 2020, 6:30am

Hi, I am analsying a data set of ≈39 samples, that are divided into four independent and farily equal groups (9 to 11 subjects in each group). When i tune the PLS-DA, the BER increases when adding comp2 and decreases again after adding comp3. Is there a rational explanation for why this is happening? Also, when i compute and plot the background, it looks very strange. See code and output below:

Code:

list.keepX ← seq(5,100,5)
tune.splsda.PMR ← tune.splsda(X,Y,ncomp = 5,
validation = ‘Mfold’,
folds = 7,
dist = ‘centroids.dist’,
progressBar = TRUE,
measure = “BER”,
test.keepX = list.keepX,
nrepeat = 50,
cpus = 2)

Output:

Background comp.predicted = 1

background.predict(MyResult.splsda.tuned, comp.predicted = 1, dist = “centroids.dist”)

Background: comp.predicted = 2

background.predict(MyResult.splsda.tuned, comp.predicted = 2, dist = “centroids.dist”)

kimanh.lecao · September 22, 2020, 11:34pm

Hi @christoa,
thanks for highlighting this new bug with background.predict, we will get onto it with @aljabadi, there is a color issue.

Regarding your error rate that is unstable, it could be due to your small sample size and the increase in level of noise as you add more components. Your Fold value is very high for n = 39, consider folds = 3 instead with 50 repeats.

Kim-Anh

aljabadi · October 5, 2020, 5:50am

Hi @christoa,

The background colour issue should be fixed now in the latest devel.
Cheers

Al

christoa · October 5, 2020, 6:08am

Thank you @kimanh.lecao and @aljabadi for your help. It helped alot to decrease the folds from 7 to 5. However, I am a little confused that this solved the problem. I remember that @aljabadi wrote, that k/n should stay above 5, and it does if i use 7 folds CV. Did i misinterpret this? Also I am wondering if there is any litterature on how to choose the optimal number of folds?

Kind regards
Christopher

kimanh.lecao · October 18, 2020, 10:39pm

@christoa
You interpretation is correct grant n/k >= 5. There is no literature I believe, but you just need to consider how many samples you would like to have in the test set and also consider the number of repeats. For a n = 39 I’d say folds = 7 is a bit high (we rarely go beyond 5, unless we have a sample size = 100).

Kim-Anh

christoa · October 19, 2020, 5:18am

Once again, thank you very much for the clarification!

christoa · December 23, 2020, 11:38am

Dear @aljabadi and @kimanh.lecao i just noticed that the prediction backgrounds in the following examples are affected by the old colour issue:

Christopher

Topic		Replies	Views
Background predict beyond comp 1 & 2 Suggestions for improvement	1	462	October 18, 2020
Splsda difficulties Analysis	3	852	December 21, 2020
DIABLO perf & tuning Analysis	4	1096	July 23, 2020
PLS-DA with missing '' values predicted in Y Analysis	1	723	April 26, 2020
Help understanding high error rate using PLS-DA Analysis	6	3601	October 21, 2020

Strange prediction background and increasing error rate

Related topics