Strange prediction background and increasing error rate

Hi, I am analsying a data set of ≈39 samples, that are divided into four independent and farily equal groups (9 to 11 subjects in each group). When i tune the PLS-DA, the BER increases when adding comp2 and decreases again after adding comp3. Is there a rational explanation for why this is happening? Also, when i compute and plot the background, it looks very strange. See code and output below:

Code:

list.keepX <- seq(5,100,5)
tune.splsda.PMR <- tune.splsda(X,Y,ncomp = 5,
validation = ‘Mfold’,
folds = 7,
dist = ‘centroids.dist’,
progressBar = TRUE,
measure = “BER”,
test.keepX = list.keepX,
nrepeat = 50,
cpus = 2)

Output:

Background comp.predicted = 1

background.predict(MyResult.splsda.tuned, comp.predicted = 1, dist = “centroids.dist”)

Background: comp.predicted = 2

background.predict(MyResult.splsda.tuned, comp.predicted = 2, dist = “centroids.dist”)

Hi @christoa,
thanks for highlighting this new bug with background.predict, we will get onto it with @aljabadi, there is a color issue.

Regarding your error rate that is unstable, it could be due to your small sample size and the increase in level of noise as you add more components. Your Fold value is very high for n = 39, consider folds = 3 instead with 50 repeats.

Kim-Anh

Hi @christoa,

The background colour issue should be fixed now in the latest devel.
Cheers

Al

Thank you @kimanh.lecao and @aljabadi for your help. It helped alot to decrease the folds from 7 to 5. However, I am a little confused that this solved the problem. I remember that @aljabadi wrote, that k/n should stay above 5, and it does if i use 7 folds CV. Did i misinterpret this? Also I am wondering if there is any litterature on how to choose the optimal number of folds?

Kind regards
Christopher

@christoa
You interpretation is correct grant n/k >= 5. There is no literature I believe, but you just need to consider how many samples you would like to have in the test set and also consider the number of repeats. For a n = 39 I’d say folds = 7 is a bit high (we rarely go beyond 5, unless we have a sample size = 100).

Kim-Anh

1 Like

Once again, thank you very much for the clarification!