Centroids dist vs Centroid in arrow plot

Hi!

I would like to know what does exactly “centroid” means in the arrow plot? I dont’t find that information in the manual. In addition, when assessing the performance of the model, I can check its performance with max, centroids and mahalanobis distance. Are both centroid the same? In that case, is any way to show the max.dist in the arrow plots instead of centroids? I am asking because the performance of the model is better with max than with centroids dist.

Thanks!

The term “centroid” when talking about the arrow plot and model prediction are different, but related. They both use the same base concept, of a centroid being the average of a set of points in N dimensional space.

When looking at an arrow plot, each sample has a centroid, which essentially takes the average component values to dictate X and Y position. Eg, if x axis is component 1 and y axis is component 2: for sample 1, its component 1 values are averaged across each block. This is then used as the x axis position of this sample’s centroid.

In contrast, prediction uses the term “centroid” to refer to a different process. I think the best resource to explain can be found at our website, click here.

If max.dist is yielding better results, its likely that there is a clear decision boundary in the reduced dimensional space between samples of your different groups - which is great! However, if the centroid and mahalanobis predictions have a significant drop in performance, then it may be a case of overfitting.