Is PLS supervised or unsupervised?

[redacted] I heard terms such as “supervised” and “unsupervised” referring to PLS-DA and PLS respectively, since the first sets preliminar classes to inform the model in advance, while the second doesn’t. However, I found interesting that when using PLS and PLS-DA for grouping samples by observing the score plots, it seems to be accepted by some seniors in the community to refer to the PLS-DA grouping as “supervised classification or clustering”, but not to refer to PLS grouping as “unsupervised classification or clustering” even if one can see how samples classify (cluster) in the PLS score plot in a model where classes were not set (always having done a good design and preprocessing). For me, this is basically “let the data talk” by just running a PLS regression and observing how they relate to each other in the hyper-space of the score plot. So, I call it “unsupervised” because I do not set any classes in advance (different from PLS-DA) and “classification” because I can see how samples are classified according to their properties. I don’t think I can call that “pattern recognition” since that looks more to me like a non-regression method, such as PCA. It seems that “classification” has been reduced to KNN, SIMCA, PLS-DA, … but the word implies more…

The point is that I would like to know your opinion about how it can be defended something so natural (to my mind) like an “unsupervised classification using PLS”, or if the mix of “unsupervised” and “classification” may look a little bit “anti-natural” by their definitions and I should consider renaming as something else. The reason to address this to you is that I observed in your mixomics package (by the way, a good one, congratulations!) that you put PLS inside the category of “unsupervised analysis” (again, “analysis” but not “classification”), so maybe you could help me to see if using “unsupervised classification” in my case may be wrong or acceptable. In my work, we aimed to a context where the field appreciates to use an unsupervised method, but at the same time they use as Y matrix a type of spectroscopy and as X another types of spectroscopy, so this is a regression case that cannot be done by PCA or other “only-X” methods due to preferences of collaborators. And, here I think we have another interesting part: if we use a Y (as in any regression), e.g. in PLS, where the latent variables are rotated in the space to maximize the variance of X, but also the covariance between X and Y, then we always would need to call any Regression as Supervised (Y “guides”/ “supervises” X, let’s say)… So, do we have to ways to see the concept “supervised”, being one the natural from a regression and the other one the consequence of setting classes?
I paste below the text that I saw on your website:

Unsup

Another interesting point here is, if doing the same with a hierarchical PLS model (not DA), where we use as inputs the scores of individual PLS models, can we then call that as “unsupervised classification” when we see the samples grouping in the super-scores plot?

There are different schools on where PLS fits. Originally (in our website) we considered PLS as ‘unsupervised’ to make the distinction with PLS-DA that is a classification technique. In our book, based on some feedback from the reviewers we decided to go for ‘regression’ instead, as, a regression analysis can be seen as ‘guiding’ the fit, similarly to PLS-DA (except that the y response is continuous). So I think we both agree here, although that could be a philosophical discussion :slight_smile:

Here is our diagram from the book:

We will update the website in the coming months to reflect this point. Thanks for the feedback.

Kim-Anh