Categorical data in sPLS

Hi Seung,
regarding your question:
My metadata is categorical. It can be converted to a numeric type, but it just means a group name and no continuity.
I analyzed the data with “Case study with PCA and sPLS on Liver Toxicity data set” and got the following result. The result is almost no difference between the variables. Is this the result because the metadata is categorical? Is there any way to see the influence between variables with categorical metadata?’

If you input the categorical data as is, the sPLS will treat this as integer, so this is not ideal.
One solution is that we use dummy matrices to map the categories into matrices (i.e. if you have one variable with 3 categories, it will transform into a matrix with 0 and 1 and 3 columns, try the function unmap(variable) ) . So you could transform all your categorical variables and concatenate all the dummy matrices as long as you do not select on the Y matrix (so keepX = .... but do not specify keepY ). Otherwise the interpretation will be tricky as sPLS might select one category of a given variable, but not all.

So in short, we have a small work around but it is not ideal! Here is an example to create your Y dummy concatenated matrix:


sex <- factor(c("male", "female", "female", "male", "female", "male", "female"))
food <- factor(c("low", "high", "medium", "high", "low", "medium", "high"))

sex.dummy<- unmap(sex)
colnames(sex.dummy) <- levels(sex)
food.dummy<- unmap(food)
colnames(food.dummy) <- levels(food)

Y <- data.frame(sex.dummy, food.dummy)