plotIndiv() legend doubles up on factors

Hi there,

After running PLS-DA and clustering, I am trying to see if the samples that fall outside the confidence ellipses are due to other reasons in the data, such as where the data was collected. Here, I have left off the ellipses and instead recolored the samples by collection site.

plotIndiv(plsda_clean_fin,
ind.names = F,
legend = T,
#ellipse = T,
title = “PLS-DA of clean dataset with NAs”,
group = clean_y_1$landscape)

The landscape variable is a factor, where the levels are the location (“City East Metro”, etc.)

str(clean_y_1$landscape)
Factor w/ 19 levels “APY Lands”,“Remote Far North”,…: 14 14 14 14 14 14 14 14 14 14 …

  • attr(*, “label”)= chr " Location Landscape"

The legend shows both the numeric entry in the column, and then also the factor levels separately.

Is there a way to combine them? Have the color and the symbol as a single legend with the level instead of the numbers?

Thanks!

hi @lizak

Yes I think there would be a way but I am a bit unclear about the overlap between your Y in your PLSDA object (should be shown by default already in the plot) and the clean_y_1$landscape. Seems to me that they are the same.

Have a look at the help file for plotIndiv for a PLSDA object (?plotIndiv) as it shows numerous examples, some that might be useful for your

## plot of individuals for objects of class 'plsda' or 'splsda'
# ----------------------------------------------------
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- breast.tumors$sample$treatment

splsda.breast <- splsda(X, Y,keepX=c(10,10),ncomp=2)

# default option: note the outcome color is included by default!
plotIndiv(splsda.breast)

# also check ?background.predict for to visualise the prediction
# area with a plsda or splsda object!



# default option with no ind name: pch and color are set automatically
plotIndiv(splsda.breast, ind.names = FALSE, comp = c(1, 2))

# default option with no ind name: pch and color are set automatically, 
# with legend
plotIndiv(splsda.breast, ind.names = FALSE, comp = c(1, 2), legend = TRUE)

# trying the different styles
plotIndiv(splsda.breast, ind.names = TRUE, comp = c(1, 2),
          ellipse = TRUE, style = "ggplot2", cex = c(1, 1))
plotIndiv(splsda.breast, ind.names = TRUE, comp = c(1, 2),
          ellipse = TRUE, style = "lattice", cex = c(1, 1))

# changing pch of the two groups
plotIndiv(splsda.breast, ind.names = FALSE, comp = c(1, 2),
          pch = c(15,16), legend = TRUE)

# creating a second grouping factor with a pch of length 3,
#  which is recycled to obtain a vector of length n
plotIndiv(splsda.breast, ind.names = FALSE, comp = c(1, 2),
          pch = c(15,16,17), legend = TRUE)

#same thing as
pch.indiv = c(rep(15:17,15), 15, 16) # length n
plotIndiv(splsda.breast, ind.names = FALSE, comp = c(1, 2),
          pch = pch.indiv, legend = TRUE)

# change the names of the second legend with pch.levels
plotIndiv(splsda.breast, ind.names = FALSE, comp = c(1, 2),
          pch = 15:17, pch.levels = c("a","b","c"),legend = TRUE)

Kim-Anh

Hi Kim-Anh,

Thanks for the response =)

You’ve hit the nail on the head, clean_y_1$landscape contains the information for both legends and for some reason, these are being pulled into two different legends. I’ve looked through the help file and any online help forums and haven’t found a way to consolidate the two legends.
The issue as far as I can see, is that clean_y_1$landscape is a factor where the symbols legend (1 - 18) is represented one way, and then the character labels for the factors (colors) are pulled out as a separate legend, where it should be a single legend ideally assigning a color and symbol to each character variable.

Perhaps I should have mentioned this from the get go, but the discriminant for PLS-DA as the input to plotIndiv() was not clean_y_1$landscape, but a different variable around disease status. Here I wanted to see if the lack of distinct clustering between disease states found in the PLS-DA came from the sample collection sites.