Biplot from PLSDA

Hi,

I’m trying to make a biplot from PLSDA results. I know I can use biplot function, but the customisation is really very limited. I cannot plot by grouping data or change colors.

Is there a way to create a biplot more customised? I would like to do something like that.

Thanks very much.

Enzo

Hi @enzo

The current biplot function we have implemented is for pca functions only. Can you please give a reproducible example on how you made such plot for a plsda object?

#---------------01. SCORES MATRIX-----------------------------------
plsda.scores<-plsda.res$variates[[1]] #extracting X scores
plsda.scores<-as.data.frame(plsda.scores) #convert to dataframe

plsda.scores<-cbind(info,plsda.scores) #merging plsda score results and info data

#----------------02. LOADINGS MATRIX----------------------------------

plsda.loadings<-plsda.res$loadings[[1]] #extracting loadings
plsda.loadings<-as.data.frame(plsda.loadings) #convert to dataframe
plsda.loadings<-rownames_to_column(plsda.loadings,var=“Type”) #label column to plot

#--------------03. BIPLOT--------------------------------

#------------prepare loading data---------------------
plsda.loadings2<-plsda.loadings
plsda.loadings2$x<-“Element” #variables in my study are chemical elements, so variables group
plsda.loadings2[,“Deposit subtype”]<-“non” #this column is empty by I need it in order to have same columns that scores table
plsda.loadings2[,“Altered rocks”]<-"" #this column is empty by I need it in order to have same columns that scores table
#------------prepare scores data---------------------
plsda.scores2<-plsda.scores
plsda.scores2$Type<-"" #observations column is empty because I label only variables
plsda.scores2$x<-“Sample” #group of samples
plsda.scores2<- mutate_if(plsda.scores2,is.numeric,~ . / 9) # scores values are modified in order to scale them to loadings

#-----merge scores and loadings--------------
biplot.data<-rbind(plsda.loadings2,plsda.scores2[,c(5,11,13,18:22)]) # merge

I hope that can help you.

Really this is not a biplot, it’s just a way to display scores and loadings in same plot.

Hi @enzo,

Thanks for the code. We have in fact implemented a customisable biplot function which is still in development :slight_smile:.

Below is an example with installation instructions:

BiocManager::install('ajabadi/mixOmics@devel', ask=FALSE, update = FALSE)
library(mixOmics)
data(breast.tumors)
X <- breast.tumors$gene.exp
colnames(X) <- paste0('GENE_', colnames(X))
rownames(X) <- paste0('SAMPLE_', rownames(X))
Y <- breast.tumors$sample$treatment

plsda.breast <- plsda(X, Y, ncomp = 2)
biplot(plsda.breast, cutoff = 0.72)

## remove arrows
biplot(plsda.breast, cutoff = 0.72, var.arrow.col = NULL, var.names.size = 4)

Created on 2020-10-27 by the reprex package (v0.3.0)

You can see more examples a ?mixOmics::biplot. I’d appreciate your feedback on what I could further improve :slight_smile: .

Hope it helps,

Al

Dear @aljabadi ,

Many thanks for the code. After install your “addon”, I am able to produce a biplot with PLS-DA.
However, I noticed that there were some other changes when running with mixOmics, for example, mypls$explained_variance was changed to mypls$prop_expl_var, and that created some issues my old code.
I also cannot run the perf function normally.
For the example:
data(nutrimouse)
X ← nutrimouse$gene
Y ← nutrimouse$lipid
MyResult.pls ← pls(X,Y, ncomp = 4)
perf.pls ← perf(MyResult.pls, validation = “loo”)

perf.pls$Q2.total now gives me NULL

Is there anyway that I can go back with the “original” function of mixOmics.
Many thanks,
Tuan

Hi @Tuan173,

We did indeed add some substantial changes to the perf function. The measures can now be accessed via $measure where you can also see new measures as described in the docs.

However, I added a new issue to reflect these changes in the docs.

If you wish to install the snapshot of the package at the time this feature was implemented, you can simply do:

BiocManager::install('mixOmicsTeam/mixOmics@1923e1a3987c7bd4de1c9610dd157e61639c472e')

Then restart R and ensure that the right version is loaded:

> packageVersion('mixOmics')
[1] ‘6.13.114’

Fingers crossed that dependencies won’t cause an issue as they might be too new for the package.

Dear @aljabadi ,

I managed to obtain a biplot for my PLSDA, but I am working with a huge number of variables, so the graph shows a cloud of arrows.
I would like to be able to select which variables are shown in the biplot, because I’m interested in some groups more than in others. Is this possible with the current version of the code?

Many thanks,
Joana

Hi @Joana,

Have you tried sPLSDA which performs variables selection?

Hi @aljabadi, Many thanks for the reply!

I have tried sPLSDA in previous projects, but I am not looking for a way to select variables, maybe I didn’t explain myself very well.

I have a matrix of thousands of microbes, and I did PLSDA, and everything works fine, and I can create your biplots, but I can’t see anything interesting because I have too many variables, therefore, too many arrows in the plot.

What I want to do is to keep exactly the same PLSDA that includes thousands of microbes in the X matrix, but I want the plot to only show for example the methanogens. I know their names, and I wanted to be able to just feed the code with a vector of names, and have these appear in the biplot.

Many thanks!
Joana

Hi @aljabadi,

Many thanks for helping me.

I will let you know if I still have the issue.

Best regards,
Tuan

Hi @Joana,

You can accomplish that quite simply by setting the loadings of other variables to zero post-hoc. See eaxmple below:

data("nutrimouse")
pls.nutrimouse <- pls(X = nutrimouse$gene, Y = nutrimouse$lipid, ncomp = 2)
biplot(pls.nutrimouse, group = nutrimouse$genotype, block = 'X',
       legend.title = 'Genotype', cutoff = 0.8) 
#> Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

## say we are only interested in c("PMDCI", "AOX", "SIAT4c", "VDR")
## we artifically set the loadings for the rest to zero
pls.nutrimouse.alter <- pls.nutrimouse
feat.show <- c("PMDCI", "AOX", "SIAT4c", "VDR")
feat.hide <- !colnames(nutrimouse$gene) %in% feat.show
pls.nutrimouse.alter$loadings$X[feat.hide,] <- 0
pls.nutrimouse.alter$loadings$X[feat.show,] 
#>              comp1        comp2
#> PMDCI   0.16551226 -0.159677385
#> AOX     0.08467599 -0.160603787
#> SIAT4c -0.13005703  0.006401503
#> VDR    -0.13533221 -0.021680345
pls.nutrimouse.alter$loadings$X[feat.hide,] 
#>           comp1 comp2
#> X36b4         0     0
#> ACAT1         0     0
#> ACAT2         0     0
#> ACBP          0     0
#> ACC1          0     0
#> ACC2          0     0
#> ACOTH         0     0
#> ADISP         0     0
#> ADSS1         0     0
#> ALDH3         0     0
#> AM2R          0     0
#> BACT          0     0
#> BIEN          0     0
#> BSEP          0     0
#> Bcl.3         0     0
#> C16SR         0     0
#> CACP          0     0
#> CAR1          0     0
#> CBS           0     0
#> CIDEA         0     0
#> COX1          0     0
#> COX2          0     0
#> CPT2          0     0
#> CYP24         0     0
#> CYP26         0     0
#> CYP27a1       0     0
#> CYP27b1       0     0
#> CYP2b10       0     0
#> CYP2b13       0     0
#> CYP2c29       0     0
#> CYP3A11       0     0
#> CYP4A10       0     0
#> CYP4A14       0     0
#> CYP7a         0     0
#> CYP8b1        0     0
#> FAS           0     0
#> FAT           0     0
#> FDFT          0     0
#> FXR           0     0
#> G6PDH         0     0
#> G6Pase        0     0
#> GK            0     0
#> GS            0     0
#> GSTa          0     0
#> GSTmu         0     0
#> GSTpi2        0     0
#> HMGCoAred     0     0
#> HPNCL         0     0
#> IL.2          0     0
#> L.FABP        0     0
#> LCE           0     0
#> LDLr          0     0
#> LPK           0     0
#> LPL           0     0
#> LXRa          0     0
#> LXRb          0     0
#> Lpin          0     0
#> Lpin1         0     0
#> Lpin2         0     0
#> Lpin3         0     0
#> M.CPT1        0     0
#> MCAD          0     0
#> MDR1          0     0
#> MDR2          0     0
#> MRP6          0     0
#> MS            0     0
#> MTHFR         0     0
#> NGFiB         0     0
#> NURR1         0     0
#> Ntcp          0     0
#> OCTN2         0     0
#> PAL           0     0
#> PDK4          0     0
#> PECI          0     0
#> PLTP          0     0
#> PON           0     0
#> PPARa         0     0
#> PPARd         0     0
#> PPARg         0     0
#> PXR           0     0
#> Pex11a        0     0
#> RARa          0     0
#> RARb2         0     0
#> RXRa          0     0
#> RXRb2         0     0
#> RXRg1         0     0
#> S14           0     0
#> SHP1          0     0
#> SPI1.1        0     0
#> SR.BI         0     0
#> THB           0     0
#> THIOL         0     0
#> TRa           0     0
#> TRb           0     0
#> Tpalpha       0     0
#> Tpbeta        0     0
#> UCP2          0     0
#> UCP3          0     0
#> VLDLr         0     0
#> Waf1          0     0
#> ap2           0     0
#> apoA.I        0     0
#> apoB          0     0
#> apoC3         0     0
#> apoE          0     0
#> c.fos         0     0
#> cHMGCoAS      0     0
#> cMOAT         0     0
#> eif2g         0     0
#> hABC1         0     0
#> i.BABP        0     0
#> i.BAT         0     0
#> i.FABP        0     0
#> i.NOS         0     0
#> mABC1         0     0
#> mHMGCoAS      0     0

biplot(pls.nutrimouse.alter, group = nutrimouse$genotype, block = 'X',
       legend.title = 'Genotype', cutoff = 0.8) 

Created on 2021-06-24 by the reprex package (v2.0.0)