Problem with perf() function on PLS-DA

Hello! I am attempting to carry out performance assessment on a PLS-DA dataset:

En.GenNoC.plsda.perf <- perf(En.GenNoC.plsda, validation = 'Mfold', folds = 2, progressBar = TRUE, nrepeat = 10)

But keep returning the following error:

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

Here is the code I had used to transform my filtered data to plsda format:

En.GenNoC.plsda <- plsda(En.GenNoC.keep, SamplesNoC, logratio = 'CLR', ncomp = 8)

where X = En.GenNoC.keep, Y = SamplesNoC.

Below are the dimensions of X (genus reads per sample), which is a list:

> dim(En.GenNoC.keep)
[1] 8 433

> is.list(En.GenNoC.keep)
[1] TRUE

And Y is a factor of sample types with length 8:

> is.factor(SamplesNoC)
[1] TRUE

> length(SamplesNoC)
[1] 8

I am confused by the error message as length of dimnames would be 2 since there a set of dimnames each for the rows and columns:

> length(dimnames(En.GenNoC.keep))
[1] 2

> length(dimnames(En.GenNoC.keep)[2])
[1] 1

There would be no dimnames(Y) as Y is a factor:

> length(dimnames(SamplesNoC))
[1] 0

I had compared my code and data format to the Koren PLS-DA example from the mixOmics workshop and could not find anything different. By the simplicity of the error message, I am probably overlooking something simple, but I’m at a loss as to what needs to be done so I’d appreciate any advice possible.

Thanks very much!

Hi @dscheah,

Thanks for using mixOmics.

Please note that PLSDA is not a data format, but a supervised discriminant model which takes a numeric matrix as X (while your example provides a list). The output of this model is a mixo_plsda (formerly known as plsda) object. The perf function then takes this object and assesses the performance of the PSLDA model.

You can always refer to ?mixOmics::perf.mixo_plsda and go over the Arguments section thoroughly.

Hope it helps

Al

Hi @aljabadi,

Thanks for your reply. I converted the X object into a matrix but got the same error when assessing the PLSDA model using the perf() function:

En.GenNoC.keepMatrix <- as.matrix(En.GenNoC.keep)
is.matrix(En.GenNoC.keepMatrix)
[1] TRUE

En.GenNoC.plsda <- plsda(En.GenNoC.keepMatrix, SamplesNoC, logratio = 'CLR', ncomp = 8)
En.GenNoC.plsda.perf <- perf(En.GenNoC.plsda, validation = 'Mfold', folds = 2, progressBar = TRUE, nrepeat = 10)
Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

The reason I had kept the X object as a list because the Koren example I was following for PLSDA also had the X object as a list:

> is.matrix(Koren.16S$data.raw)
[1] FALSE
> is.list(Koren.16S$data.raw)
[1] TRUE

Koren.plsda <- plsda(Koren.16S$data.raw, Koren.16S$bodysite, logratio = 'CLR', ncomp = 10)
Koren.perf.plsda <- perf(Koren.plsda, validation = 'Mfold', folds = 5, progressBar = TRUE, nrepeat = 10)

I’m not sure if I’ve misunderstood the original example or made a mistake in my code…

Thanks very much for your time!

Hi @dscheah,

Thanks for the followup.

I can certainly look into the issue if I can reproduce it on my side. You’re welcome to send us an email with the script and data.

Best wishes,

Al

Hi @aljabadi,

Yes, I can certainly send you my scripts and related data, as well as the scripts I had reproduced when working through the Koren.16S example dataset. I couldn’t find an email address on the website – where should I send my message to?

Hi @dscheah,

You can click on this blue text to send us an email.
Alternatively, you can right-click on the above text and choose ‘Copy Email Address’.

Best wishes,

Al

Hi @aljabadi, thanks for letting me know. I’ll compile the script and data and forward them to you asap.

Thanks!

Hi @aljabadi and @dscheah

Was this problem solved? I seem to have the exact same problem

Kind regards
Thor

Hi @Thor,

I could not find a record of an email following this issue and if you’re using the latest version ( >= 6.12.0) it should mean that it has not been resolved. In that case, can you please send us an email with data and scripts?

Best,

Al

Hi @Thor,

Thanks for sending through the code and data. In line 82, I could see that you used Sample.site as both Y and multilevel in splsda analysis. I think you meant to use Sample.id for multilevel.

Also, for cim function you need to input a splsda model output as opposed to the result of the perf function in your code. See ?cim for more info. Below is the corrected pseudo code:

diverse.splsda = splsda(data.filter, ...)
diverse.perf.splsda.perf = perf(diverse.splsda, ...)

## your code
diverse.perf.splsda = cim(perf(diverse.splsda, validation = "loo",
                               progressBar = FALSE, dist = 'max.dist'))#not working
## right way to do it
diverse.splsda.cim = cim(diverse.splsda) ## ensure your plot pane is clear and large enough

Hope that helps.

Best,

Al

Hi Al,

Thanks for your swift and excellent reply - works perfect now :slightly_smiling_face:

Hi Al,

Another question which I hoped you could answer. In contrast to the tutorial some of the plotloadings in my contribution plot is above 0.0, i.e 0.2 - as opposed to the the tutorial where all are negative or below 0.0. Hence the interpretation is difficult for me. I guess these are still considered signature OTUs/ASVs for the specific sample site, but what is the interpretation of positive versus negative?

Updated script attached, same data as before.

Kind regards
Thor

(Attachment Rscriptformixomics.R is missing)

Hi @Thor,

Your script could not be attached to Discourse threads and that’s why it’s missing.

As for the plotLoadings’ contributions, the barplots show the values of variable loadings (weights) on components. The importance of a variable primarily depends on the magnitude (absolute value) so you don’t have to worry about signs. The variable with similar signs potentially behave similarly (upregulate/downregulate) with respect to the outcome variable (Y) which you can further look into.

Hope it helps,

Al

Thanks Al! However may I ask for clarification considering the sample site tutorial where nothing is up/downregulated.

In the case of the HMP tutorial with different sample sites across the body, i.e. samplesite as Y(outcome). If you had different signs as opposed to the plotloadings in the tutorial which are all negative, i.e. positive and negative values in the same plot - Some ASVs/OTUs would be considered positively or negatively associated with the sample site? Or how do you translate it?

Hi @thor,
It refers more to a high abundance in a specific sample group (if you set contrib = 'max'). You can read the publication: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0160169 for more details.

Kim-Anh