Unable to understand selectVar() output in sPLS-DA

srikantverma · May 22, 2020, 1:37pm

Hello,

I am trying to find class-specific features in a gene expression data set using sPLS-DA model. After tuning for the optimal number of components and variables, I built the final sPLS-DA model using the below mentioned piece of code:

splsda.res.final = mixOmics::splsda(X = training_data, Y = response.variable.training, ncomp = 6, keepX = c(743, 124, 372, 268, 619, 248), mode = “regression”, scale = FALSE, tol = 1e-06, max.iter = 100, near.zero.var = FALSE, all.outputs = TRUE);

>splsda.res.final
sPLS-DA (regression mode) with 6 sPLS-DA components. 
You entered data X of dimensions: 380 12368 

 You entered data Y with 5 classes. 

Selection of [743] [124] [372] [248] [619] [248] variables on each of the sPLS-DA components on the X data set. 

No Y variables can be selected.

I thought there are 743 variables in component 1, but when I used selectVar(), I got 995 variables.

length(mixOmics::selectVar(object = splsda.res.final, comp = 1)$name)
[1] 995

Can someone help me to understand this observation?

aljabadi · May 25, 2020, 12:57am

Hi @srikantverma,

Thanks for using mixOmics and reporting this issue.

I had observed this issue before but unfortunately, I hasn’t been able to reproduce it. It could potentially be caused by too many missing values in many features, in which case you can filter out some of them before the analysis.

Running the following could give you some insight into the proportion of missing values in features.

col_na <- apply(training_data, 2, function(x) 100*sum(is.na(x)))/dim(training_data)[2]
hist(col_na,  main = 'Histogram of NA proportions in features')

In any case, this is a problem we need to fix and/or inform the users. Would you be able to send your data so we can reproduce and fix this please?

You can click on this text to send us an email.
Alternatively, you can right-click on the above text and choose ‘Copy Email Address’

Best,

Al

srikantverma · May 25, 2020, 4:15pm

Thanks a lot @aljabadi for your reply.
Regarding missing values in data, I would like to inform you that there is none in the data that I have used.
I have shared the data along with an R script for reproducing the observation. I am hopeful that the team will be able to resolve the issue soon. However, meanwhile, could you please suggest if I should work with 995 features, or sort them on their loading weights to get top 743 features for downstream analysis?
Regards
Srikant

aljabadi · June 9, 2020, 7:54am

Hi @srikantverma,

Thanks for the email with the data and fully reproducible and well-described code!

The reason why more variables are apparently selected is that some of the feature loadings are indistinguishable from 0 for some R functions ( < 1e-14). The reason is that the tune function is not quite optimal at recommending the number of features to keep and that is something we are working on at the moment. Basically, what I recommend for you is to use the plot function with the tune object to see what are the optimal number of features, especially on the first/second components where the gain in accuracy can be minimal compared to the added complexity by selecting many more features. It could be less than what is recommended by the algorithm. Generally, the recommended parameters are advisory and limited by other hyper-parameters (folds, repeats, etc) and it is important to use the diagnostic summaries and visualisations provided throughout your analysis.

Hope it helps

Al

srikantverma · June 9, 2020, 12:01pm

Thanks a lot @aljabadi !
Your recommendation will certainly help the entire mixOmics’s user cummunity.

Topic		Replies	Views
Number of variables in final sPLS-DA Analysis	1	88	May 2, 2024
Number of variables per component in tuning vs checking stability Support	2	254	September 6, 2023
How to tune (block) sPLS-DA to select all the variable discriminating groups? Analysis	1	513	September 13, 2020
Difference between PLS-DA and sPLS-DA Analysis	3	4042	December 21, 2020
VIP score mismatch with number of components Bugs	8	1534	March 18, 2021

Unable to understand selectVar() output in sPLS-DA

Related topics