Question regarding list.keepX

NilsM · July 2, 2024, 8:59am

Dear Mixomics team,

I have a question regarding the list.keepX function described in the vignette:

list.keepX ← list( mRNA = c(8, 25), miRNA = c(14,5), protein = c(10, 5))

More specifically what exactly do the values following C( refer to?
In example if I have a group containing 20 individuals from which 1000 different proteins are analyzed does the first number following protein = c( represent the amount of individuals selected (in this case 10) and the second number the amount of proteins included in the analysis ?

Yours sincerely,
Nils

NilsM · July 2, 2024, 9:58am

In addition to my previous question:
I have a lot of features (20.000 genes and 3000 metabolites while the groups are relatively small about N=4). What I understand from reading the forum is that it is best to use a more explorative approach based on the separation achieved by the plot.indiv and plot.diablo functions. Using all features versus using a small number of features (100-500) gives more or less a similar separation between groups and the separation is perfectly in line with what we expect. As a result, I understand it is better to use a lower number of features.

However, I want to extract the highest correlating features with a specific variable. (i.e. features correlating highest with glutamine). To perform this task, I saved the circosplot into an object to extract the similarity matrix. I however noticed that using 5000 genes versus 500 genes results in a lot more highly correlating features to glutamine. I understand that this makes sense as the variables best representing the variation between the data may not necessarily overlap with those correlating to a specific feature. Therefor I was wondering if there is a better way to extract the correlation from a specific feature to all other variables (both genes and metabolites).

Kind regards,
Nils

kimanh.lecao · July 4, 2024, 10:35pm

hi @NilsM

Have a look at out tutorials on our website and vignette as we explain what those parameters mean.
In this case c(10,5) specifies selecting 10 proteins on component 1 and 5 on component 2.

Kim-Anh

kimanh.lecao · July 4, 2024, 10:37pm

hi @NilsM

There is a cutoff correlation value you can set up in the CirCosPlot, if you would like to consider only the top correlated ones with glutamine. But you will have to set the threshold yourself. I’d recommend you only consider the top genes or metabolites, as there is still a risk that you end up with spurious correlations.

The tuning function (your previous question) should help also selecting what might be an optimal number of genes or metabolites.

Kim-Anh

Topic		Replies	Views
Question regarding the similarity matrix Analysis	1	243	July 27, 2023
keepX and feature selection for circos plot Analysis	2	171	February 23, 2024
How does keepX and keepY choose the variables?	5	362	September 26, 2022
Extracting Gene List From Model/Graphs Analysis	1	26	July 12, 2024
Generic questions about DIABLO: perf, keepX and no variable selection Support	5	1374	December 11, 2022

Question regarding list.keepX

Related topics