More specifically what exactly do the values following C( refer to?
In example if I have a group containing 20 individuals from which 1000 different proteins are analyzed does the first number following protein = c( represent the amount of individuals selected (in this case 10) and the second number the amount of proteins included in the analysis ?
In addition to my previous question:
I have a lot of features (20.000 genes and 3000 metabolites while the groups are relatively small about N=4). What I understand from reading the forum is that it is best to use a more explorative approach based on the separation achieved by the plot.indiv and plot.diablo functions. Using all features versus using a small number of features (100-500) gives more or less a similar separation between groups and the separation is perfectly in line with what we expect. As a result, I understand it is better to use a lower number of features.
However, I want to extract the highest correlating features with a specific variable. (i.e. features correlating highest with glutamine). To perform this task, I saved the circosplot into an object to extract the similarity matrix. I however noticed that using 5000 genes versus 500 genes results in a lot more highly correlating features to glutamine. I understand that this makes sense as the variables best representing the variation between the data may not necessarily overlap with those correlating to a specific feature. Therefor I was wondering if there is a better way to extract the correlation from a specific feature to all other variables (both genes and metabolites).
Have a look at out tutorials on our website and vignette as we explain what those parameters mean.
In this case c(10,5) specifies selecting 10 proteins on component 1 and 5 on component 2.
There is a cutoff correlation value you can set up in the CirCosPlot, if you would like to consider only the top correlated ones with glutamine. But you will have to set the threshold yourself. I’d recommend you only consider the top genes or metabolites, as there is still a risk that you end up with spurious correlations.
The tuning function (your previous question) should help also selecting what might be an optimal number of genes or metabolites.