Comparing PCA/mixOmics tools with other methods

mixOmics_user · August 5, 2020, 11:41pm

[user via email]

We have developed an omic integration tool to find clusters in extensive data sets, as a way to complement current methods that work very well in smaller sample sizes. We wish to describe similar tools available to highlight the best context to use the different existing statistical tools. We also want to determine if our tool has adequate performance by evaluating it in comparison with mixOmics.

However, to be fair and acknowledge current methods capabilities, we want to ensure we are using the adequate version and combination of arguments (parameters) that give your package the highest possible performance.

Below are the version and arguments we have so far used.

The idea here is to do sparse PCA on a matrix x representing concatenate omic blocks and assuming 479 “signal” features.

out_spca <- spca(X = x, ncomp = 2, keepX = c(479,479))

mixOmics_6.13.11

We would appreciate it if you could give us feedback on the above.

Thank you very much,

Agustin

kimanh.lecao · August 6, 2020, 10:26pm

Hi Augustin,
here :

out_spca <- spca(X = x, ncomp = 2, keepX = c(479,479))

you are assuming that

the correlation / variance structure of the data can be summarised in 2 dimensions, i.e. there are 2 distinct sources of variation to extract, on each dimension / component
the 479 variables selected in component 1 should be non overlapping (according to the PC definition) with the other 479 variables selected in component 2.

You did not mention how many variables you have in x. Basically, in many of our papers where we benchmark the approaches we do as follow:

define the variable selection size according to prior knowledge, but sometimes only focusing on dimension/component 1
define the variable selection size according to a tuning process (relevant for sPLS-DA or block.splsda() where we have such functions available, see case studies SRBCT, or mixDIABLO examples on our website).

Kim-Anh

agustin20 · August 7, 2020, 2:55pm

Thank you very much for your prompt reply! Indeed, we are assuming the matrix x has a simple structure that can be summarized with two PCs. The original dimensions of x were 500 and 3000 (is a simple simulation where only 479 features contribute to the variability across subjects). But after considering your answer here, we will adapt the benchmark to use sPLS-DA and include the tuning process.

Best wishes

Topic		Replies	Views
Small samples and non omics Analysis	4	483	June 17, 2020
Need help in reviewing data analysis Analysis	11	769	September 28, 2021
Tuning sPLS-DA and sPCA with low sample size Analysis	3	48	May 9, 2025
Selecting method for integrating multiple data Analysis	3	123	June 27, 2024
Small number of samples (n=4) Analysis	1	202	February 23, 2023

Comparing PCA/mixOmics tools with other methods

Related topics