How to design rows in the input of single-cell multi-omics data

Hello, author. I have single cell multi-omics data (RNA-seq EM-seq ATAC-seq) from one sample. It is now desirable to analyze the interaction of three omics data at the single cell level or cluster level. I have a few questions to ask:

I have the following questions to ask:

  1. Is the N-Integration method capable of analyzing the interaction under a state (only normal group or disease group)

  2. The number of cells in the data is 6000 and divided into 12 cell types. Now I do not know how many rows should be, 6000 or 12 rows, if it is 12 rows, there is no repetition, when perf will report an error, if it is 6000, the matrix is too sparse, I use cor() to calculate the coefficient is high or low.

  3. Based on the second article, how to set the folds of perf()? So I would like to ask you, thank you.

Hi @QuietgraceH,

Thanks for using mixOmics!

  1. Yes, if you want to integrate your modalities and identify features that distinguish your states (normal vs disease) you can use DIABLO. If you want to integrate your datasets only within samples collected from normal or disease group you can use multiblock (s)PLS.
  2. Sparsity is a common issue with single cell data, and whilst the mixOmics realm doesn’t extend to dealing with sparsity, there are lots of other tools out there that you can use to address this. One example would be grouping cells into metacells, a recent user on the forum asked a similar question - have a look at the suggestions there.
  3. The number of folds to choose depends on your number of samples, which will depend on point 2. In general we recommend choosing folds that include at least 5-6 samples, see more details here.

Hope that helps!
Eva