Small number of samples (n=4)


I am contacting you regarding some problems when applying mixOmics in a set of 4 different sequencing data experiments I generated in four samples (two replicates from same cell type). I sumarize teh structure of my data below:

lapply(data, dim) $meRIP_seq[1] 4 20544$cheRNA_seq[1] 4 20544$RNAPII_GB_seq[1] 4 20544$RNAPII_PR_seq[1] 4 20544

Y[1] WU WU TU TULevels: TU WU

Do you know if I can apply the N-approach regarding the small number of samples I have? I have tried some approaches but no clear results were obtained. For instance, when doing pairwise sPLS between 2 variables I get almost perfect correlations (0.999) and the correlation circle plots from these analyses show the genes just located on the axis coordinates. Also, when performing block sPLS, the arrow plots do not show any distance between datasets (no arrows).

I would be very grateful if you could give me some advice or suggestion about how to deal with my data and the interpretation of the results.

Many thanks in advance.

Best regards,

Hi Alicia,

With a sample size of n = 4 (per omic), you are reaching the limit of the method. It looks also like your datasets are highly (perfectly) correlated – did you normalise the data? Usually you would expect some level of noise.

I would suggest you:

  • normalise then filter the data (only keep the most 5,000 variant features)
  • inspect the sources of variation in the data (PCA)
  • try a non supervised approach for 2 data sets (e.g. sPLS).

But generally, I think you will have to rely on fold changes and broad exploratory analysis for your data.