Input data structure

Hi all,

Sorry, probably I am quite new with R and I am experiencing problems setting the correct structure for my input data. I am trying to generate a list of data frames but I am having problems in the further analyses, and I think maybe my data is not correctly organized.

Could you please provide any example of how to generate this data from the beginning? I have seen some of your examples (e.g. the nutrimouse data) but, since it is already established, I do not understand how to get to its final structure.

Thank you very much in advance.

hi @agallego,

I am not sure which method you are planning to use, but say you want to start first with PCA or PLS:

library(mixOmics)
data(nutrimouse)
?nutrimouse

#There are 40 mice samples (liver cells) from which we monitored lipids and the gene expression levels. There are 2 data sets: gene expression and lipids of size 40 x 120 (genes) and 40 x 21 (lipids), as well as a meta data file that indicate other information, such as diet and genotype of the mice.

head(nutrimouse$gene) # this is the gene expression data frame, you can store this as the object ‘X’
head(nutrimouse$lipid)# this is the lipid data frame, you can store this as the object ‘Y’
nutrimouse$genotype # this is how your meta data should look like, you can store this as the object ‘meda.data’

# store the data to get started with the analysis:
X = nutrimouse$gene
Y = nutrimouse$lipid
meta.data = nutrimouse$genotype
dim(X) # check the dimensions
dim(Y)

res.pca = pca(X)

# in your case, you would read the data first, e.g
X = read.csv('mygenes.csv', header = TRUE)
Y = read.csv('mylipids.csv', header = TRUE)
meta.data = read.csv('mygenotype.csv', header = TRUE)
dim(X) # check the dimensions

dim(Y)

res.pca = pca(X)

If you have someone in your lab / institute that could guide you through these first steps that would be easier for you as it is difficult to explain here.

Kim-Anh