I am trying to adapt the diablo tutorial into my own analysis and I am receiving the below error message when trying to form a basic diablo model:
basic.diablo.model = block.splsda(X = data, Y = Y, ncomp = 5, design = design)
Design matrix has changed to include Y; each block will be
linked to Y.
Error in Check.entry.wrapper.mint.block(X = X, Y = Y, indY = indY, ncomp = ncomp, :
Please check the rownames of the data, there seems to be some
discrepancies
I filtered the two tables earlier on so they are matching in sample names:
Bacteria=Bacteria[rownames(Bacteria) %in% rownames(Metabolites_table), ]
Metabolites_table=Metabolites_table[rownames(Metabolites_table) %in% rownames(Bacteria), ]
As you’ve already figured out the error you’re getting does seem to be due to a mismatch of rownames of the dataframes you are using to build the diablo model. Each dataframe you input as the X argument should be for a different data modality and should be across the same samples. It looks like you have two data modalities (also called ‘blocks’): metabolites and bacteria. These two dataframes should have metabolites and bacteria variables in the columns and samples in the rows. The samples of these two dataframes need to be identical. Here is some code to double check that they are identical:
# First check that the number of rows for your two datablocks are the same
nrow(Metabolites_table)
nrow(Bacteria)
# If they are the same, lets now check that the sample names (ie rownames) are identical
setequal(rownames(Bacteria), rownames(Metabolites_table))
# If they are not identical, this will print out any mismatches
cat("Bacteria row names not in Metabolites_table:\n")
print(setdiff(rownames(Bacteria), rownames(Metabolites_table)))
cat("Metabolites_table row names not in Bacteria:\n")
print(setdiff(rownames(Metabolites_table), rownames(Bacteria)))
Let me know how you get on with these checks and if this solves your problem!
Cheers,
Eva
If they are not identical, this will print out any mismatches
cat(“Bacteria row names not in Metabolites_table:\n”)
Bacteria row names not in Metabolites_table:
print(setdiff(rownames(Bacteria), rownames(Metabolites_table)))
character(0)
cat(“Metabolites_table row names not in Bacteria:\n”)
Metabolites_table row names not in Bacteria:
print(setdiff(rownames(Metabolites_table), rownames(Bacteria)))
character(0)
Thanks for sending through your data, I’ve found the problem - you have the same rownames in Bacteria and Metabolites_table but they weren’t in the same order, which is needed for block.splsda (check out the help file for the function which details the requirement for each argument).
You can reorder the rows in your Bacteria dataframe by running: