Block.plsda: X block names error and data format?

atan · July 25, 2024, 9:17pm

Hello,

I’m interested in using the block.plsda and DIABLO to integrate RNAseq, ATACseq, 16S, and morphological data. When I try to run the block.plsda step, I get the following warning:

Error in Check.entry.wrapper.mint.block(X = X, Y = Y, indY = indY, ncomp = ncomp,  : 
Each block of 'X' must have a unique name.

My data frames of sequencing results are formatted so that the row names are the samples (and in the same order) and the columns are the gene names/OTUs (with unique identifiers added to account for some duplicated gene names).

I’m running:

X <- list(rna, atac, bacseq)
Y <- morph$Length.measurement_.micron.

result.diablo.MD <- block.plsda(X, Y)

where the rna, atac, and bacseq inputs are counts of each gene/OTU and the Y is list of the body length of each sample.

Any insight on where I’m going wrong with sample input? Thanks!

kimanh.lecao · July 25, 2024, 10:14pm

hi @atan

Have you checked that your column names in each of the data sets are unique?

i.e length(unique(rownames(rna))) for each data set. There might be still some duplicated names somewhere.

Have you run each data set just with a sPLSDA? If you get the same error but only for a given data set, it might also pinpoint you to the issue (plus, we recommend you analyse each data set individually first for a better understansding of your data).

Kim-Anh

atan · July 26, 2024, 3:31pm

Hi Kim-Ahn,

I’ve double checked the column names for each set and they’re all unique. Running the sPLSDA works fine for each data set individually!

I was also able to run the block.plsda on the breast cancer TCGA dataset without errors.

Am I missing something about how the X sets need to be distinguished from each other? The RNAseq and ATACseq sets use the same gene name formats, would that be a problem? The Y set has a single value for each sample because each sample was generated from a pool of larvae–should that set be provided in some other way (like a mean value across treatment groups that’s listed for each sample)?

kimanh.lecao · August 1, 2024, 10:15pm

hi @atan

Difficult for me to say, but it’s possible that the error comes from the fact that your colnames are repeated across the sets. It would be better to name them, for example r_genename and a_genename for the RNA-seq and ATAC-seq.

Kim-Anh

Topic		Replies	Views
Error in Check.entry.wrapper.mint.block	3	33	February 13, 2025
Diablo perf error Support	7	1683	February 16, 2021
Perf block.plsda error Analysis	5	309	January 10, 2023
Setting up data for PLS	6	967	March 10, 2023
Error with perf() for a block.plsda analysis: clash with epiDisplay Bugs	4	176	October 5, 2023

Block.plsda: X block names error and data format?

Related topics