Setting up data for PLS

wilk0211 · September 21, 2021, 8:40pm

Hello,
I was hoping you could help with how to properly setup data for PLS analysis. I am trying to run a simple analysis using gene expression data from two groups of samples (control vs case). While I can get through the tutorial OK, when I try to execute with my data, I am having issues with X and Y data matrix. Could you please direct me to an actual example how the data should be setup so sPSL-DA reads my files appropriately? I have tried viewing the srbct dataset and modeling after the example with no luck.

Thank you!

christoa · September 23, 2021, 11:00am

Hi @wilk0211,

X should be a matrix with unique gene identifiers as column names and row names should be the sample identifier. Y should be a vector with all your classes. If you share the error, a screenshot and/or the code, I can help you identify the problem.

Christopher

wilk0211 · September 23, 2021, 5:21pm

Thank you so much. I really appreciate the response. I will try to execute the procedure again in the next couple of days and keep you posted.

Thank you!

Jordan

wilk0211 · September 25, 2021, 4:47pm

OK, so I have tried a few different approaches and still seem to have an issue. In R Studio, I am importing two Excel datasets:
X is genes and identifiers:

	gene1	gene2
Control1	893	64
Case1	354	73

Y is class:
x
Control
Case

Here are the errors I get:

I have also gotten:

MyResult.splsda ← splsda(X, Y, keepX = c(50,50)) # 1 Run the method
Error in [<-(*tmp*, classification == groups[j], j, value = 1) :
subscript out of bounds

Thanks again for the help.

christoa · October 1, 2021, 6:52am

Hi @wilk0211,

It seems there is something wrong with Y. It should be a factor/class vector with the same length as number of samples, but this is not the case when you assign RNA_PLS_Y to Y. Doing it manually as you did in the lower lines: Y <- c("control", "control" .....) should work, which tells me that there is something wrong with X also. Try to use the import function in Rstudio (File → Import Dataset → From excel), make sure to set “First Row as Names” and that all the columns are numeric.

Christopher

yrtc · March 10, 2023, 6:45am

I have that same problem. Any chance you can share the .csv file of the srbct dataset? At least it sets an example for us. This sample csv file will guide us to define X and Y to perform PLS-DA analysis with mixomic package.

kimanh.lecao · March 10, 2023, 6:49am

@yrtc,

You can type in R:

library(mixOmics)
data(srbct)
write.csv(srbct$gene, 'data_gene.csv')
write.csv(srbct$class, 'data_class.csv')

it will write these data into your working directory and you can inspect how the data should look like. If in doubt, I recommend you talk to someone in your team who has some experience with R.

Kim-Anh

Topic		Replies	Views
PLS-DA analysis:'X' and/or 'Y' must be a numeric matrix Support	1	925	July 27, 2022
PLS-DA with missing '' values predicted in Y Analysis	1	717	April 26, 2020
PLS-DA classification Analysis	1	307	July 27, 2022
Input data structure Analysis	1	190	March 2, 2023
Block.plsda: X block names error and data format? Support	3	50	August 1, 2024

Setting up data for PLS

Related topics