Setting up data for PLS

I was hoping you could help with how to properly setup data for PLS analysis. I am trying to run a simple analysis using gene expression data from two groups of samples (control vs case). While I can get through the tutorial OK, when I try to execute with my data, I am having issues with X and Y data matrix. Could you please direct me to an actual example how the data should be setup so sPSL-DA reads my files appropriately? I have tried viewing the srbct dataset and modeling after the example with no luck.

Thank you!

Hi @wilk0211,

X should be a matrix with unique gene identifiers as column names and row names should be the sample identifier. Y should be a vector with all your classes. If you share the error, a screenshot and/or the code, I can help you identify the problem.

  • Christopher

Thank you so much. I really appreciate the response. I will try to execute the procedure again in the next couple of days and keep you posted.

Thank you!


OK, so I have tried a few different approaches and still seem to have an issue. In R Studio, I am importing two Excel datasets:
X is genes and identifiers:

gene1 gene2
Control1 893 64
Case1 354 73

Y is class:

Here are the errors I get:

I have also gotten:

MyResult.splsda ← splsda(X, Y, keepX = c(50,50)) # 1 Run the method
Error in [<-(*tmp*, classification == groups[j], j, value = 1) :
subscript out of bounds

Thanks again for the help.

Hi @wilk0211,

It seems there is something wrong with Y. It should be a factor/class vector with the same length as number of samples, but this is not the case when you assign RNA_PLS_Y to Y. Doing it manually as you did in the lower lines: Y <- c("control", "control" .....) should work, which tells me that there is something wrong with X also. Try to use the import function in Rstudio (File → Import Dataset → From excel), make sure to set “First Row as Names” and that all the columns are numeric.

  • Christopher

I have that same problem. Any chance you can share the .csv file of the srbct dataset? At least it sets an example for us. This sample csv file will guide us to define X and Y to perform PLS-DA analysis with mixomic package.


You can type in R:

write.csv(srbct$gene, 'data_gene.csv')
write.csv(srbct$class, 'data_class.csv')

it will write these data into your working directory and you can inspect how the data should look like. If in doubt, I recommend you talk to someone in your team who has some experience with R.