Hello,
I was hoping you could help with how to properly setup data for PLS analysis. I am trying to run a simple analysis using gene expression data from two groups of samples (control vs case). While I can get through the tutorial OK, when I try to execute with my data, I am having issues with X and Y data matrix. Could you please direct me to an actual example how the data should be setup so sPSL-DA reads my files appropriately? I have tried viewing the srbct dataset and modeling after the example with no luck.
X should be a matrix with unique gene identifiers as column names and row names should be the sample identifier. Y should be a vector with all your classes. If you share the error, a screenshot and/or the code, I can help you identify the problem.
OK, so I have tried a few different approaches and still seem to have an issue. In R Studio, I am importing two Excel datasets:
X is genes and identifiers:
gene1
gene2
Control1
893
64
Case1
354
73
Y is class:
x
Control
Case
Here are the errors I get:
I have also gotten:
MyResult.splsda ← splsda(X, Y, keepX = c(50,50)) # 1 Run the method
Error in [<-(*tmp*, classification == groups[j], j, value = 1) :
subscript out of bounds
It seems there is something wrong with Y. It should be a factor/class vector with the same length as number of samples, but this is not the case when you assign RNA_PLS_Y to Y. Doing it manually as you did in the lower lines: Y <- c("control", "control" .....) should work, which tells me that there is something wrong with X also. Try to use the import function in Rstudio (File → Import Dataset → From excel), make sure to set “First Row as Names” and that all the columns are numeric.
I have that same problem. Any chance you can share the .csv file of the srbct dataset? At least it sets an example for us. This sample csv file will guide us to define X and Y to perform PLS-DA analysis with mixomic package.
it will write these data into your working directory and you can inspect how the data should look like. If in doubt, I recommend you talk to someone in your team who has some experience with R.