Hi,
I have previous posted this question and received some advice but I seem to not be able to perform the analysis. So, in short I am running a PLS-DA model to classify a binary trait (0/1) using a number of perdictors. While most of predictors are numeric, I am able to run the model normally, now I have one predictor with four levels of factors representing season (summer, autumn, winter, spring), I coded them as 1,2,3,4 respectively.
Previously, Kim Anh suggested me to use unmap(data$season) so I did that which resulted in 4 predictors then combine with the other predictors to run the model. Is this the right way? It seems to not improve the model accuracy compared to the multilevel approach as indicated below.
Multilevel option: I found on the website that we might be able to use multilevel option (Multilevel Vac18 Case Study | mixOmics)
I followed this approach as shown in the code below:
design.train ← data.frame(sample = fert.train$Season)
Y<-fert.train[,1]
X<-fert.train[,2:556]
plsda.fert<-plsda(X,Y,ncomp = 10,scale = TRUE, multilevel = design.train)External validation
Y<-fert.test[,1]
X<-fert.test[,2:555]
design.test<- data.frame(season = fert.test$Season)
predR<-predict(plsda.fert,X,ncomp=ccomp,multilevel=design.test)
The model ran well, but in some validation set, when there was some records not repeated, the model produced an error:
Error in FUN(X[[i]], …) :
A multilevel analysis can not be performed when at least one some sample is not repeated.
This makes me assume that I am using the wrowng codes as this is for repeated records. BUT it did show to improve preidction accuracy.
Can you please give me some comments about these 2 options?
Thanks,
Phuong