Hi,
I am running a model on 71% of my data, i.e. 13 samples. Which is very few, I know. I then have 4 samples in my test set that I want to predict their outcome with.
The model I get has an AUROC of 1 and 100% accuracy in predicting the outcome of the 13 samples used. This indicates serious overfitting to me. I thus find it essential to keep a few samples out and run predict on those. While the code works perfectly fine for my X.train data:pred=predict(MyResult.splsda.final, X.train) Prediction <- pred$class$mahalanobis.dist[, 1] #using just comp1
giving me: PA.04 PA.06 PA.08 PA.09 PA.10 PA.12 PA.13 PA.14 PA.18 PA.35 PA.38 "R_Ep" "R_Ep" "NR_Ep" "R_Ep" "NR_Ep" "NR_Ep" "R_Ep" "R_Ep" "R_Ep" "R_Ep" "NR_Ep" PA.39 PA.40 "R_Ep" "NR_Ep"
it does not work for my X.test:
I do not understand why the other samples do not have a prediction. I have looked into the input file and I do not think that anything is wrong there. R correctly assigns the outcome when splitting the data:
prop.table(table(Resp.test$Epilepsy))
giving me:
NR_Ep R_Ep
0.5 0.5
So it does not seem to be a format or spelling issue. I do not know where to look further and I hope you can help me solving this issue.
Thank you so much for your time!
/Stef
I know you guys are probably very busy! However, I have been stuck at this step for over a week now and really hope you can help me solve this, so I can move on with my analysis.
I very much appreciate your help @MaxBladen@kimanh.lecao@aljabadi
Best wishes,
Stef
To help I’d first need a reproducible example (as described here and in the banner). From there, we can potentially discuss me accessing your data and scripts to evaluate them for issues. Feel free to directly message me via the forum
Thank you @MaxBladen! I can indeed reproduce your code. I can also predict my train set without a problem. Just when it comes to the test set, it is not working. This is what I get from pred2=predict(MyResult.splsda.final, X.test)
But I have no NAs in my test set: which(is.na(X.test)) integer(0)
So I am not sure why I get NaN as predict and variables.
Thank you very much for looking into this!
/Stef
I don’t know whats causing this. While I would usually offer to take your data and debug it for you, I’ve got a contract starting this week which will prevent me from doing any mixOmics work for a month or so.
The size and content of the B.hat component makes no sense to me. My gut says this is where the issue is arising
I understand @MaxBladen and I appreciate all the support you have been giving me so far. You helped me resolve several issues in the past and I am very grateful for it! Thus, I have been able to publish one article with mixOmics analyses, one is close to acceptance and this is part of the third manuscript. Also, I have convinced my colleagues to use it
Is there anyone else in the team who could take a look at this in the mean time? If so, could you tag that person here, please?
Best wishes,
PS: What is B.hat?
I’m really glad I’ve been able to help. It’s a fantastic package but can be a bit tricky to get into initially.
Unfortunately, I’m pretty much the only one who monitors the forums these days. I can ask some of my colleagues about your issue - but I don’t know when they’d get back to me.
B.hat are the regression coefficients used to generate the predicted values for the novel data. The first dimension represents each feature (995), the second dimension represents each level of your response vector and the third dimension represents each component.