Hi there,
I am trying implement sPLS-DA and Diablo in my omic data and have a question and hope you can help.
So, first I am using sPLS-DA to identify age-associated cognitive, methylation and transcriptomic alteration ( have categorized age in 3 groups ([50-60[, [60-70 and [70,…[ ). I did not split my data in training and testing because by goal is not to predict but just to identify age-associated alterations.
Next i want to to use the selected features from sPLS-DA models and integrate them in Diablo. Here the idea is to test whether the age-associated alteration (methyl|trans) identified previously with sPLS-DA can discriminate individuals with higher and lower cognitive performance and to look at the correlation between these variables.
Question 1: is it correct to split my data set in training/test in the Diablo model and predict cognitive performance if i used all samples in sPLS-DA models?
Question 2 : what is the rational for selecting the number of folds, does it depend on the number of samples. I have tested 5 ;10 and leave one out and noted that the stability of variable selection is variable. Cant figure out what is more appropriated.
Ps. number of samples|features per data set, methylation 41|734 668 ; transcriptomic; 75|26 000 ; Methylation and transcriptomic 35|more or less 200.
Question 3: is it correct to use all features selected from the sPLS-DA into the Diablo model or should i only include the stable ones?
Question 4: i have very big classification error rates in parameter tunning (ncomponents and nfeatures) can i still proceed with these analysis…
Thanks for you time.
Sonya