I am following " Case study: HMP bodysites repeated measures" tutorial for a microbiome study with amplicon sequencing. I have 3 dietary groups (S, B and Y - where S is control), 15 cows for control diet group and 16 cows for the other two groups. I also have 3 time points for sampling from each cow and all the cows were fed the control diet on the first time point (W2).
Firstly, I see that inter-individual variation was stronger than the time variation (based on the pca), so applied the multilevel approach in the HMP tutorial: The outcome is dietarygroup and time point (FeedWeek) and the unique sample ID is cowID for multilevel analysis.
I want to compare the archaea composition between the treatment groups and control group for the different time points, and select the discriminative ASVs with sPLS-DA. Although the first time points are from the same diet, I have annotated them with the affiliated treatment group to see the differences in the beginning - so 9 groups in total, 141 samples and 126 ASVs after pre-processing.
Hope this is the correct approach for our design.
I have run tuning sPLS-DA for choosing keepX and ncomp parameters, but the BER was very high even for the 8th component and the error rate for the first component increased by feature numbers. The output from plot(diverse.tune.splsda):
And the output of the error rates from perf (increased nrepeat to 1000 as suggested before for another entry Help understanding high error rate using PLS-DA):
I wonder if the approach is correctly selected and what is the reason of that so high error rate. Can it be because of the similarity between the groups analyzed, as the groups did not diverge much from each other on the sPLS-DA comp 1-2 plot?
I am quite new to mixOmics and hope my questions make sense to you. Looking forward to your feedback.
Thank you very much for your time!