Find DE genes downstream of PLS-DA

Hi all,

  I have a human dataset and I investigate the effect of disease on my favourite cell type. The

samples looked intertwined on a PCA so I decided to run a PLS-DA and discriminate for the presence
(or not) of the disease. My ultimate goal is to identify DE genes between the two conditions and
validate them experimentally. When I tune the PLS-DA, I end up with 8 features on Component 1 and 7
features on Component 2 which feels very few to me. Is there a way to get the maximum number of
features that can still discriminate the two conditions sufficiently? Or am I missingg the principle of
PLS-DA?

Thanks in advance,
Theo

Hi Theo,

If your aim is primarily to identify DE genes then PLS-DA is not appropriate, as it considers a signature as a whole, rather than individual and independently identified genes (which is what a classical univariate analysis does).

But assuming you are still interested in PLS-DA, then yes you can change the keepX (number of variables to select per component) as you wish. The tuning gives you an indication, but ultimately you can vary this parameter. What is worthwhile once you fit your final sPLS-DA model is then to run a perf() function to estimate the performance of the model. This is going to really tell you how performant the method is (evaluated based on cross-validation).

Kim-Anh