a) Dealing with confounders: I know that one of the risk factors for the disease that I am working on is age. In addition, certain RNA transcripts increase with age. So I will not be sure if I find a top candidate that is increasing in expression as a result of the age or disease. Is there a way to deal with it?
b) Integration of clinical data with RNA-Seq: I have a dataset that contains categorical variables, e.g. diabetics (yes/no), and also continuous variables, e.g. White blood cell count. What is the recommended way to integrate such data with RNA-seq expression data from the same samples?
a) mixOmics models are not able to account for confounding effects such as batch effects at this time. If you find there is a strong age effect in your data which confounds your biological effect of interest (you can test this using mixOmics and putting age as your Y variable), you may need to correct for this upstream of mixOmics. See this post for some suggestions.
b) Mixing categorical and continuous variables is challenging, we recommend separating them into two matrices, and for the categorical variables (like diabetics yes/no) make them into a dummy numeric variable using the function map(). Check out these posts where people have had similar questions answered - here and here. Assuming you run a DIABLO model to integrate your RNA-seq and clinical data with disease as outcome, this should identify the relevant variables (e.g. diabetic) which explain the disease outcome.