Valid data types?

David · February 24, 2021, 11:20pm

Dear forum members,

I am new to mixOmics and I am wondering whether is good practice to try to use continuos data as many times as possible, or if, on the contrary I is better to keep categorical/binary data as it is?

Imean, I got some data that normally could be coded into categorical values (or binaries) but also comes with the possibility to be coded as continuous values. It is the case of genomic variants, that can be 1 or 0 depending if the gene of interest is mutated or not, but, they can also be modeled as allele frequencies ranging between 0 and 1.

Other example is copy number variations (CNV) that can be ranging from 0 (absence), 1 (one copy), 2 (diploid, so, normal state in humans), 3 (one extra copy), (two extra copies)…and so on. These values can be transformed into log2 ratios making them continuous too.

So here’s my question: what’s more desirable? Is there any specific type of data modality that we should avoid?

Thank you!

aljabadi · February 25, 2021, 11:54pm

Hi @David,

Welcome to mixOmics community and thank you for your query.

In short, our methods work assume that the data are continuous, even if you input binary variables. One exception is the Y variable in discriminant analyses (PLS-DA), which is assumed to be categorical. So yes, it’s best to keep the continuous nature of the data except for the mentioned case. Although generally, you could also incorporate categorical variables in PLS models. If there are more than 2 categories in PLS models, the variable should be ordinal. For instance it is not possible to incorporate a tissue type as a continuous variable (we typically can’t assume lung > liver > pancreas), whereas the CNV example you mentioned is an ordinal categorical variable.

Hope it helps.

Al

Topic		Replies	Views
Input categorical variables	3	1392	September 18, 2019
Is data suitable for mixomics? Analysis	1	716	October 21, 2019
Continuous and categorical matrix for Y Analysis	2	65	March 6, 2025
Continuous response variable Y in DIABLO? Analysis	4	1105	March 2, 2023
Categorical data in sPLS Analysis	0	819	September 18, 2019

Valid data types?

Related topics