Error performing rCCA

Hello,
I’m have been using mixOmics to perform Kernel PCA’s. Now, I’m trying to use rCCA to assess the correlation between a presence/absence matrix of venoms (97 masses in columns, 23 individuals in rows) and a matrix of behaviours of the same individuals (7 continuous behavioural variables, 23 individuals). When I performed both cross-validation and shrinkage methods, I obtained the variates, loadings etc.

CV.rcc.valida<-rcc(Xv,Yb, ncomp = 3,
lambda1 = opt.l1_val, lambda2 = opt.l2_val)

However, when I tried to do the plotVar to observe the correlation structure I got this error:

plotVar(results.rCCA.valida2, var.names = c(TRUE, TRUE),
cex = c(4, 4), cutoff = 0.5,
title = ‘(b) H. valida, rCCA shrinkage comp 1 - 2’)

Error in (ind.group[i] + 1):ind.group[i + 1] : NA/NaN argument
In addition: Warning message:
In cor(object$Y, object$variates$X[, c(comp1, comp2)] + object$variates$Y[, :
**** the standard deviation is zero****

Could someone please help me?

When I tried to run the plotVar like this
plotVar(results.rCCA.valida2)

I got the following error
Warning messages:
1: In cor(object$Y, object$variates$X[, c(comp1, comp2)] + object$variates$Y[, :
the standard deviation is zero
2: Removed 14 rows containing missing values (geom_point).
3: Removed 14 rows containing missing values (geom_text).

I’d really appreciate some help

Attached is a sample of both behaviour and venoms datasets.

Hi @Linda_Duran,

Sorry to hear about your issue. I’m unable to reproduce your error for a few reasons. Firstly, the following call you provided:

CV.rcc.valida<-rcc(Xv,Yb, ncomp = 3,
lambda1 = opt.l1_val, lambda2 = opt.l2_val)

is not what you have inputted into the plotVar() function: the object name you use here is results.rCCA.valida2. I can’t confirm if you have used the same rcc() call.

Next, I can assume (but can’t be sure) that you’ve used the tune.rcc() function to yield the opt.l1_val and opt.l2_val values. This function can not handle NA's internally, meaning your inputted Xv and Yb dataframes shouldn’t have any missing values.

I have a few questions for you:

  1. In your second attempt noted above (plotVar(results.rCCA.valida2)) where only warnings were raised, was a plot produced?
  2. Do either of your inputted dataframes contain NA's? This can be checked via which(is.na(Xv)) and which(is.na(Yb)). Also, why are they named Xv and Yb?
  3. Have you used the tune.rcc() function or another method to yield your lambda values?
  4. What does the resulting object from rcc() look like? Does the results.rCCA.valida2$variates or results.rCCA.valida2$loadings contain any NA's?

Preferably, you can send me an email (mbladen19@gmail.com) with your data and your scripts and I can try and figure this out. Without more information I am unable to reproduce, and therefore solve, your issue.

Cheers,
Max.

After exploring the data, I have determined what the issue is. Within the Xv dataframe, there are a total of 14 features which have the same values for all samples (determined via the code which(apply(Xv, 2, sd) == 0)). This is a serious problem when calculating correlations (especially the Pearson correlation - that which is used by plotVar()). Consider its formula:

image

When we have zero variance, the mean (x.bar) will always be equal to each sample value (x.i). Hence, in the above equation, x.i - x.bar will always equal 0. This creates an issue: such that the denominator is 0 (which is impossible in mathematics). Hence, the correlation cannot be calculated which leaves an NA in that position in the dataframe.

It can be seen in your second attempt in this post (plotVar(results.rCCA.valida2)) that the package we use (ggplot) can figure this out and removes those 14 rows of correlations which correspond to the 14 features with zero variance. Hence, it was able to produce a plot but raised warnings:
Removed 14 rows containing missing values (geom_point).

In your first attempt however, due to the specific logic of the plotVar() function, the presence of the cutoff parameter does not allow ggplot to remove these rows automatically. The correlation dataframe is then passed to ggplot with NA's within it which raises a fatal error - resulting in no plot being produced at all:
In cor(object$Y, object$variates$X[, c(comp1, comp2)] + object$variates$Y[, : the standard deviation is zero

Below, I’ll add the code which I used to determine this. I’d recommend you run this for yourself so you can see what I’m talking about in action.

Hope this helped.

Cheers,
Max.

library(mixOmics)
#library(devtools)
#setwd("~/GitHub/mixOmics/R")
#load_all()

# ============================================================================ #
# Set up data

# read in csv's
Xv<- read.csv("Val_sum_venomt.csv") # 30 x 98
Yb<- read.csv("Behaviours_val.csv") # 23 x 7

rownames(Xv) <- Xv$ID # set ID feature as rownames and remove ID feature
Xv <- Xv[,-1]

rownames(Yb) <- Yb$ID # set ID feature as rownames and remove ID feature
Yb <- Yb[,-1]

# identify which rows of Xv are also in Yb
# remove any rows from Xv that don't have equivalent sample in Yb
usable.idx <- which(rownames(Xv) %in% rownames(Yb))
Xv <- Xv[usable.idx, ]

# ============================================================================ #
# Using the ridge methodology without removing any features

# set the grid
grid <- seq(0.001,0.2,length=10)

# tune the rcc object to yield the optimal lambda values
cv.tunercc.val<-tune.rcc(Xv,Yb,
                         grid1 = grid,
                         grid2 = grid,
                         validation = "loo")

# extract these optimal lambda values
opt.l1_val<-cv.tunercc.val$opt.lambda1
opt.l2_val<-cv.tunercc.val$opt.lambda2


CV.rcc.valida.ridge <- rcc(Xv,Yb, method = "ridge",
                           lambda1 = opt.l1_val, lambda2 = opt.l2_val)
### produces no errors

plot(CV.rcc.valida.ridge, type = "barplot") 
### produces no errors

plotVar(CV.rcc.valida.ridge)
### PRODUCES THE FOLLOWING WARNINGS (but still plots):
# Warning messages:
# 1: In cor(object$X, object$variates$X[, c(comp1, comp2)] + object$variates$Y[,  : 
#       the standard deviation is zero
# 2: Removed 14 rows containing missing values (geom_point). 
# 3: Removed 14 rows containing missing values (geom_text).

# ============================================================================ #
# Using the shrinkage methodology without removing any features

CV.rcc.valida.shrink <- rcc(Xv,Yb, method = "shrinkage")
### PRODUCES THE FOLLOWING WARNINGS:
# Warning messages:
#   1: 14 instances of variables with zero scale detected! 
#   2: 14 instances of variables with zero scale detected! 
#   3: 14 instances of variables with zero scale detected! 

plot(CV.rcc.valida.shrink, type = "barplot")
### produces no errors

plotVar(CV.rcc.valida.shrink)
### PRODUCES THE FOLLOWING WARNINGS (but still plots):
# Warning messages:
# 1: In cor(object$X, object$variates$X[, c(comp1, comp2)] + object$variates$Y[,  : 
#       the standard deviation is zero
# 2: Removed 14 rows containing missing values (geom_point). 
# 3: Removed 14 rows containing missing values (geom_text).


# ============================================================================ #
# Let's now try removing rows which have all 0s or all 1s, meaning that they
# have no variation (and hence no standard deviation)

# which features have zero variance (all values are the same)
zero.var.feats <- as.vector(which(apply(Xv, 2, sd) == 0))
Xv <- Xv[, -zero.var.feats] # remove them from the Xv dataframe

# ============================================================================ #
# Using the ridge methodology AFTER removing zero-variance features

CV.rcc.valida2.ridge <- rcc(Xv,Yb, method = "ridge",
                            lambda1 = opt.l1_val, lambda2 = opt.l2_val)
### produces no errors

plot(CV.rcc.valida2.ridge, type = "barplot")
### produces no errors

plotVar(CV.rcc.valida2.ridge)
### produces no errors

# ============================================================================ #
# Using the shrinkage methodology AFTER removing zero-variance features

CV.rcc.valida2.shrink <- rcc(Yb,Xv, method = 'shrinkage')
### produces no errors

plot(CV.rcc.valida2.shrink, type = "barplot")
### produces no errors

plotVar(CV.rcc.valida2.shrink)
### produces no errors

plotVar(CV.rcc.valida2.shrink, var.names = c(T,F),
        cex = c(4, 4), cutoff = 0.5,
        title = "(b) H. valida, rCCA shrinkage comp 1 - 2")
### produces no errors

cim(CV.rcc.valida2.shrink, comp = 1:2, xlab = "Venom masses", ylab = "behaviours")
### produces no errors


Hi Mark,

Thank you very much for your time and help.

I ran the code and I didn’t have any issues. I have another question. CCA use chi-square to obtained the correlation matrix; however one of my matrix is binary. Is it possible to change the distance when is calculating the rCCA?.
Similarly, is it possible, after getting rCCA variates of both matrices, to calculate a permutation test to determine if the constrained axes obtained between both matrices are important?

Thank you,

Linda Duran