After exploring the data, I have determined what the issue is. Within the Xv
dataframe, there are a total of 14 features which have the same values for all samples (determined via the code which(apply(Xv, 2, sd) == 0)
). This is a serious problem when calculating correlations (especially the Pearson correlation - that which is used by plotVar()
). Consider its formula:

When we have zero variance, the mean (x.bar
) will always be equal to each sample value (x.i
). Hence, in the above equation, x.i - x.bar
will always equal 0. This creates an issue: such that the denominator is 0 (which is impossible in mathematics). Hence, the correlation cannot be calculated which leaves an NA
in that position in the dataframe.
It can be seen in your second attempt in this post (plotVar(results.rCCA.valida2)
) that the package we use (ggplot
) can figure this out and removes those 14 rows of correlations which correspond to the 14 features with zero variance. Hence, it was able to produce a plot but raised warnings:
Removed 14 rows containing missing values (geom_point).
In your first attempt however, due to the specific logic of the plotVar()
function, the presence of the cutoff
parameter does not allow ggplot
to remove these rows automatically. The correlation dataframe is then passed to ggplot
with NA
's within it which raises a fatal error - resulting in no plot being produced at all:
In cor(object$Y, object$variates$X[, c(comp1, comp2)] + object$variates$Y[, : the standard deviation is zero
Below, I’ll add the code which I used to determine this. I’d recommend you run this for yourself so you can see what I’m talking about in action.
Hope this helped.
Cheers,
Max.
library(mixOmics)
#library(devtools)
#setwd("~/GitHub/mixOmics/R")
#load_all()
# ============================================================================ #
# Set up data
# read in csv's
Xv<- read.csv("Val_sum_venomt.csv") # 30 x 98
Yb<- read.csv("Behaviours_val.csv") # 23 x 7
rownames(Xv) <- Xv$ID # set ID feature as rownames and remove ID feature
Xv <- Xv[,-1]
rownames(Yb) <- Yb$ID # set ID feature as rownames and remove ID feature
Yb <- Yb[,-1]
# identify which rows of Xv are also in Yb
# remove any rows from Xv that don't have equivalent sample in Yb
usable.idx <- which(rownames(Xv) %in% rownames(Yb))
Xv <- Xv[usable.idx, ]
# ============================================================================ #
# Using the ridge methodology without removing any features
# set the grid
grid <- seq(0.001,0.2,length=10)
# tune the rcc object to yield the optimal lambda values
cv.tunercc.val<-tune.rcc(Xv,Yb,
grid1 = grid,
grid2 = grid,
validation = "loo")
# extract these optimal lambda values
opt.l1_val<-cv.tunercc.val$opt.lambda1
opt.l2_val<-cv.tunercc.val$opt.lambda2
CV.rcc.valida.ridge <- rcc(Xv,Yb, method = "ridge",
lambda1 = opt.l1_val, lambda2 = opt.l2_val)
### produces no errors
plot(CV.rcc.valida.ridge, type = "barplot")
### produces no errors
plotVar(CV.rcc.valida.ridge)
### PRODUCES THE FOLLOWING WARNINGS (but still plots):
# Warning messages:
# 1: In cor(object$X, object$variates$X[, c(comp1, comp2)] + object$variates$Y[, :
# the standard deviation is zero
# 2: Removed 14 rows containing missing values (geom_point).
# 3: Removed 14 rows containing missing values (geom_text).
# ============================================================================ #
# Using the shrinkage methodology without removing any features
CV.rcc.valida.shrink <- rcc(Xv,Yb, method = "shrinkage")
### PRODUCES THE FOLLOWING WARNINGS:
# Warning messages:
# 1: 14 instances of variables with zero scale detected!
# 2: 14 instances of variables with zero scale detected!
# 3: 14 instances of variables with zero scale detected!
plot(CV.rcc.valida.shrink, type = "barplot")
### produces no errors
plotVar(CV.rcc.valida.shrink)
### PRODUCES THE FOLLOWING WARNINGS (but still plots):
# Warning messages:
# 1: In cor(object$X, object$variates$X[, c(comp1, comp2)] + object$variates$Y[, :
# the standard deviation is zero
# 2: Removed 14 rows containing missing values (geom_point).
# 3: Removed 14 rows containing missing values (geom_text).
# ============================================================================ #
# Let's now try removing rows which have all 0s or all 1s, meaning that they
# have no variation (and hence no standard deviation)
# which features have zero variance (all values are the same)
zero.var.feats <- as.vector(which(apply(Xv, 2, sd) == 0))
Xv <- Xv[, -zero.var.feats] # remove them from the Xv dataframe
# ============================================================================ #
# Using the ridge methodology AFTER removing zero-variance features
CV.rcc.valida2.ridge <- rcc(Xv,Yb, method = "ridge",
lambda1 = opt.l1_val, lambda2 = opt.l2_val)
### produces no errors
plot(CV.rcc.valida2.ridge, type = "barplot")
### produces no errors
plotVar(CV.rcc.valida2.ridge)
### produces no errors
# ============================================================================ #
# Using the shrinkage methodology AFTER removing zero-variance features
CV.rcc.valida2.shrink <- rcc(Yb,Xv, method = 'shrinkage')
### produces no errors
plot(CV.rcc.valida2.shrink, type = "barplot")
### produces no errors
plotVar(CV.rcc.valida2.shrink)
### produces no errors
plotVar(CV.rcc.valida2.shrink, var.names = c(T,F),
cex = c(4, 4), cutoff = 0.5,
title = "(b) H. valida, rCCA shrinkage comp 1 - 2")
### produces no errors
cim(CV.rcc.valida2.shrink, comp = 1:2, xlab = "Venom masses", ylab = "behaviours")
### produces no errors