How does tune.rcc handle NAs

While running the following line

tune.rcc(X, Y, grid1 = grid1, grid2 = grid2, validation = "Mfold")

I got the following warning:

Calls: tune.rcc → apply → FUN → Mfold → rcc → explained_variance
Warning in explained_variance(result$Y, result$variates$Y, ncomp) :
NA values put to zero, results will differ from PCA methods
used with NIPALS

does it mean the NAs were filled with zero?

Hi @blueskypie,

Thanks for getting in touch regarding your question.

The missing values are replaced by 0 only for calculating the explained variance of the components. Missing values are ignored in the iterative algorithm when deriving the said components. The explained variance calculations centre the data matrices so technically the missing values are replaced by the mean of columns to disregard the missing values in this calculation as well - although the calculated mean may in fact be affected by the missing/unknown values.

That being said, our warning message used to only consider the PCA calculations which has been fixed in the latest development version (it applied to all explained variance calculations where there are missing values).

Hope it helps.

Al

Thank you so much for the quick response! I really appreciate! So “put NAs to zero” actually ignores the missing values in computing the variance since the mean will be zero after centering. Then is it correct to say that missing values are not imputed and actually ignored in tune.rcc?

Hi @blueskypie,

This is correct. But it is not due to how the explained variance calculations handle missing values (the explained variance of a component is not what rcc is trying to optimise). Explained variance of component is only calculated as a statistic after the rcc algorithm extracts the components. Basically, the following steps:

i) Extract the component while ignoring missing values
THEN
ii) Calculate the explained variance of the component while ignoring the missing values

Hope it helps.

Al