Hello,
I am running a simple PLS1 analysis where my X matrix contains a reasonable level of missing data. To tune the number of components using perf I get the error message “Error: missing data in ‘X’ and/or ‘Y’. Use ‘nipals’ for dealing with NAs.”
Is it a reasonable solution to first impute my X matrix using impute.nipals and then run perf? I’m a little concerned as the results in pls1 and pls2 objects (see below) have quite different components from dimension 2 upwards.
pls1 <- pls(CGdat[, var_names_trans],CGdat$timeGrazed, scale=TRUE, ncomp=20, mode="classic")
# tuning number of components
X.impute <- impute.nipals(X = CGdat[, var_names_trans], ncomp = 20)
pls2 <- pls(X.impute,CGdat$time, scale=TRUE, ncomp=20, mode="classic")
perf.pls <- perf(pls2, validation = 'Mfold',
folds = 10, nrepeat = 5)
plot(perf.pls, criterion = ‘R2’)
I also get some very strange results for criteria other than R2 with SDs exploding after about 2/3 components - I assume this must be related to the missingness, but any advice is welcome. Note my data is relatively small with
dim(CGdat[, var_names_trans])
[1] 112 32