Perf.pls with missing data

kirsty.hassall · August 24, 2023, 4:02pm

Hello,
I am running a simple PLS1 analysis where my X matrix contains a reasonable level of missing data. To tune the number of components using perf I get the error message “Error: missing data in ‘X’ and/or ‘Y’. Use ‘nipals’ for dealing with NAs.”

Is it a reasonable solution to first impute my X matrix using impute.nipals and then run perf? I’m a little concerned as the results in pls1 and pls2 objects (see below) have quite different components from dimension 2 upwards.

pls1 <- pls(CGdat[, var_names_trans],CGdat$timeGrazed, scale=TRUE, ncomp=20, mode="classic")
# tuning number of components
X.impute <- impute.nipals(X = CGdat[, var_names_trans], ncomp = 20)
pls2 <- pls(X.impute,CGdat$time, scale=TRUE, ncomp=20, mode="classic")
perf.pls <- perf(pls2, validation = 'Mfold',
                        folds = 10, nrepeat = 5)

plot(perf.pls, criterion = ‘R2’)

I also get some very strange results for criteria other than R2 with SDs exploding after about 2/3 components - I assume this must be related to the missingness, but any advice is welcome. Note my data is relatively small with
dim(CGdat[, var_names_trans])
[1] 112 32

kimanh.lecao · August 31, 2023, 11:47pm

hi @kirsty.hassall,

I can’t see your outputs, but yes, you should impute with NIPALS first. Also check beforehand that you have less than 20% of missing values in your dataset, as NIPALS can only do so much. If not, then you may have to remove variables with too many missing values.

Regarding the perf, potentially you are including a large number of components, and you should aim for a small number (I dont know anything about your data, but probably up to 10 is enough).

Kim-Anh

Topic		Replies	Views
Error when using tune.spls: Missing data in "X" or "Y" matrix Analysis	1	38	November 21, 2024
Non-orthogonality with NIPALS after filtering for data with high rates of missing values Analysis	2	212	March 17, 2023
MINT-PLS - Prediction with missing values (NA)	2	482	November 28, 2019
NIPALS and non-orthogonal components Suggestions for improvement	1	329	March 31, 2022
PLSDA-Handling Missing Data Support	1	612	March 29, 2021

Perf.pls with missing data

Related topics