I have a question regarding the output of sPCA tuning, so when plotting the “tune.spca” the y-axis says “correlation of components” and it says “average correlation between the predicted and actual components” in the tutorial. It is a bit ambiguous to me how exactly is this correlation calculated and what are “predicted” components?
All the tune functions work like this (including
- Data randomly segmented into folds
- For each fold:
- Use this one as testing and all remaining as training.
- Build sPCA model using training data. Use this to predict variate values of all testing samples. These are the “predicted components”.
- Build sPCA model using all data (training and testing). Extract variate values for samples contained in testing data. These are the “actual components”
- Record the correlation between the predicted components and the actual components
- Repeat this for all repeats and
So the “average correlation between the predicted and actual components” is the mean value of the correlation between samples’ variates predicted by a model and the true sample variates (true referring to model which uses all samples). Hope this clears things up!
Thanks for your reply.
So, by “variate values”, do you mean loadings?
Also, if I understand it correctly, PCA is a dimensionality reduction technique in which we go from a dataset to a covariance matrix and perform eigendecomposition on that. So, how can this procedure be used to “predict” variates for another dataset (test set)?