Greetings,
I am hoping to use sparse PLS in MixOmics in a train/test context, where I first build a sPLS model with my training dataset (while using cross-validation with this training dataset to find the optimal hyperparameters of number of components and sparsity levels for my X and Y matrices).
I then want to apply the final trained sPLS model to a test dataset. To do so, I take the loading vectors from the trained model and apply them to the X and Y matrices of my test data. I then find the correlation between the first component in my test data and want to see if it’s significant through permutation testing.
To do so, for each permutation loop, I simply shuffle the Y data from the trained data, create a sPLS model with the now X and shuffled Y training dataset, apply the loadings to my test dataset, and recalculate the correlation between the first component. I want to see where my “real” test correlation falls in this distribution of correlations obtained with shuffled data.
My concern is from the fact that when I look at my histogram of permutated correlation values, there seems to be a clear bimodal nature to this distribution:
I want to ask if 1) this is to be expected with permutation approaches with PLS when applying a trained model to a test dataset, and 2) if not, what could I be doing wrong?
Thank you in advance.