# Best way to structure multilevel PLS?

Hi all, great package. I am trying to figure out the best way to construct my data analysis. I have read and gone through the tutorials, but still have some questions.

I have a bunch of plants (different genotypes, but for now I can treat them as individuals). From each plant, I have analyzed every leaf (maize plants, so these are kind of time variables). On each leaf, I have collected four different spectra. So that means have spectra nested within leaves nested within plants. Each spectra has ~800 data points. Each plant has ~15 leaves, and I have 16 plants total. This is a big p>n problem, too.

What I would like to do is predict data that I have in another matrix that I have collected that are actual measures of the plant. PLS is appropriate for this, but the multilevel aspect is new to me. If you have any input or advice (or additional resources to point me toward), I would greatly appreciate it. I am not sure if DIABLO would be appropriate here if I treat each spectral dataset as an ‘omics’ set.

Here’s some sample data to demonstrate the structure. The columns b1:b7 represent sample spectral bands. All bands are present in all spectra, but they are obtained differently. This means they’re not truly independent and there’s a good bit of collinearity.

structure(list(Plant = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L), Leaf = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), Spec = c(“a”, “a”, “a”, “b”, “b”, “b”, “c”, “c”, “c”, “a”,
“a”, “a”, “b”, “b”, “b”, “c”, “c”, “c”, “a”, “a”, “a”, “b”, “b”,
“b”, “c”, “c”, “c”), b1 = c(0.7, 0.384, 0.187, 0.685, 0.655,
0.929, 0.292, 0.551, 0.678, 0.156, 0.412, 0.695, 0.543, 0.892,
0.612, 0.578, 0.207, 0.496, 0.735, 0.43, 0.484, 0.722, 0.994,
0.018, 0.621, 0.383, 0.395), b2 = c(0.777, 0.26, 0.647, 0.451,
0.064, 0.885, 0.828, 0.337, 0.035, 0.425, 0.764, 0.465, 0.501,
0.054, 0.275, 0.645, 0.604, 0.678, 0.172, 0.446, 0.778, 0.965,
0.911, 0.497, 0.596, 0.978, 0.244), b3 = c(0.715, 0.572, 0.571,
0.47, 0.03, 0.776, 0.82, 0.756, 0.75, 0.835, 0.933, 0.633, 0.918,
0.603, 0.608, 0.15, 0.94, 0.049, 0.9, 0.284, 0.558, 0.301, 0.516,
0.874, 0.477, 0.342, 0.04), b4 = c(0.544, 0.987, 0.603, 0.32,
0.658, 0.462, 0.301, 0.499, 0.285, 0.965, 0.21, 0.185, 0.377,
0.968, 0.665, 0.539, 0.219, 0.866, 0.836, 0.261, 0.772, 0.032,
0.51, 0.043, 0.883, 0.038, 0.204), b5 = c(0.889, 0.532, 0.006,
0.248, 0.551, 0.118, 0.498, 0.081, 0.633, 0.006, 0.075, 0.085,
0.058, 0.417, 0.531, 0.418, 0.883, 0.977, 0.322, 0.465, 0.944,
0.558, 0.922, 0.926, 0.106, 0.013, 0.451), b6 = c(0.387, 0.209,
0.081, 0.92, 0.592, 0.324, 0.185, 0.885, 0.735, 0.431, 0.507,
0.642, 0.658, 0.689, 0.301, 0.12, 0.242, 0.558, 0.361, 0.243,
0.999, 0.512, 0.186, 0.978, 0.552, 0.415, 0.444), b7 = c(0.769,
0.448, 0.304, 0.123, 0.079, 0.573, 0.376, 0.505, 0.256, 0.813,
0.675, 0.794, 0.116, 0.705, 0.049, 0.188, 0.435, 0.367, 0.478,
0.996, 0.212, 0.495, 0.499, 0.148, 0.172, 0.731, 0.748)), class = “data.frame”, row.names = c(NA,
-27L))

Hi @hanlo, I am having similar issues with the multi level sPLS analysis. I have data from the field on seawater microbial communities, sampled at 4 different sampling trips. I am trying to correlate microbial abundances to nutrient data (so essentially looking for correlation between 2 continuous datasets), but I want to ‘subtract’ the effect of sampling trip. If you have any advice, please let me know!

Thank you and very best wishes,
Marko

@hanlo I would refer you to my responses to @MarkoTerzin’s post (found here).