Morning @DjamilaE,

No apologies required!

While I know you can colour by groups in some of the plot options, I’m not sure if it is possible to somehow account for group membership in the analysis itself

Unfortunately, there isn’t a way to do this *directly* within the `spls()`

function. However, we can explore data in certain ways to achieve something similar. I’ll start by just setting up the data and such:

```
library(mixOmics)
data("liver.toxicity")
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
group <- liver.toxicity$treatment$Dose.Group
# taken from the sPLS Case Study (mixOmics.org)
optimal.ncomp <- 2
optimal.keepX <- c(35, 45)
optimal.keepY <- c(4,4)
liver.spls <- spls(X, Y, mode = "canonical",
ncomp = optimal.ncomp,
keepX = optimal.keepX,
keepY = optimal.keepY)
```

One way I can think to incorporate the group into your analysis would be to run individual `splsda`

on each of your input dataframes (X and Y) and then to run visualise the relation between the selected features in each of these cases.

Firstly, we can look at the `plotLoadings()`

(read more about the colouring here) for each dataset against the group.

```
par(mfrow=c(2,2))
gene.splsda <- splsda(X, group,
ncomp = optimal.ncomp,
keepX = optimal.keepX)
plotLoadings(gene.splsda, contrib = "max", method = "median",
title = "Figure 1a, comp1")
plotLoadings(gene.splsda, contrib = "max", method = "median", comp = 2,
title = "Figure 1b, comp2") #
```

```
par(mfrow=c(2,2))
treatment.splsda <- splsda(Y, group,
ncomp = optimal.ncomp,
keepX = optimal.keepY)
plotLoadings(treatment.splsda, contrib = "max", method = "median",
title = "Figure 2a, comp1")
plotLoadings(treatment.splsda, contrib = "max", method = "median", comp = 2,
title = "Figure 2b, comp2") #
```

The next idea I can think of would be to produce a heatmap of correlations between the variables selected by each of these `splsda()`

calls. I use `heatmap()`

here so it can be shown in the forum - for your analysis, I’d recommend using `cim()`

. You could also explore using the `network()`

function

```
selected.genes <- rownames(which(gene.splsda$loadings$X!=0, arr.ind = T))
selected.treaments <- rownames(which(treatment.splsda$loadings$X!=0, arr.ind = T))
X.s <- X[, selected.genes]
Y.s <- Y[, selected.treaments]
heatmap(cor(X.s, Y.s))
```

The subsetting data (`X.s`

and `Y.s`

) can then be fed into its own `spls()`

call and analysed.

could you suggest a way to select the optimal number of components in this case?

You can definitely still use the `tune()`

and `perf()`

functions when using sPLS in canonical mode. The two below code chunks depict how you could go about this.

```
subset.spls <- spls(X.s, Y.s, mode = "canonical", ncomp = 5)
sub.spls.perf <- perf(subset.spls, folds = 5, nrepeat = 5)
plot(sub.spls.perf, criterion = "cor.tpred") # explore different criteria
```

```
sub.spls.tune <- tune.spls(X, Y, test.keepX = c(1:10), folds = 5, ncomp = 5)
plot(sub.spls.tune)
```

how would I best go about setting this range for both sets of data?

I would suggest an iterative approach. Start with a broad range of values, with large intervals and repeat the tuning, using a finer `test.keepX`

each time. I go into it more in this post.

Hope these answers help.

Max.