Introduction to Adaptive Discrete Smoothing • AdaptiveDiscreteSmoothing

A simple Example

We generate data from a very simple model.
For each individual $j = 1,\dots, N$, we generate $i = 1,\dots n$ observations of covariates $X_{i,j} \in \mathbb{R}^p$, where the covariates are generated independently from a uniform distribution $\mathcal{U}[0,1]$. The outcome is constructed by the following linear model \[Y_{i,j} = X_{i,j}^T\beta_j +\varepsilon_{i,j},\] where $\varepsilon_{i,j} \sim \mathcal{N}(0,1)$ and \[\beta_j = \begin{pmatrix}1\\\vdots \\1 \end{pmatrix}I_{\{j\le N/2\}}.\]

set.seed(42)
n <- 100; N <- 10; p <- 3
X <- matrix(runif(N*n*p),nrow = n*N, ncol = p)
ind <- as.factor(rep(1:N,n)[sample(1:(n*N),n*N)])
Y <- X%*%rep(1,p) * (as.numeric(ind) <= N/2) + rnorm(n*N,0,1)

data <- data.frame(X,"y" = Y, "ind" = ind)

The data is saved into a standard data.frame(), with a column “ind” (as a factor()) which identifies the individuals.
At first, we have to initialize the adaptive discrete smoothing model with a valid learner. Calling the ADS$fit() method estimates the corresponding parameters.

library(AdaptiveDiscreteSmoothing)
library(mlr3)
library(mlr3learners)
learner <- mlr_learners$get("regr.lm")
model <- ADS$new(data = data,
                 target = "y",
                 individ = "ind",
                 learner = learner,
                 iterations = 4)
model$fit()

Finally, we can predict new values by using the ADS$predict() method.

mse <- mean((data$y - model$predict(newdata = data))^2)
print(mse)
#> [1] 0.9817652

One can easily take a look at the estimated weights via the ADS$heatmap() method

model$heatmap()

A more complicated Example

For each individual $j = 1,\dots, N$, we generate $i = 1,\dots n$ observations of covariates $X_{i,j} \in \mathbb{R}^p$, where the covariates are generated by a normal distribution $\mathcal{N}(0,\Sigma)$ (here we consider $\Sigma = I_p$). The outcome is constructed by the following linear model \[Y_{i,j} = X_{i,j}^T\beta_j +\varepsilon_{i,j},\] where $\varepsilon_{i,j} \sim \mathcal{N}(0,\sigma_\varepsilon^2)$. The coefficient vector $\beta_j$ is generated to depend on an unobserved quantity $\alpha_j \sim \mathcal{U}[0,1]$ \[\beta_j = \begin{pmatrix}1\\\vdots \\1 \end{pmatrix} + \begin{pmatrix}\alpha_j\\ \alpha_j^2,\\ -\alpha_j\\-\alpha_j^2\\ \alpha_j\\ \alpha_j/2\\ \alpha_j/3 \\ \vdots \end{pmatrix}.\]

set.seed(42)
n <- 200; N <- 20; p <- 10

ind <- as.factor(rep(1:N,each = n))
alpha_vec <- sort(runif(N, min = 0, max = 1))
X <- matrix(nrow = 0, ncol = 10)
Y <- numeric()
for (individual in seq_len(N)){
  X_ind <- matrix(rnorm(n*p,0,1), ncol = p)
  X <- rbind(X,X_ind)
  alpha <- alpha_vec[individual]
  beta <- rep(1,p) + c(alpha,alpha^2,-alpha,-alpha^2,alpha*1/seq_len(p-4)) 
  Y <- append(Y,X_ind%*%beta + rnorm(n,0,1))
}
data <- data.frame(X,"y" = Y, "ind" = ind)

The values of $\alpha$ are sorted increasingly, to be able to interpret the weights (which should be higher on the diagonal of the heatmap).
We can apply the same learners as above or work with different learners. For example we can use random forests from the ranger-package or

learner = mlr_learners$get("regr.ranger")
learner$param_set$values = list(num.trees = 50, mtry = 3)
model_rf <- ADS$new(data = data,
                 target = "y",
                 individ = "ind",
                 learner = learner,
                 iterations = 4)
model_rf$fit()

cross-validated elastic net from the glmnet-package.

model_glmnet <- ADS$new(data = data,
                 target = "y",
                 individ = "ind",
                 learner = mlr_learners$get("regr.cv_glmnet"),
                 iterations = 4)
model_glmnet$fit()

Next, we generate a test dataset data_test with the same coefficients and evaluate the performance of the estimators.

print(mean((data_test$y - model_rf$predict(newdata = data_test))^2))
#> [1] 4.555694
print(mean((data_test$y - model_glmnet$predict(newdata = data_test))^2))
#> [1] 1.25361

model_rf$heatmap()

model_glmnet$heatmap()

A nonlinear Example

set.seed(42)
n <- 200; N <- 10; p <- 10
ind <- as.factor(rep(1:N,each = n))
alpha_vec <- sort(runif(N, min = 0, max = 1))
X <- matrix(nrow = 0, ncol = 10)
Y <- numeric()
for (individual in seq_len(N)){
  X_ind <- matrix(rnorm(n*p,0,1), ncol = p)
  X <- rbind(X,X_ind)
  alpha <- alpha_vec[individual]
  Y <- append(Y,2*sin(alpha*X_ind[,1])+alpha^2*X_ind[,2]+rnorm(n,0,1))
}
data <- data.frame(X,"y" = Y, "ind" = ind)

learner = mlr_learners$get("regr.ranger")
learner$param_set$values = list(num.trees = 50, mtry = 3)
model_rf <- ADS$new(data = data,
                 target = "y",
                 individ = "ind",
                 learner = learner,
                 iterations = 4)
model_rf$fit()

print(mean((data_test$y - model_rf$predict(newdata = data_test))^2))
#> [1] 1.120383

model_rf$heatmap()