Sequential Covariate Modeling
After fitting a nonlinear mixed effect (NLME) model, some individual variability will have been captured by the random effects. Here we describe how to leverage machine learning to predict a part of the individual variability that was previously captured by random effects. This workflow can be used with any fitted NLME model, regardless of whether it used covariate information in the initial fitting. We call this model 'augmentation'.
The basic idea is to treat the modeling as a sequential process with the following steps:
- Fit an NLME model (with or without any embedded machine learning or covariate effects).
- Fit a machine learning model that uses baseline covariates to predict the empirical Bayes estimates of the fitted NLME model. This can be done in a few simple steps:
- Extract a target input-output mapping.
- Define and train a supervised machine learning model on this target.
- Augment the fitted NLME by embedding the newly trained covariate model into it.
These steps can be accomplished by only a few lines of code (click to expand).
# Assuming fpm is a FittedPumasModel obtained from Step 1
target = preprocess(fpm) # Step 2.1
nn = MLPDomain(numinputs(target), 10, 10, (numoutputs(target), identity); reg=L2())
fitted_nn = fit(nn, target) # Step 2.2
fpm_augment = augment(fpm, fitted_nn) # Step 3
We provide more details on each of these steps below.
Even though neural networks embedded in NLME models can process covariates and be fitted jointly, the sequential fitting approach has a few advantages. First, it is much more computationally efficient and gives results quicker. Moreover, it provides modularity between the NLME and the covariate modeling processes which is advantageous for keeping an iterative model development process simpler and faster. Finally, it can be useful for splitting up the modeling work across team members, some of whom might have more domain-specific knowledge and be better suited to do the NLME modeling while some might have more experience using machine learning to process potentially complex covariates.
We now turn to a more detailed description of the steps involved in this workflow.
Data Preprocessing
The first step in the augment workflow is the creation of a fitting target that will be used in to optimize a neural network later on. We create the fitting target
using the preprocess
function.
DeepPumas.preprocess Function
preprocess(fpm; ηs, covs)
preprocess(model, pop, param, approx; ηs, covs)
Preprocess covariates and posterior η approximations into a target x
→y
mapping for fitting.
x
is a standardized matrix of the covariate values for each Subject. y
is derived from empirical bayes estimates (EBEs) that have been standardized and then rescaled such that their standard deviations approximate the sensitivity of the loglikelihood with respect to the given EBE. This informs the trade-off that a fit
or a hyperopt
needs to make when it is impossible to fit everything at once. Some random effects impact patient outcomes more than others - better to emphasize getting them right.
The resulting FitTarget
contains these computed inputs (x
) and outputs (y
) for the target mapping as well as the transformations (pre
and post
) that facilitates going back and forth between this transformed mapping and the Subject
→ηs
mapping.
Arguments
Either
fpm::FittedPumasModel
- a fitted pumas model. The embedded data will be used to generate the input data and some version of the posterior η distribution will be used to generate the target output.
or
model::AbstractPumasModel
- a Pumas modelpop::Population
- a Population of training data.param::NamedTuple
- fitted parameters.approx::AbstractLikelihoodApproximation
- a likelihood approximation, e.g.,Pumas.FOCE()
.
Keyword arguments
ηs
- specify the posterior ηs to be used as targets. This should be an iterator ofSymbol
s. For exampleηs = (:η_CL, :η_Ka)
. (η is typed by\eta<tab>
)covs - specify which covariates of
fpm.data.covariates()
to use when generating input data. This should be an iterator ofSymbol
s. For examplecovs = (:AGE, :SEX)
.
Returns
Example
Let model
be a PumasModel
, pop
be a Population
.
fpm = fit(model, pop, sample_params(model), MAP(FOCE()))
target = preprocess(fpm)
preprocess(X, Y; standardize = false)
process the X and Y matrices into a FitTarget
for use with fit
or hyperopt
.
The number of columns in your matrices reflect the number of datapoints while the number of rows give the dimensionality of your input (X) and output (Y).
standardize
toggles a z-score standardization of X. This transformation is stored and can be applied to new input data fed to a model fitted towards this generated FitTarget
.
Model Definition
The second step is to define a neural network that will be trained to link baseline covariates to EBEs. The basic functionality to create neural networks has been described in Neural Network Model Domains. Briefly, having defined a target
via preprocess, we can simply use MLPDomain to create a basic multilayer perceptron:
nn = MLPDomain(numinputs(target), 10, 10, 10, (numoutputs(target), identity); reg=L2(1.))
Currently, only neural networks with basic layers (e.g. Flux.Dense
) are supported.
Model Fitting
Once a neural network that links the baseline covariates and EBEs is defined - we need to fit it. Here we specify how to do a single fit and below we show how to use hyper-optimization to fit the neural network as well as optimize hyper-parameters, like the regularization constant. Having defined a fitting target
via preprocess and a machine learning model ml
(e.g. via MLPDomain, but not exclusively) we simply use the fit
function:
StatsAPI.fit Function
fit(ml, target[, init]; kwargs...)
Fit a machine learning model, ml
, to a target
input-output mapping.
Arguments
ml::Union{FluxDomain, Chain, HoldoutValidationResult}
- A machine learning model, typically coming out of a call toMLPDomain
or afit
thereof.target::FitTarget
- a target input-output mapping, typically generated bypreprocess
.init=sample_params(nn)
- the initial parameters to start from. Defaults to a random sampling of a Glorot Normal distribution. see alsoinit_params
which also samples from a Glorot distribution but with a seed for reproducibility.
Keyword arguments
training_fraction = 1.0
- The fraction of data that should be used for training. The rest is withheld for validation.shuffle = true
- Should the data be shuffled before splitting into a training and a validation set?optim_alg = nothing
- Specify the optimization algorithm. OnlyAdam
is supported for all neural networks backends.Flux
-based models can use optimization algorithms fromOptimisers.jl
. Ifnothing
is passed then a freshfit
will useAdam
while a continued fit (whereml
is the result of a previousfit
) will re-use the previous algorithm.optim_options
- A NamedTuple of options to be passed on to the fitting. Valid options:rng
- a random number generator.loss
- the loss function, available options arel1
(default) andl2
.lambda
- the regularization constant. Defaults to0
or to any regularization or prior specified inml
.alpha
- the regularization mode.0
forL2
regularization and1
forL1
. Defaults to0
or to any Prior or regularization specified inml
.verbosity
- tune the amount of printouts during hyperoptimization. Defaults to1
. Set to0
to avoid printouts.batch_size
- Specify the batch sizes for the optimization algorithm (typicallyAdam
). Batching determines how many data points to use each time the gradient is calculated and a step in parameter space is taken. Batching introduces an often beneficial stochasticity in the optimization. This is good for avoiding local optima but is sometimes computationally costly. Defaults to including all data in one batch.epochs
- the number of iterations to optimize for. Defaults to10_000
.
fit(model, pop, alg; kwargs...)
Tries to fit an augmented PumasModel with a provided AbstractDataDrivenAlgorithm
. See the DataDrivenDiffEq documentation for available packages, algorithms and options.
Model Hyper-optimization
Hyper-optimization is an important step in model development in order to obtain good validation and generalization performance. Therefore, as part of the augment workflow we provide a hyperopt
function that can hyper-optimize a neural network to a given fitting target
. Currently, only the regularization constant is optimized.
DeepPumas.hyperopt Function
hyperopt(ml::FluxDomain, target::FitTarget[, alg])
hyperopt(ml::FluxModel, target::FitTarget[, alg])
Optimize the parameters and hyperparameters of ml
.
Arguments
ml
- Typically aFluxDomain
resulting from a call toMLPDomain
but plainFluxModel
s are also supported.target::FitTarget
- the target input-output mapping for the optimization. Typically the result ofpreprocess
.alg
(optional) - a specification of how to conduct the hyperparameter optimization. Currently, onlyGridSearch
is supported. Defaults toGridSearch()
.
Returns
HyperoptResult
which can be used to augment aPumasModel
(seeaugment
)
Model Augmentation
Having fitted a neural network to a target
we need to insert it into a FittedPumasModel
, we do this using the augment
function
DeepPumas.augment Function
augment(fpm, fitted_ml[, prior][, paramname])
Return an ML-augmented FittedPumasModel
.
Insert fitted_ml
between the @random
and @pre
block of the PumasModel in fpm
. fitted_ml
is evaluated for each Subject
and the resulting output is additively combined with the random effects to form the 'augmented random effects' where part of the inter-subject variation is captured by the random effects and part by the fitted_ml
. These augmented random effects replace the ordinary random effects used in the @pre
, @dosecontrol
, @derived
and @observed
blocks of the fpm
model.
fitted_ml
will be added as a Domain
to the @param
block, just like any other parameter, and it will have the name paramname
. Since the same name cannot be used twice, you may have to pass a different paramname
symbol if there is a conflict.
The resulting model will initialize its parameter values to the fitted coefficients of fpm
and the fitted parameter values of fitted_ml
such that init_params
often results in very good parameters.
The returned model is a normal FittedPumasModel
that can be used for any Pumas workflow. Its fitted coefficients are the same of fpm
with the addition of the fitted parameter values of fitted_ml
, often resulting in a good fit already.
Arguments
fpm::FittedPumasModel
- a fitted@model
to be augmented.fitted_ml
- the machine learning model to do the augmentation. Thisfitted_ml
could be the output of eitherfit
orhyperopt
orAugmentML
.prior::Prior
(optional) - specify the prior associated with the parameters of the embedded machine learning model. Defaults to the prior used infit
or the best prior coming out ofhyperopt
depending on howfitted_ml
was created.paramname::Symbol = :nn
- A name to identify the embedded ML within the augmented model.
Refitting the Random Effect Prior
Just to recap, the augment workflow so far consisted of the following steps
Get a
FittedPumasModel
by fitting to some data without using the covariatesUse the preprocess function on the
FittedPumasModel
to create a fittingtarget
Define a suitable neural network
ml
model that will link baseline covariates to random effect EBEsInsert the trained
ml
into aFittedPumasModel
via augment
Having gone through steps 1-5 note that the augmented FittedPumasModel
has two (aside from data) sources of information about EBEs - ml
that has been fit in step 4, and the random effect prior, parameterized by ml
captures some information (via baseline covariates) about the EBEs that were generated using the prior learned in step 1. Therefore, it is advisable to re-fit the fit
function with the constantcoef
keyword argument, fixing everything except
fpm_refitted = fit(
augmented_fpm.model,
augmented_fpm.data,
coef(augmented_fpm),
MAP(FOCE()),
constantcoef = filter(i -> i !== :Ω, keys(coef(augmented_fpm)))
)