Skip to content

Sequential Covariate Modeling

After fitting a nonlinear mixed effect (NLME) model, some individual variability will have been captured by the random effects. Here we describe how to leverage machine learning to predict a part of the individual variability that was previously captured by random effects. This workflow can be used with any fitted NLME model, regardless of whether it used covariate information in the initial fitting. We call this model 'augmentation'.

The basic idea is to treat the modeling as a sequential process with the following steps:

  1. Fit an NLME model (with or without any embedded machine learning or covariate effects).
  2. Fit a machine learning model that uses baseline covariates to predict the empirical Bayes estimates of the fitted NLME model. This can be done in a few simple steps:
    1. Extract a target input-output mapping.
    2. Define and train a supervised machine learning model on this target.
  3. Augment the fitted NLME by embedding the newly trained covariate model into it.
These steps can be accomplished by only a few lines of code (click to expand).
julia
# Assuming fpm is a FittedPumasModel obtained from Step 1
target = preprocess(fpm) # Step 2.1
nn = MLPDomain(numinputs(target), 10, 10, (numoutputs(target), identity); reg=L2())
fitted_nn = fit(nn, target) # Step 2.2
fpm_augment = augment(fpm, fitted_nn) # Step 3

We provide more details on each of these steps below.

Even though neural networks embedded in NLME models can process covariates and be fitted jointly, the sequential fitting approach has a few advantages. First, it is much more computationally efficient and gives results quicker. Moreover, it provides modularity between the NLME and the covariate modeling processes which is advantageous for keeping an iterative model development process simpler and faster. Finally, it can be useful for splitting up the modeling work across team members, some of whom might have more domain-specific knowledge and be better suited to do the NLME modeling while some might have more experience using machine learning to process potentially complex covariates.

We now turn to a more detailed description of the steps involved in this workflow.

Data Preprocessing

The first step in the augment workflow is the creation of a fitting target that will be used in to optimize a neural network later on. We create the fitting target using the preprocess function.

DeepPumas.preprocess Function
julia
preprocess(fpm; ηs, covs)
preprocess(model, pop, param, approx; ηs, covs)

Preprocess covariates and posterior η approximations into a target xy mapping for fitting.

x is a standardized matrix of the covariate values for each Subject. y is derived from empirical bayes estimates (EBEs) that have been standardized and then rescaled such that their standard deviations approximate the sensitivity of the loglikelihood with respect to the given EBE. This informs the trade-off that a fit or a hyperopt needs to make when it is impossible to fit everything at once. Some random effects impact patient outcomes more than others - better to emphasize getting them right.

The resulting FitTarget contains these computed inputs (x) and outputs (y) for the target mapping as well as the transformations (pre and post) that facilitates going back and forth between this transformed mapping and the Subjectηs mapping.

Arguments

Either

  • fpm::FittedPumasModel - a fitted pumas model. The embedded data will be used to generate the input data and some version of the posterior η distribution will be used to generate the target output.

or

  • model::AbstractPumasModel - a Pumas model

  • pop::Population - a Population of training data.

  • param::NamedTuple - fitted parameters.

  • approx::AbstractLikelihoodApproximation - a likelihood approximation, e.g., Pumas.FOCE().

Keyword arguments

  • ηs - specify the posterior ηs to be used as targets. This should be an iterator of Symbols. For example ηs = (:η_CL, :η_Ka). (η is typed by \eta<tab>)

  • covs - specify which covariates of fpm.data.covariates() to use when generating input data. This should be an iterator of Symbols. For example covs = (:AGE, :SEX).

Returns

Example

Let model be a PumasModel, pop be a Population.

julia
fpm = fit(model, pop, sample_params(model), MAP(FOCE()))
target = preprocess(fpm)

source

julia
preprocess(X, Y; standardize = false)

process the X and Y matrices into a FitTarget for use with fit or hyperopt.

The number of columns in your matrices reflect the number of datapoints while the number of rows give the dimensionality of your input (X) and output (Y).

standardize toggles a z-score standardization of X. This transformation is stored and can be applied to new input data fed to a model fitted towards this generated FitTarget.

source

Model Definition

The second step is to define a neural network that will be trained to link baseline covariates to EBEs. The basic functionality to create neural networks has been described in Neural Network Model Domains. Briefly, having defined a target via preprocess, we can simply use MLPDomain to create a basic multilayer perceptron:

julia
nn = MLPDomain(numinputs(target), 10, 10, 10, (numoutputs(target), identity); reg=L2(1.))

Currently, only neural networks with basic layers (e.g. Flux.Dense) are supported.

Model Fitting

Once a neural network that links the baseline covariates and EBEs is defined - we need to fit it. Here we specify how to do a single fit and below we show how to use hyper-optimization to fit the neural network as well as optimize hyper-parameters, like the regularization constant. Having defined a fitting target via preprocess and a machine learning model ml (e.g. via MLPDomain, but not exclusively) we simply use the fit function:

StatsAPI.fit Function
julia
fit(ml, target[, init]; kwargs...)

Fit a machine learning model, ml, to a target input-output mapping.

Arguments

  • ml::Union{FluxDomain, Chain, HoldoutValidationResult} - A machine learning model, typically coming out of a call to MLPDomain or a fit thereof.

  • target::FitTarget - a target input-output mapping, typically generated by preprocess.

  • init=sample_params(nn) - the initial parameters to start from. Defaults to a random sampling of a Glorot Normal distribution. see also init_params which also samples from a Glorot distribution but with a seed for reproducibility.

Keyword arguments

  • training_fraction = 1.0 - The fraction of data that should be used for training. The rest is withheld for validation.

  • shuffle = true - Should the data be shuffled before splitting into a training and a validation set?

  • optim_alg = nothing - Specify the optimization algorithm. Only Adam is supported for all neural networks backends. Flux-based models can use optimization algorithms from Optimisers.jl. If nothing is passed then a fresh fit will use Adam while a continued fit (where ml is the result of a previous fit) will re-use the previous algorithm.

  • optim_options - A NamedTuple of options to be passed on to the fitting. Valid options:

    • rng - a random number generator.

    • loss - the loss function, available options are l1 (default) and l2.

    • lambda - the regularization constant. Defaults to 0 or to any regularization or prior specified in ml.

    • alpha - the regularization mode. 0 for L2 regularization and 1 for L1. Defaults to 0 or to any Prior or regularization specified in ml.

    • verbosity - tune the amount of printouts during hyperoptimization. Defaults to 1. Set to 0 to avoid printouts.

    • batch_size - Specify the batch sizes for the optimization algorithm (typically Adam). Batching determines how many data points to use each time the gradient is calculated and a step in parameter space is taken. Batching introduces an often beneficial stochasticity in the optimization. This is good for avoiding local optima but is sometimes computationally costly. Defaults to including all data in one batch.

    • epochs - the number of iterations to optimize for. Defaults to 10_000.

source

julia
fit(model, pop, alg; kwargs...)

Tries to fit an augmented PumasModel with a provided AbstractDataDrivenAlgorithm. See the DataDrivenDiffEq documentation for available packages, algorithms and options.

source

Model Hyper-optimization

Hyper-optimization is an important step in model development in order to obtain good validation and generalization performance. Therefore, as part of the augment workflow we provide a hyperopt function that can hyper-optimize a neural network to a given fitting target. Currently, only the regularization constant is optimized.

DeepPumas.hyperopt Function
julia
hyperopt(ml::FluxDomain, target::FitTarget[, alg])
hyperopt(ml::FluxModel, target::FitTarget[, alg])

Optimize the parameters and hyperparameters of ml.

Arguments

  • ml - Typically a FluxDomain resulting from a call to MLPDomain but plain FluxModels are also supported.

  • target::FitTarget - the target input-output mapping for the optimization. Typically the result of preprocess.

  • alg (optional) - a specification of how to conduct the hyperparameter optimization. Currently, only GridSearch is supported. Defaults to GridSearch().

Returns

  • HyperoptResult which can be used to augment a PumasModel (see augment)

source

Model Augmentation

Having fitted a neural network to a target we need to insert it into a FittedPumasModel, we do this using the augment function

DeepPumas.augment Function
julia
augment(fpm, fitted_ml[, prior][, paramname])

Return an ML-augmented FittedPumasModel.

Insert fitted_ml between the @random and @pre block of the PumasModel in fpm. fitted_ml is evaluated for each Subject and the resulting output is additively combined with the random effects to form the 'augmented random effects' where part of the inter-subject variation is captured by the random effects and part by the fitted_ml. These augmented random effects replace the ordinary random effects used in the @pre, @dosecontrol, @derived and @observed blocks of the fpm model.

fitted_ml will be added as a Domain to the @param block, just like any other parameter, and it will have the name paramname. Since the same name cannot be used twice, you may have to pass a different paramname symbol if there is a conflict.

The resulting model will initialize its parameter values to the fitted coefficients of fpm and the fitted parameter values of fitted_ml such that init_params often results in very good parameters.

The returned model is a normal FittedPumasModel that can be used for any Pumas workflow. Its fitted coefficients are the same of fpm with the addition of the fitted parameter values of fitted_ml, often resulting in a good fit already.

Arguments

  • fpm::FittedPumasModel - a fitted @model to be augmented.

  • fitted_ml - the machine learning model to do the augmentation. This fitted_ml could be the output of either fit or hyperopt or AugmentML.

  • prior::Prior (optional) - specify the prior associated with the parameters of the embedded machine learning model. Defaults to the prior used in fit or the best prior coming out of hyperopt depending on how fitted_ml was created.

  • paramname::Symbol = :nn - A name to identify the embedded ML within the augmented model.

source

Refitting the Random Effect Prior

Just to recap, the augment workflow so far consisted of the following steps

  1. Get a FittedPumasModel by fitting to some data without using the covariates

  2. Use the preprocess function on the FittedPumasModel to create a fitting target

  3. Define a suitable neural network ml model that will link baseline covariates to random effect EBEs

  4. Train ml to the target either via fit or hyperopt

  5. Insert the trained ml into a FittedPumasModel via augment

Having gone through steps 1-5 note that the augmented FittedPumasModel has two (aside from data) sources of information about EBEs - ml that has been fit in step 4, and the random effect prior, parameterized by Ω, that has been learned in step 1. However, ml captures some information (via baseline covariates) about the EBEs that were generated using the prior learned in step 1. Therefore, it is advisable to re-fit the Ω to take this into account, otherwise the random effect prior may be too wide. This can be done by using the fit function with the constantcoef keyword argument, fixing everything except Ω:

julia
fpm_refitted = fit(
  augmented_fpm.model,
  augmented_fpm.data,
  coef(augmented_fpm),
  MAP(FOCE()),
  constantcoef = filter(i -> i !==, keys(coef(augmented_fpm)))
)