Appearance
Sequential Covariate Modeling
After fitting a nonlinear mixed effect (NLME) model, some individual variability will have been captured by the random effects. Here we describe how to leverage machine learning to predict a part of the individual variability that was previously captured by random effects. This workflow can be used with any fitted NLME model, regardless of whether it used covariate information in the initial fitting. We call this model 'augmentation'.
The basic idea is to treat the modeling as a sequential process with the following steps:
- Fit an NLME model (with or without any embedded machine learning or covariate effects).
- Fit a machine learning model that uses baseline covariates to predict the empirical Bayes estimates of the fitted NLME model. This can be done in a few simple steps:
- Extract a target input-output mapping.
- Define and train a supervised machine learning model on this target.
- Augment the fitted NLME by embedding the newly trained covariate model into it.
These steps can be accomplished by only a few lines of code (click to expand).
julia
# Assuming fpm is a FittedPumasModel obtained from Step 1
target = preprocess(fpm) # Step 2.1
nn = MLPDomain(numinputs(target), 10, 10, (numoutputs(target), identity); reg=L2())
fitted_nn = fit(nn, target) # Step 2.2
fpm_augment = augment(fpm, fitted_nn) # Step 3We provide more details on each of these steps below.
Even though neural networks embedded in NLME models can process covariates and be fitted jointly, the sequential fitting approach has a few advantages. First, it is much more computationally efficient and gives results quicker. Moreover, it provides modularity between the NLME and the covariate modeling processes which is advantageous for keeping an iterative model development process simpler and faster. Finally, it can be useful for splitting up the modeling work across team members, some of whom might have more domain-specific knowledge and be better suited to do the NLME modeling while some might have more experience using machine learning to process potentially complex covariates.
We now turn to a more detailed description of the steps involved in this workflow.
Data Preprocessing
The first step in the Sequential Covariate Modeling workflow is the creation of a fitting target that will be used in to optimize a neural network later on. We create the fitting target using the preprocess function.
DeepPumas.preprocess Function
julia
preprocess(fpm; ηs, covs)
preprocess(model, pop, param, approx; ηs, covs)Preprocess covariates and posterior η approximations into a target x→y mapping for fitting.
x is a standardized matrix of the covariate values for each Subject. y is derived from empirical bayes estimates (EBEs) that have been standardized and then rescaled such that their standard deviations approximate the sensitivity of the loglikelihood with respect to the given EBE. This informs the trade-off that a fit or a hyperopt needs to make when it is impossible to fit everything at once. Some random effects impact patient outcomes more than others - better to emphasize getting them right.
The resulting FitTarget contains these computed inputs (x) and outputs (y) for the target mapping as well as the transformations (pre and post) that facilitates going back and forth between this transformed mapping and the Subject→ηs mapping.
Arguments
Either
fpm::FittedPumasModel- a fitted pumas model. The embedded data will be used to generate the input data and some version of the posterior η distribution will be used to generate the target output.
or
model::AbstractPumasModel- a Pumas modelpop::Population- a Population of training data.param::NamedTuple- fitted parameters.approx::AbstractLikelihoodApproximation- a likelihood approximation, e.g.,Pumas.FOCE().
Keyword arguments
ηs- specify the posterior ηs to be used as targets. This should be an iterator ofSymbols. For exampleηs = (:η_CL, :η_Ka). (η is typed by\eta<tab>)covs - specify which covariates of
fpm.data.covariates()to use when generating input data. This should be an iterator ofSymbols. For examplecovs = (:AGE, :SEX).
Returns
Example
Let model be a PumasModel, pop be a Population.
julia
fpm = fit(model, pop, sample_params(model), MAP(FOCE()))
target = preprocess(fpm)julia
preprocess(X, Y; standardize = false)process the X and Y matrices into a FitTarget for use with fit or hyperopt.
The number of columns in your matrices reflect the number of datapoints while the number of rows give the dimensionality of your input (X) and output (Y).
standardize toggles a z-score standardization of X. This transformation is stored and can be applied to new input data fed to a model fitted towards this generated FitTarget.
Model Definition
The second step is to define a neural network that will be trained to link baseline covariates to EBEs. The basic functionality to create neural networks has been described in Neural Network Model Domains. Briefly, having defined a target via preprocess, we can simply use MLPDomain to create a basic multilayer perceptron:
julia
nn = MLPDomain(numinputs(target), 10, 10, 10, (numoutputs(target), identity); reg=L2(1.))Currently, only neural networks with basic layers (e.g. Flux.Dense) are supported.
Model Fitting
Once a neural network that links the baseline covariates and EBEs is defined - we need to fit it. Here we specify how to do a single fit and below we show how to use hyper-optimization to fit the neural network as well as optimize hyper-parameters, like the regularization constant. Having defined a fitting target via preprocess and a machine learning model ml (e.g. via MLPDomain, but not exclusively) we simply use the fit function:
StatsAPI.fit Function
julia
fit(ml, target[, init]; kwargs...)Fit a machine learning model, ml, to a target input-output mapping.
Arguments
ml::Union{FluxDomain, Chain, HoldoutValidationResult}- A machine learning model, typically coming out of a call toMLPDomainor afitthereof.target::FitTarget- a target input-output mapping, typically generated bypreprocess.init=sample_params(nn)- the initial parameters to start from. Defaults to a random sampling of a Glorot Normal distribution. see alsoinit_paramswhich also samples from a Glorot distribution but with a seed for reproducibility.
Keyword arguments
training_fraction = 1.0- The fraction of data that should be used for training. The rest is withheld for validation.shuffle = true- Should the data be shuffled before splitting into a training and a validation set?optim_alg = nothing- Specify the optimization algorithm. OnlyAdamis supported for all neural networks backends.Flux-based models can use optimization algorithms fromOptimisers.jl. Ifnothingis passed then a freshfitwill useAdamwhile a continued fit (wheremlis the result of a previousfit) will re-use the previous algorithm.optim_options- A NamedTuple of options to be passed on to the fitting. Valid options:rng- a random number generator.loss- the loss function, available options arel1(default) andl2.lambda- the regularization constant. Defaults to0or to any regularization or prior specified inml.alpha- the regularization mode.0forL2regularization and1forL1. Defaults to0or to any Prior or regularization specified inml.verbosity- tune the amount of printouts during hyperoptimization. Defaults to1. Set to0to avoid printouts.batch_size- Specify the batch sizes for the optimization algorithm (typicallyAdam). Batching determines how many data points to use each time the gradient is calculated and a step in parameter space is taken. Batching introduces an often beneficial stochasticity in the optimization. This is good for avoiding local optima but is sometimes computationally costly. Defaults to including all data in one batch.epochs- the number of iterations to optimize for. Defaults to10_000.
julia
fit(model, pop, alg; kwargs...)Tries to fit an augmented PumasModel with a provided AbstractDataDrivenAlgorithm. See the DataDrivenDiffEq documentation for available packages, algorithms and options.
Model Hyper-optimization
Hyper-optimization is an important step in model development in order to obtain good validation and generalization performance. Therefore, as part of the Sequential Covariate Modeling workflow we provide a hyperopt function that can hyper-optimize a neural network to a given fitting target. Currently, only the regularization constant is optimized.
DeepPumas.hyperopt Function
julia
hyperopt(ml::FluxDomain, target::FitTarget[, alg])
hyperopt(ml::FluxModel, target::FitTarget[, alg])Optimize the parameters and hyperparameters of ml.
Arguments
ml- Typically aFluxDomainresulting from a call toMLPDomainbut plainFluxModels are also supported.target::FitTarget- the target input-output mapping for the optimization. Typically the result ofpreprocess.alg(optional) - a specification of how to conduct the hyperparameter optimization. Currently, onlyGridSearchis supported. Defaults toGridSearch().
Returns
HyperoptResultwhich can be used to augment aPumasModel(seeaugment)
Model Augmentation
Having fitted a neural network to a target we need to insert it into a FittedPumasModel, we do this using the augment function
DeepPumas.augment Function
julia
augment(fpm, fitted_ml[, prior][, paramname])Return an ML-augmented FittedPumasModel.
Insert fitted_ml between the @random and @pre block of the PumasModel in fpm. fitted_ml is evaluated for each Subject and the resulting output is additively combined with the random effects to form the 'augmented random effects' where part of the inter-subject variation is captured by the random effects and part by the fitted_ml. These augmented random effects replace the ordinary random effects used in the @pre, @dosecontrol, @derived and @observed blocks of the fpm model.
fitted_ml will be added as a Domain to the @param block, just like any other parameter, and it will have the name paramname. Since the same name cannot be used twice, you may have to pass a different paramname symbol if there is a conflict.
The resulting model will initialize its parameter values to the fitted coefficients of fpm and the fitted parameter values of fitted_ml such that init_params often results in very good parameters.
The returned model is a normal FittedPumasModel that can be used for any Pumas workflow. Its fitted coefficients are the same of fpm with the addition of the fitted parameter values of fitted_ml, often resulting in a good fit already.
Arguments
fpm::FittedPumasModel- a fitted@modelto be augmented.fitted_ml- the machine learning model to do the augmentation. Thisfitted_mlcould be the output of eitherfitorhyperoptorAugmentML.prior::Prior(optional) - specify the prior associated with the parameters of the embedded machine learning model. Defaults to the prior used infitor the best prior coming out ofhyperoptdepending on howfitted_mlwas created.paramname::Symbol = :nn- A name to identify the embedded ML within the augmented model.
Refitting the Random Effect Prior
Just to recap, the Sequential Covariate Modeling workflow so far consisted of the following steps
Get a
FittedPumasModelby fitting to some data without using the covariatesUse the preprocess function on the
FittedPumasModelto create a fittingtargetDefine a suitable neural network
mlmodel that will link baseline covariates to random effect EBEsInsert the trained
mlinto aFittedPumasModelvia augment
Having gone through steps 1-5 note that the augmented FittedPumasModel has two (aside from data) sources of information about EBEs - ml that has been fit in step 4, and the random effect prior, parameterized by ml captures some information (via baseline covariates) about the EBEs that were generated using the prior learned in step 1. Therefore, it is advisable to re-fit the fit function with the constantcoef keyword argument, fixing everything except
julia
fpm_refitted = fit(
augmented_fpm.model,
augmented_fpm.data,
coef(augmented_fpm),
MAP(FOCE()),
constantcoef = filter(i -> i !== :Ω, keys(coef(augmented_fpm)))
)