Appearance
Clustering Functions
In exploratory data analysis, one of the most common approaches is clustering: grouping similar elements in a collection. In pharmacometrics, this can be usually be applied to subjects or variables (covariates, biomarkers, etc.) associated with them. Accordingly, DeepPumas.jl provides the functions cluster_subjects and cluster_variables, respectively. Also, ClusterResults, medoids, cluster_count and total_cost are provided as utilities.
DeepPumas.cluster_subjects Function
julia
cluster_subjects(
popDF::DataFrame,
variables::Union{Symbol, Vector{Symbol}},
k::Integer;
standardize,
baseline,
init,
maxiter,
tol,
display,
)Group subjects in population popDF into k clusters based on one or more variables (covariates, biomarkers, etc.) using K-Medoids clustering. The pairwise distance matrix uses dynamic time warping (handles varying numbers of measurements). Variables have to be numeric, finite and without missing values. A ClusterResults object is returned. Other keyword arguments:
standardize = true: standardize each variablebaseline = falses(length(variables)): per variable, indicate if only baseline values should be used, which are taken from the first row associated with each subjectinit = :kmpp: initialization of medoids. Can be a vector ofksubject IDs, or aSymbolindicating a seeding algorithm. For more details see Clustering.kmedoidsmaxiter = 200: maximum number of iterationstol = 1e-8: minimum change in objective value until convergencedisplay = :none: verbosity.:noneshows nothing.:finalsummarizes results after clustering.:itershows the progress at each iteration.
The related function cluster_variables is used to cluster similar variables (e.g., covariates, biomarkers) in a population. See also medoids, cluster_count, total_cost.
DeepPumas.cluster_variables Function
julia
cluster_variables(
popDF::DataFrame,
variables::Vector{Symbol},
k::Integer;
standardize,
baseline,
init,
maxiter,
tol,
display,
cluster_negative,
)Group variables (covariates, biomarkers, etc.) in population popDF into k clusters using K-Medoids clustering. The pairwise distance matrix uses dynamic time warping (handles varying numbers of measurements). Variables have to be numeric, finite and without missing values. Other keyword arguments:
standardize = true: standardize each variablebaseline = falses(length(variables)): per variable, indicate if only baseline values should be used, which are taken from the first row associated with each subjectinit = :kmpp: initialization of medoids. Can be a vector ofkvariable indices, or aSymbolindicating a seeding algorithm. For more details see Clustering.kmedoidsmaxiter = 200: maximum number of iterationstol = 1e-8: minimum change in objective value until convergencedisplay = :none: verbosity.:noneshows nothing.:finalsummarizes results after clustering.:itershows the progress at each iterationcluster_negative = false: if variables negatively correlated should be clustered together.
The related function cluster_subjects clusters subjects according to the given variables. See also medoids, cluster_count, total_cost and ClusterResults.
DeepPumas.ClusterResults Type
julia
ClusterResultsObject returned by cluster_subjects and cluster_variables. Contains the following fields:
assignments:DataFramewith columnssubject(orvariable),cluster(assignments),cost(distance from point to cluster medoid),cluster_center(medoid of respective cluster)iterations: number of iterations the algorithm ran forconverged: boolean informing if algorithm converged or not.
See also medoids, cluster_count, total_cost.
DeepPumas.medoids Function
julia
medoids(cr::ClusterResults)Return medoids of clusters.
See also cluster_variables, cluster_subjects and DeepPumas.ClusterResults.
DeepPumas.cluster_count Function
julia
cluster_count(cr::ClusterResults)Return number of elements in each cluster.
See also cluster_variables, cluster_subjects and ClusterResults.
DeepPumas.total_cost Function
julia
total_cost(cr::ClusterResults)Return sum of distances from each element to the medoid of its cluster.
See also cluster_variables, cluster_subjects and ClusterResults.