Predictions and Realisations

Usage on several models can be seen in the examples section, such as for the Logistic Model.

Predictions

LikelihoodBasedProfileWiseAnalysis.check_dimensional_prediction_coverageFunction
check_dimensional_prediction_coverage(data_generator::Function, 
    generator_args::Union{Tuple, NamedTuple},
    t::AbstractVector,
    model::LikelihoodModel, 
    N::Int, 
    num_points_to_sample::Union{Int, Vector{<:Int}},
    θtrue::AbstractVector{<:Real}, 
    θindices::Union{Vector{Vector{Int}}, Vector{Vector{Symbol}}},
    θinitialguess::AbstractVector{<:Real}=θtrue;
    <keyword arguments>)

Performs a simulation to estimate the prediction coverage of dimensional confidence samples (including full likelihood samples) for parameters in θindices given a model by:

  1. Repeatedly drawing new observed data using data_generator for fixed true parameter values, θtrue, and fixed true prediction value.
  2. Fitting the model.
  3. Sampling points using sample_type.
  4. Evaluating predictions from the points in the samples and finding the prediction extrema.
  5. Checking whether the prediction extrema contain the true prediction value(s), in a pointwise and simultaneous fashion. The estimated simultaneous coverage is returned with a default 95% confidence interval within a DataFrame.

The prediction coverage from combining the prediction sets of multiple confidence profiles, choosing 1 to length(θindices) random combinations of θindices, is also evaluated (i.e. the final result is the union over all profiles in θindices).

Arguments

  • data_generator: a function with two arguments which generates data for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The two arguments must be the vector of true model parameters, θtrue, and a Tuple or NamedTuple, generator_args. Outputs a data Tuple or NamedTuple that corresponds to the log-likelihood function contained in model.
  • generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via generator_args.
  • t: a vector of time points to compute predictions and evaluate coverage at.
  • model: a LikelihoodModel containing model information.
  • N: a positive number of coverage simulations.
  • num_points_to_sample: integer number of points to sample (for UniformRandomSamples and LatinHypercubeSamples sample types). For the UniformGridSamples sample type, if integer it is the number of points to grid over in each parameter dimension. If it is a vector of integers each index of the vector is the number of points to grid over in the corresponding parameter dimension. For example, [1,2] would mean a single point in dimension 1 and two points in dimension 2.
  • θtrue: a vector of true parameters values of the model for simulating data with.
  • θindices: a vector of vectors of parameter indexes for the combinations of interest parameters to samples points from.
  • θinitialguess: a vector containing the initial guess for the values of each parameter. Used to find the MLE point in each iteration of the simulation. Default is θtrue.

Keyword Arguments

  • confidence_level: a number ∈ (0.0, 1.0) for the confidence level to find samples within and evaluate coverage at. Default is 0.95 (95%).
  • sample_type: the sampling method used to sample parameter space. Available sample types are UniformGridSamples, UniformRandomSamples and LatinHypercubeSamples. Default is LatinHypercubeSamples() (LatinHypercubeSamples).
  • lb: optional vector of lower bounds on parameters. Use to specify parameter lower bounds to sample over that are different than those contained in model.core. Default is Float64[] (use lower bounds from model.core).
  • ub: optional vector of upper bounds on parameters. Use to specify parameter upper bounds to sample over that are different than those contained in model.core. Default is Float64[] (use upper bounds from model.core).
  • θlb_nuisance: a vector of lower bounds on nuisance parameters, require θlb_nuisance .≤ model.core.θmle. Default is model.core.θlb.
  • θub_nuisance: a vector of upper bounds on nuisance parameters, require θub_nuisance .≥ model.core.θmle. Default is model.core.θub.
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • optimizationsettings: a OptimizationSettings containing the optimisation settings used to find optimal values of nuisance parameters for a given interest parameter value. Default is missing (will use default_OptimizationSettings() (see default_OptimizationSettings).
  • show_progress: boolean variable specifying whether to display progress bars on the percentage of simulation iterations completed and estimated time of completion. Default is model.show_progress.
  • distributed_over_parameters: boolean variable specifying whether to distribute the workload of the simulation across simulation iterations (false) or across the individual confidence profile calculations within each iteration (true). Default is false.
  • manual_GC_calls: boolean variable specifying whether to manually call garbage collection, GC.gc(), after every 10 iterations (distributed_over_parameters=true) or after every iteration on that worker (distributed_over_parameters=false). May be important to correctly free up memory for coverage simulations that use distributed or threaded workloads for Julia versions prior to v1.10.0. Default is false.

Details

This simulated coverage check is used to estimate the performance of propagating dimensional samples into prediction space. The simulation uses Distributed.jl to parallelise the workload.

The uncertainty in estimates of the prediction coverage under the simulated model will decrease as the number of simulations, N, is increased. Confidence intervals for the coverage estimate are provided to quantify this uncertainty. The confidence interval for the estimated coverage is a Clopper-Pearson interval on a binomial test generated using HypothesisTests.jl.

Support for `dof`

Setting the degrees of freedom of a sampled parameter confidence set to a value other than the interest parameter dimensionality is not currently supported (e.g. as supported for univariate and bivariate profiles). Support may be added in the future.

Recommended setting for distributed_over_parameters
  • If the number of processes available to use is significantly greater than the number of model parameters or only a few pairs of model parameters are being checked for coverage, false is recommended.
  • If system memory or model size in system memory is a concern, or the number of processes available is similar or less than the number of pairs of model parameters being checked, true will likely be more appropriate.
  • When set to false, a separate LikelihoodModel struct will be used by each process, as opposed to only one when set to true, which could cause a memory issue for larger models.
source
LikelihoodBasedProfileWiseAnalysis.check_univariate_prediction_coverageFunction
check_univariate_prediction_coverage(data_generator::Function, 
    generator_args::Union{Tuple, NamedTuple},
    model::LikelihoodModel, 
    N::Int, 
    θtrue::AbstractVector{<:Real}, 
    θs::AbstractVector{<:Int64},
    θinitialguess::AbstractVector{<:Real}=θtrue; 
    <keyword arguments>)

Performs a simulation to estimate the prediction coverage of univariate confidence profiles for parameters in θs given a model by:

  1. Repeatedly drawing new observed data using data_generator for fixed true parameter values, θtrue, and fixed true prediction value.
  2. Fitting the model and univariate confidence intervals.
  3. Sampling points along the profile within the confidence intervals.
  4. Evaluating predictions from the points in the profile and finding the prediction extrema.
  5. Checking whether the prediction extrema contain the true prediction value(s), in a pointwise and simultaneous fashion. The estimated simultaneous coverage is returned with a default 95% confidence interval within a DataFrame.

The prediction coverage from combining the prediction sets of multiple confidence profiles, choosing 1 to length(θs) random combinations of θs, is also evaluated (i.e. the final result is the union over all profiles in θs).

Arguments

  • data_generator: a function with two arguments which generates data for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The two arguments must be the vector of true model parameters, θtrue, and a Tuple or NamedTuple, generator_args. Outputs a data Tuple or NamedTuple that corresponds to the log-likelihood function contained in model.
  • generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via generator_args.
  • t: a vector of time points to compute predictions and evaluate coverage at.
  • model: a LikelihoodModel containing model information.
  • N: a positive number of coverage simulations.
  • θtrue: a vector of true parameters values of the model for simulating data with.
  • θs: a vector of parameters to profile, as a vector of model parameter indexes.
  • θinitialguess: a vector containing the initial guess for the values of each parameter. Used to find the MLE point in each iteration of the simulation. Default is θtrue.

Keyword Arguments

  • num_points_in_interval: an integer number of points to optionally evaluate within the confidence interval for each interest parameter using get_points_in_intervals!. Points are linearly spaced in the interval. Useful for predictions from univariate profiles. Default is 0.
  • confidence_level: a number ∈ (0.0, 1.0) for the confidence level to evaluate the confidence interval coverage at. Default is 0.95 (95%).
  • dof: an integer ∈ [1, model.core.num_pars] for the degrees of freedom used to define the asymptotic threshold (LikelihoodBasedProfileWiseAnalysis.get_target_loglikelihood) which defines the extremities of the univariate profile, i.e. the confidence interval. For parameter confidence intervals that are considered individually, it should be set to 1. For intervals that are considered simultaneously, it should be set to the number of intervals that are being calculated, i.e. model.core.num_pars when we wish the confidence interval for every parameter to hold simultaneously. Default is 1. Setting it to model.core.num_pars should be reasonable when making predictions for well-identified models with <10 parameters. Note: values other than 1 and model.core.num_pars may not have a clear statistical interpretation.
  • profile_type: whether to use the true log-likelihood function or an ellipse approximation of the log-likelihood function centred at the MLE (with optional use of parameter bounds). Available profile types are LogLikelihood, EllipseApprox and EllipseApproxAnalytical. Default is LogLikelihood() (LogLikelihood).
  • θlb_nuisance: a vector of lower bounds on nuisance parameters, require θlb_nuisance .≤ model.core.θmle. Default is model.core.θlb.
  • θub_nuisance: a vector of upper bounds on nuisance parameters, require θub_nuisance .≥ model.core.θmle. Default is model.core.θub.
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • optimizationsettings: a OptimizationSettings containing the optimisation settings used to find optimal values of nuisance parameters for a given interest parameter value. Default is missing (will use default_OptimizationSettings() (see default_OptimizationSettings).
  • show_progress: boolean variable specifying whether to display progress bars on the percentage of simulation iterations completed and estimated time of completion. Default is model.show_progress.
  • distributed_over_parameters: boolean variable specifying whether to distribute the workload of the simulation across simulation iterations (false) or across the individual confidence interval calculations within each iteration (true). Default is false.
  • manual_GC_calls: boolean variable specifying whether to manually call garbage collection, GC.gc(), after every 10 iterations (distributed_over_parameters=true) or after every iteration on that worker (distributed_over_parameters=false). May be important to correctly free up memory for coverage simulations that use distributed or threaded workloads for Julia versions prior to v1.10.0. Default is false.

Details

This simulated coverage check is used to estimate the performance of propagating univariate parameter confidence intervals into prediction space. The simulation uses Distributed.jl to parallelise the workload.

The uncertainty in estimates of the prediction coverage under the simulated model will decrease as the number of simulations, N, is increased. Confidence intervals for the coverage estimate are provided to quantify this uncertainty. The confidence interval for the estimated coverage is a Clopper-Pearson interval on a binomial test generated using HypothesisTests.jl.

Recommended setting for distributed_over_parameters
  • If the number of processes available to use is significantly greater than the number of model parameters or only a few model parameters are being checked for coverage, false is recommended.
  • If system memory or model size in system memory is a concern, or the number of processes available is similar or less than the number of model parameters being checked, true will likely be more appropriate.
  • When set to false, a separate LikelihoodModel struct will be used by each process, as opposed to only one when set to true, which could cause a memory issue for larger models.
Not intended for use on bimodal univariate profile likelihoods

The current implementation only considers two extremes of the log-likelihood and whether the truth is between these two points. If the profile likelihood function is bimodal, it's possible the method has only found one set of correct confidence intervals (estimated coverage will be correct, but less than expected) or found one extrema on distinct sets (estimated coverage may be incorrect and will either be larger than expected or much lower than expected).

source
LikelihoodBasedProfileWiseAnalysis.check_bivariate_prediction_coverageFunction
check_bivariate_prediction_coverage(data_generator::Function, 
    generator_args::Union{Tuple, NamedTuple},
    t::AbstractVector,
    model::LikelihoodModel, 
    N::Int, 
    num_points::Union{Int, Vector{<:Int}},
    θtrue::AbstractVector{<:Real}, 
    θcombinations::Union{Vector{Vector{Int}}, Vector{Tuple{Int,Int}}},
    θinitialguess::AbstractVector{<:Real}=θtrue; 
    <keyword arguments>)

Performs a simulation to estimate the prediction coverage of bivariate confidence profiles for parameters in θcombinations given a model by:

  1. Repeatedly drawing new observed data using data_generator for fixed true parameter values, θtrue, and fixed true prediction value.
  2. Fitting the model and bivariate confidence boundaries.
  3. Sampling points within the polygon hull of the confidence boundaries.
  4. Evaluating predictions from the points in the profile and finding the prediction extrema.
  5. Checking whether the prediction extrema contain the true prediction value(s), in a pointwise and simultaneous fashion. The estimated simultaneous coverage is returned with a default 95% confidence interval within a DataFrame.

The prediction coverage from combining the prediction sets of multiple confidence profiles, choosing 1 to length(θcombinations) random combinations of θcombinations, is also evaluated (i.e. the final result is the union over all profiles in θcombinations).

Arguments

  • data_generator: a function with two arguments which generates data for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The two arguments must be the vector of true model parameters, θtrue, and a Tuple or NamedTuple, generator_args. Outputs a data Tuple or NamedTuple that corresponds to the log-likelihood function contained in model.
  • generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via generator_args.
  • t: a vector of time points to compute predictions and evaluate coverage at.
  • model: a LikelihoodModel containing model information.
  • N: a positive number of coverage simulations.
  • num_points: positive number of points to find on the boundary at the specified confidence level using a single method. Or a vector of positive numbers of boundary points to find for each method in method (if method is a vector of AbstractBivariateMethod). Set to at least 3 within the function as some methods need at least three points to work.
  • θtrue: a vector of true parameters values of the model for simulating data with.
  • θcombinations: a vector of pairs of parameters to profile, as a vector of vectors of model parameter indexes.
  • θinitialguess: a vector containing the initial guess for the values of each parameter. Used to find the MLE point in each iteration of the simulation. Default is θtrue.

Keyword Arguments

  • num_internal_points: an integer number of points to optionally evaluate within the a polygon hull approximation of a bivariate boundary for each interest parameter pair using sample_bivariate_internal_points!. Default is 0.
  • hullmethod: method of type AbstractBivariateHullMethod used to create a 2D polygon hull that approximates the bivariate boundary from a set of boundary points and internal points (method dependent). For available methods see bivariate_hull_methods(). Default is MPPHullMethod() (MPPHullMethod).
  • sample_type: either a UniformRandomSamples or LatinHypercubeSamples struct for how to sample internal points from the polygon hull. UniformRandomSamples are homogeneously sampled from the polygon and LatinHypercubeSamples use the intersection of a heuristically optimised Latin Hypercube sampling plan with the polygon. Default is LatinHypercubeSamples() (LatinHypercubeSamples).
  • confidence_level: a number ∈ (0.0, 1.0) for the confidence level on which to find the profile_type boundary. Default is 0.95 (95%).
  • dof: an integer ∈ [2, model.core.num_pars] for the degrees of freedom used to define the asymptotic threshold (LikelihoodBasedProfileWiseAnalysis.get_target_loglikelihood) which defines the boundary of the bivariate profile. For bivariate profiles that are considered individually, it should be set to 2. For profiles that are considered simultaneously, it should be set to model.core.num_pars. Default is 2. Setting it to model.core.num_pars should be reasonable when making predictions for well-identified models with <10 parameters. Note: values other than 2 and model.core.num_pars may not have a clear statistical interpretation.
  • θlb_nuisance: a vector of lower bounds on nuisance parameters, require θlb_nuisance .≤ model.core.θmle. Default is model.core.θlb.
  • θub_nuisance: a vector of upper bounds on nuisance parameters, require θub_nuisance .≥ model.core.θmle. Default is model.core.θub.
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • optimizationsettings: a OptimizationSettings containing the optimisation settings used to find optimal values of nuisance parameters for a given interest parameter value. Default is missing (will use default_OptimizationSettings() (see default_OptimizationSettings).
  • profile_type: whether to use the true log-likelihood function or an ellipse approximation of the log-likelihood function centred at the MLE (with optional use of parameter bounds). Available profile types are LogLikelihood, EllipseApprox and EllipseApproxAnalytical. Default is LogLikelihood() (LogLikelihood).
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • show_progress: boolean variable specifying whether to display progress bars on the percentage of simulation iterations completed and estimated time of completion. Default is model.show_progress.
  • distributed_over_parameters: boolean variable specifying whether to distribute the workload of the simulation across simulation iterations (false) or across the individual confidence interval calculations within each iteration (true). Default is false.
  • manual_GC_calls: boolean variable specifying whether to manually call garbage collection, GC.gc(), after every 10 iterations (distributed_over_parameters=true) or after every iteration on that worker (distributed_over_parameters=false). May be important to correctly free up memory for coverage simulations that use distributed or threaded workloads for Julia versions prior to v1.10.0. Default is false.

Details

This simulated coverage check is used to estimate the performance of propagating bivariate parameter confidence intervals into prediction space. The simulation uses Distributed.jl to parallelise the workload.

The uncertainty in estimates of the prediction coverage under the simulated model will decrease as the number of simulations, N, is increased. Confidence intervals for the coverage estimate are provided to quantify this uncertainty. The confidence interval for the estimated coverage is a Clopper-Pearson interval on a binomial test generated using HypothesisTests.jl.

Recommended setting for distributed_over_parameters
  • If the number of processes available to use is significantly greater than the number of model parameters or only a few pairs of model parameters are being checked for coverage, false is recommended.
  • If system memory or model size in system memory is a concern, or the number of processes available is similar or less than the number of pairs of model parameters being checked, true will likely be more appropriate.
  • When set to false, a separate LikelihoodModel struct will be used by each process, as opposed to only one when set to true, which could cause a memory issue for larger models.
May not work correctly on bimodal confidence boundaries

The current implementation constructs a single polygon with minimum polygon perimeter from the set of boundary points as the confidence boundary. If there are multiple distinct boundaries represented, then there will be edges connecting the distinct boundaries which the true parameter might be inside (but not inside either of the distinct boundaries).

source

Realisations

LikelihoodBasedProfileWiseAnalysis.check_dimensional_prediction_realisations_coverageFunction
check_dimensional_prediction_realisations_coverage(data_generator::Function,
    reference_set_generator::Function,
    training_generator_args::Union{Tuple,NamedTuple},
    testing_generator_args::Union{Tuple,NamedTuple},
    t::AbstractVector,
    model::LikelihoodModel, 
    N::Int, 
    num_points_to_sample::Union{Int, Vector{<:Int}},
    θtrue::AbstractVector{<:Real}, 
    θindices::Union{Vector{Vector{Int}}, Vector{Vector{Symbol}}},
    θinitialguess::AbstractVector{<:Real}=θtrue;
    <keyword arguments>)

Performs a simulation to estimate the prediction reference set and realisation coverage of dimensional confidence samples (including full likelihood samples) for parameters in θindices given a model by:

  1. Constructing the confidence_level reference set for predictions from the fixed true parameter values, θ_true.
  2. Repeatedly drawing new observed training data using data_generator and training_generator_args for fixed true parameter values, θtrue, and fixed true prediction value.
  3. Fitting the model using training data.
  4. Sampling points using sample_type.
  5. Evaluating predictions from the points in the samples and finding the prediction extrema (reference tolerance sets).
  6. Drawing new observed testing data using data_generator and training_generator_args for fixed true parameter values, θtrue, and fixed true prediction value.
  7. Checking whether the prediction extrema (reference tolerance set) contains the prediction reference set from Step 1, in a pointwise and simultaneous fashion.
  8. Checking whether the prediction extrema contain the observed testing data, in a pointwise and simultaneous fashion.
  9. The estimated simultaneous coverage of the reference set and the prediction realisations (observed testing data) is returned with a default 95% confidence interval, alongside pointwise coverage, within a DataFrame. We also provided an alternate 'simultaneous' statistic for prediction realisation coverage; rather than testing whether 100% of prediction realisations are covered we test whether simultaneous_alternate_proportion proportion of prediction realisations are covered.

The prediction coverage from combining the prediction sets of multiple confidence profiles, choosing 1 to length(θindices) random combinations of θindices, is also evaluated (i.e. the final result is the union over all profiles in θindices).

Arguments

  • data_generator: a function with two arguments which generates data for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The two arguments must be the vector of true model parameters, θtrue, and a Tuple or NamedTuple, generator_args. When used with training_generator_args, it outputs a data Tuple or NamedTuple that corresponds to the log-likelihood function contained in model. When used with testing_generator_args, it outputs an array containing the observed data to use as the test data set.
  • reference_set_generator: a function with three arguments which generates the confidence_level data reference set for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The three arguments must be the vector of true model parameters, θtrue, a Tuple or NamedTuple, generator_args, and a number (0.0, 1.0) for the confidence level at which to evaluate the reference set. When used with testing_generator_args it outputs a tuple of two arrays, (lq, uq), which contain the lower and upper quantiles of the reference set.
  • training_generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at, used to create the training set of data. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via training_generator_args.
  • testing_generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at, used to create the test data set. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via testing_generator_args.
  • t: a vector of time points to compute predictions and evaluate coverage at, which are the same as the time points used to create the test data set.
  • model: a LikelihoodModel containing model information.
  • N: a positive number of coverage simulations.
  • num_points_to_sample: integer number of points to sample (for UniformRandomSamples and LatinHypercubeSamples sample types). For the UniformGridSamples sample type, if integer it is the number of points to grid over in each parameter dimension. If it is a vector of integers each index of the vector is the number of points to grid over in the corresponding parameter dimension. For example, [1,2] would mean a single point in dimension 1 and two points in dimension 2.
  • θtrue: a vector of true parameters values of the model for simulating data with.
  • θindices: a vector of vectors of parameter indexes for the combinations of interest parameters to samples points from.
  • θinitialguess: a vector containing the initial guess for the values of each parameter. Used to find the MLE point in each iteration of the simulation. Default is θtrue.

Keyword Arguments

  • confidence_level: a number ∈ (0.0, 1.0) for the confidence level to find samples within and evaluate coverage at. Default is 0.95 (95%).
  • region: a Real number ∈ [0, 1] specifying the proportion of the density of the error model from which to evaluate the highest density region. Default is 0.95.
  • sample_type: the sampling method used to sample parameter space. Available sample types are UniformGridSamples, UniformRandomSamples and LatinHypercubeSamples. Default is LatinHypercubeSamples() (LatinHypercubeSamples).
  • lb: optional vector of lower bounds on parameters. Use to specify parameter lower bounds to sample over that are different than those contained in model.core. Default is Float64[] (use lower bounds from model.core).
  • ub: optional vector of upper bounds on parameters. Use to specify parameter upper bounds to sample over that are different than those contained in model.core. Default is Float64[] (use upper bounds from model.core).
  • θlb_nuisance: a vector of lower bounds on nuisance parameters, require θlb_nuisance .≤ model.core.θmle. Default is model.core.θlb.
  • θub_nuisance: a vector of upper bounds on nuisance parameters, require θub_nuisance .≥ model.core.θmle. Default is model.core.θub.
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • simultaneous_alternate_proportion: a number ∈ (0.0, 1.0) for the alternate 'simultaneous' coverage statistic, testing whether at least this proportion of prediction realisations are covered. Recommended to be equal to region. Default is 0.95 (95%).
  • optimizationsettings: a OptimizationSettings containing the optimisation settings used to find optimal values of nuisance parameters for a given interest parameter value. Default is missing (will use default_OptimizationSettings() (see default_OptimizationSettings).
  • show_progress: boolean variable specifying whether to display progress bars on the percentage of simulation iterations completed and estimated time of completion. Default is model.show_progress.
  • distributed_over_parameters: boolean variable specifying whether to distribute the workload of the simulation across simulation iterations (false) or across the individual confidence profile calculations within each iteration (true). Default is false.
  • manual_GC_calls: boolean variable specifying whether to manually call garbage collection, GC.gc(), after every 10 iterations (distributed_over_parameters=true) or after every iteration on that worker (distributed_over_parameters=false). May be important to correctly free up memory for coverage simulations that use distributed or threaded workloads for Julia versions prior to v1.10.0. Default is false.

Details

This simulated coverage check is used to estimate the performance of propagating dimensional samples into prediction realisation space. The simulation uses Distributed.jl to parallelise the workload.

The uncertainty in estimates of the prediction realisation coverage under the simulated model will decrease as the number of simulations, N, is increased. Confidence intervals for the coverage estimate are provided to quantify this uncertainty. The confidence interval for the estimated coverage is a Clopper-Pearson interval on a binomial test generated using HypothesisTests.jl.

Support for `dof`

Setting the degrees of freedom of a sampled parameter confidence set to a value other than the interest parameter dimensionality is not currently supported (e.g. as supported for univariate and bivariate profiles). Support may be added in the future.

Recommended setting for distributed_over_parameters
  • If the number of processes available to use is significantly greater than the number of model parameters or only a few pairs of model parameters are being checked for coverage, false is recommended.
  • If system memory or model size in system memory is a concern, or the number of processes available is similar or less than the number of pairs of model parameters being checked, true will likely be more appropriate.
  • When set to false, a separate LikelihoodModel struct will be used by each process, as opposed to only one when set to true, which could cause a memory issue for larger models.
source
LikelihoodBasedProfileWiseAnalysis.check_univariate_prediction_realisations_coverageFunction
check_univariate_prediction_realisations_coverage(data_generator::Function, 
    training_generator_args::Union{Tuple, NamedTuple},
    testing_generator_args::Union{Tuple, NamedTuple},
    model::LikelihoodModel, 
    N::Int, 
    θtrue::AbstractVector{<:Real}, 
    θs::AbstractVector{<:Int64},
    θinitialguess::AbstractVector{<:Real}=θtrue; 
    <keyword arguments>)

Performs a simulation to estimate the prediction reference set and realisation coverage of univariate confidence profiles for parameters in θs given a model by:

  1. Constructing the confidence_level reference set for predictions from the fixed true parameter values, θ_true.
  2. Repeatedly drawing new observed training data using data_generator and training_generator_args for fixed true parameter values, θtrue, and fixed true prediction value.
  3. Fitting the model using training data.
  4. Fitting the model and univariate confidence intervals using training data.
  5. Sampling points along the profile within the confidence intervals.
  6. Evaluating predictions from the points in the profile and finding the prediction extrema (reference tolerance sets).
  7. Drawing new observed testing data using data_generator and training_generator_args for fixed true parameter values, θtrue, and fixed true prediction value.
  8. Checking whether the prediction extrema (reference tolerance set) contains the prediction reference set from Step 1, in a pointwise and simultaneous fashion.
  9. Checking whether the prediction extrema contain the observed testing data, in a pointwise and simultaneous fashion.
  10. The estimated simultaneous coverage of the reference set and the prediction realisations (observed testing data) is returned with a default 95% confidence interval, alongside pointwise coverage, within a DataFrame. We also provided an alternate 'simultaneous' statistic for prediction realisation coverage; rather than testing whether 100% of prediction realisations are covered we test whether simultaneous_alternate_proportion proportion of prediction realisations are covered.

The coverage from combining the prediction reference sets of multiple confidence profiles, choosing 1 to length(θs) random combinations of θs, is also evaluated (i.e. the final result is the union over all profiles in θs).

Arguments

  • data_generator: a function with two arguments which generates data for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The two arguments must be the vector of true model parameters, θtrue, and a Tuple or NamedTuple, generator_args. When used with training_generator_args, it outputs a data Tuple or NamedTuple that corresponds to the log-likelihood function contained in model. When used with testing_generator_args, it outputs an array containing the observed data to use as the test data set.
  • reference_set_generator: a function with three arguments which generates the confidence_level data reference set for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The three arguments must be the vector of true model parameters, θtrue, a Tuple or NamedTuple, generator_args, and a number (0.0, 1.0) for the confidence level at which to evaluate the reference set. When used with testing_generator_args it outputs a tuple of two arrays, (lq, uq), which contain the lower and upper quantiles of the reference set.
  • training_generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at, used to create the training set of data. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via training_generator_args.
  • testing_generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at, used to create the test data set. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via testing_generator_args.
  • t: a vector of time points to compute predictions and evaluate coverage at, which are the same as the time points used to create the test data set.
  • model: a LikelihoodModel containing model information.
  • N: a positive number of coverage simulations.
  • θtrue: a vector of true parameters values of the model for simulating data with.
  • θs: a vector of parameters to profile, as a vector of model parameter indexes.
  • θinitialguess: a vector containing the initial guess for the values of each parameter. Used to find the MLE point in each iteration of the simulation. Default is θtrue.

Keyword Arguments

  • num_points_in_interval: an integer number of points to optionally evaluate within the confidence interval for each interest parameter using get_points_in_intervals!. Points are linearly spaced in the interval. Useful for predictions from univariate profiles. Default is 0.
  • confidence_level: a number ∈ (0.0, 1.0) for the confidence level to evaluate the confidence interval coverage at. Default is 0.95 (95%).
  • dof: an integer ∈ [1, model.core.num_pars] for the degrees of freedom used to define the asymptotic threshold (LikelihoodBasedProfileWiseAnalysis.get_target_loglikelihood) which defines the extremities of the univariate profile, i.e. the confidence interval. For parameter confidence intervals that are considered individually, it should be set to 1. For intervals that are considered simultaneously, it should be set to the number of intervals that are being calculated, i.e. model.core.num_pars when we wish the confidence interval for every parameter to hold simultaneously. Default is 1. Setting it to model.core.num_pars should be reasonable when making predictions for well-identified models with <10 parameters. Note: values other than 1 and model.core.num_pars may not have a clear statistical interpretation.
  • region: a Real number ∈ [0, 1] specifying the proportion of the density of the error model from which to evaluate the highest density region. Default is 0.95.
  • profile_type: whether to use the true log-likelihood function or an ellipse approximation of the log-likelihood function centred at the MLE (with optional use of parameter bounds). Available profile types are LogLikelihood, EllipseApprox and EllipseApproxAnalytical. Default is LogLikelihood() (LogLikelihood).
  • θlb_nuisance: a vector of lower bounds on nuisance parameters, require θlb_nuisance .≤ model.core.θmle. Default is model.core.θlb.
  • θub_nuisance: a vector of upper bounds on nuisance parameters, require θub_nuisance .≥ model.core.θmle. Default is model.core.θub.
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • simultaneous_alternate_proportion: a number ∈ (0.0, 1.0) for the alternate 'simultaneous' coverage statistic, testing whether at least this proportion of prediction realisations are covered. Recommended to be equal to region. Default is 0.95 (95%).
  • optimizationsettings: a OptimizationSettings containing the optimisation settings used to find optimal values of nuisance parameters for a given interest parameter value. Default is missing (will use default_OptimizationSettings() (see default_OptimizationSettings).
  • show_progress: boolean variable specifying whether to display progress bars on the percentage of simulation iterations completed and estimated time of completion. Default is model.show_progress.
  • distributed_over_parameters: boolean variable specifying whether to distribute the workload of the simulation across simulation iterations (false) or across the individual confidence interval calculations within each iteration (true). Default is false.
  • manual_GC_calls: boolean variable specifying whether to manually call garbage collection, GC.gc(), after every 10 iterations (distributed_over_parameters=true) or after every iteration on that worker (distributed_over_parameters=false). May be important to correctly free up memory for coverage simulations that use distributed or threaded workloads for Julia versions prior to v1.10.0. Default is false.

Details

This simulated coverage check is used to estimate the performance of propagating univariate parameter confidence intervals into prediction realisation space. The simulation uses Distributed.jl to parallelise the workload.

The uncertainty in estimates of the prediction realisation coverage under the simulated model will decrease as the number of simulations, N, is increased. Confidence intervals for the coverage estimate are provided to quantify this uncertainty. The confidence interval for the estimated coverage is a Clopper-Pearson interval on a binomial test generated using HypothesisTests.jl.

Recommended setting for distributed_over_parameters
  • If the number of processes available to use is significantly greater than the number of model parameters or only a few model parameters are being checked for coverage, false is recommended.
  • If system memory or model size in system memory is a concern, or the number of processes available is similar or less than the number of model parameters being checked, true will likely be more appropriate.
  • When set to false, a separate LikelihoodModel struct will be used by each process, as opposed to only one when set to true, which could cause a memory issue for larger models.
Not intended for use on bimodal univariate profile likelihoods

The current implementation only considers two extremes of the log-likelihood and whether the truth is between these two points. If the profile likelihood function is bimodal, it's possible the method has only found one set of correct confidence intervals (estimated coverage will be correct, but less than expected) or found one extrema on distinct sets (estimated coverage may be incorrect and will either be larger than expected or much lower than expected).

source
LikelihoodBasedProfileWiseAnalysis.check_bivariate_prediction_realisations_coverageFunction
check_bivariate_prediction_realisations_coverage(data_generator::Function, 
    reference_set_generator::Function,
    training_generator_args::Union{Tuple, NamedTuple},
    testing_generator_args::Union{Tuple, NamedTuple},
    t::AbstractVector,
    model::LikelihoodModel, 
    N::Int, 
    num_points::Union{Int, Vector{<:Int}},
    θtrue::AbstractVector{<:Real}, 
    θcombinations::Union{Vector{Vector{Int}}, Vector{Tuple{Int,Int}}},
    θinitialguess::AbstractVector{<:Real}=θtrue; 
    <keyword arguments>)

Performs a simulation to estimate the prediction reference set and realisation coverage of bivariate confidence profiles for parameters in θcombinations given a model by:

  1. Constructing the confidence_level reference set for predictions from the fixed true parameter values, θ_true.
  2. Repeatedly drawing new observed training data using data_generator and training_generator_args for fixed true parameter values, θtrue, and fixed true prediction value.
  3. Fitting the model using training data.
  4. Fitting the model and bivariate confidence boundaries using training data.
  5. Sampling points within the polygon hull of the confidence boundaries.
  6. Evaluating predictions from the points in the profile and finding the prediction extrema (reference tolerance sets).
  7. Drawing new observed testing data using data_generator and training_generator_args for fixed true parameter values, θtrue, and fixed true prediction value.
  8. Checking whether the prediction extrema (reference tolerance set) contains the prediction reference set from Step 1, in a pointwise and simultaneous fashion.
  9. Checking whether the prediction extrema contain the observed testing data, in a pointwise and simultaneous fashion.
  10. The estimated simultaneous coverage of the reference set and the prediction realisations (observed testing data) is returned with a default 95% confidence interval, alongside pointwise coverage, within a DataFrame. We also provided an alternate 'simultaneous' statistic for prediction realisation coverage; rather than testing whether 100% of prediction realisations are covered we test whether simultaneous_alternate_proportion proportion of prediction realisations are covered.

The prediction coverage from combining the prediction reference sets of multiple confidence profiles, choosing 1 to length(θcombinations) random combinations of θcombinations, is also evaluated (i.e. the final result is the union over all profiles in θcombinations).

Arguments

  • data_generator: a function with two arguments which generates data for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The two arguments must be the vector of true model parameters, θtrue, and a Tuple or NamedTuple, generator_args. When used with training_generator_args, it outputs a data Tuple or NamedTuple that corresponds to the log-likelihood function contained in model. When used with testing_generator_args, it outputs an array containing the observed data to use as the test data set.
  • reference_set_generator: a function with three arguments which generates the confidence_level data reference set for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The three arguments must be the vector of true model parameters, θtrue, a Tuple or NamedTuple, generator_args, and a number (0.0, 1.0) for the confidence level at which to evaluate the reference set. When used with testing_generator_args it outputs a tuple of two arrays, (lq, uq), which contain the lower and upper quantiles of the reference set.
  • training_generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at, used to create the training set of data. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via training_generator_args.
  • testing_generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at, used to create the test data set. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via testing_generator_args.
  • t: a vector of time points to compute predictions and evaluate coverage at, which are the same as the time points used to create the test data set.
  • model: a LikelihoodModel containing model information.
  • N: a positive number of coverage simulations.
  • num_points: positive number of points to find on the boundary at the specified confidence level using a single method. Or a vector of positive numbers of boundary points to find for each method in method (if method is a vector of AbstractBivariateMethod). Set to at least 3 within the function as some methods need at least three points to work.
  • θtrue: a vector of true parameters values of the model for simulating data with.
  • θcombinations: a vector of pairs of parameters to profile, as a vector of vectors of model parameter indexes.
  • θinitialguess: a vector containing the initial guess for the values of each parameter. Used to find the MLE point in each iteration of the simulation. Default is θtrue.

Keyword Arguments

  • num_internal_points: an integer number of points to optionally evaluate within the a polygon hull approximation of a bivariate boundary for each interest parameter pair using sample_bivariate_internal_points!. Default is 0.
  • hullmethod: method of type AbstractBivariateHullMethod used to create a 2D polygon hull that approximates the bivariate boundary from a set of boundary points and internal points (method dependent). For available methods see bivariate_hull_methods(). Default is MPPHullMethod() (MPPHullMethod).
  • sample_type: either a UniformRandomSamples or LatinHypercubeSamples struct for how to sample internal points from the polygon hull. UniformRandomSamples are homogeneously sampled from the polygon and LatinHypercubeSamples use the intersection of a heuristically optimised Latin Hypercube sampling plan with the polygon. Default is LatinHypercubeSamples() (LatinHypercubeSamples).
  • confidence_level: a number ∈ (0.0, 1.0) for the confidence level on which to find the profile_type boundary. Default is 0.95 (95%).
  • dof: an integer ∈ [2, model.core.num_pars] for the degrees of freedom used to define the asymptotic threshold (LikelihoodBasedProfileWiseAnalysis.get_target_loglikelihood) which defines the boundary of the bivariate profile. For bivariate profiles that are considered individually, it should be set to 2. For profiles that are considered simultaneously, it should be set to model.core.num_pars. Default is 2. Setting it to model.core.num_pars should be reasonable when making predictions for well-identified models with <10 parameters. Note: values other than 2 and model.core.num_pars may not have a clear statistical interpretation.
  • region: a Real number ∈ [0, 1] specifying the proportion of the density of the error model from which to evaluate the highest density region. Default is 0.95.
  • profile_type: whether to use the true log-likelihood function or an ellipse approximation of the log-likelihood function centred at the MLE (with optional use of parameter bounds). Available profile types are LogLikelihood, EllipseApprox and EllipseApproxAnalytical. Default is LogLikelihood() (LogLikelihood).
  • θlb_nuisance: a vector of lower bounds on nuisance parameters, require θlb_nuisance .≤ model.core.θmle. Default is model.core.θlb.
  • θub_nuisance: a vector of upper bounds on nuisance parameters, require θub_nuisance .≥ model.core.θmle. Default is model.core.θub.
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • simultaneous_alternate_proportion: a number ∈ (0.0, 1.0) for the alternate 'simultaneous' coverage statistic, testing whether at least this proportion of prediction realisations are covered. Recommended to be equal to region. Default is 0.95 (95%).
  • optimizationsettings: a OptimizationSettings containing the optimisation settings used to find optimal values of nuisance parameters for a given interest parameter value. Default is missing (will use default_OptimizationSettings() (see default_OptimizationSettings).
  • show_progress: boolean variable specifying whether to display progress bars on the percentage of simulation iterations completed and estimated time of completion. Default is model.show_progress.
  • distributed_over_parameters: boolean variable specifying whether to distribute the workload of the simulation across simulation iterations (false) or across the individual confidence interval calculations within each iteration (true). Default is false.
  • manual_GC_calls: boolean variable specifying whether to manually call garbage collection, GC.gc(), after every 10 iterations (distributed_over_parameters=true) or after every iteration on that worker (distributed_over_parameters=false). May be important to correctly free up memory for coverage simulations that use distributed or threaded workloads for Julia versions prior to v1.10.0. Default is false.

Details

This simulated coverage check is used to estimate the performance of propagating bivariate parameter confidence intervals into prediction realisation space. The simulation uses Distributed.jl to parallelise the workload.

The uncertainty in estimates of the prediction realisation coverage under the simulated model will decrease as the number of simulations, N, is increased. Confidence intervals for the coverage estimate are provided to quantify this uncertainty. The confidence interval for the estimated coverage is a Clopper-Pearson interval on a binomial test generated using HypothesisTests.jl.

Recommended setting for distributed_over_parameters
  • If the number of processes available to use is significantly greater than the number of model parameters or only a few pairs of model parameters are being checked for coverage, false is recommended.
  • If system memory or model size in system memory is a concern, or the number of processes available is similar or less than the number of pairs of model parameters being checked, true will likely be more appropriate.
  • When set to false, a separate LikelihoodModel struct will be used by each process, as opposed to only one when set to true, which could cause a memory issue for larger models.
May not work correctly on bimodal confidence boundaries

The current implementation constructs a single polygon with minimum polygon perimeter from the set of boundary points as the confidence boundary. If there are multiple distinct boundaries represented, then there will be edges connecting the distinct boundaries which the true parameter might be inside (but not inside either of the distinct boundaries).

source