Parameter Confidence Intervals

Usage on several models can be seen in the examples section, such as for the Logistic Model.

LikelihoodBasedProfileWiseAnalysis.check_univariate_parameter_coverageFunction
check_univariate_parameter_coverage(data_generator::Function, 
    generator_args::Union{Tuple, NamedTuple},
    model::LikelihoodModel, 
    N::Int, 
    θtrue::AbstractVector{<:Real}, 
    θs::AbstractVector{<:Int64},
    θinitialguess::AbstractVector{<:Real}=θtrue; 
    <keyword arguments>)

Performs a simulation to estimate the coverage of univariate confidence intervals for parameters in θs given a model by:

  1. Repeatedly drawing new observed data using data_generator for fixed true parameter values, θtrue.
  2. Fitting the model and univariate confidence intervals.
  3. Checking whether the confidence interval for each of the parameters of interest contain the true parameter value in θtrue. The estimated coverage is returned with a default 95% confidence interval within a DataFrame.

Arguments

  • data_generator: a function with two arguments which generates data for fixed time points and true model parameters corresponding to the log-likelihood function contained in model. The two arguments must be the vector of true model parameters, θtrue, and a Tuple or NamedTuple, generator_args. Outputs a data Tuple or NamedTuple that corresponds to the log-likelihood function contained in model.
  • generator_args: a Tuple or NamedTuple containing any additional information required by both the log-likelihood function and data_generator, such as the time points to be evaluated at. If evaluating the log-likelihood function requires more than just the simulated data, arguments for the data output of data_generator should be passed in via generator_args.
  • model: a LikelihoodModel containing model information, saved profiles and predictions.
  • N: a positive number of coverage simulations.
  • θtrue: a vector of true parameters values of the model for simulating data with.
  • θs: a vector of parameters to profile, as a vector of model parameter indexes.
  • θinitialguess: a vector containing the initial guess for the values of each parameter. Used to find the MLE point in each iteration of the simulation. Default is θtrue.

Keyword Arguments

  • confidence_level: a number ∈ (0.0, 1.0) for the confidence level to evaluate the confidence interval coverage at. Default is 0.95 (95%).
  • profile_type: whether to use the true log-likelihood function or an ellipse approximation of the log-likelihood function centred at the MLE (with optional use of parameter bounds). Available profile types are LogLikelihood, EllipseApprox and EllipseApproxAnalytical. Default is LogLikelihood() (LogLikelihood).
  • θlb_nuisance: a vector of lower bounds on nuisance parameters, require θlb_nuisance .≤ model.core.θmle. Default is model.core.θlb.
  • θub_nuisance: a vector of upper bounds on nuisance parameters, require θub_nuisance .≥ model.core.θmle. Default is model.core.θub.
  • coverage_estimate_confidence_level: a number ∈ (0.0, 1.0) for the level of a confidence interval of the estimated coverage. Default is 0.95 (95%).
  • optimizationsettings: a OptimizationSettings containing the optimisation settings used to find optimal values of nuisance parameters for a given interest parameter value. Default is missing (will use default_OptimizationSettings() (see default_OptimizationSettings).
  • show_progress: boolean variable specifying whether to display progress bars on the percentage of simulation iterations completed and estimated time of completion. Default is model.show_progress.
  • distributed_over_parameters: boolean variable specifying whether to distribute the workload of the simulation across simulation iterations (false) or across the individual confidence interval calculations within each iteration (true). Default is false.

Details

This simulated coverage check is used to estimate the performance of parameter confidence intervals. The simulation uses Distributed.jl to parallelise the workload.

For a 95% confidence interval of a interest parameter θi it is expected that under repeated experiments from an underlying true model (data generation) which are used to construct a confidence interval for θi using the method used in univariate_confidenceintervals!, 95% of the intervals constructed would contain the true value for θi. In the simulation where the values of the true parameters, θtrue, are known, this is equivalent to whether the confidence interval for θi contains the value θtrue[θi].

The uncertainty in estimates of the coverage under the simulated model will decrease as the number of simulations, N, is increased. Confidence intervals for the coverage estimate are provided to quantify this uncertainty. The confidence interval for the estimated coverage is a Clopper-Pearson interval on a binomial test generated using HypothesisTests.jl.

Simultaneous confidence intervals

Calculating the coverage of simultaneous confidence intervals is not currently supported (i.e. for dof ≠ 1)

Recommended setting for distributed_over_parameters
  • If the number of processes available to use is significantly greater than the number of model parameters or only a few model parameters are being checked for coverage, false is recommended.
  • If system memory or model size in system memory is a concern, or the number of processes available is similar or less than the number of model parameters being checked, true will likely be more appropriate.
  • When set to false, a separate LikelihoodModel struct will be used by each process, as opposed to only one when set to true, which could cause a memory issue for larger models.
Not intended for use on bimodal univariate profile likelihoods

The current implementation only considers two extremes of the log-likelihood and whether the truth is between these two points. If the profile likelihood function is bimodal, it's possible the method has only found one set of correct confidence intervals (estimated coverage will be correct, but less than expected) or found one extrema on distinct sets (estimated coverage may be incorrect and will either be larger than expected or much lower than expected).

source