The Statistical Analysis of Multivariate Time Data A Marginal Modeling Approach Prentice Instant Download
The Statistical Analysis of Multivariate Time Data A Marginal Modeling Approach Prentice Instant Download
https://ebookbell.com/product/the-statistical-analysis-of-
multivariate-time-data-a-marginal-modeling-approach-
prentice-10510216
https://ebookbell.com/product/the-statistical-analysis-of-
multivariate-failure-time-data-a-marginal-modeling-approach-ross-l-
prentice-10441688
https://ebookbell.com/product/the-statistical-analysis-of-
multivariate-failure-time-data-prentice-rl-12083448
https://ebookbell.com/product/the-statistical-analysis-of-
quasiexperiments-reprint-2019-christopher-h-achen-51822324
The Statistical Analysis Of Doubly Truncated Data With Applications In
R 1st Edition Jacobo De Ualvarez
https://ebookbell.com/product/the-statistical-analysis-of-doubly-
truncated-data-with-applications-in-r-1st-edition-jacobo-de-
ualvarez-36306410
https://ebookbell.com/product/the-statistical-analysis-of-
intervalcensored-failure-time-data-1st-edition-jianguo-sun-
auth-4271968
https://ebookbell.com/product/the-statistical-analysis-of-functional-
mri-data-1st-edition-nicole-lazar-auth-4272032
https://ebookbell.com/product/the-statistical-analysis-of-failure-
time-data-second-edition-2nd-edition-john-d-kalbfleisch-4312398
https://ebookbell.com/product/the-statistical-analysis-of-time-series-
theodore-w-anderson-6847034
The Statistical Analysis of
Multivariate Failure Time
Data
A Marginal Modeling Approach
MONOGRAPHS ON STATISTICS AND APPLIED PROBABILITY
Ross L. Prentice
Shanshan Zhao
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Preface xi
vii
viii
3.2.3 Simulation evaluation 57
3.2.4 Asymptotic distributional results 59
3.3 Maximum Likelihood and Estimating Equation Approaches 60
3.4 Nonparametric Assessment of Dependency 62
3.4.1 Cross ratio and concordance function estimators 62
3.4.2 Australian twin study illustration 63
3.4.3 Simulation evaluation 65
3.5 Additional Estimators and Estimation Perspectives 65
3.5.1 Additional bivariate survivor function estimators 65
3.5.2 Estimation perspectives 67
Bibliography 201
Over the past several decades statistical methods for the analysis of univariate failure
time data have become well established, with Kaplan–Meier (KM) curves, censored
data rank tests, and especially Cox regression, as core analytic tools. These tools arise
substantially from nonparametric and semiparametric likelihood considerations. The
likelihood developments have mostly started with a focus on the survivor function
F, defined by F(t) = P(T > t), for a random failure time variate T > 0; or, in con-
junction with a corresponding covariate p-vector z = (z1 , z2 , . . . , z p ), on the survivor
function given z, defined by F(t; z) = P(T > t; z).
Because of the usual presence of right censoring, the hazard function λ , de-
fined with absolutely continuous failure times by λ (t : z) = f (t; z)/F(t; z), where
f (t; z) = −dF(t; z)/dt is the failure time density (given z) at follow-up time t, plays
a key role in the likelihood-based developments. In fact, a focus on hazard rates
over time allows considerable modeling flexibility compared to that for F given z,
since hazard rates can be allowed to depend, not just on a set of “baseline” covari-
ates, but also on covariate changes and on the occurrence of events of various types
arising during the follow-up of study subjects. As a result, hazard, and more specif-
ically intensity, rate regression analyses are sometimes referred to as event history
analyses. The Cox (1972) model stands as the major tool for univariate failure time
regression analysis. Under this model λ (t; z) is written as a product of a baseline
rate λ {t; z = (0, . . . , 0)}, hereafter λ (t; 0), and a hazard ratio factor that is usually
expressed in exponential form as exp(zβ ), with column vector β = (β1 , . . . , β p )0 .
The hazard ratio parameter β is often the principal target of estimation in univari-
ate failure time analysis, and efficient and reliable procedures are available for its
estimation, via Cox’s (1975) partial likelihood method. There are other classes of
hazard rate regression models, including accelerated failure time models that allow
covariates to affect the “speed” at which an individual traverses the failure time axis,
and transformation models that combine these and other classes of regression mod-
els, but the vast majority of applications in various disciplines and settings focus on
the Cox model. This model is quite comprehensive when the time-dependent covari-
ate feature is fully exercised. While it is clearly useful to have more than a single
framework for failure time regression analysis, other classes of semiparametric re-
gression models typically do not enjoy the same utility of parameter interpretation,
ease of modeling, and computational efficiency as does the Cox model. There are
a number of books that provide excellent accounts of many of these univariate fail-
ure time analysis developments, including early books by Kalbfleisch and Prentice
(1980, 2002), Breslow and Day (1980, 1987), Lawless (1983, 2002), Cox and Oakes
xi
xii PREFACE
(1984), Fleming and Harrington (1991), and Andersen, Borgan, Gill, and Keiding
(1993), among several others. These books vary in the extent of their coverage and
in their degree of technical formality. Some of them provide a rather thorough ac-
count of asymptotic distribution theory via counting process and martingale meth-
ods. For example, Andersen et al. (1993) present comprehensive distribution theory
using martingale and counting process convergence theory for a fairly broad range
of topics, from an event history analysis perspective.
The methods just alluded to can be obtained alternatively by plugging empiri-
cal hazard rate estimators into a representation for F given Z, where Z is a possibly
evolving covariate history, or by using a mean parameter estimating equation ap-
proach (e.g., Liang & Zeger, 1986) for parameter estimation. The utility of each of
these approaches can be considered for the analysis of multivariate failure time data.
A mature theory for the analysis of multivariate failure time data has been slow
to develop. Most of the books just mentioned include some account of analysis
methods for select types of multivariate failure time data; for example, Kalbfleisch
and Prentice (2002) include chapters on correlated failure time methods and on
recurrent event methods. However, consensus is still lacking on such basic top-
ics as the preferred means of estimating the joint survivor function F, defined by
F(t1 ,t2 ) = P(T1 > t1 , T2 > t2 ), for a pair of failure time variates (T1 , T2 ), even in
the homogeneous (non-regression) situation. Also, a compelling semiparametric ap-
proach to the conceptualization and modeling of dependency among failure time
variates, that is suited to right censored data, has yet to be established.
In more recent times, specialized books have been devoted to multivariate fail-
ure time data analysis. Hougaard (2000) provides a thorough account of multivariate
failure time methods proposed up to that point in time, with considerable emphasis
on the use of frailty models to induce dependencies, and on the use of transformation
models for marginal distributions in conjunction with copula models for dependency.
Crowder (2012) provides an update on these same approaches along with a detailed
account of parametric methods for multivariate failure time data analysis. Cook and
Lawless (2007) provide an excellent account of the more specialized topic of recur-
rent event data analysis methods.
The frailty approach has also been the subject of specialized books (Duchateau &
Janssen, 2010; Wienke, 2011) on data modeling and analysis. Frailty models are usu-
ally formulated by multiplying univariate hazard rate models by a positive random ef-
fect, or frailty, variate that is shared by failure time outcomes that may be dependent.
These methods are well suited to assessing comparative hazard rates for individuals
in predefined clusters, which could be an important application goal. Frailty models,
on the other hand are not so well suited to estimating population-averaged regres-
sion effects. Regression coefficients in frailty models have an interpretation that is
conditional on the random effect for the correlated outcomes, while corresponding
marginal regression associations typically have a complex form. Additionally, frailty
factors may need to be allowed to vary over the follow-up period of the study to
span a broad range of dependencies among failure times, adding further complexity.
Nevertheless, much useful information can often be extracted by the careful applica-
tion of frailty models. Therneau and Grambsch (2000), provide a valuable account of
xiii
Cox model extensions using frailty models with emphasis on model testing and the
adaptation of existing software.
The copula approach to multivariate failure time data analysis has also received
considerable attention. This approach embraces univariate survivor function mod-
els, including those incorporating time-dependent covariates, for marginal survivor
functions, and brings these together using a parametric copula model to yield a joint
survivor function for a multivariate failure time variate given covariates. As such
marginal regression parameters retain the usual population averaged interpretation
that attends corresponding univariate data analyses. The models are typically applied
in a two-stage fashion (e.g., Shih & Louis, 1995) so that dependency assumptions im-
posed through the choice of copula neither bias or enhance the efficiency of marginal
hazard rate regression analyses compared to univariate failure time data analyses.
With a choice of copula function that is in good agreement with the data, this ana-
lytic approach can lead to useful summary measures of dependence between depen-
dent failure times, as is a major goal in some application settings, such as family
studies in genetic epidemiology. As typically applied, however, copula models tend
to impose strong assumptions on the nature of dependencies among failure times,
and are not convenient for allowing such dependencies to depend on covariates that
may be evolving over the study follow-up period(s).
Counting process intensity modeling provides an important approach to the re-
gression analysis of multivariate failure time data, with Andersen and Gill’s (1982)
development of distribution theory for semiparametric intensity models of multi-
plicative (Cox model) form standing out as a major development. When individual
study subjects have multiple failure times, of the same or different types, on the
same failure time axis, intensity modeling can lead to valuable insights into how fail-
ure rates at a given follow-up time depend on the preceding covariates and failure
histories for the individual. Often, however, primary interest resides in the depen-
dence of hazard rates on preceding covariate, but not preceding “counting process”
history, requiring a different regression methodology. Also the counting process in-
tensity modeling is not suited to multiple failure types for an individual with failure
types each having different potential censoring times, as may occur if failure types
have their own outcome ascertainment processes. Furthermore, the nice martingale
convergence results that attend multivariate failures on a single failure time axis have
not been extended to multiple time axis (see Andersen et al., 1993 for discussion).
Also see Aalen, Borgan, and Gjessing (2010) for an account of the various modeling
approaches mentioned above, with a major emphasis on counting process intensity
modeling, as well as the comprehensive recent book by Cook and Lawless (2018) on
multistate models for life history data analysis.
A marginal modeling approach can substantially fill the gap left by the previ-
ously mentioned modeling approaches. This approach, championed by Danyu Lin,
L. J. Wei and colleagues considers semiparametric models for the single failure haz-
ard rates for a multivariate failure time response (e.g., Spiekerman & Lin, 1998;
Lin, Wei, Yang, & Ying, 2000). Estimating equations have been developed for haz-
ard rate parameters, and empirical process convergence results lead to corresponding
distribution theory. The present authors have recently proposed an extension of these
xiv PREFACE
methods using semiparametric models for both single and dual outcome hazard rates
for correlated failure time outcomes. This extension leads to novel survivor function
estimators for pairs of failure time outcomes that may be accompanied by evolving
covariates, and to semiparametric estimates of dependency between pairs of failure
time variates. Importantly, these dual outcome hazard rate methods can provide ad-
ditional insight into the effects of the treatments or exposures under study, beyond
single outcome analyses. A primary aim of the presentation here is to provide a sum-
mary of these marginal methods. The frailty, copula and counting process intensity
approaches have been well covered in other venues, but we provide sufficient cov-
erage here to offer the reader an introduction to these methods, and to provide some
comparisons and contrasts with the marginal modeling approach.
While it is not feasible to provide a thorough account of asymptotic distribu-
tion theory for the estimators presented, we sketch the main elements of such theory
development, based mainly on empirical process theory, and make some use also
of martingale convergence results. While our account does not provide a complete
and rigorous development, we hope to provide enough detail to provide insight into
associated distributional results, and to provide a useful linkage to sources that pro-
vide such rigor and completeness. Also, our presentation emphasizes the modeling
of observable quantities, with little attention given, for example, to the counterfactual
approach to causal inference with observational data, the latent failure time approach
to competing risks, or even to random effects models generally. These approaches
each involve assumptions about unobservable quantities. As such they may be best
thought of as adjuncts to observable data modeling methods that may be consid-
ered to address further data analytic questions, but typically do so under additional
untestable model assumptions.
The present effort is essentially a research monograph. We do not attempt to
present a compendium of the rather voluminous set of methods that have been pro-
posed for some aspect of multivariate failure time data analysis, and we apologize to
other authors if their methodologic contributions are not discussed comprehensively,
or even at all. Furthermore, the data analysis methods emphasized require moderately
large numbers of pairs of outcome events during cohort follow-up, precluding useful
illustrations using classroom-type data sets. Partly for this reason we have chosen
to illustrate these multivariate methods using data from large cohorts in the national
Women’s Health Initiative in which we are engaged. The methods emphasized may
require novel software. We provide a description of, and link to, the software used in
most of our illustrations in appendix materials.
This book is intended primarily for statistical and biostatistical researchers as a
source of useful and interpretable data analysis methods, and as a basis for ideas
for further methodology development. The book could also serve as a text for a
graduate-level course in statistics or biostatistics among students having a reason-
able command of calculus and probability theory. Prior exposure to univariate failure
time methods for such students would also be helpful, though a presentation of the
“core” univariate methods mentioned above is included here. A set of exercises is in-
cluded with each chapter to enhance the usefulness of the presentation as a graduate
text. Many of the examples used to illustrate the methods described are drawn from
xv
epidemiologic cohort studies or from randomized controlled clinical trials, and this
book may also serve as a useful reference source for quantitative epidemiologists,
and for biomedical scientists more generally, who are working with data of the type
discussed here. Statistical and biostatistical practitioners can derive utility from our
presentation, since we attempt to describe the analytic procedures and underlying
concepts in terms that are mostly not highly technical.
The authors would like to thank Aaron Aragaki for help with Women’s Health
Initiative illustrations, including creation of the platter plots shown on the front cover;
and to thank Noelle Noble for tremendous help with manuscript preparation.
1
2 INTRODUCTION AND CHARACTERIZATION
Failure time methods have application in many subject matter and research ar-
eas, including biomedical, behavioral, physical, and engineering sciences, and vari-
ous industrial settings. Most of the illustrations in this book will be drawn from the
biomedical research area in which the authors are engaged.
A major reason for specialized statistical methods for failure time data analysis is
the usual presence of right censoring since some, or perhaps most, study subjects will
not have experienced the event or events of interest at the cutoff date for data analysis,
and some may have discontinued participation in study follow-up procedures used to
ascertain failures prior to such cutoff date. The usual assumption about censoring
is that of independence, which requires the set of subjects who are uncensored and
continuing to be monitored for failures at any follow-up time to be representative of
individuals at risk for failure in the study population in terms of their failure rates,
conditional on a specified set of covariates.
Most failure time methods have focused on a single failure time variate T > 0.
The distribution of T can be characterized by its survivor function F, where F(t) =
P(T > t) and P denotes probability, which can be thought of in terms of frequencies
in the underlying, typically conceptual, study population. To accommodate contin-
uous, discrete and mixed failure times, F is usually assumed to be continuous from
the right with left-hand limits, whence the probability function corresponding to F is
defined at time t by −F(dt) where
lim [{F(t) − F(t + ∆t)}/∆t]dt if t is a continuity point of F
−F(dt) = ∆t↓0
F(∆t) if t is a mass point of F,
Λ(dt) = −F(dt)/F(t − ).
Λ(dt) is referred to as the hazard rate at follow-up time t. It is the failure probability,
or failure probability element if t is a continuity point of F, at time t conditional
on lack of failure prior to t. The absolutely continuous, discrete and mixed special
cases can be combined using Stieltjes integrals so that, for example, the probabil-
ity distribution function F and the (cumulative) hazard function at time t are given,
respectively, by
Z t Z t
F(t) = 1 − F(t) = 1 + F(ds) and Λ(t) = Λ(ds).
0 0
if T is absolutely continuous. Product integrals are quite useful for likelihood con-
struction, and for data analysis more generally, with failure time data. In part this
utility arises because product integration allows discrete and continuous components
of the distribution of F to be included in a single notation. Appendix A provides
some background on the definition and properties of Stieltjes integrals and product
integration.
One could speculate that failure time data would be analyzed mostly using linear
models, or generalized linear models, or other standard statistical procedures, were
it not for the presence of right censoring. This would be an oversimplification, how-
ever, since a focus on hazard functions is helpful for statistical modeling and leads
to useful parameter interpretation regardless of the presence of censoring. Impor-
tantly, hazard rate modeling allows inference to be made on failure rates that evolve
over the cohort follow-up period in a manner that may depend on covariates that are
also changing over time, or on other types of events that are experienced by indi-
viduals during the study follow-up period. Also, in some contexts the inclusion of
time-varying covariates may be necessary for an independent censoring assumption
to be plausible. The simplest type of covariate is a fixed vector z = (z1 , . . . , zq ) as-
certained at time zero (e.g., at or before the date of enrollment in a cohort study) for
each study subject. Such a “baseline” covariate q-vector can include various aspects
of the individual’s preceding history that may be of interest in relation to subsequent
survival probabilities or hazard rates. In an observational epidemiology context, for
example, z may include exposures or characteristics that may be associated with the
risk of occurrence of a study disease. In an industrial accelerated lifetime product
testing situation z may include the temperature, or stress level, applied to a prod-
uct to produce early breakdowns of various types. In a randomized controlled trial
(RCT) z would typically include indicator variables for treatment assignments, pos-
sibly along with product terms between these indicator variables and other factors
that may influence the magnitude of any treatment effect on a failure time outcome.
In general there may be a covariate process, Z, that evolves over the study follow-
up period. For notation, one can denote by z(s) = {z1 (s), z2 (s), . . .} the covariate
value at follow-up time s ≥ 0 and write Z(t) = z(0) ∨ {z(s); 0 < s < t} to denote
the collection of an individual’s covariate data up to time t, including baseline co-
variates that can include exposures and characteristics that pertain to the time period
prior to study enrollment. Here Z(t) is usually defined to involve sample paths that
are continuous from the left; that is Z(t) = lim Z(s) when t > 0, so that the history
s↑t
Z(t) does not include covariate “process” jumps occurring at time t. It is often useful
to specify a statistical model for Λ given Z. In fact many, perhaps most, univariate
failure time data analysis applications involve the modeling and estimation of hazard
rates of this type. This statistical approach, and the use of the Cox (1972) regres-
sion model specifically with its nonparametric “baseline” hazard function, will be
4 INTRODUCTION AND CHARACTERIZATION
described in Chapter 2. There are many variants of the simple cohort study design
in which data are ascertained on a random sample of a study population conditional
on Z(0); for example, study subjects may enter the cohort late (t > 0), or be ascer-
tained with varying selection probabilities according to their covariate histories, or
according to their failure experience during study follow-up; there may be variations
in how and when covariates are measured, and in the reliability of those measure-
ments. Also subjects may cease continued study participation, censoring their failure
times, for complex reasons, possibly including their experiences during the study
follow-up period. These and other study features necessitate a corresponding rich
class of statistical models and methods for data analysis. Hazard rate models provide
a major unifying concept, and related methods have come to be known as event his-
tory analysis methods. The usual presence of right censoring has implications for the
types of statistical models that can be reliably applied, even if the covariates are time-
independent. For example linear regression models and maximum likelihood-based
regression parameter estimators have valuable robustness properties in that consis-
tent estimators of regression parameters typically arise even if the parametric form
for the linear model error variable is misspecified, in the absence of censoring or
other forms of missing or incomplete data. Unfortunately, this robustness is not re-
tained in the presence of right censoring, motivating the application of nonparametric
or semiparametric models, and estimation procedures that will have good behavior
regardless of the value of a nonparametric model component. This topic, too, will be
elaborated in Chapter 2 for univariate failure time data.
defining a Volterra integral equation for F, in terms of F(·, 0), F(0, ·) and Λ11 , that
has a unique solution. That solution is in a rather inconvenient Péano series form.
Nevertheless, the fact that F is determined by its marginal hazard rates, and its double
failure hazard rate provides useful background for bivariate failure time data mod-
eling. See Appendix A for the definition of two-dimensional Stieltjes integrals. The
Péano series solution to (1.4) is given in §3.2.1.
It is natural to consider the joint probability distribution for (T1 , T2 ) as comprised
of its marginal survivor functions, or equivalently its marginal hazard functions, and
a component that measures dependency between T1 and T2 given the marginal dis-
tributions. In fact, the copula approach to bivariate distribution modeling uses this
conceptualization through specification of a parametric model that brings together
the marginal survivor functions for T1 and T2 to form their joint survivor function.
Dependency between T1 and T2 at follow-up time (t1 ,t2 ) can also be character-
ized by comparing the double failure hazard rate Λ11 (dt1 , dt2 ) to the product of cor-
responding single failure hazard rates at (t1 ,t2 ). The single failure hazard rate func-
tions Λ10 and Λ01 are defined respectively by Λ10 (dt1 ,t2− ) = −F(dt1 ,t2− )/F(t1− ,t2− )
and Λ01 (t1− , dt2 ) = −F(t1− , dt2 )/F(t1− ,t2− ) for t1 ≥ 0 and t2 ≥ 0; and by Λ10 (t1 ,t2 ) =
R t1 R t2
0 Λ10 (ds1 ,t2 ) and Λ01 (t1 ,t2 ) = 0 Λ01 (t1 , ds2 ). In particular one can compare
− −
Λ11 (dt1 , dt2 ) to Λ10 (dt1 ,t2 )Λ01 (t1 , dt2 ) at any t1 > 0, t2 > 0 through the ratio
α(t1 ,t2 ) = Λ11 (dt1 , dt2 )/{Λ10 (dt1 ,t2− )Λ01 (t1− , dt2 )}, (1.5)
which expresses double failure rate departure from local independence on a rela-
tive scale. The function α is referred to as the cross ratio function, terminology that
F(dt1 ,dt2 )F(t1− ,t2− )
reflects the expression α(t1 ,t2 ) = F(dt1 ,t2− )F(t1− ,dt2 )
. It provides a possible means of
characterizing dependency between T1 and T2 in that its set of possible values is
essentially unrestricted by the corresponding marginal hazard rates Λ10 (dt1 , 0) and
Λ01 (0, dt2 ). In fact α(t1 ,t2 ) may take any value in [0, ∞) for absolutely continuous
failure times.
It turns out that the bivariate survivor function F is completely determined also
6 INTRODUCTION AND CHARACTERIZATION
by its marginal hazard rate Λ10 (·, 0) and Λ01 (0, ·) and its cross ratio functions. The
joint modeling and estimation of marginal hazard rates and cross ratios, however, is
complicated by the lack of a closed form expression for F in terms of these distribu-
tional components.
There is, however, a closed form expression for the survivor function in terms of
its single failure hazard rates more generally and its cross ratio function. Specifically
F is given, in product integral notation (see Appendix A), by
t1 t2
F(t1 ,t2 ) = ∏{1 − Λ10 (ds1 , 0)} ∏{1 − Λ01 (0, ds2 )}
0 0
t1
Λ11 (ds1 , ds2 ) − Λ10 (ds1 , s−
t2 −
2 )Λ01 (s1 , ds2 )
∏ ∏ 1 + {1 − Λ10 (∆s1 , s− )}{1 − Λ01 (s− , ∆s2 )}
0 0 2 1
t1 t2
= ∏{1 − Λ10 (ds1 , 0)} ∏{1 − Λ01 (0, ds2 )}
0 0
t1 t2
{α(s1 , s2 ) − 1}Λ10 (ds1 , s− −
2 )Λ01 (s1 , ds2 )
∏∏ 1 + . (1.6)
0 0 {1 − Λ10 (∆s1 , s− −
2 )}{1 − Λ01 (s1 , ∆s2 )}
from which one sees that Λ10 (dt1 ,t2 ) is determined by the marginal hazard rate
Λ10 (dt1 , 0), the cross ratios α(t1 , s2 ) for 0 < s2 ≤ t2 , the single failure hazard
rates Λ10 (dt1 , s2 ) for 0 < s2 < t2 , and single failure hazard rates Λ01 (t1− , ds2 ) for
0 < s2 ≤ t2 . Note that the right side of this expression does not involve single
failure hazard rates Λ10 (ds1 , s2 ) or Λ01 (s1 , ds2 ) at (s1 , s2 ) = (t1 ,t2 ). This, and a
corresponding expression for Λ01 (t1 , dt2 ), support an inductive proof that F is de-
termined by its marginal hazard rate and cross ratio functions for discrete fail-
ure times (T1 , T2 ). The induction hypothesis specifies this to be true for all points
{(s1 , s2 ); 0 ≤ s1 ≤ t1 , 0 ≤ s2 ≤ t2 , (s1 , s2 ) 6= (t1 ,t2 )}. The single failure hazard rate
expressions at (t1 ,t2 ) then show the hypothesis to hold also at (t1 ,t2 ). Since the in-
duction hypothesis holds trivially along the coordinate axes, it follows that it holds
also throughout the set of grid points (t1 ,t2 ), where P(T1 = t1 ) > 0 and P(T2 = t2 ) > 0,
and hence for the discrete failure time survivor function as a whole.
If the distribution of (T1 , T2 ) includes continuity points then the T1 and T2 axes
can each be partitioned with F, by definition of the product integrals in (1.6), given
by the limit of discrete distributions formed by these partitions as the mesh of the grid
decreases to zero. Each such approximating discrete distribution can be characterized
BIVARIATE FAILURE TIME DATA AND DISTRIBUTIONS 7
by its marginal hazard rate and cross ratio functions, and these functions converge to
those from F as the partition mesh becomes small. It follows that F is uniquely de-
termined generally by its marginal hazard rate and cross ratio functions. In fact, one
could regard marginal hazard rates and cross ratios as key building blocks for bivari-
ate survival data modeling. Note that if F is absolutely continuous (1.6) simplifies
to
Zt Z t2
1
F(t1 ,t2 ) = exp − Λ10 (ds1 , 0) − Λ01 (0, ds2 )+
0 0
Z t Z t2
{Λ11 (ds1 , ds2 ) − Λ10 (ds1 , s2 )Λ01 (s1 , ds2 )}
0 0
Zt Z t2
1
= exp − Λ10 (ds1 , 0) − Λ01 (0, ds2 )+
0 0
Z t1 Z t2
{α(s1 , s2 ) − 1}Λ10 (ds1 , s2 )Λ01 (s1 , ds2 ) .
0 0
The survivor function F can be characterized also in terms of its marginal hazard
rate functions Λ10 (·, 0) and Λ01 (0, ·) and its “covariance rate” function Ω11 , defined
by Ω11 (t1 ,t2 ) = 0t1 0t2 Ω11 (ds1 , ds2 ) where
R R
This characterization, from (1.6), expresses F(t1 ,t2 ) as a product of its marginal sur-
vival probabilities F(t1 , 0) and F(0,t2 ) and a factor reflecting dependency between
T1 and T2 over (0,t1 ] × (0,t2 ]. With absolutely continuous failure times the denomi-
nator terms in Ω11 equal one, and Ω11 (dt1 , dt2 ) is simply the difference between the
double failure hazard element at (t1 ,t2 ) and the “local independence” product of the
corresponding single failure hazard elements.
Often with bivariate failure time data, primary interest will focus on marginal
hazard rates and their dependence on covariates. The reader might logically ask, why
not simply apply the well-established univariate failure time methods that were pre-
viously mentioned for inference on marginal hazard rates, while bringing in a com-
plementary dependency function only if there is additional interest in the nature of
any dependency between T1 and T2 . In fact, much of the available literature on copula
models uses this type of two-stage modeling with a parametric “copula” model for F
given its marginal hazard rates. This same approach will be considered in Chapter 4,
but with semiparametric and parametric regression models for marginal hazard rates
and for cross ratios, respectively.
A simple, but important, special case of a bivariate survivor function is provided
by the Clayton–Oakes model
F(t1 ,t2 ) = {F(t1 , 0)−θ + F(0,t2 )−θ − 1}−1/θ ∨ 0, for θ ≥ −1. (1.7)
This joint survivor function (Exercise 1.1) has marginal survivor functions
given by F(t1 , 0) and F(0,t2 ) and is an example of a copula model, wherein the
8 INTRODUCTION AND CHARACTERIZATION
marginal survival probabilities are brought together through a copula function C,
here C(u1 , u2 ) = {u−θ + u−θ 2 − 1}
−1/θ ∨ 0, to give the joint survivor function. The
Λ10 (dt1 , 0; z)= −F(dt1 , 0; z)/F(t1− , 0; z) and Λ01 (0, dt2 ; z)= −F(0, dt2 ; z)/F(0,t2− ; z).
Z(t1 ,t2 ) =
{z(s1 , s2 ); s1 = 0 if t1 = 0, s1 < t1 if t1 > 0; and s2 = 0 if t2 = 0, s2 < t2 if t2 > 0}
the covariate history up to (t1 ,t2 ). One can define single and double failure hazard
HIGHER DIMENSIONAL FAILURE TIME DATA AND DISTRIBUTIONS 9
rates at (t1 ,t2 ) given Z(t1 ,t2 ), respectively, by
Λ10 {dt1 ,t2− ; Z(t1 ,t2 )} = P{T1 ∈ [t1 ,t1 +dt1 ); T1 ≥ t1 , T2 ≥ t2 , Z(t1 ,t2 )},
Λ01 {t1− , dt2 ; Z(t1 ,t2 )} = P{T2 ∈ [t2 ,t2 +dt2 ); T1 ≥ t1 , T2 ≥ t2 , Z(t1 ,t2 )}, and
Λ11 {dt1 , dt2 ; Z(t1 ,t2 )} = P{T1 ∈ [t1 ,t1 +dt1 ), T2 ∈ [t2 ,t2 +dt2 );
T1 ≥t1 , T2 ≥t2 , Z(t1 ,t2 )},
and corresponding single and double failure hazard processes Λ10 , Λ01 and Λ11 by
Z t1
Λ10 {t1 ,t2− ; Z(t1 ,t2 )} = Λ10 {ds1 ,t2− ; Z(s1 ,t2 )},
0
Z t2
Λ01 {t1− ,t2 ; Z(t1 ,t2 )} = Λ01 {t1− , ds2 ; Z(t1 , s2 )}, and
0
Z t1 Z t2
Λ11 {t1 ,t2 ; Z(t1 ,t2 )} = Λ11 {ds1 , ds2 ; Z(s1 , s2 )}.
0 0
Ω111 (ds1 , ds2 , ds3 ) ={−∂ 3 log F(s1 , s2 , s3 )/∂ s1 ∂ s2 ∂ s3 }ds1 ds2 ds3
=Λ111 (ds1 , ds2 , ds3 ) − Ω110 (ds1 , ds2 , s3 )Λ001 (s1 , s2 , ds3 )
− Ω101 (ds1 , s2 , ds3 )Λ010 (s1 , ds2 , s3 )
− Ω011 (s1 , ds2 , ds3 )Λ100 (ds1 , s2 , s3 )
− Λ100 (ds1 , s2 , s3 )Λ010 (s1 , ds2 , s3 )Λ001 (s1 , s2 , ds3 ).
In this expression Λ111 is the triple failure hazard rate function given by
Λ111 (ds1 , ds2 , ds3 ) = [{−∂ 3 F(s1 , s2 , s3 )/∂ s1 ∂ s2 ∂ s3 }/F(s1 , s2 , s3 )]ds1 ds2 ds3 .
Note that Ω111 (ds1 , ds2 , ds3 ) contrasts the triple failure hazard rate at (s1 , s2 , s3 )
with that under local independence after allowing for dependencies at (s1 , s2 , s3 ) that
emanate from the pairwise marginal covariance rates.
In Chapter 5 we will consider modeling the trivariate survivor function, and re-
gression extensions thereof. As a specific survivor function consider
F(t1 ,t2 ,t3 ) ={F(t1 ,t2 , 0)−θ + F(t1 , 0,t3 )−θ + F(0,t2 ,t3 )−θ
− F(t1 , 0, 0)−θ − F(0,t2 , 0)−θ − F(0, 0,t3 )−θ + 1}−1/θ ∨ 0 (1.9)
F(t1 ,t2 , 0)F(t1 , 0,t3 )F(0,t2 ,t3 )/{F(t1 , 0, 0)F(0,t2 , 0)F(0, 0,t3 )}.
Also (1.9) approaches the upper Fréchet bound F(t1 ,t2 , 0) ∧ F(t1 , 0,t3 ) ∧ F(0,t2 ,t3 )
as θ → ∞, and the lower Fréchet bound
{F(t1 ,t2 , 0) + F(t1 , 0,t3 ) + F(0,t2 ,t3 ) − F(t1 , 0, 0) − F(0,t2 , 0) − F(0, 0,t3 ) + 1} ∨ 0
as θ → −1. All (t1 ,t2 ,t3 ) values away from this lower bound are continuity points
for (T1 , T2 , T3 ) for any θ ∈ [−1, ∞).
Straightforward calculations from (1.9), at continuity points, show
Ω111 (dt1 , dt2 , dt3 ) = θ {Ω110 (dt1 , dt2 ,t3 )Λ001 (t1 ,t2 , dt3 )
+ Ω101 (dt1 ,t2 , dt3 )Λ010 (t1 , dt2 ,t3 )
+ Ω011 (t1 , dt2 , dt3 )Λ100 (dt1 ,t2 ,t3 )}
− θ 2 Λ100 (dt1 ,t2 ,t3 )Λ010 (t1 , dt2 ,t3 )Λ001 (t1 ,t2 , dt3 ) (1.10)
MULTIVARIATE RESPONSE DATA: MODELING AND ANALYSIS 11
under (1.9), so that the parameter θ governs the magnitude of any trivariate depen-
dency among the three variates.
In the very special case in which each of the three pairwise marginal survivor
functions adheres to (1.7) with the same θ value as in (1.9), this trivariate survivor
function reduces to
F(t1 ,t2 ,t3 ) = {F(t1 , 0, 0)−θ + F(0,t2 , 0)−θ + F(0, 0,t3 )−θ − 2}−1/θ ∨ 0,
which some authors have considered as the trivariate generalization of (1.7). How-
ever, this survivor function may be too specialized for many applications: not only
are the pairwise cross ratios independent of their respective time arguments, but these
marginal cross ratios take the identical value (1 + θ ) for each pair of failure times.
Generalization of (1.9) to an arbitrary number of failure time variates will be given
in Chapter 6, along with estimation procedures for trivariate and higher dimensional
failure time data analysis more generally.
where the conditioning event H (t) includes not only the covariate history Z(t) prior
to time t, but also the failure history for the individual prior to time t. This latter
history is conveniently described by the counting process N, where N(dt) equals the
number of failures experienced by the individual at time t and N(t) = 0t N(ds). Re-
R
current event data frequently involve a large number of failures on individual study
subjects, and the modeling of (1.11) sometimes involves simplifying assumptions as
to how the intensity at time t depends on the preceding failure history {N(s), s < t}
for the individual, with Markov and semi-Markov assumptions commonly imposed.
Of course, there may be recurrent events for several types of failure time variates in
which case the concepts of this and the preceding sections can be combined leading
to the modeling of marginal failure rate processes (1.11), along with corresponding
cross ratio processes for example, in each case with the conditioning event including
not only the covariate, but also the preceding failure history for each event type. The
modeling and analysis of recurrent event data will be discussed in Chapter 7. Impor-
tantly, marginal modeling approaches in which one models failure rates at follow-up
time t as a function of the preceding failure, but not the preceding counting process
history, will also be considered in Chapter 7, along with generalizations to include
failures of various types on the same or different failure time axes.
Chapter 8 considers a variety of additional important topics in the modeling and
analysis of multivariate failure time data, including censoring schemes that are “de-
pendent;” cohort sampling procedures where some components of covariate histories
are assembled only for failing individuals and a subcohort of individuals who are
without failure during certain follow-up periods; data analysis procedures when co-
SOME APPLICATION SETTINGS 13
variate values are subject to measurement error or missing values, and joint models
for covariate histories and failure times, among other topics.
We end this chapter by describing a few application areas encountered in our
applied work. These settings will be used to illustrate modeling and analysis methods
in subsequent chapters.
Table 1.1 Time in days to severe (stage ≥ 2) acute graft versus host disease (A-GVHD), death,
or last contact for bone marrow transplant patients treated with cyclosporine and methotrexate
(CSP + MTX) or with MTX onlya
CSP+MTX MTX
Time LAF Age Time LAF Age Time LAF Age Time LAF Age
3∗ 0 40 324∗ 0 23 9 1 35 104∗ 1 27
8 1 21 356∗ 1 13 11 1 27 106∗ 1 19
10 1 18 378∗ 1 34 12 0 22 156∗ 1 15
12∗ 0 42 408∗ 1 27 20 1 21 218∗ 1 26
16 0 23 411∗ 1 5 20 1 30 230∗ 0 11
17 0 21 420∗ 1 23 22 0 7 231∗ 1 14
22 1 13 449∗ 1 37 25 1 36 316∗ 1 15
64∗ 0 20 490∗ 1 37 25 1 38 393∗ 7 27
65∗ 1 15 528∗ 1 32 25∗ 0 20 395∗ 0 2
77∗ 1 34 547∗ 1 32 28 0 25 428∗ 0 3
82∗ 1 14 691∗ 1 38 28 0 28 469∗ 1 14
98∗ 1 10 769∗ 0 18 31 1 17 602∗ 1 18
155∗ 0 27 1111∗ 0 20 35 1 21 681∗ 0 23
189∗ 1 9 1173∗ 0 12 35 1 25 690∗ 1 9
199∗ 1 19 1213∗ 0 12 46 1 35 1112∗ 1 11
247∗ 1 14 1357∗ 0 29 49 0 19 1180∗ 0 11
Source: Kalbfleisch and Prentice (2002, Table 1.2)
a
Asterisks indicate that time to severe A-GVHD is right censored; that is, the patient died without severe A-GVHD or
was without severe A-GVHD at last contact.
Thiotepa Group
1 3 1 8 3 36 26, 35
1 1 1 1 1 38
8 1 5 5 1 1 39 22, 23, 27, 32
1 2 9 6 1 39 4, 16, 23, 27, 33, 36, 37
1 1 10 3 1 40 24, 26, 29, 40
1 1 13 3 2 41
2 6 14 3 1 1 41
5 3 17 1, 3, 5, 7, 10 1 1 43 1, 27
5 1 18 1 1 44
1 3 18 17 6 1 44 22, 20, 23, 27, 38
5 1 19 2 1 2 45
1 1 21 17, 19 1 4 46 2
1 1 22 1 4 46
1 3 25 3 3 49
1 5 25 1 1 50
1 1 25 4 1 50 4, 24, 47
1 1 26 6, 12, 13 3 4 54
1 1 27 6 2 1 54 38
2 1 29 2 1 3 59
Source: Kalbfleisch and Prentice (2002, p. 292)
a
Initial number of tumors of 8 denotes 8 or more; Size denotes size of largest such tumor in centimeters.
b
Censoring and recurrence times are measured in months.
been grouped into months, giving a moderate number of tied recurrence times. Data
analysis methods that can accommodate these types of tied times without incurring
appreciable bias are needed for this and other applications.
SOME APPLICATION SETTINGS 19
1.7.5 Women’s Health Initiative dietary modification trial
As introduced in §1.7.3 a total of 48,835 post-menopausal US women were assigned
to either a low-fat dietary pattern intervention (40%) or to a usual diet comparison
group (60%) as a part of the multifaceted Women’s Health Initiative clinical trial. In-
tervention group women were taught nutritional and behavioral approaches to mak-
ing a major dietary change, in groups of size 10–15 led by nutritionists. The dietary
goals included fat reduction to 20% of energy (calories), fruit and vegetable increase
to 5 servings/day, and grains increase to 6 servings/day. The comparison group re-
ceived printed health-related materials only. Breast and colorectal cancer incidence
were designated primary outcomes for disease risk reduction, and coronary heart
disease (CHD) and total cardiovascular disease (CVD) incidence were designated
secondary trial outcomes.
The trial proceeded to its planned termination (March 31, 2005) at which time
breast cancer incidence results were in the favorable direction for intervention versus
comparison-group women, but not statistically significant (p = 0.09) at conventional
levels. There was no evidence of an intervention influence on colorectal cancer inci-
dence; and CHD and overall CVD results were also neutral in spite of evidence of
favorable change in low-density lipoprotein cholesterol among intervention, but not
comparison group, women.
These findings were somewhat disappointing, given the magnitude of effort re-
quired to mount such a large, complex trial. However, the adherence of intervention
women to dietary fat goals was only about 70% of that anticipated in the trial de-
sign resulting in loss of power for trial outcomes, and the differential breast cancer
incidence in the intervention versus the comparison group was also about 70% of
that projected in the trial design. Also, this nutritional and behavioral intervention
can be projected to favorably influence a range of other important outcomes, includ-
ing disease-specific and total mortality. This opens the possibility of more definitive
results for composite outcomes, such as breast cancer followed by death from any
cause. In spite of much reduced incidence rates for composite outcomes (double fail-
ures) of this type, randomization comparisons may have greater power than either
of the marginal hazard rate comparisons, depending on the strength or relationship
between the double failure hazard rates and the randomization indicator variable. HI
investigators have recently conducted further trial analyses of this type, finding nom-
inally significant intention-to-treat effects on breast cancer followed by death, and on
diabetes requiring insulin injections. The related bivariate failure time analyses will
provide illustration in Chapters 4 and 7. Another valuable development in a recent
round of data analysis was the identification of post-randomization confounding by
differential use of statins between randomization groups among women who were
hypertensive at baseline or who had prior cardiovascular disease. Statin use in the
trial cohort increased markedly during the trial follow-up period, and these potent
preparations are known to have a strong influence on low-density lipoprotein choles-
terol concentrations in the blood, and on coronary heart disease incidence. In contrast
there was no evidence of such confounding among baseline healthy (normotensive,
without prior CVD) women, and in this stratum intervention group, women experi-
20 INTRODUCTION AND CHARACTERIZATION
enced a Bonferroni–adjusted significantly lower CHD incidence compared to com-
parison group women. Trial women have continued to be followed for clinical out-
comes during the post-intervention period, and new information is still emerging on
the effects of this low-fat intervention program on clinical outcomes during the com-
bined intervention and post-intervention follow-up periods. Statistical methods are
needed to take full advantage of the wealth of data obtained in this type of enterprise,
including analyses of single and double hazard failure rates in relation to random-
ization indicator variables and in relation to other study subject characteristics and
exposures over the trial follow-up period.
BIBLIOGRAPHIC NOTES
There is a long history of modeling failure time data using survivor and hazard
functions. The preface lists a number of books that describe these functions, and
their estimation under independent censorship, in some detail, including early books
by Kalbfleisch and Prentice (1980, 2002), Breslow and Day (1980, 1987), Lawless
(1983, 2002), Cox and Oakes (1984), Fleming and Harrington (1991), and Andersen
et al. (1993). A thorough account of product integration, as in (1.1), is given by
Gill and Johansen (1990) with applications to failure time data (see Appendix A).
Dabrowska (1988) provided the nice representation (1.6), which expresses the sur-
vivor function in terms of its marginal hazard rates and dependency rates that contrast
the double failure hazard rate to the product of corresponding single failure hazard
rates locally. Dabrowska (1988) also alludes to higher dimensional representation
from which (1.8) derives. See also Gill and Johansen (1990) and Prentice and Zhao
(2018) for such higher dimensional representation. Clayton (1978) introduced the
bivariate survivor function model (1.7) with θ > 0, which was further developed
by Oakes (1982, 1986, 1989). This model was generalized to higher dimensions in
Prentice (2016). The focus here on modeling marginal hazard rates, and on marginal
single and double failure hazard rates, will be used in an attempt to provide a unified
presentation throughout this book. Key references for marginal hazard rate analyses
for single failure hazard rates include Wei, Lin, and Weissfeld (1989), Spiekerman
and Lin (1998), and Lin et al. (2000). The literature on the modeling and analysis of
recurrent events is described in some detail in Cook and Lawless (2007) while the
same authors have recently provided (Cook & Lawless, 2018) a detailed account of
multistate models for event history analyses more generally. Anderson (1984) pro-
vides a unified record of multivariate normal-based modeling and estimation proce-
dures. Key references for mean and covariance estimation with (uncensored) discrete
and continuous data include Liang and Zeger (1986), Zeger and Liang (1986), and
Prentice and Zhao (1991).
SOME APPLICATION SETTINGS 21
EXERCISES AND COMPLEMENTS
Exercise 1.1
Show that the Clayton–Oakes bivariate survivor function (1.7) for failure time vari-
ates T1 , and T2 given by
for θ > 0 has marginal survivor functions given by F(t1 , 0) and F(0,t2 ) and a time-
independent cross ratio function equal to 1 + θ . Also, by considering log F(t1 ,t2 ),
show this survivor function converges to the independence special case F(t1 ,t2 ) =
F(t1 , 0)F(0,t2 ) as θ ↓ 0. Further show that this survivor function can be extended to
to allow negative dependencies (θ < 0), that this distribution approaches the upper
Fréchet bound of F(t1 , 0) ∧ F(0,t2 ) for maximal positive dependency as θ → ∞, and
approaches the lower Fréchet bound of {F(t1 , 0) + F(0,t2 ) − 1} ∨ 0 for maximal neg-
ative dependency as θ → −1. Comment on the extent to which absolute continuity
is retained as θ becomes increasingly negative.
Exercise 1.2
The trivariate survivor function, for failure time variates T1 , T2 and T3 given by
F(t1 ,t2 ,t3 ) = {F(t1 , 0, 0)−θ + F(0,t2 , 0)−θ + F(0, 0,t3 )−θ − 2}−1/θ ∨ 0
Integrate this latter expression over (0,t1 ] × (0,t2 ] × (0,t3 ] and combine with the
former expression to yield the absolutely continuous survivor function representation
(1.8).
Exercise 1.4
Derive a test for whether the censoring times in Table 1.2, in a given treatment group,
depend on the preceding bladder tumor recurrence pattern for each patient as ob-
served during trial follow-up.
Exercise 1.5
Consider discrete failure time variates (T1 , T2 ). Show that the quantity in square
brackets in the double product integral on the right side of (1.6) can be expressed
as F(s− − − −
1 , s2 )F(s1 , s2 )/{F(s1 , s2 )F(s1 , s2 )} and thereby show through massive can-
cellation that this double product integral reduces to F(t1 ,t2 )/{F(t1 , 0)F(0,t2 )}, so
that the right side of (1.6) equals F(t1 ,t2 ).
Exercise 1.6
Consider m > 2 failure time variates T1 , . . . , Tm having joint survivor function (θ ≥
−1) (Prentice, 2016) given by
SOME APPLICATION SETTINGS 23
Exercise 1.7
Suppose that failure time variates T1 and T2 are statistically independent given the
value of a shared random effect W , where W is a gamma variate rescaled to have
mean one and variance θ > 0 that acts multiplicatively on the hazard rate. Show that
the Clayton (1978) model
F(t1 ,t2 ) = {F(t1 , 0)−θ + F(0,t2 )−θ − 1}1/θ
then arises by integrating over the joint distribution of the random effect, which is
often referred to as a “frailty” variate in this context. Generalize this result to m > 2
failure time variates and derive pairwise marginal cross ratio functions for each pair
of the m variates.
Exercise 1.8
Consider the WHI hormone therapy trial context of §1.7.3 with T1 defined as time
from randomization to CHD and T2 time from randomization to stroke. Describe
the difference in interpretation between the marginal T1 hazard process Λ10 given
by Λ10 {dt1 , 0; Z(t1 , 0)} and the recurrent event intensity process for T1 , given by
Λ1 {dt1 ; Ht } = P{CHD event in [t,t + dt); Ht } as in (1.11).
24 INTRODUCTION AND CHARACTERIZATION
Exercise 1.9
In the same WHI hormone therapy trial context (§1.7.3) define T1 as time from ran-
domization to breast cancer diagnosis, and T2 as time from breast cancer diagnosis
to death following breast cancer. Write down expressions for hazard rate processes
for T1 and for T2 given T1 ≤ t1 . Can you develop an expression for the hazard rate
for the composite time from randomization to death following breast cancer outcome
T3 = T1 + T2 in terms of these component hazard functions. Discuss the advantages
and disadvantages of comparing randomization groups in terms of T3 hazard rates
versus separate analyses for T1 and T2 hazard rates.
Chapter 2
2.1 Overview 25
2.2 Nonparametric Survivor Function Estimation 25
2.3 Hazard Ratio Regression Estimation Using the Cox Model 27
2.4 Cox Model Properties and Generalizations 31
2.5 Censored Data Rank Tests 32
2.6 Cohort Sampling and Dependent Censoring 33
2.7 Aplastic Anemia Clinical Trial Application 35
2.8 WHI Postmenopausal Hormone Therapy Application 35
2.9 Asymptotic Distribution Theory 39
2.10 Additional Univariate Failure Time Models and Methods 44
2.11 A Cox-Logistic Model for Continuous, Discrete or Mixed Failure Time Data 45
2.1 Overview
There is an extensive literature on statistical modeling and estimation for a univari-
ate failure time variate T > 0. In this chapter some core methods, which we will
build upon in subsequent multivariate failure time methods presentations, will be de-
scribed. The core methods include Kaplan–Meier survivor function estimation, Cox
model hazard ratio parameter estimation with its associated logrank test, as well as
other censored data rank tests. The presentation will focus on nonparametric and
semiparametric likelihood formulations for estimator development for reasons men-
tioned in Chapter 1. A brief account of asymptotic distribution theory for these testing
and estimation procedures will also be given.
25
26 UNIVARIATE FAILURE TIME
population, and are followed forward from t = 0 to observe individual failure times,
subject to independent right censoring. Here, independent right censoring means that
the set of individuals without prior failure or censoring has a hazard rate equal to that
for the study population, at any follow-up time t > 0. Denote by t1 < t2 < · · · < tI
the ordered distinct failure times in the sample, and suppose that di individuals fail
at ti , out of the ri individuals who are without failure or censoring prior to time
ti , i = 1, . . . , I. A nonparametric likelihood function for F can be written
n h i
L = ∏ {−F(dsk )}δk F(sk )1−δk , (2.1)
k=1
where sk is observed and is the smaller of the failure or censoring time for the kth
individual in the sample and δk takes a value of 1 if sk is uncensored and a value
0 if sk is censored. Expression (2.1) can be maximized within the class of discrete,
continuous and mixed survivor functions by placing mass (probability) only at the
observed uncensored failure times, or on the half line beyond the largest sk value if
uncensored. Doing so yields a discrete, step function estimator of F, starting with
F(0) = 1. Substituting
using the discrete special case of (1.1) and collecting terms, then gives a partially
maximized likelihood of
I
L = ∏[Λ(dti )di {1 − Λ(dti )}ri −di ], (2.2)
i=1
Of course F is not identifiable at times where no individuals are “at risk” for
failure, so F̂ is undefined beyond the largest follow-up time (i.e., the largest sk value)
observed in the sample. The corresponding hazard function estimator Λ̂ is given by
Λ̂(t) = ∑ di /ri ,
ti ≤t
which is often referred to as the Nelson–Aalen estimator. Note that, in keeping with
(1.1)
t
F̂(t) = ∏{1 − Λ̂(ds)}.
0
Informally, the ith factor in (2.2) can be recognized as a binomial likelihood for
Λ(dti ). Conditioning on all failure and censoring information prior to ti , thereby fix-
ing ri , shows Λ̂(dti ) = di ri−1 to have a conditional, and hence an unconditional, mean
NONPARAMETRIC SURVIVOR FUNCTION ESTIMATION 27
of Λ(dti ). Similarly, for i < j, conditioning on all failure and censoring information
prior to t j fixes Λ̂(dti ) and shows Λ̂(dti ) and Λ̂(dt j ) to be conditionally, and hence
unconditionally, uncorrelated. Application of the delta method then leads to
ˆ F̂(t) = F̂(t)2 ∑ {di /(ri − di )}
var
ti ≤t
for i = 1, . . . , I, for each k = 1, . . . , n. Under a discrete failure time model with hazard
rate Λ(dti ) at T = ti , i = 1, . . . , I one has
µki =E(Wki ) = Yki Λ(dti ), partial derivatives ∂ µki /∂ Λ(dti ) = Yki ,
(
µki (1 − µki ) if i = j
and cov{(Wki − µki )(Wk j − µk j )} =
0 otherwise.
where x(t) = {x1 (t), . . . , x p (t)} is a modeled covariate p-vector comprised of data-
analyst-defined functions of Z(t) and possibly product terms between such functions
and t. This modeled regression variable, with sample paths that are continuous from
the left with limits from the right, is intended to “capture” the dependence of the
hazard rate at time t on the preceding covariate history, through the value of the
hazard ratio parameter β 0 = (β1 , . . . , β p ), where a prime (0) denotes vector transpose.
The function λ0 in (2.5) is referred to as the baseline hazard function, and λ0 (t) is the
hazard rate at a reference covariate history Z0 (t) for which the modeled covariate is
x(t) ≡ 0, a zero vector for all t. The hazard process model given Z is semiparametric
with the p-vector β and the nonparametric function Λ0 , where Λ0 (t) = 0t λ0 (s)ds, as
R
parameters to be estimated.
Denote by T1 , . . . , Tn the underlying failure times for a random sample of size
n from a study population followed forward in time from t = 0. Suppose that Tk is
subject to right censoring by a variate Ck , so that one observes Sk = Tk ∧Ck and non-
censoring indicator variable δk = I[Sk = Tk ]. Suppose also that covariate histories
Zk (Sk ) are recorded, k = 1, . . . , n. An independent censoring assumption requires the
hazard rate λ {t; Z(t)} in (2.5) to equal the same hazard rate, but with Ck ≥ t added to
the conditioning event. That is, independent censorship implies that the subset of in-
dividuals who are without prior failure or censoring at any follow-up time t, referred
to as the “risk set” at time t and denoted R(t), is representative of the study popu-
lation in terms of hazard rate at t given Z(t). This assumption needs to be carefully
considered in the context of specific applications, and may be able to be relaxed as
necessary.
A semiparametric likelihood function, analogous to (2.1), can be written
n n oδk Zs
k
L = ∏ λ0 (sk )exk (sk )β exp − exk (u)β λ0 (u)du ,
k=1 0
where (sk , δk ) and Zk (sk ), k = 1, . . . , n are the observed data in the study sample.
Several approaches have been considered for dealing with the nonparametric as-
pect of this model, including partial likelihood (Cox, 1972, 1975), marginal likeli-
hood (Kalbfleisch & Prentice, 1973), and approximate likelihood (Breslow, 1974)
methods.
The Breslow (1974) approach begins by noting that the above likelihood can be
maximized by placing all failure probability within the risk region of the data, de-
fined by R = {t; sk ≥ t for some k ∈ (1, . . . , n)}, on the observed uncensored failure
HAZARD RATIO REGRESSION ESTIMATION USING THE COX MODEL 29
times t1 < t2 < · · · < tI in the sample. This implies that censored sk values can be
replaced by censored values at the immediately preceding uncensored failure time in
the sample without diminishing the likelihood. Now if one approximates the base-
line rate function by λ0 (t) = λi whenever t ∈ (ti−1 ,ti ], i = 1, . . . , I one can write the
likelihood, following the censored data shifting just mentioned, as
" #
I n o
xi (ti )β xk (ti )β
L = ∏ ∏ λi e ∏ exp −e λi ∆ti (2.6)
i=1 k∈D(ti ) k∈R(ti )
where D(ti ) denotes the set of di individuals having uncensored failures at T = ti , and
∆ti = ti −ti−1 , i = 1, . . . , I, with t0 = 0. In this form L can be recognized as having the
form of a parametric likelihood to which application of standard likelihood methods
can be considered.
Specifically, one can solve the equations ∂ log L/∂ λi = 0, i = 1, . . . I explicitly
giving ( )
λ̂i (β ) = di / ∆ti ∑ exk (ti )β , i = 1, . . . , I,
k∈R(ti )
which can be inserted into (2.6) to give the profile likelihood
,( )di
I
L(β ) = ∏ ∏ exk (ti )β ∑ exk (ti )β , (2.7)
i=1 k∈D(ti ) k∈R(ti )
1 1
is such that n 2 {Λ̂0 (·) − Λ0 (·)} converges jointly with n 2 (β̂ − β ) to a zero mean
Gaussian process over a time period [0, τ], where τ is in the support of the follow-up
times (S values). The estimator β̂ has also been shown to be semiparametric efficient
under (2.5) in the special case of time-independent covariates.
The Cox likelihood can be derived similarly by setting the overall sample empir-
ical hazard rates di /ri equal to the model-based average ∑`∈R(ti ) λ0 (ti )ex` (ti )β /ri and
solving for λ0 (ti ) at each i = 1, . . . , I. Plugging these values into the semiparametric
likelihood gives (2.7).
A mean parameter estimating equation development can also be considered: As
above set
( (
1 Sk = ti , and δk = 1 1 Sk ≥ ti
Wki = , and Yki = , for all (k, i).
0 otherwise 0 otherwise
Under a model of the form (2.5) one has µki = E(Wki ) = Yki eαi +xk (ti )β , where αi =
log Λ0 (dti ), the uncorrelatedness of Wki , i = 1, . . . , I for each k, and mean parameter
estimating equations (Appendix A, A.5) for (α1 , . . . , αI ) that solve
n
∑ (Wki − µki ) = 0, for i = 1, . . . , I.
k=1
which can be inserted into the estimating equation for β giving ∑Ii=1 Wki xki (ti )0 −
di ∑Ii=1 {∑nk=1 Yki xk (ti )0 exk (ti )β / ∑nk=1 Yki exk (ti )β } = 0, yielding (2.8) and (2.10). Also
the model-based variance estimator, using the notation of Appendix A, is
!−1
n I
∑∑ Yki D̂0kiV̂ki−1 D̂ki ,
k=1 i=1
COX MODEL PROPERTIES AND GENERALIZATIONS 31
where ˆ denotes evaluation at (α̂1 , . . . , α̂I , β̂ ) solving these equations, which equals
I(β̂ )−1 from (2.9).
It follows that maximum likelihood, empirical hazard plug-in, and mean param-
eter estimating functions again agree for this rather general regression estimation
problem. Generalizations of these approaches will be considered in later chapters
to manage nonparametric aspects of models specified for multivariate failure time
regression estimation.
The asymptotic developments mentioned above assume absolutely continuous
failure times, so that technically we should have di = 1, i = 1, . . . , I. However (β̂ , Λ̂0 )
as described above can tolerate some tied failure times without incurring appreciable
asymptotic bias. A rule of thumb may be that the number of ties di should not be
more than a few percent (e.g., 5%) of the size, ri , of the corresponding risk set at
uncensored failure times. Nearly all available computer software for the Cox model
allows tied failure times, and applies the expressions given above, even though more
sophisticated approximations have been proposed for handling tied failure times.
at time t, and does not depend on the choice of baseline covariate history Z0 .
Also, it is worth commenting that the time-varying feature of (2.5) can be quite
powerful. This feature allows hazard ratios for a specific covariate to vary in a user-
defined fashion as a function of follow-up time, and it allows hazard rates to be
defined that condition on stochastic covariates that are recorded during study follow-
up.
From (2.10) one can specify
( )
F̂{t; Z(t)} = ∏ 1 − di exp{xi (t)β̂ } ∑ exk (ti )β̂ ,
ti ≤t k∈R(ti )
by introducing a weight function h, where h(t) can depend on failure and censoring
information prior to t, in order to provide, for example, greater sensitivity to hazard
rate differences early versus late in the follow-up period, or vice versa. A correspond-
ing variance estimator under β = 0 is given by
I
V (0) = ∑ h(ti )2Vi
i=1
where (Vi ) j j = ri j (ri − ri j )di (ri − di )ri−2 (ri − 1)−1 , j = 1, . . . , p and (Vi ) jk =
−ri j rik di (ri −di )ri−2 (ri −1)−1 , j 6= k , where, for example, ri j is the size of the risk set
in the jth sample at ti . Tests of the form (2.14) are known as weighted logrank tests.
In addition to h(t) ≡ 1, another commonly used test specifies h(t) = F̂(t), with F̂
a survivor function estimator under the null hypothesis. This generalized Wilcoxon
test applies greater weight to early failures, compared to the logrank test. Weight
functions h that may depend on failure, but not censoring, distribution estimates are
preferable for test interpretation.
HOGMANAY, HOGMENAY, s.
1. The last day of the year, S.
2. The entertainment given to a visitor on this day; or a gift
conferred on those who apply for it, S.
J. Nicol.
The origin is quite uncertain.
To HOY, v. a.
1. To incite, a term used as to dogs, S.
Burns.
2. To chase or drive away.
Lyndsay.
Isl. ho-a, greges convocare vel agere.
HOYES, s.
1. A term used in public proclamations, calling attention, S.
Skene.
O. Fr. oyez, hear ye.
2. Used as equivalent to hue, in the phrase hue and cry.
Stat. Rob. I.
To HOIST, v. n. To cough.
V. Host.
To HOLL, v. n. To excavate, S.
A. S. hol-ian, id.
Holl, Howe, adj.
1. Hollow, deep; how, S.
Palice Hon.
2. Concave.
Douglas.
3. Giving a hollow sound, S.
Burns.
Isl. hol-ur, cavus, concavus.
Holl, s. Hold of a ship.
Wallace.
HOLM, HOWM, s. The level low ground on the banks of a river, S.,
hoam, S. B.
Isl. hwam-r, a little valley.
Wyntown.
HOLT, s. A wood; as in E.
HOLT, s.
1. High and barren ground.
Douglas.
Isl. hollt, terra aspera et sterilis.
2. A very small hay cock, or a small quantity of manure before it is
spread, Dumfr.
Statist. Acc.
HOO, s. Delay.
V. Hove.
Wallace.
HOO, s. Cap.
V. How.
HOP, HOPE, s. A sloping hollow between two hills, or the hollow that
forms two ridges between one hill, South of S.
Wallace.
Celt. hope, petite vallée entre des montagnes.
HORRING, s. Abhorrence.
Buchanan.
HORSE, s. A faucet, S. B.
HORSE-COUPER, s. A horse-dealer, S.
Colvil.
HOSE-NET, s.
1. A small net, affixed to a pole, resembling a stocking, S.
2. In a hose-net, in an entanglement, S.
R. Bruce.
To HOST, HOIST, v. n.
1. To cough, S.
Henrysone.
2. Metaph. to belch up; applied to the effusions of grief or
displeasure.
Doug.
3. To hem, S.
A. S. hweost-an, Su. G. host-a, id.
Host, Hoast, Hoist, s.
1. A single act of coughing, S.
Dunbar.
2. A settled cough, S.
K. Hart.
3. A hem, S.
4. Denoting what is attended with no difficulty or hesitation. It did
na cost him a host, S.
Ross.
A. S. hweost, Belg. hoest, id.
HOSTA, interj. Expressing surprise, and perhaps hesitation, Ang.
Shirrefs.
Moes. G. haus-jan, audire.
To HOSTAY, v. a. To besiege.
Fr. hostoyer, id.
Wyntown.
To HOUD, v. n.
1. To wriggle, S.
2. To move by succussation, Loth.
Houd, s. The act of wriggling, S. B.
To HOVE, v. n.
1. To swell, S.
Hogg.
2. To rise, to ascend.
Polwart.
Dan. hov-er, to swell.
HOUFF, s. A haunt.
V. Hoif.
To Houff, v. n. To take shelter, S.
HOUFFIT, part. Heaved.
K. Hart.
HOURIS, s. pl.
1. Matins.
Bellenden.
2. Metaph. the chanting of birds.
Dunbar.
Fr. heures, a book of prayers for certain hours.
HOW, s.
1. A coif or hood. S. B. pron. hoo.
Kelly.
Belg. huyve, Dan. hue, id.
2. A chaplet.
Douglas.
A. S. hufe, tiara.
3. Sely how, also happy how, a membrane on the head, with which
some children are born; pron. hoo, S. B.
Ruddiman.
HOW, HOU, HOO, s. A piece of wood, which joins the couple-wings
together at the top, on which rests the roof-tree of a thatched
house, S.
Ramsay.
Su. G. huf, summitas tecti.
HOW, s. A hoe, S.
Fr. houe.
Barbour.
HOW, HOU, s.
1. The sound made by the owl.
Fr. hu-er, to hoot.
Doug.
2. A sea cheer.
Complaynt S.
To HUD, v. n. To hide.
V. Hod.
Leg. St Androis.
HUD-PYKE, s. A miser.
Dunbar.
Su. G. pick-hogad, qui avide desiderat.
HUKEBANE, s. Huckle-bone, S. B.
Su. G. Isl. huk-a, inclinare se.
Dunbar.
HUM, s. A sham, S.
Su. G. hum, an uncertain rumour.
HUMDRUM, s. Dejection, S. B.
Ross.
Isl. humm-a, admurmurare, and drom-a, tarde et lente
gradi.
HUMLOIK, s. Hemlock.
Lyndsay.
HUMMEL, s. A drone.
Dunbar.
Germ. hummel, fucus.
HUMSTRUM, s. A pet.
Gl. Shirr.
Hum, as in hum-drum, and strum, q. v.
HUND, s.
1. A dog, S.
Dunbar.
Moes. G. hunds, A. S. hund, canis.
2. An avaricious person, S.
Teut. hond, homo avarus.
HUNE, s. Delay.
V. Hone.
Dunbar.
To HUR, v. n. To snarl.
Muses Thren.
Lat. hirr-ire, id.
HURCHEON, s. A hedgehog, S.
ebookbell.com