1. Moderators: DR. A. S. DORLE
Mrs. K.R. KULKARNI
Presenter: DR. VIJAYLAKSHMI
2. Introduction
What is Survival Analysis?
Survival data, survival time
Censoring
Survival curve
Life table analysis
Kaplan –Meier method
3. Five Approaches To Expressing Prognosis
1. Case-fatality rate
2. 5-year survival
3. Observed survival
4. Median survival time
5. Relative survival
4. Case-fatality is as calculated number of deaths from a specific
disease during a specific time period divided by number of
cases of the disease usually expressed as 100
If a person has the disease, what is the likelihood that he or
she will die of the disease?
Denominator for case-fatality is the number of people who
have the disease.
Ideally suited to diseases that are short-term, acute conditions.
5. This term is frequently used particularly in evaluating
treatments for cancer.
It is the percentage of patients who are alive 5 years after
treatment begins or 5 years after diagnosis.
It is a proportion.
Most deaths from cancer occur during this period after
diagnosis, so 5-year survival has been used as an index of
success in cancer treatment.
7. Another approach is to use the actual observed survival
over time. For this purpose, we use a life table.
8. It is defined as the length of time that half of the study
population survives.
Median survival offers two advantages over mean survival.
a. it is less affected by extremes,
b. mean survival- observe all of the deaths in the study
population before the mean could be calculated.
Median survival- observe the deaths of half of the group.
9. For any group of people with a disease, we want to compare their
survival to the survival we would expect in this age group even if
they did not have the disease. This is known as the relative survival.
Relative survival is thus defined as the ratio of the observed survival
to the expected survival
Relative survival= Observed survival in people with the disease
Expected survival if disease were absent
11. Survival distribution is often skewed, or far from being normal.
Thus there is a need for new statistical techniques.
Survival analysis requires that the dependent( outcome) variable be
dichotomous (e.g survival/ death, success/ failure)
Parametric - if the distribution of survival times is known to be
normal
Nonparametric if the distribution is unknown.
This assumption would be true if the survival times of all the
subjects were exact and known. But some survival times are not.
12. Survival analysis is a collection of statistical methods that are used to
describe, explain, or predict the occurrence and timing of events.
Other names for survival analysis include
Event history analysis,
Failure time analysis,
Hazard analysis,
Transition analysis,
Duration analysis.
13. Estimate time-to-event for a group of individuals,
such as time until second heart-attack for a group of MI
patients. (Median test)
To compare time-to-event between two or more
groups, such as treated vs. placebo MI patients in a
randomized controlled trial. (Log rank test)
To assess the relationship of co-variables to time-to-
event
weight, insulin resistance, or cholesterol influence
survival time of MI patients?. (Cox regression test)
Note: expected time-to-event = 1/incidence rate
Objectives of survival analysis
14. Mortality studies ( death is event)
Remission of symptoms ( therapy trials)
Disease surveillance ( when did it occur)
Quality Control & Reliability in Manufacturing (e.g.,
The amount of force needed to damage a part such
that it is not useable)
15. Survival analysis requires that each individual be observed
over some defined interval of time
If events occurred during that interval, their times are
recorded.
Time usually is measured in discrete units (years,
months)
Time theoretically can be measured in continuous units
(i.e., hours, minutes, seconds)
16. Survival time can be defined broadly as the time to the occurrence of a
given event.
This event can be the
development of a disease,
response to a treatment or
death.
Therefore, survival time can be tumor-free time, the time from the start of
treatment to response, length of remission, and time to death.
17. Survival data can include
survival time
response to a given treatment
patient characteristics related to response and survival
development of a disease.
It focuses on predicting the probability of response, survival, or
mean lifetime, comparing the survival distributions of human
patients
Survival data
18. Outcome variable: Time until an event occurs
Time: years, months, weeks, or days
Start follow-up Event
Event: death, disease, relapse, recovery
Time
19. Time ≡ survival time
It gives the time that an individual has survived over some follow up period
Event ≡ failure
The event of interest usually is death, disease or any other individual
experience.
Failure is a positive event
20. The lifetime of electronic devices, components, or
systems - reliability engineering
Felons’ time to parole criminology
Duration of first marriage sociology
Length of newspaper or magazine subscription
marketing
Worker’s compensation claims (insurance) and their
various influencing risk or prognostic factors.
21. 1) Leukemia patients/time in remission (weeks)
Event: Going out of remission
Outcome: Time in weeks until a person goes out of remission
2) Disease-free cohort/time until heart disease (years)
Event: Developing heart disease
Outcome: time in years until a person develops heart disease.
22. Some patients may still be alive or disease-free at the end of the
study period. The exact survival times of these subjects are
unknown. These are called censored observations or censored times
and
It can also occur when people are lost to follow-up after a period of
study.
When these are not censored observations, the set of survival times is
complete.
There are three types of censoring.
23. 1) A person does not experience the event before the study
ends.
2) A person is lost to follow-up during the study
period.
3) A person withdraws from the study because
of death (if death is not the event of interest) or
some other reason.
25. Right censoring
Left censoring
Interval censoring
Right Censoring is also categorised by
1. Fixed study length/ Type I censoring
2. Fixed number of events/ Type II censoring
3. Random entry to study/ Type III censoring
•The plus indicates a censored observation
26. Subjects are observed for a
fixed period of time, say six
months
Survival times recorded for the
animals that died. (exact or
uncensored observations)
F died accidently
The survival data are 10, 15,
30+, 25, 30+ and 19+ weeks.
27. To wait until a fixed portion
of the animals have died
After which the surviving
animals are sacrificed.
Ex: four of the six rats have
developed tumors.
The survival are 10, 15, 35+,
25, 35, and 19+ weeks.
28. The period of study is fixed and
patients enter the study at
different times during that
period.
Hence the censored times are
also different.
The respective remission times
of the six patients are 4, 4+, 6,
8+, 3, and 3+ months.
29. Type I and type II censored observations are also called
singly censored data
Type III censoring is called progressively censored data.
Another name is random censoring.
30. It is a least common type of censoring
It happens when an event is known to have occurred before some
particular time, but the exact time is unknown.
A 50-year-old participant was found to have developed
retinopathy, but there is no record of the exact time at which initial
evidence was found. Diagnosis is at 50 years.
We do not know the origin time
31. Interval Censoring: which means that an individual is
known to have an event between two points in time (a and
b), but the exact time is unknown.
Ex: if medical records indicate that at age 45, the patient in
the example above did not have retinopathy. His age at
diagnosis is between 45 and 50 years.
32. This function, denoted by S(t), is defined as the probability that an
individual survives longer than t:
Let T denote the survival time.
S(t)= P (an individual survives longer than t)
=P(T >t)
Berkson (1942) recommended a graphic presentation of S(t). The
graph of S(t) is called the survival curve.
33. A steep survival curve, as in
Fig 2.1a, represents low
survival rate or short
survival time.
A gradual or flat survival
curve such as in Fig2.1b
represents high survival
rate or longer survival.
35. Survival curve is used to find the 50th percentile (the median)
and other percentiles (e.g., 25th and 75th) of survival time
To compare survival distributions of two or more groups.
If there are no censored observations
S (t) = number of patients surviving longer than t
total number of patients
36. Example, consider the following set of survival data: 4, 6, 6+, 10+, 15, 20.
S (5) =5/6= 0.833.
Cannot obtain S (11) since the exact number of patients surviving longer
than 11 is unknown.
When censored observations are present -Nonparametric methods are used
37. Let S1(t) and S2(t) be the survival functions of the two
groups.
The null hypothesis is
H0: S1(t) =S2(t), for all t > 0
The alternative hypothesis is:
H1: S1(t) S2(t), for some t > 0
40. SPSS, SAS, S-Plus and many other statistical software
packages have the capability of analyzing survival data
Logrank Test can be used to compare two survival curves
A p-value of less than 0.05 based on the Logrank test
indicate a difference between the two survival curves
43. To compare survival curves, a log-rank test creates 2x2
tables at each event time and combines across the tables
Provides a c2 statistic with 1 degree of freedom (for a two
sample comparison) and a p-value
Same procedure for hypothesis testing
44. H0: S1(t)=S2(t)
Time to event outcome, dichotomous predictor
Log rank test
Test statistic: c2=4.4
p-value=0.036
Since the p-value is less than 0.05, we reject the
null hypothesis
We conclude that there is a significant difference in
the survival time in the treated compared to
untreated
45. p-value
Pr>chi2 = 0.0364
chi2(1) = 4.38
Total 13 13.00
1 6 9.19
0 7 3.81
group observed expected
Events Events
Log-rank test for equality of survivor functions
analysis time _t: weeks
failure _d: event
. sts test group, logrank
46. Hazard: the event of interest occurring
Hazard might be death, engine breakdown, adoption of innovation,
etc.
Hazard rate: is the instantaneous probability of the given event
occurring at any point in time. It can be plotted against time on the
X axis, forming a graph of the hazard rate over time.
Hazard function: the equation that describe this plotted line is the
hazard function.
Hazard ratio: also called relative risk.
47. The hazard function is also known as the
instantaneous failure rate,
force of mortality,
conditional mortality rate
age-specific failure rate.
48. The hazard function h(t) of survival time T gives the
conditional failure rate.
This is defined as the probability of failure during a very
small time interval, assuming that the individual has
survived to the beginning of the interval, or
As the limit of the probability that an individual fails in
a very short interval, t +∆t, given that the individual has
survived to time t:
49. h(t) = lim ∆t → 0 P[an individual dying in the interval
(t, t+ ∆t ) given the individual has survived to t ]
∆t
h(t) = f (t)
1 - F(t)
Where F(t) = p (an individual fails before time t)
50. The cumulative hazard function is defined as
h(t) = number of patients dying per unit time in the interval
(number of patients surviving at t)-1/2(number of deaths in
the interval)
H(t)= - log S(t)
Thus, at t=0, S(t)= 1, H(t)= 0, and at t=∞, S(t)= 0, H(t)= ∞.
The cumulative hazard function can be any value between zero
and infinity.
51. h 3(t), is the risk of healthy persons
between 18 and 40 years of age
whose main risks of death are
accidents.
h 4(t)- process of human life
h 5(t)- risks that increase initially,
then decrease after treatment. Ex-
TB
52. As length of observation varies from participant to
participant, person –time methods is used
If 1 person is observed for 3 yrs and another person for
1 yr, the total duration of observation would be equal to 4
Person Years
It is useful if risk of death/ outcome does not greatly
change over the follow up period
It is not useful if the risk of death changes with amount of
time elapsed since baseline
55. The life-table method is one of the oldest techniques for measuring
mortality and describing the survival experience of a population.
It has been used by actuaries, demographers, governmental agencies,
and medical researchers in
studies of survival,
population growth,
fertility,
migration,
length of married life
length of survival
56. The life tables, summarizing the mortality experience of a
specific population for a specific period of time, are called
population life tables.
The life-table method applied to patients with a given disease
who have been followed for a period of time are called clinical
life tables.
Although population and clinical life tables are similar in
calculation, the sources of required data are different.
57. Two kinds of population life tables:
1. Cohort life table
2. Current life table.
The cohort life table describes the survival or mortality
experience from birth to death of a specific cohort of
persons who were born at about the same time, for
example, all persons born in 1950.
58. The cohort has to be followed from 1950 until all of them
die.
The proportion of death (survivor) is then used to
construct life tables for successive calendar years.
This type of table, useful in population projection and
prospective studies, is not often constructed since it
requires a long follow-up period.
59. The current life table is constructed by applying the age-specific
mortality rates of a population in a given period of time to a
hypothetical cohort of 1,00,000 or 10,00,000 persons.
The starting point is birth at year 0.
Based on the life experience of an actual population over a short
period of time
One of the most often reported statistics from current life tables is
the life expectancy.
The term population life table is often used to refer to the current
life table.
62. 1. AGE (x)
2. AGE-SPECIFIC MORTALITY RATE (q
x)
3. NUMBER ALIVE AT BEGINNING OF YEAR (l
x)
4. NUMBER DYING IN THE YEAR (d
x)
63. PROCEDURE:
We use column 2 multiplied by column 3 to obtain column 4.
Then column 4 is subtracted from column 3 to obtain the
next row’s entry in column 3.
64. EXAMPLE:
100,000 births ( row 1, column 3) have an infant mortality rate
of 46.99/thousand (row 2, column 2), so there are 4,699 infant
deaths (row 3, column 4). This leaves 95,301 left (100,000 –
4,699) to begin the second year of life (row 2 column 3).
65. If we stopped with the first four columns, we
could still find out the probability of surviving to
any given age.
e.g. in this table, we see that 90.27% of non-white
males survived to age 30.
66. Column:
5. THE NUMBER OF YEARS LIVED BY THE
POPULATION IN YEAR X (Lx)
6. THE NUMBER OF YEARS LIVED BY THE
POPULATION IN YEAR X AND IN ALL
SUBSEQUENT YEARS (Tx)
7. THE LIFE EXPECTANCY FROM THE
BEGINNING OF YEAR X (ex)
67. The total number of years lived in each year is listed in
column 5, Lx.
It is based on two sources. One source is persons who
survived the year, who are listed in column 3 of the row
below. They each contributed one year.
Each person who died during the year (column 4 of the
same row) contributed a part of year, depending on when
they died.
For most purposes, we simply assume they contributed ½ a
year.
68. The entry for column 5, Lx in this table for age 8-9 is
94,321. Where does this number come from?
1. 94,291 children survived to age 9 (column 3 of
age 9-10), contributing 94,291 years.
2. 60 children died (column 4 of age 8-9) , so they
contributed ½ year each, or 30 years.
3. 94,921 + 30 = 94,321.
69. Because deaths in year 1 are not evenly
distributed during the year (they are closer to
birth), infants deaths contribute less than ½ a
year.
Can you figure out what fraction of a year are
contributed by infant deaths (0-1) in this table?
70. 1. Lx = 96,254
2. 95,301 contributed one year
3. 96,254 - 95,301 = 953 years, which must come from infants who died 0-1
4. 4,699 infants died 0-1
5. 953/4,699 = .202 or 1/5 of a year, or about 2.4 months
71. Formula is
n *[ lx+(x+n)]
2
Where n= age interval
For more than 75 yrs
L75 = D75 / M75
M75 for India is 0.1779
L75+= 17345/0.1779= 97499
72. The top line of Column 6, or Tx=0 , is obtained by
summing up all of the rows in column 5.
It is the total number of years of life lived by all members of
the cohort.
This number is the key calculation in life expectancy,
because, if we divide it by the number of people in the cohort,
we get the average life expectancy at birth, ex=0, which is
column 7.
73. For any year, column 6, Tx, provides the number of years
yet to be lived by the entire cohort, and
Column 7- the number of years lived on average by any
individual in the cohort. (Tx/lx)
Thus column 7 is the final product of the life table, life
expectancy at birth, or life expectancy at any other
specified age.
74. Life expectancy at birth in the US now is 77.3
years. This means that a baby born now will live
77.3 years if…………..
that baby experiences the same age-specific
mortality rates as are currently operating in the US.
75. Applied to clinical data for many decades.
Berkson and Gage (1950) and Cutler and Ederer (1958)
give a life-table method for estimating the survivorship
function;
Life-table technique uses incomplete data such as losses to
follow-up and persons withdrawn alive as well as
complete death data.
76. Two important assumptions are made in using life tables.
1. Over the period of the study, there has been no improvement
in treatment and that survivorship in one calendar year of the
study is the same as in another calendar year of the study.
2. The survival experience of people who are lost to follow-up
is the same as the experience of those who are followed up.
77. It is used to calculate the survival rates of patients during fixed
intervals, such as years
It determines the number of people surviving to the beginning
of each interval
Assumes that the individuals who were censored, were
observed for only half the interval
Mortality rate of the interval is calculated by dividing the
number of deaths in the interval by the total person years of
observations in that interval for all those who began the interval
80. 16
71
36
P 2 = probability of surviving the 2nd year = 71/197-43= 0.461
P 3 = probability of surviving the 3rd year = 36/71-16= 0.655
P 4 = probability of surviving the 4th year = 16/36-13= 0.800
P 5 = probability of surviving the 5th year = 8/16-8= 0.800
83. The columns are as follows:
Col(1): The interval since beginning treatment.
Col (2): The number of study subjects who were alive at
the beginning of each interval.
Col (3): The number of study subjects who died during
that interval.
Col (4): The number who “withdrew” during the interval
(lost to follow-up)
84. Column (5): The number of people who are effectively at risk of
dying during the interval.
Column (6): The proportion who died during the interval is
calculated by dividing
The number who died during the interval column ( 3)
The number who were effectively at risk of dying during the
interval (column 5)
85. Column (7): The proportion who did not die during the
interval= 1.0 − proportion who died during the interval
(column 6).
Column (8): The proportion who survived from the point
at which they were enrolled in the study to the end of this
interval (cumulative survival)
87. It has become the most commonly used approach to
survival analysis in medicine
It is usually referred to as Kaplan-Meier Life Table
Method
Also referred as Product-limit (PL) Method because it
takes advantage of the N survival rate (PN )being equal
to the product of all the survival rates of the individual
intervals (e.g p1, p2 ) leading up to time N
88. PL method of estimating the survivorship function was
developed by Kaplan and Meier (1958).
This method is applicable to small, moderate, and large
samples.
If data is grouped into intervals and sample size is very large,
say in the thousands, it may be more convenient to perform a
life-table analysis.
PL estimate is based on individual survival times, whereas in
the life-table method, survival times are grouped into intervals.
89. It calculates a new line of the life table every time a new death
occurs
Because death occur unevenly over time, the intervals are
uneven and there are many intervals
Hence the graph looks like uneven stair steps
A death produces an instantaneous drop in the proportion
surviving and another death free period begins
90. Nonparametric maximum likelihood estimator.
Median survival is a point of time when S(t) is 0.5
Mean is equal to the area under the survival curve
If the largest observation is uncensored, the PL estimate at
that time equals zero
Inspection of Kaplan-Meier curve will allow you to
determine which of the groups had the significantly longer
survival time
94. •Kaplan-Meier analysis of
overall survival among 126
patients with asymptomatic,
but severe, aortic stenosis,
compared with age- and sex-
matched persons in the general
population.
•This analysis included
perioperative and postoperative
deaths among patients who
required valve replacement
during follow-up.
95. •Kaplan- Meier analysis
of event-free survival
among 25 patients with
no or mild aortic valve
calcification,
compared with 101
patients with moderate
or severe calcification.
•The vertical bars
indicate standard errors.
96. • It shows survival for
white and black
children over a 16-year
period.
•No black children
survived longer than 4
yr, but some white
children survived as
long as 11 years in this
16-year period of
observation.
97. •In whites survivorship
increased in each
successive period.
• For example, if we
examine 3-year survival
by looking at the 3-year
point on each successive
curve, the survival
improved from 8% to
25% to 58%.
98. •In blacks there was
much less improvement
in survival over time;
•The curves for the two
later 5-year periods
almost overlap.
99. Means and Median for Survival time
Since there is a lot of overlap in the confidence intervals, it is unlikely that there is much
difference in the "average" survival time.
If confidence intervals do not overlap between levels, differences in effect on time to event
can be inferred.
100. Overall comparison
This table provides overall tests of the equality of survival times across
groups. Since the significance values of the tests are all greater than 0.05,
there is no statistically significant difference between two treatments in
survival time.
101. The Cox Regression procedure is useful for modeling the time to
a specified event, based upon the values of given covariates.
One or more covariates are used to predict a status (event).
The central statistical output is the hazard ratio.
Data contain censored and uncensored cases.
Similar to logistic regression, but Cox regression assesses
relationship between survival time and covariates
102. The time-dependent covariate has a significance value less
than 0.05, which means it contributes to the model, but the
value of the coefficient is very small.
We found that the effect of age on recidivism is time-
dependent, and added a term to the model that helps to
account for that dependence.
103. Data is vulnerable to errors of one sort or another.
Potential danger comes from fitting an incorrect model.
The omission of variables (confounders) that affect the
outcome and that are also correlated with the included
variables.
104. Survival Analysis
Life table analysis
Kaplan –Meier method
Survival analysis accounts for censoring in time to event data
Log rank test: difference in survival between 2 groups
Cox proportional hazard model
More complex/powerful models available
SPSS, R, SAS, Stata
105. 1. L.Gordis, Textbook of Epidemiology , The Epidemiologic approach to disease &
intervention, 5th ed.
2. Rajvir Bhalwar. Preventive Health Care Of The Elderly In: Text book of Public
Health and Community Medicine 1st edition, New Delhi: Department of
community medicine AFMC Pune; 2009:
3. Park K. Park’s text book of Preventive and Social Medicine, 23rd ed.Jabalpur.
Banarasidas Bhanot Publication; 2015.
4. Paul D. Allison. Survival Analysis.
5. Hui Bian. Survival Analysis Using SPSS.
6. LEE E, WENYUWANG J. Statistical Methods for Survival Data Analysis.
Oklahoma. A JOHN WILEY & SONS, INC., PUBLICATION, 3 rd edition.