1. Chapter 1
STATISTICAL CONCEPTS AND MARKET RETURNS
Statistical methods provide a set of tools for analyzing data and drawing conclusions on
asset returns, earnings, growth rates, commodity prices, or any other financial data. In this
chapter, we will study the four properties of return distributions: a) where the returns are
centered (central tendency); b) how far returns are dispersed (dispersion); c) whether the
distribution of returns is symmetric or lopsided (skewness); and, d) whether extreme outcomes
are likely (kurtosis).
Terminologies:
Population – all elements of a specific group
Parameter – quantity used to describe a population such mean, range and variance
Sample – a representation or a subset of a population
Statistic – quantity used to describe a sample
Measurement Scales
1. Nominal scale - categorizes each member of the population or sample using an integer for
each category.
2. Ordinal scale - each member of the population or sample is placed into a category and these
categories are ranked/ordered with respect to some characteristic
3. Interval scale - each member is assigned a number from a scale. It provides not only ranking
but also assurance that the differences between values are equal.
4. Ratio scale - has all the characteristics of interval measurement scale as well as true
meaning of zero point as the origin.
Frequency Distribution
A frequency distribution is a tabular display of data summarized into intervals. It helps in
the analysis of large amount of statistical data for all types of measurement scales. We use
frequency distribution to summarize rates of return, the fundamental units that analysts and
portfolio managers use for making investment decisions. To analyze rates of return, we first
compute the total return or the holding period return for time period t, Rt:
Pt Pt 1 Dt
Rt
Pt 1
where: Pt = price per share at the end of time period t
Pt – 1 = price per share at the end of time period t – 1, the time period immediately
preceding time period t
Dt = cash distributions received during time period t
2. Construction of a Frequency Distribution
1. Calculate the range of the data.
2. Decide on the number of classes in the frequency distribution, k, usually between 5 and
15.
3. Determine the interval width as Range/k.
4. Determine class boundaries.
5. Count the number of observations falling in each interval.
6. Compute the class midpoints.
7. Construct a table of the intervals listed from smallest to largest that shows the number of
observations falling in each interval.
Relative Frequency
Relative frequency is the absolute frequency of each interval divided by the total number
of observations.
Cumulative Relative Frequency
Accumulates or adds up the relative frequencies from the first class to the last class. It
tells us the fraction of observations that are less than the upper limits of each class interval.
Graphical Presentation of Data
Histogram – a bar chart of data from a frequency distribution
Frequency Polygon – a line graph of the class midpoint (on the x –axis) and the absolute
frequency of the class interval (on the y-axis).
Cumulative Frequency graph – a line graph of the cumulative relative frequency (y-axis) and the
upper class limit (x-axis).
Measures of Position/Location
1. Measures of Central Tendency – summarize the location on which the data are centered.
a. Mean
i. Arithmetic mean – the sum of the observations divided by the number of
observations
Population mean is the arithmetic mean value of a population. For N observations
and Xi is the ith observation, the population mean, , is computed as
N
Xi
i 1
N
Sample mean is the arithmetic mean computed for a sample. For n observations in
the sample, the sample mean, X , is computed as
3. n
Xi
i 1
X
n
ii. Weighted mean – weights each value in the distribution according to its importance
Weighted mean for a set of observations X1 , X2 , . . . , Xn with corresponding weights
w1, w2 , . . . , wn is computed as
n
Xw wi X i
i 1
Example 1.1: An investment manager with P10,000,000 to invest allocates
P6,500,000 to equities and P3,500,000 to bonds. If the portfolio has a weight of .7 on
equities and .3 on bonds, what is the return on this portfolio?
Solution:
X w .7(P6,500 000) .3(P3,500,000) = P 5,600,000
,
This value is the portfolio’s return on the stocks and bond investments.
This example illustrates the general principle that a portfolio return (past data) is
a weighted sum. Weighted mean can also be used for forward-looking data and we
call this as expected value or expected return. The weights are probabilities from
forecasts.
iii. Geometric mean – most frequently used to average rates of change over time or to
compute the growth rate of a variable, such as time series rates of returns on an
asset or a portfolio, or to compute the growth rate of a financial variable such as
earnings or sale.
Geometric mean, G, of a set of observations X1 , X2 , . . . , Xn, where each Xi is
greater than or equal to 0, is computed as
G n X 1 X 2 ...X n
When data involves returns in time series, we compute the geometric mean
return, RG, or compound returns over the time period spanned by returns R1 through
RT, as
1 RG T (1 R1 )(1 R2 )...( RT )
1
or
RG T (1 R1 )(1 R2 )...( RT ) 1
1
Example 1.2: Calculate the geometric mean return of the total returns of the ABC
company from 2000 to 2005: 16.2%, 20.3%, 9.8%, -11%, 1.6%, -13.5%.
RG = ((1.162)(1.203)(1.098)(.89)(1.016)(.865))1/6 – 1
= .030929545 = 3.093%
4. iv. Harmonic mean – a special type of weighted mean in which an observation’s weight
is inversely proportional to its magnitude. It is appropriate when averaging ratios
(amount per unit). It is used to compute cost averaging which involves the periodic
investment of a fixed amount of money.
Harmonic mean of a set of observations X1 , X2 , . . . , Xn, where each Xi > 0, is
computed as
n
XH n
1
i 1 Xi
Example 1.3: Suppose an investor purchases $5000 of a security each month for 3
months. The share prices are $15, $10, and $12 at the three purchase dates. What
is the average price paid for the security?
Solution:
3 3
XH = $ 12 - average price paid for the security
1 1 1 15
15 10 12 60
b. Median – the value of the middle item of a set of items that has been sorted into ascending or
descending order. In an odd numbered sample of n items, the median is in the (n+1)/2 position.
In an even-numbered sample, the median is the mean of the values in the n/2 and (n+2)/2
positions.
c. Mode – most frequently occurring value in a distribution or the value or values with the
highest frequency.
2. Quantiles – describe the location of data that involves identifying values at or below which
specified proportions of the data lie. It is used in portfolio performance evaluation as well as in
investment strategy development and research.
a. Percentiles – divide the n observations of the distribution into hundredths
b. Quartiles – divide the n observations of the distribution into quarters where the divisions are
(Q1 or P25), (P50 or Md) and (Q3 or P75)
c. Quintiles – divide the n observations of the distribution into fifths where the divisions are P20,
P40, P60, and P80
d. Deciles – divide the n observations of the distribution into tenths where the divisions are P10,
P20, P30, P40, P50, P60, P70, P80, and P90
Measures of Dispersion
1. Range (R) – the difference between the maximum and minimum values in set of observations
2. Mean Absolute Deviation (MAD) – average of the absolute deviations around the mean.
5. n
Xi X
i 1
MAD =
n
3. Variance – average of the squared deviations around the mean
a. Population variance
N
(Xi )2
2 i 1
N
where: Xi = ith observation
= population mean
b. Sample variance
n
(X i X )2
s2 i 1
n 1
where: Xi = ith observation
X = sample mean
4. Standard Deviation – positive square root of the variance
n
(Xi X )2
i 1
s
n 1
5. Semivariance – average squared deviation below the mean
n*
(Xi X )2
2 i 1
sb , Xi < X
n* 1
6. Semideviation or semistandard deviation – positive square root of semivariance
n*
(Xi X )2
i 1
sb , Xi < X
n* 1
Chebyshev’s Inequality
The proportion of the observations within k standard deviations of the arithmetic mean is
at least (1 – 1/ k2) for all k > 1.
The table below illustrates the proportion of the observations that must lie within a
certain number of standard deviations around the sample mean.
6. Proportions from Chebyshev’s Inequality
K Interval around the sample Proportion
Mean
1.25 X 1.25s 36%
1.50 X 1.50s 56%
2.00 X 2.00s 75%
2.50 X 2.50s 84%
3.00 X 3.00s 89%
4.00 X 4.00s 94%
Example 1.4: The arithmetic mean and the standard deviation of monthly returns of ABC
investments were .95% and 6.5%, respectively from 1950-2009 is 720 monthly observations.
a) Determine the interval that must contain at least 75% of monthly returns.
b) What are the minimum and maximum number of observations that must lie in the interval in
(a) ?
Solution:
a) At least 75% of the observations must lie within 2 standard deviations of the mean. Thus,
the interval that must contain at least 75% of the observations for the monthly return series, we
have .95% 2(6.5%) = .95 13% or – 12.05% to 13.95%.
b) For a sample size of 720, at least .75(720) = 540 observations must lie in the interval from -
12.05% to 13.95%.
Coefficient of Variation (CV) – the ratio of the standard deviation of a set of observations to
their mean value. It is a measure of relative dispersion, that is, the amount of dispersion relative
to a reference value or benchmark. When the observations are returns, the coefficient of
variation measures the amount of risk (standard deviation) per unit of mean return. Thus, in
finance analysis, the greater the CV of returns the more it is risky.
s
CV =
X
Example 1.5: The table summarizes the annual mean returns and standard deviations for
several major US asset classes from 1926-2002
Asset Class Arithmetic Mean Standard Deviation
Return (%) of Return (%)
S & P 500 12.3 21.9
US small stock 16.9 35.1
US long-term corporate 6.1 7.2
US long-term government 5.8 8.2
US 30-day T-bill 3.8 0.9
a) Determine the coefficient of variation for each asset class
b) Which asset class is most risky? least risky?
c) Determine whether there is more difference between the absolute risk (standard
deviation) or the relative risk (CV) of the S&P 500 and US small stocks.
7. Solution:
a) S & P 500 CV = 21.9/12.3 = 1.78
US small stock CV = 35.1/16.9 = 2.077
US long-term corporate CV = 7.2/6.1 = 1.18
US long-term government CV = 8.2/5.8 = 1.414
US 30-day T-bill CV = 0.9/3.8 = 0.237
b) US small stock is most risky while US 30-day T-bill is least risky.
c) The standard deviation of US small stock return is (35.1–21.9)/21.9=.603= 60.3% larger than
S&P 500 returns compared with their difference in the CV of (2.077–1.78)/1.78=0.167= 16.7%.
Sharpe Ratio or Reward-to-Variability Ratio
Sharpe ratio is widely used for investment performance measurement to measure excess return
per unit of risk. The Sharpe ratio for a portfolio is calculated as
Rp RF
Sh
sp
where: Rp = mean return to the portfolio, p
RF = mean return to a risk-free asset
sp = standard deviation of return on the portfolio
The numerator of the Sharpe ratio (called mean excess return on portfolio p) measures the
extra reward that investors receive for the added risk taken. Moreover, a portfolio’s Sharpe ratio
decreases if we increase risk, all else equal. Risk-averse investors who make decisions based
on mean return and standard deviation of return prefer portfolios with larger Sharpe ratios to
those with smaller Sharpe ratio.
Example 1.6: Using the given table in Example 1.5, consider the performance of the S&P 500
and US small stocks, using the mean of US T-bill return to represent the risk-free rate (least
risky), we find the Sharpe ratios as
S&P 500: Sh = (12.3 – 3.8)/21.9 = 0.39
US small stocks: Sh = (16.9 – 3.8)/35.1 = 0.37
US small stocks earned higher mean returns but performed slightly less well than the S&P 500.
Measures of Shape – measures the degree of symmetry in return distributions.
1. Normal Distribution – is symmetrical, bell-shaped distribution where the mean, median and
mode are equal and described by the parameters mean and variance. If a return distribution is
symmetric about the mean, equal loss and gain intervals exhibit the same frequencies, that is,
losses from -4% to -2% occur with about the same frequency as gains from 2% to 4%.
8. | | | | | |
-3 -2 -1 1 2 3
2. Skewed distribution – a distribution that is not symmetric about its mean. It is computed by
the following formula:
n
(Xi X )3
n i 1
SK
(n 1)(n 2) s3
where: n = the number of observations in the sample
s = the sample standard deviation
a. Positively skewed distribution
A return distribution with positive skew has frequent small losses and a few extreme
gains. The mode is less than the median which is less than the mean.
SK > 0
b. Negatively skewed distribution
A return distribution with negative skew has frequent small gains and a few extreme
losses. The mean is less than the median which is less than the mode.
SK < 0
3. Kurtosis – measures the peakedness of a distribution and provides information about the
probability of extreme outcomes. A return distribution differs from a normal distribution by having
more returns clustered closely around the mean and more returns with large deviations from the
mean (having fatter tails). Investors would perceive a greater chance of extremely large
deviations from the mean as increasing risk. Excess Kurtosis, KE, is calculated by the following
formula:
n
n(n 1) (Xi X )4
i 1 3(n 1) 2
KE , n < 100
(n 1)(n 2)(n 3) s 4 (n 2)(n 3)
9. n
(Xi X )4
1 i 1
KE 3, n 100
n s4
a. Leptokurtic (B) – a distribution that is more peaked than the normal distribution, K > 3 or
KE > 0.
b. Mesokurtic (A) – a distribution identical to a normal distribution, K = 3 or KE = 0
c. Platykurtic (C) – a distribution that is less peaked than the normal distribution, K < 3 or
KE<0.
c
10. Exercises 1.1 Name: __________________________
Year & Sec: _____________________
Score: ____________
State the type of scale used to measure the following sets of data:
1. sales
2. investment style of mutual funds
3. Analyst’s rating of a stock in a portfolio as underweight, market weight, or overweight
4. a measure of the risk of portfolios on a scale of 1 (very conservative) to 5 (very risky).
5. credit ratings for bond issues
6. cash dividends per share
7. hedge fund classification
8. bond maturity in years
11. Exercise 1.2 Name: __________________________
Year & Sec: _____________________
Score: ____________
The table below gives the deviations of a hypothetical portfolio’s annual total returns (gross of
fees) from its benchmark’s annual returns, for a 12-year period.
Portfolio’s Deviations from Benchmark Return
Year Deviation from
benchmark (%)
1992 -7.14
1993 1.62
1994 2.48
1995 -2.59
1996 9.37
1997 -0.55
1998 -0.89
1999 -9.19
2000 -5.11
2001 -0.49
2002 6.84
2003 3.04
a. Make a frequency distribution for the portfolio’s deviations from benchmark return using
k = 6.
b. Calculate the frequency, cumulative frequency, relative frequency and cumulative
frequency for the portfolio’s deviations from benchmark return.
c. Construct a histogram using the data.
d. Identify the modal interval of the grouped data.
12. Exercise 1.3 Name: __________________________
Year & Sec: _____________________
Score: ____________
The table below gives the deviations of a hypothetical portfolio’s annual total returns (gross of
fees) from its benchmark’s annual returns, for a 12-year period.
Portfolio’s Deviations from Benchmark Return
Year Deviation from
benchmark (%)
1992 -7.14
1993 1.62
1994 2.48
1995 -2.59
1996 9.37
1997 -0.55
1998 -0.89
1999 -9.19
2000 -5.11
2001 -0.49
2002 6.84
2003 3.04
a. Calculate the sample mean return.
b. Calculate the median return.
c. Calculate the geometric mean.
d. Calculate the P25, P40, P80.
e. Determine the range, MAD, variance, and standard deviation
f. Determine the semivariance and semideviation.
13. Exercise 1.4 Name: __________________________
Year & Sec: _____________________
Score: ____________
The table below gives the deviations of a hypothetical portfolio’s annual total returns (gross of
fees) from its benchmark’s annual returns, for a 12-year period.
Portfolio’s Deviations from Benchmark Return
Year Deviation from
benchmark (%)
1992 -7.14
1993 1.62
1994 2.48
1995 -2.59
1996 9.37
1997 -0.55
1998 -0.89
1999 -9.19
2000 -5.11
2001 -0.49
2002 6.84
2003 3.04
a. Calculate the skewness
b. Calculate the excess kurtosis.
14. Exercise 1.5 Name: __________________________
Year & Sec: _____________________
Score: ____________
An analyst has estimated the following parameters for the annual returns distributions for four
portfolios:
Portfolio Mean Return Variance of Returns Skewness Kurtosis
A 10% 625 1.8 0
B 14% 900 0.0 3
C 16% 1250 -0.85 5
D 19% 2000 1.4 2
The analyst has been asked to evaluate the portfolios’ risk and return characteristics. Assume
that a risk-free investment will earn 5%.
a. Which portfolio would be preferred based on the Sharpe performance measure?
b. Which portfolio would be the most preferred based on the coefficient of variation?
c. Which portfolio/s is/are symmetric?
d. Which portfolio/s has/have fatter tails than a normal distribution?
e. Which portfolio is the riskiest based on its skewness?
f. Which portfolio is the riskiest based on its kurtosis?
g. Which portfolio will likely be considered more risky when judged by its semivariance
rather than by its variance?