This document discusses descriptive statistics and analysis. It provides definitions of key terms like data, variable, statistic, and parameter. It also describes common measures of central tendency like mean, median and mode. Additionally, it covers measures of variability such as range, variance and standard deviation. Various graphical and numerical methods for summarizing and presenting sample data are presented, including tables, charts and distributions.
2. • Descriptive statistics used in censuses taken by the
Babylonians and Egyptians between 4500 and 3000
B.C.
• In addition, the Roman Emperor Augustus
(27 B.C.—A.D. 17) conducted surveys on births and
deaths of the citizens of the empire, as well as the
number of livestock each owned and the crops each
citizen harvested yearly.
3. Data: the information that has been collected from an
experiment, a survey, a historical record, etc.
A variable is a characteristic or attribute that can
assume different values.
A statistic is a characteristic or measure obtained by
using the data values from a sample.
A parameter is a characteristic or measure obtained
by using all the data values from a
specific population
4. consists of the
Collection
Organization
Summarization
Presentation of data.
5. Summarize,describe and characterize the sample being
studied
Determine if the sample is normally distributed (bell
curve) most statistical tests require the sample to have
normal distribution
Determine if the sample can be compared to the larger
population
Are displayed as tables, charts, percentages, frequency,
distributions and reported as measures of central tendency
6. Central tendancy- the sample mean, mode, median
Measures of Position
Measures of variability- range,varience and
standard deviation
Exploratory Data Analysis
7. The Mean
The Mode
The Median
The Midrange
8. The mean is the sum of the values, divided by the
total number of values.
Sample Mean
The symbol represents the sample mean.
9. =Sum of all data value
= number of data in sample
=number of data items in population
The mean is sensitive to extreme scores
(outliers) in the sample
For a population, the Greek letter (mu) is
used for the mean.
Population Mean
10.
11.
12. The median is the midpoint of the data array. The
symbol for the median is MD
the middle value or 50th procentile (the value of the
observation, that divides the sorted data in almost
equal parts).
• The median is not sensitive to extreme scores
2
1n
Mode
Median
Mean
13. • When n odd: median is the middle observation
• When n even: median is the average of values of two
middle observations
14. The value that occurs most often in data set is called the mode.
The mode=10
15. The midrange is defined as the sum of the lowest
and highest values in the data set,
divided by 2. The symbol MR is used for the
midrange.
19. The range is the highest value minus the lowest value.
The symbol R is used for the range.
20. The variance is the average of the squares of the distance
each value is from the mean. The symbol for the
population variance is
The standard deviation is the square root of the variance.
The symbol for the
21. The formula for the sample variance, denoted by ,
is
The standard deviation of a sample (denoted by s)
is
22. Applications of the Variance and Standard
Deviation
1. To determine the spread of the data
2. To determine the consistency of a variable
ex: in the manufacture of fittings, such as nuts and
bolts, the variation in the diameters must be small,
or the parts will not fit together.
3. To determine the number of data values thatfall
within a specified interval in a distribution
23. 68% of the population in a normal distribution is within
1 standard deviation of the mean
24. The coefficient of variation, denoted by CVar, is the
standard deviation divided by the
mean. The result is expressed as a percentage.
25.
26. A z score or standard score
Percentiles
Quartiles and Deciles
Outliers
27. A z score or standard score for a value is obtained by
subtracting the mean from the
value and dividing the result by the standard deviation.
The symbol for a standard score
is z. The formula is
28. The z score represents the number of standard
deviations that a data value falls above or below the
mean.
30. Quartiles divide the distribution into four groups,
separated by Q1, Q2, Q3
31. An outlier is an extremely high or an extremely low
data value when compared with the rest of the data
values
An outlier can strongly affect the mean and standard
deviation of a variable
Descriptive statistics consists of the collection, organization, summarization, and
presentation of data.
A data set that has only one value that occurs with the greatest frequency is said to
be unimodal.
If a data set has two values that occur with the same greatest frequency, both values
are considered to be the mode and the data set is said to be bimodal. If a data set has more
than two values that occur with the same greatest frequency, each value is used as the
mode, and the data set is said to be multimodal. When no data value occurs more than
once, the data set is said to have no mode. A data set can have more than one mode or no
mode at all. These situations will be shown in some of the examples that follow.