SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Data Exploration and Visualization with R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 39
Outline 
Introduction 
Have a Look at Data 
Explore Individual Variables 
Explore Multiple Variables 
More Explorations 
Save Charts to Files 
Further Readings and Online Resources 
2 / 39
Data Exploration and Visualization with R 1 
Data Exploration and Visualization 
I Summary and stats 
I Various charts like pie charts and histograms 
I Exploration of multiple variables 
I Level plot, contour plot and 3D plot 
I Saving charts into
les of various formats 
1Chapter 3: Data Exploration, in book R and Data Mining: Examples and 
Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 
3 / 39
Outline 
Introduction 
Have a Look at Data 
Explore Individual Variables 
Explore Multiple Variables 
More Explorations 
Save Charts to Files 
Further Readings and Online Resources 
4 / 39
Size and Structure of Data 
dim(iris) 
## [1] 150 5 
names(iris) 
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Wid... 
## [5] "Species" 
str(iris) 
## 'data.frame': 150 obs. of 5 variables: 
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... 
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1... 
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1... 
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0... 
## $ Species : Factor w/ 3 levels "setosa","versicolor",.... 
5 / 39
Attributes of Data 
attributes(iris) 
## $names 
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Wid... 
## [5] "Species" 
## 
## $row.names 
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ... 
## [16] 16 17 18 19 20 21 22 23 24 25 26 27 28 ... 
## [31] 31 32 33 34 35 36 37 38 39 40 41 42 43 ... 
## [46] 46 47 48 49 50 51 52 53 54 55 56 57 58 ... 
## [61] 61 62 63 64 65 66 67 68 69 70 71 72 73 ... 
## [76] 76 77 78 79 80 81 82 83 84 85 86 87 88 ... 
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 1... 
## [106] 106 107 108 109 110 111 112 113 114 115 116 117 118 1... 
## [121] 121 122 123 124 125 126 127 128 129 130 131 132 133 1... 
## [136] 136 137 138 139 140 141 142 143 144 145 146 147 148 1... 
## 
## $class 
## [1] "data.frame" 
6 / 39
First Rows of Data 
iris[1:3, ] 
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
## 1 5.1 3.5 1.4 0.2 setosa 
## 2 4.9 3.0 1.4 0.2 setosa 
## 3 4.7 3.2 1.3 0.2 setosa 
head(iris, 3) 
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
## 1 5.1 3.5 1.4 0.2 setosa 
## 2 4.9 3.0 1.4 0.2 setosa 
## 3 4.7 3.2 1.3 0.2 setosa 
tail(iris, 3) 
## Sepal.Length Sepal.Width Petal.Length Petal.Width Spe... 
## 148 6.5 3.0 5.2 2.0 virgi... 
## 149 6.2 3.4 5.4 2.3 virgi... 
## 150 5.9 3.0 5.1 1.8 virgi... 
7 / 39
A Single Column 
The
rst 10 values of Sepal.Length 
iris[1:10, "Sepal.Length"] 
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 
iris$Sepal.Length[1:10] 
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 
8 / 39
Outline 
Introduction 
Have a Look at Data 
Explore Individual Variables 
Explore Multiple Variables 
More Explorations 
Save Charts to Files 
Further Readings and Online Resources 
9 / 39
Summary of Data 
Function summary() 
I numeric variables: minimum, maximum, mean, median, and 
the
rst (25%) and third (75%) quartiles 
I categorical variables (factors): frequency of every level 
summary(iris) 
## Sepal.Length Sepal.Width Petal.Length Petal.Width 
## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1 
## 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3 
## Median :5.80 Median :3.00 Median :4.35 Median :1.3 
## Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2 
## 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8 
## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5 
## Species 
## setosa :50 
## versicolor:50 
## virginica :50 
## 
## 
## 
10 / 39
library(Hmisc) 
describe(iris[, c(1, 5)]) # check columns 1 & 5 
## iris[, c(1, 5)] 
## 
## 2 Variables 150 Observations 
## -----------------------------------------------------------... 
## Sepal.Length 
## n missing unique Info Mean .05 .10 ... 
## 150 0 35 1 5.843 4.600 4.800 5... 
## .50 .75 .90 .95 
## 5.800 6.400 6.900 7.255 
## 
## lowest : 4.3 4.4 4.5 4.6 4.7, highest: 7.3 7.4 7.6 7.7 7.9 
## -----------------------------------------------------------... 
## Species 
## n missing unique 
## 150 0 3 
## 
## setosa (50, 33%), versicolor (50, 33%) 
## virginica (50, 33%) 
## -----------------------------------------------------------... 
11 / 39
Mean, Median, Range and Quartiles 
I Mean, median and range: mean(), median(), range() 
I Quartiles and percentiles: quantile() 
range(iris$Sepal.Length) 
## [1] 4.3 7.9 
quantile(iris$Sepal.Length) 
## 0% 25% 50% 75% 100% 
## 4.3 5.1 5.8 6.4 7.9 
quantile(iris$Sepal.Length, c(0.1, 0.3, 0.65)) 
## 10% 30% 65% 
## 4.80 5.27 6.20 
12 / 39
Variance and Histogram 
var(iris$Sepal.Length) 
## [1] 0.6857 
hist(iris$Sepal.Length) 
Histogram of iris$Sepal.Length 
iris$Sepal.Length 
Frequency 
4 5 6 7 8 
0 5 10 15 20 25 30 
13 / 39
Density 
plot(density(iris$Sepal.Length)) 
4 5 6 7 8 
0.0 0.1 0.2 0.3 0.4 
density.default(x = iris$Sepal.Length) 
N = 150 Bandwidth = 0.2736 
Density 
14 / 39
Pie Chart 
Frequency of factors: table() 
table(iris$Species) 
## 
## setosa versicolor virginica 
## 50 50 50 
pie(table(iris$Species)) 
setosa 
versicolor 
virginica 15 / 39
Bar Chart 
barplot(table(iris$Species)) 
setosa versicolor virginica 
0 10 20 30 40 50 
16 / 39
Outline 
Introduction 
Have a Look at Data 
Explore Individual Variables 
Explore Multiple Variables 
More Explorations 
Save Charts to Files 
Further Readings and Online Resources 
17 / 39
Correlation 
Covariance and correlation: cov() and cor() 
cov(iris$Sepal.Length, iris$Petal.Length) 
## [1] 1.274 
cor(iris$Sepal.Length, iris$Petal.Length) 
## [1] 0.8718 
cov(iris[, 1:4]) 
## Sepal.Length Sepal.Width Petal.Length Petal.Width 
## Sepal.Length 0.68569 -0.04243 1.2743 0.5163 
## Sepal.Width -0.04243 0.18998 -0.3297 -0.1216 
## Petal.Length 1.27432 -0.32966 3.1163 1.2956 
## Petal.Width 0.51627 -0.12164 1.2956 0.5810 
# cor(iris[,1:4]) 
18 / 39
Aggreation 
Stats of Sepal.Length for every Species with aggregate() 
aggregate(Sepal.Length ~ Species, summary, data = iris) 
## Species Sepal.Length.Min. Sepal.Length.1st Qu. 
## 1 setosa 4.30 4.80 
## 2 versicolor 4.90 5.60 
## 3 virginica 4.90 6.22 
## Sepal.Length.Median Sepal.Length.Mean Sepal.Length.3rd Qu. 
## 1 5.00 5.01 5.20 
## 2 5.90 5.94 6.30 
## 3 6.50 6.59 6.90 
## Sepal.Length.Max. 
## 1 5.80 
## 2 7.00 
## 3 7.90 
19 / 39
Boxplot 
I The bar in the middle is median. 
I The box shows the interquartile range (IQR), i.e., range 
between the 75% and 25% observation. 
boxplot(Sepal.Length ~ Species, data = iris) 
setosa versicolor virginica 
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 
20 / 39
Scatter Plot 
with(iris, plot(Sepal.Length, Sepal.Width, col = Species, 
pch = as.numeric(Species))) 
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 
2.0 2.5 3.0 3.5 4.0 
Sepal.Length 
Sepal.Width 
21 / 39
Scatter Plot with Jitter 
Function jitter(): add a small amount of noise to the data 
plot(jitter(iris$Sepal.Length), jitter(iris$Sepal.Width)) 
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 
2.0 2.5 3.0 3.5 4.0 
jitter(iris$Sepal.Length) 
jitter(iris$Sepal.Width) 
22 / 39
A Matrix of Scatter Plots 
pairs(iris) 
Sepal.Length 
2.0 3.0 4.0 0.5 1.5 2.5 
4.5 5.5 6.5 7.5 
2.0 3.0 4.0 
Sepal.Width 
Petal.Length 
1 2 3 4 5 6 7 
0.5 1.5 2.5 
Petal.Width 
4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 1.0 2.0 3.0 
1.0 2.0 3.0 
Species 
23 / 39
Outline 
Introduction 
Have a Look at Data 
Explore Individual Variables 
Explore Multiple Variables 
More Explorations 
Save Charts to Files 
Further Readings and Online Resources 
24 / 39
3D Scatter plot 
library(scatterplot3d) 
scatterplot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width) 
0.0 0.5 1.0 1.5 2.0 2.5 
2.0 2.5 3.0 3.5 4.0 4.5 
4 
5 
6 
7 
8 
iris$Petal.Width 
iris$Sepal.Length 
iris$Sepal.Width 
25 / 39
Interactive 3D Scatter Plot 
Package rgl supports interactive 3D scatter plot with plot3d(). 
library(rgl) 
plot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width) 
26 / 39
Heat Map 
Calculate the similarity between dierent 
owers in the iris data 
with dist() and then plot it with a heat map 
dist.matrix - as.matrix(dist(iris[, 1:4])) 
heatmap(dist.matrix) 
2423 194 3439143426 376 438 1346 1455169 2312 2245 2447 1337 3479 1212 2407 2316 3305 1308451 1502 2408289 118 110169 112323 111318 110108 113306 110236 110414 112415 6919 5948 6805 8812 6833 6938 6700 5904 18057 5667 6722 8919 9967 19050 5726 5667 5559 6889 7958 7869 7924 16049 110357 112415 114426 111403 110348 111167 112499 111335 111325 111418 5738 5817 7847 114507 112344 112278 17319 17230 111224 110423 
2423 194 3439 143 246 376 348 1346 1455 169 2312 2245 2447 1337 3479 1212 2407 2316 3305 1308 451 1502 2408 289 118 110169 112323 111318 110108 113306 110236 110414 112415 6919 5948 6805 8812 6833 6938 6700 5904 18057 5667 6722 8919 9967 19050 5726 5667 5559 6889 7958 7869 7924 16049 110357 112415 114426 111403 110348 111167 112499 111335 111325 111418 5738 5817 7847 114507 112344 112278 17139 17230 111224 110423 
27 / 39
Level Plot 
Function rainbow() creates a vector of contiguous colors. 
library(lattice) 
levelplot(Petal.Width ~ Sepal.Length * Sepal.Width, iris, cuts = 9, 
col.regions = rainbow(10)[10:1]) 
Sepal.Length 
Sepal.Width 
4.0 
3.5 
3.0 
2.5 
2.0 
5 6 7 
2.5 
2.0 
1.5 
1.0 
0.5 
0.0 
28 / 39
Contour 
contour() and filled.contour() in package graphics 
contourplot() in package lattice 
filled.contour(volcano, color = terrain.colors, asp = 1, plot.axes = contour(volcano, 
add = T)) 
180 
160 
140 
120 
100 
100 
100 
100 
110 
110 
110 
110 
130 
120 
140 
150 
160 
170 
160 
170 
180 
180 
190 
29 / 39
3D Surface 
persp(volcano, theta = 25, phi = 30, expand = 0.5, col = lightblue) 
volcano 
Y 
Z 
30 / 39
Parallel Coordinates 
library(MASS) 
parcoord(iris[1:4], col = iris$Species) 
Sepal.Length Sepal.Width Petal.Length Petal.Width 
31 / 39
Parallel Coordinates with Package lattice 
library(lattice) 
parallelplot(~iris[1:4] | Species, data = iris) 
Petal.Width 
Petal.Length 
Sepal.Width 
Petal.Width 
Petal.Length 
Sepal.Width 
Sepal.Length 
setosa versicolor 
Min Max 
Sepal.Length 
virginica 
32 / 39
Visualization with Package ggplot2 
library(ggplot2) 
qplot(Sepal.Length, Sepal.Width, data = iris, facets = Species ~ .) 
4.5 
4.0 
3.5 
3.0 
2.5 
2.0 
4.5 
4.0 
3.5 
3.0 
2.5 
2.0 
4.5 
4.0 
3.5 
3.0 
2.5 
2.0 
setosa versicolor virginica 
5 6 7 8 
Sepal.Length 
Sepal.Width 
33 / 39
Outline 
Introduction 
Have a Look at Data 
Explore Individual Variables 
Explore Multiple Variables 
More Explorations 
Save Charts to Files 
Further Readings and Online Resources 
34 / 39
Save Charts to Files 
I Save charts to PDF and PS
les: pdf() and postscript() 
I BMP, JPEG, PNG and TIFF

Contenu connexe

Tendances

Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMSkoolkampus
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First SearchKevin Jadiya
 

Tendances (20)

Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data Mining
Data MiningData Mining
Data Mining
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Breadth First Search & Depth First Search
Breadth First Search & Depth First SearchBreadth First Search & Depth First Search
Breadth First Search & Depth First Search
 

En vedette

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with RYanchang Zhao
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Cheat sheets for data scientists
Cheat sheets for data scientistsCheat sheets for data scientists
Cheat sheets for data scientistsAjay Ohri
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画
脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画
脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画Takehisa Yamakita
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonDr. Volkan OBAN
 
Statistical Test
Statistical TestStatistical Test
Statistical Testguestdbf093
 
A+ cheat sheet
A+ cheat sheetA+ cheat sheet
A+ cheat sheetabnmi
 
Naive Bayes Example using R
Naive Bayes Example using  R Naive Bayes Example using  R
Naive Bayes Example using R Dr. Volkan OBAN
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Dr. Volkan OBAN
 

En vedette (20)

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Cheat sheets for data scientists
Cheat sheets for data scientistsCheat sheets for data scientists
Cheat sheets for data scientists
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画
脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画
脱rainbow():RColorBrewerとcolorRampPalette()で、地図描画
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
Follow up SPARK
Follow up SPARKFollow up SPARK
Follow up SPARK
 
Scikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-PythonScikit-learn Cheatsheet-Python
Scikit-learn Cheatsheet-Python
 
Statistical Test
Statistical TestStatistical Test
Statistical Test
 
A+ cheat sheet
A+ cheat sheetA+ cheat sheet
A+ cheat sheet
 
Linux cheat-sheet
Linux cheat-sheetLinux cheat-sheet
Linux cheat-sheet
 
Naive Bayes Example using R
Naive Bayes Example using  R Naive Bayes Example using  R
Naive Bayes Example using R
 
Python
PythonPython
Python
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
 
Advanced R cheat sheet
Advanced R cheat sheetAdvanced R cheat sheet
Advanced R cheat sheet
 

Similaire à Data Exploration and Visualization with R

RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdfzehiwot hone
 
Data manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsyData manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsySmartHinJ
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with PipesRsquared Academy
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.Dr. Volkan OBAN
 
関東第3回ゼロはじめるからR言語勉強会ー グラフ
関東第3回ゼロはじめるからR言語勉強会ー グラフ関東第3回ゼロはじめるからR言語勉強会ー グラフ
関東第3回ゼロはじめるからR言語勉強会ー グラフPaweł Rusin
 
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPlotly
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
R interview questions
R interview questionsR interview questions
R interview questionsAjay Tech
 
Cloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forestsCloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forestsDataconomy Media
 
Data Science for Folks Without (or With!) a Ph.D.
Data Science for Folks Without (or With!) a Ph.D.Data Science for Folks Without (or With!) a Ph.D.
Data Science for Folks Without (or With!) a Ph.D.Douglas Starnes
 
Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)Scott Mansfield
 
Chapter3_Visualizations2.pdf
Chapter3_Visualizations2.pdfChapter3_Visualizations2.pdf
Chapter3_Visualizations2.pdfMekiyaShigute1
 
한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남Eunjeong (Lucy) Park
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RStacy Irwin
 
A Simple Introduction to R for Market Researchers
A Simple Introduction to R for Market ResearchersA Simple Introduction to R for Market Researchers
A Simple Introduction to R for Market ResearchersRay Poynter
 
Deep Learning for JavaScript Folks Without (or With!) a Ph.D.
Deep Learning for JavaScript Folks Without (or With!) a Ph.D.Deep Learning for JavaScript Folks Without (or With!) a Ph.D.
Deep Learning for JavaScript Folks Without (or With!) a Ph.D.Douglas Starnes
 

Similaire à Data Exploration and Visualization with R (20)

RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
R part iii
R part iiiR part iii
R part iii
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdf
 
Graphics in R
Graphics in RGraphics in R
Graphics in R
 
Data manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsyData manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsy
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with Pipes
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
 
関東第3回ゼロはじめるからR言語勉強会ー グラフ
関東第3回ゼロはじめるからR言語勉強会ー グラフ関東第3回ゼロはじめるからR言語勉強会ー グラフ
関東第3回ゼロはじめるからR言語勉強会ー グラフ
 
Introduction to tibbles
Introduction to tibblesIntroduction to tibbles
Introduction to tibbles
 
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of WranglingPLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
PLOTCON NYC: Behind Every Great Plot There's a Great Deal of Wrangling
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
R interview questions
R interview questionsR interview questions
R interview questions
 
Cloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forestsCloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forests
 
Data Science for Folks Without (or With!) a Ph.D.
Data Science for Folks Without (or With!) a Ph.D.Data Science for Folks Without (or With!) a Ph.D.
Data Science for Folks Without (or With!) a Ph.D.
 
Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)Creating a Custom Serialization Format (Gophercon 2017)
Creating a Custom Serialization Format (Gophercon 2017)
 
Chapter3_Visualizations2.pdf
Chapter3_Visualizations2.pdfChapter3_Visualizations2.pdf
Chapter3_Visualizations2.pdf
 
한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남한국어와 NLTK, Gensim의 만남
한국어와 NLTK, Gensim의 만남
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
A Simple Introduction to R for Market Researchers
A Simple Introduction to R for Market ResearchersA Simple Introduction to R for Market Researchers
A Simple Introduction to R for Market Researchers
 
Deep Learning for JavaScript Folks Without (or With!) a Ph.D.
Deep Learning for JavaScript Folks Without (or With!) a Ph.D.Deep Learning for JavaScript Folks Without (or With!) a Ph.D.
Deep Learning for JavaScript Folks Without (or With!) a Ph.D.
 

Plus de Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 

Plus de Yanchang Zhao (8)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 

Dernier

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Data Exploration and Visualization with R

  • 1. Data Exploration and Visualization with R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 39
  • 2. Outline Introduction Have a Look at Data Explore Individual Variables Explore Multiple Variables More Explorations Save Charts to Files Further Readings and Online Resources 2 / 39
  • 3. Data Exploration and Visualization with R 1 Data Exploration and Visualization I Summary and stats I Various charts like pie charts and histograms I Exploration of multiple variables I Level plot, contour plot and 3D plot I Saving charts into
  • 4. les of various formats 1Chapter 3: Data Exploration, in book R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 3 / 39
  • 5. Outline Introduction Have a Look at Data Explore Individual Variables Explore Multiple Variables More Explorations Save Charts to Files Further Readings and Online Resources 4 / 39
  • 6. Size and Structure of Data dim(iris) ## [1] 150 5 names(iris) ## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Wid... ## [5] "Species" str(iris) ## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0... ## $ Species : Factor w/ 3 levels "setosa","versicolor",.... 5 / 39
  • 7. Attributes of Data attributes(iris) ## $names ## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Wid... ## [5] "Species" ## ## $row.names ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ... ## [16] 16 17 18 19 20 21 22 23 24 25 26 27 28 ... ## [31] 31 32 33 34 35 36 37 38 39 40 41 42 43 ... ## [46] 46 47 48 49 50 51 52 53 54 55 56 57 58 ... ## [61] 61 62 63 64 65 66 67 68 69 70 71 72 73 ... ## [76] 76 77 78 79 80 81 82 83 84 85 86 87 88 ... ## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 1... ## [106] 106 107 108 109 110 111 112 113 114 115 116 117 118 1... ## [121] 121 122 123 124 125 126 127 128 129 130 131 132 133 1... ## [136] 136 137 138 139 140 141 142 143 144 145 146 147 148 1... ## ## $class ## [1] "data.frame" 6 / 39
  • 8. First Rows of Data iris[1:3, ] ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa head(iris, 3) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa tail(iris, 3) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Spe... ## 148 6.5 3.0 5.2 2.0 virgi... ## 149 6.2 3.4 5.4 2.3 virgi... ## 150 5.9 3.0 5.1 1.8 virgi... 7 / 39
  • 10. rst 10 values of Sepal.Length iris[1:10, "Sepal.Length"] ## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 iris$Sepal.Length[1:10] ## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 8 / 39
  • 11. Outline Introduction Have a Look at Data Explore Individual Variables Explore Multiple Variables More Explorations Save Charts to Files Further Readings and Online Resources 9 / 39
  • 12. Summary of Data Function summary() I numeric variables: minimum, maximum, mean, median, and the
  • 13. rst (25%) and third (75%) quartiles I categorical variables (factors): frequency of every level summary(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1 ## 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3 ## Median :5.80 Median :3.00 Median :4.35 Median :1.3 ## Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2 ## 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8 ## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5 ## Species ## setosa :50 ## versicolor:50 ## virginica :50 ## ## ## 10 / 39
  • 14. library(Hmisc) describe(iris[, c(1, 5)]) # check columns 1 & 5 ## iris[, c(1, 5)] ## ## 2 Variables 150 Observations ## -----------------------------------------------------------... ## Sepal.Length ## n missing unique Info Mean .05 .10 ... ## 150 0 35 1 5.843 4.600 4.800 5... ## .50 .75 .90 .95 ## 5.800 6.400 6.900 7.255 ## ## lowest : 4.3 4.4 4.5 4.6 4.7, highest: 7.3 7.4 7.6 7.7 7.9 ## -----------------------------------------------------------... ## Species ## n missing unique ## 150 0 3 ## ## setosa (50, 33%), versicolor (50, 33%) ## virginica (50, 33%) ## -----------------------------------------------------------... 11 / 39
  • 15. Mean, Median, Range and Quartiles I Mean, median and range: mean(), median(), range() I Quartiles and percentiles: quantile() range(iris$Sepal.Length) ## [1] 4.3 7.9 quantile(iris$Sepal.Length) ## 0% 25% 50% 75% 100% ## 4.3 5.1 5.8 6.4 7.9 quantile(iris$Sepal.Length, c(0.1, 0.3, 0.65)) ## 10% 30% 65% ## 4.80 5.27 6.20 12 / 39
  • 16. Variance and Histogram var(iris$Sepal.Length) ## [1] 0.6857 hist(iris$Sepal.Length) Histogram of iris$Sepal.Length iris$Sepal.Length Frequency 4 5 6 7 8 0 5 10 15 20 25 30 13 / 39
  • 17. Density plot(density(iris$Sepal.Length)) 4 5 6 7 8 0.0 0.1 0.2 0.3 0.4 density.default(x = iris$Sepal.Length) N = 150 Bandwidth = 0.2736 Density 14 / 39
  • 18. Pie Chart Frequency of factors: table() table(iris$Species) ## ## setosa versicolor virginica ## 50 50 50 pie(table(iris$Species)) setosa versicolor virginica 15 / 39
  • 19. Bar Chart barplot(table(iris$Species)) setosa versicolor virginica 0 10 20 30 40 50 16 / 39
  • 20. Outline Introduction Have a Look at Data Explore Individual Variables Explore Multiple Variables More Explorations Save Charts to Files Further Readings and Online Resources 17 / 39
  • 21. Correlation Covariance and correlation: cov() and cor() cov(iris$Sepal.Length, iris$Petal.Length) ## [1] 1.274 cor(iris$Sepal.Length, iris$Petal.Length) ## [1] 0.8718 cov(iris[, 1:4]) ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Sepal.Length 0.68569 -0.04243 1.2743 0.5163 ## Sepal.Width -0.04243 0.18998 -0.3297 -0.1216 ## Petal.Length 1.27432 -0.32966 3.1163 1.2956 ## Petal.Width 0.51627 -0.12164 1.2956 0.5810 # cor(iris[,1:4]) 18 / 39
  • 22. Aggreation Stats of Sepal.Length for every Species with aggregate() aggregate(Sepal.Length ~ Species, summary, data = iris) ## Species Sepal.Length.Min. Sepal.Length.1st Qu. ## 1 setosa 4.30 4.80 ## 2 versicolor 4.90 5.60 ## 3 virginica 4.90 6.22 ## Sepal.Length.Median Sepal.Length.Mean Sepal.Length.3rd Qu. ## 1 5.00 5.01 5.20 ## 2 5.90 5.94 6.30 ## 3 6.50 6.59 6.90 ## Sepal.Length.Max. ## 1 5.80 ## 2 7.00 ## 3 7.90 19 / 39
  • 23. Boxplot I The bar in the middle is median. I The box shows the interquartile range (IQR), i.e., range between the 75% and 25% observation. boxplot(Sepal.Length ~ Species, data = iris) setosa versicolor virginica 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 20 / 39
  • 24. Scatter Plot with(iris, plot(Sepal.Length, Sepal.Width, col = Species, pch = as.numeric(Species))) 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 2.0 2.5 3.0 3.5 4.0 Sepal.Length Sepal.Width 21 / 39
  • 25. Scatter Plot with Jitter Function jitter(): add a small amount of noise to the data plot(jitter(iris$Sepal.Length), jitter(iris$Sepal.Width)) 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 2.0 2.5 3.0 3.5 4.0 jitter(iris$Sepal.Length) jitter(iris$Sepal.Width) 22 / 39
  • 26. A Matrix of Scatter Plots pairs(iris) Sepal.Length 2.0 3.0 4.0 0.5 1.5 2.5 4.5 5.5 6.5 7.5 2.0 3.0 4.0 Sepal.Width Petal.Length 1 2 3 4 5 6 7 0.5 1.5 2.5 Petal.Width 4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 1.0 2.0 3.0 1.0 2.0 3.0 Species 23 / 39
  • 27. Outline Introduction Have a Look at Data Explore Individual Variables Explore Multiple Variables More Explorations Save Charts to Files Further Readings and Online Resources 24 / 39
  • 28. 3D Scatter plot library(scatterplot3d) scatterplot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width) 0.0 0.5 1.0 1.5 2.0 2.5 2.0 2.5 3.0 3.5 4.0 4.5 4 5 6 7 8 iris$Petal.Width iris$Sepal.Length iris$Sepal.Width 25 / 39
  • 29. Interactive 3D Scatter Plot Package rgl supports interactive 3D scatter plot with plot3d(). library(rgl) plot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width) 26 / 39
  • 30. Heat Map Calculate the similarity between dierent owers in the iris data with dist() and then plot it with a heat map dist.matrix - as.matrix(dist(iris[, 1:4])) heatmap(dist.matrix) 2423 194 3439143426 376 438 1346 1455169 2312 2245 2447 1337 3479 1212 2407 2316 3305 1308451 1502 2408289 118 110169 112323 111318 110108 113306 110236 110414 112415 6919 5948 6805 8812 6833 6938 6700 5904 18057 5667 6722 8919 9967 19050 5726 5667 5559 6889 7958 7869 7924 16049 110357 112415 114426 111403 110348 111167 112499 111335 111325 111418 5738 5817 7847 114507 112344 112278 17319 17230 111224 110423 2423 194 3439 143 246 376 348 1346 1455 169 2312 2245 2447 1337 3479 1212 2407 2316 3305 1308 451 1502 2408 289 118 110169 112323 111318 110108 113306 110236 110414 112415 6919 5948 6805 8812 6833 6938 6700 5904 18057 5667 6722 8919 9967 19050 5726 5667 5559 6889 7958 7869 7924 16049 110357 112415 114426 111403 110348 111167 112499 111335 111325 111418 5738 5817 7847 114507 112344 112278 17139 17230 111224 110423 27 / 39
  • 31. Level Plot Function rainbow() creates a vector of contiguous colors. library(lattice) levelplot(Petal.Width ~ Sepal.Length * Sepal.Width, iris, cuts = 9, col.regions = rainbow(10)[10:1]) Sepal.Length Sepal.Width 4.0 3.5 3.0 2.5 2.0 5 6 7 2.5 2.0 1.5 1.0 0.5 0.0 28 / 39
  • 32. Contour contour() and filled.contour() in package graphics contourplot() in package lattice filled.contour(volcano, color = terrain.colors, asp = 1, plot.axes = contour(volcano, add = T)) 180 160 140 120 100 100 100 100 110 110 110 110 130 120 140 150 160 170 160 170 180 180 190 29 / 39
  • 33. 3D Surface persp(volcano, theta = 25, phi = 30, expand = 0.5, col = lightblue) volcano Y Z 30 / 39
  • 34. Parallel Coordinates library(MASS) parcoord(iris[1:4], col = iris$Species) Sepal.Length Sepal.Width Petal.Length Petal.Width 31 / 39
  • 35. Parallel Coordinates with Package lattice library(lattice) parallelplot(~iris[1:4] | Species, data = iris) Petal.Width Petal.Length Sepal.Width Petal.Width Petal.Length Sepal.Width Sepal.Length setosa versicolor Min Max Sepal.Length virginica 32 / 39
  • 36. Visualization with Package ggplot2 library(ggplot2) qplot(Sepal.Length, Sepal.Width, data = iris, facets = Species ~ .) 4.5 4.0 3.5 3.0 2.5 2.0 4.5 4.0 3.5 3.0 2.5 2.0 4.5 4.0 3.5 3.0 2.5 2.0 setosa versicolor virginica 5 6 7 8 Sepal.Length Sepal.Width 33 / 39
  • 37. Outline Introduction Have a Look at Data Explore Individual Variables Explore Multiple Variables More Explorations Save Charts to Files Further Readings and Online Resources 34 / 39
  • 38. Save Charts to Files I Save charts to PDF and PS
  • 39. les: pdf() and postscript() I BMP, JPEG, PNG and TIFF
  • 40. les: bmp(), jpeg(), png() and tiff() I Close
  • 41. les (or graphics devices) with graphics.off() or dev.off() after plotting # save as a PDF file pdf(myPlot.pdf) x - 1:50 plot(x, log(x)) graphics.off() # Save as a postscript file postscript(myPlot2.ps) x - -20:20 plot(x, x^2) graphics.off() 35 / 39
  • 42. Outline Introduction Have a Look at Data Explore Individual Variables Explore Multiple Variables More Explorations Save Charts to Files Further Readings and Online Resources 36 / 39
  • 43. Further Readings I Examples of ggplot2 plotting: http://had.co.nz/ggplot2/ I Package iplots: interactive scatter plot, histogram, bar plot, and parallel coordinates plot (iplots) http://stats.math.uni-augsburg.de/iplots/ I Package googleVis: interactive charts with the Google Visualisation API http://cran.r-project.org/web/packages/googleVis/vignettes/ googleVis_examples.html I Package ggvis: interactive grammar of graphics http://ggvis.rstudio.com/ I Package rCharts: interactive javascript visualizations from R http://rcharts.io/ 37 / 39
  • 44. Online Resources I Chapter 3: Data Exploration, in book R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining.pdf I R Reference Card for Data Mining http://www.rdatamining.com/docs/R-refcard-data-mining.pdf I Free online courses and documents http://www.rdatamining.com/resources/ I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining 38 / 39
  • 45. The End Thanks! Email: yanchang(at)rdatamining.com 39 / 39