SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
dplyr
dplyr 
dplyr 
dplyr dplyr 
dplyr dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr dplyr 
dplyr 
dplyr 
dplyr dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr #rcatladies 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr dplyr 
dplyr 
dplyr 
dplyr 
dplyr 
dplyr dplyr
verb 
function that takes a 
data frame as its first 
argument
Examples of R verbs 
head, tail, … 
verb subject … 
> head( iris, n = 4 ) 
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
1 5.1 3.5 1.4 0.2 setosa 
2 4.9 3.0 1.4 0.2 setosa 
3 4.7 3.2 1.3 0.2 setosa 
4 4.6 3.1 1.5 0.2 setosa
%>% 
from magrittr
Classic R code 
mean( rnorm( 100, mean = 4, sd = 4), trim = .1 ) 
Pipeline R code with %>% 
100 %>% 
rnorm( mean = 4, sd = 4) %>% 
mean( trim = .1 )
nycflights13: Data about flights departing NYC in 2013
> library("nycflights13") 
> flights 
Source: local data frame [336,776 x 16] 
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight 
1 2013 1 1 517 2 830 11 UA N14228 1545 
2 2013 1 1 533 4 850 20 UA N24211 1714 
3 2013 1 1 542 2 923 33 AA N619AA 1141 
4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 
5 2013 1 1 554 -6 812 -25 DL N668DN 461 
6 2013 1 1 554 -4 740 12 UA N39463 1696 
7 2013 1 1 555 -5 913 19 B6 N516JB 507 
8 2013 1 1 557 -3 709 -14 EV N829AS 5708 
9 2013 1 1 557 -3 838 -8 B6 N593JB 79 
10 2013 1 1 558 -2 753 8 AA N3ALAA 301 
.. ... ... ... ... ... ... ... ... ... ... 
Variables not shown: origin (chr), dest (chr), air_time (dbl), distance (dbl), 
hour (dbl), minute (dbl)
tbl_df 
A data frame that does print all of itself by default 
> data <- tbl_df(mtcars) 
> data 
Source: local data frame [32 x 11] 
mpg cyl disp hp drat wt qsec vs am gear carb 
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 
.. ... ... ... ... ... ... ... .. .. ... ...
filter 
A subset of the rows of the data frame
filter(flights, month == 1, day == 1) 
flights %>% 
filter( dep_delay < 10 ) 
flights %>% 
filter( arr_delay < dep_delay ) 
flights %>% 
filter( hour < 12, arr_delay <= 0 )
arrange 
reorder a data frame
flights %>% 
filter( hour < 8 ) %>% 
arrange( year, month, day ) 
flights %>% 
arrange( desc(dep_delay) )
select 
select certain columns from the data frame
# Select columns by name 
select(flights, year, month, day) 
# Select all columns between year and day 
select(flights, year:day) 
# Select all columns except those from year to 
# day (inclusive) 
select(flights, -(year:day))
mutate 
modify or create columns based on others
d <- flights %>% 
mutate( 
gain = arr_delay - dep_delay, 
speed = distance / air_time * 60 
) %>% 
filter( gain > 0 ) %>% 
arrange( desc(speed) ) 
d %>% 
select( year, month, day, dest, gain, speed )
summarise 
collapse a data frame into one row …
summarise(flights, 
delay = mean(dep_delay, na.rm = TRUE)) 
flights %>% 
filter( dep_delay > 0 ) %>% 
summarise(arr_delay = mean(arr_delay, na.rm = TRUE))
group_by 
Group observations by one or more variables
flights %>% 
group_by( tailnum ) %>% 
summarise( 
count = n(), 
dist = mean(distance, na.rm = TRUE), 
delay = mean(arr_delay, na.rm = TRUE) 
) %>% 
filter( is.finite(delay) ) %>% 
arrange( desc(count) )
flights %>% 
group_by(dest) %>% 
summarise( 
planes = n_distinct(tailnum), 
flights = n() 
) %>% 
arrange( desc(flights) )
joins 
joining two data frames
inner_join 
all rows from x where there are matching 
values in y, and all columns from x and y. If there are multiple matches 
between x and y, all combination of the matches are returned. 
destinations <- flights %>% 
group_by(dest) %>% 
summarise( 
planes = n_distinct(tailnum), 
flights = n() 
) %>% 
arrange( desc(flights) ) %>% 
rename( faa = dest ) 
inner_join( destinations, airports, by = "faa")
inner_join 
all rows from x where there are matching 
values in y, and all columns from x and y. If there are multiple matches 
between x and y, all combination of the matches are returned. 
destinations <- flights %>% 
group_by(dest) %>% 
summarise( 
planes = n_distinct(tailnum), 
flights = n() 
) %>% 
arrange( desc(flights) ) 
inner_join( destinations, airports, 
by = c( "dest" = "faa" ) )
other joins 
See ?join 
• left_join, right_join 
• inner_join, outer_join 
• semi_join 
• anti_join
dplyr %>% summary 
• Simple verbs: filter, mutate, select, summarise, 
arrange 
• Grouping with group_by 
• Joins with *_join 
• Convenient with %>% 
• F✈️ST
dplyr 
Romain François 
@romain_francois 
romain@r-enthusiasts.com

Contenu connexe

Tendances (20)

Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 
Unit03 dbms
Unit03 dbmsUnit03 dbms
Unit03 dbms
 
SQL Views
SQL ViewsSQL Views
SQL Views
 
Relational model
Relational modelRelational model
Relational model
 
introdution to SQL and SQL functions
introdution to SQL and SQL functionsintrodution to SQL and SQL functions
introdution to SQL and SQL functions
 
Data tidying with tidyr meetup
Data tidying with tidyr  meetupData tidying with tidyr  meetup
Data tidying with tidyr meetup
 
MySQL Operators
MySQL OperatorsMySQL Operators
MySQL Operators
 
sql function(ppt)
sql function(ppt)sql function(ppt)
sql function(ppt)
 
Elmasri Navathe DBMS Unit-1 ppt
Elmasri Navathe DBMS Unit-1 pptElmasri Navathe DBMS Unit-1 ppt
Elmasri Navathe DBMS Unit-1 ppt
 
MySQL Data types
MySQL Data typesMySQL Data types
MySQL Data types
 
Unit1 DBMS Introduction
Unit1 DBMS IntroductionUnit1 DBMS Introduction
Unit1 DBMS Introduction
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
TID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To DatabaseTID Chapter 10 Introduction To Database
TID Chapter 10 Introduction To Database
 
Database
DatabaseDatabase
Database
 
2 database system concepts and architecture
2 database system concepts and architecture2 database system concepts and architecture
2 database system concepts and architecture
 
R decision tree
R   decision treeR   decision tree
R decision tree
 
Python : Functions
Python : FunctionsPython : Functions
Python : Functions
 
Pandas
PandasPandas
Pandas
 
Database
DatabaseDatabase
Database
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 

En vedette

Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016Penn State University
 
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016Penn State University
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Baseball Database Queries with SQL and dplyr
Baseball Database Queries with SQL and dplyrBaseball Database Queries with SQL and dplyr
Baseball Database Queries with SQL and dplyrayman diab
 
Fantasy Football Draft Optimization in R - HRUG
Fantasy Football Draft Optimization in R - HRUGFantasy Football Draft Optimization in R - HRUG
Fantasy Football Draft Optimization in R - HRUGegoodwintx
 
Open Data Science Conference 2015
Open Data Science Conference 2015Open Data Science Conference 2015
Open Data Science Conference 2015CrowdFlower
 
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016Penn State University
 
20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料安隆 沖
 
Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Vladimir Gutierrez, PhD
 
Paquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RPaquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RNestor Montaño
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8Muhammad Nabi Ahmad
 
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Fan Li
 

En vedette (20)

Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
 
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
 
Tokyor36
Tokyor36Tokyor36
Tokyor36
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Baseball Database Queries with SQL and dplyr
Baseball Database Queries with SQL and dplyrBaseball Database Queries with SQL and dplyr
Baseball Database Queries with SQL and dplyr
 
Fantasy Football Draft Optimization in R - HRUG
Fantasy Football Draft Optimization in R - HRUGFantasy Football Draft Optimization in R - HRUG
Fantasy Football Draft Optimization in R - HRUG
 
dplyr
dplyrdplyr
dplyr
 
Building powerful dashboards with r shiny
Building powerful dashboards with r shinyBuilding powerful dashboards with r shiny
Building powerful dashboards with r shiny
 
Open Data Science Conference 2015
Open Data Science Conference 2015Open Data Science Conference 2015
Open Data Science Conference 2015
 
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
 
20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料
 
Rlecturenotes
RlecturenotesRlecturenotes
Rlecturenotes
 
Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)
 
Paquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RPaquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en R
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
 
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)
 

Similaire à Data manipulation with dplyr

SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrRomain Francois
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisUniversity of Illinois,Chicago
 
Cloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forestsCloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forestsDataconomy Media
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
MH prediction modeling and validation in r (1) regression 190709
MH prediction modeling and validation in r (1) regression 190709MH prediction modeling and validation in r (1) regression 190709
MH prediction modeling and validation in r (1) regression 190709Min-hyung Kim
 
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RRsquared Academy
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Mingxuan Li
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwnARUN DN
 
R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RDr. Volkan OBAN
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
R Programming: Numeric Functions In R
R Programming: Numeric Functions In RR Programming: Numeric Functions In R
R Programming: Numeric Functions In RRsquared Academy
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions Dr. Volkan OBAN
 
Intro to tsql unit 10
Intro to tsql   unit 10Intro to tsql   unit 10
Intro to tsql unit 10Syed Asrarali
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Data Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetData Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetDr. Volkan OBAN
 

Similaire à Data manipulation with dplyr (20)

dplyr
dplyrdplyr
dplyr
 
SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Cloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forestsCloudera - A Taste of random decision forests
Cloudera - A Taste of random decision forests
 
R programming language
R programming languageR programming language
R programming language
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
MH prediction modeling and validation in r (1) regression 190709
MH prediction modeling and validation in r (1) regression 190709MH prediction modeling and validation in r (1) regression 190709
MH prediction modeling and validation in r (1) regression 190709
 
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In R
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with R
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
R Programming: Numeric Functions In R
R Programming: Numeric Functions In RR Programming: Numeric Functions In R
R Programming: Numeric Functions In R
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
 
Intro to tsql unit 10
Intro to tsql   unit 10Intro to tsql   unit 10
Intro to tsql unit 10
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetData Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat Sheet
 

Plus de Romain Francois (19)

R/C++
R/C++R/C++
R/C++
 
dplyr and torrents from cpasbien
dplyr and torrents from cpasbiendplyr and torrents from cpasbien
dplyr and torrents from cpasbien
 
dplyr use case
dplyr use casedplyr use case
dplyr use case
 
user2015 keynote talk
user2015 keynote talkuser2015 keynote talk
user2015 keynote talk
 
R/C++ talk at earl 2014
R/C++ talk at earl 2014R/C++ talk at earl 2014
R/C++ talk at earl 2014
 
Rcpp11 genentech
Rcpp11 genentechRcpp11 genentech
Rcpp11 genentech
 
Rcpp11 useR2014
Rcpp11 useR2014Rcpp11 useR2014
Rcpp11 useR2014
 
Rcpp11
Rcpp11Rcpp11
Rcpp11
 
R and C++
R and C++R and C++
R and C++
 
R and cpp
R and cppR and cpp
R and cpp
 
Rcpp attributes
Rcpp attributesRcpp attributes
Rcpp attributes
 
Rcpp is-ready
Rcpp is-readyRcpp is-ready
Rcpp is-ready
 
Rcpp
RcppRcpp
Rcpp
 
Integrating R with C++: Rcpp, RInside and RProtoBuf
Integrating R with C++: Rcpp, RInside and RProtoBufIntegrating R with C++: Rcpp, RInside and RProtoBuf
Integrating R with C++: Rcpp, RInside and RProtoBuf
 
Object Oriented Design(s) in R
Object Oriented Design(s) in RObject Oriented Design(s) in R
Object Oriented Design(s) in R
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
RProtoBuf: protocol buffers for R
RProtoBuf: protocol buffers for RRProtoBuf: protocol buffers for R
RProtoBuf: protocol buffers for R
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 

Dernier

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Dernier (20)

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

Data manipulation with dplyr

  • 2.
  • 3. dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr #rcatladies dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr dplyr
  • 4. verb function that takes a data frame as its first argument
  • 5. Examples of R verbs head, tail, … verb subject … > head( iris, n = 4 ) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa
  • 7. Classic R code mean( rnorm( 100, mean = 4, sd = 4), trim = .1 ) Pipeline R code with %>% 100 %>% rnorm( mean = 4, sd = 4) %>% mean( trim = .1 )
  • 8. nycflights13: Data about flights departing NYC in 2013
  • 9. > library("nycflights13") > flights Source: local data frame [336,776 x 16] year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight 1 2013 1 1 517 2 830 11 UA N14228 1545 2 2013 1 1 533 4 850 20 UA N24211 1714 3 2013 1 1 542 2 923 33 AA N619AA 1141 4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 5 2013 1 1 554 -6 812 -25 DL N668DN 461 6 2013 1 1 554 -4 740 12 UA N39463 1696 7 2013 1 1 555 -5 913 19 B6 N516JB 507 8 2013 1 1 557 -3 709 -14 EV N829AS 5708 9 2013 1 1 557 -3 838 -8 B6 N593JB 79 10 2013 1 1 558 -2 753 8 AA N3ALAA 301 .. ... ... ... ... ... ... ... ... ... ... Variables not shown: origin (chr), dest (chr), air_time (dbl), distance (dbl), hour (dbl), minute (dbl)
  • 10. tbl_df A data frame that does print all of itself by default > data <- tbl_df(mtcars) > data Source: local data frame [32 x 11] mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 .. ... ... ... ... ... ... ... .. .. ... ...
  • 11. filter A subset of the rows of the data frame
  • 12. filter(flights, month == 1, day == 1) flights %>% filter( dep_delay < 10 ) flights %>% filter( arr_delay < dep_delay ) flights %>% filter( hour < 12, arr_delay <= 0 )
  • 13. arrange reorder a data frame
  • 14. flights %>% filter( hour < 8 ) %>% arrange( year, month, day ) flights %>% arrange( desc(dep_delay) )
  • 15. select select certain columns from the data frame
  • 16. # Select columns by name select(flights, year, month, day) # Select all columns between year and day select(flights, year:day) # Select all columns except those from year to # day (inclusive) select(flights, -(year:day))
  • 17. mutate modify or create columns based on others
  • 18. d <- flights %>% mutate( gain = arr_delay - dep_delay, speed = distance / air_time * 60 ) %>% filter( gain > 0 ) %>% arrange( desc(speed) ) d %>% select( year, month, day, dest, gain, speed )
  • 19. summarise collapse a data frame into one row …
  • 20. summarise(flights, delay = mean(dep_delay, na.rm = TRUE)) flights %>% filter( dep_delay > 0 ) %>% summarise(arr_delay = mean(arr_delay, na.rm = TRUE))
  • 21. group_by Group observations by one or more variables
  • 22. flights %>% group_by( tailnum ) %>% summarise( count = n(), dist = mean(distance, na.rm = TRUE), delay = mean(arr_delay, na.rm = TRUE) ) %>% filter( is.finite(delay) ) %>% arrange( desc(count) )
  • 23. flights %>% group_by(dest) %>% summarise( planes = n_distinct(tailnum), flights = n() ) %>% arrange( desc(flights) )
  • 24. joins joining two data frames
  • 25. inner_join all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned. destinations <- flights %>% group_by(dest) %>% summarise( planes = n_distinct(tailnum), flights = n() ) %>% arrange( desc(flights) ) %>% rename( faa = dest ) inner_join( destinations, airports, by = "faa")
  • 26. inner_join all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned. destinations <- flights %>% group_by(dest) %>% summarise( planes = n_distinct(tailnum), flights = n() ) %>% arrange( desc(flights) ) inner_join( destinations, airports, by = c( "dest" = "faa" ) )
  • 27. other joins See ?join • left_join, right_join • inner_join, outer_join • semi_join • anti_join
  • 28. dplyr %>% summary • Simple verbs: filter, mutate, select, summarise, arrange • Grouping with group_by • Joins with *_join • Convenient with %>% • F✈️ST
  • 29. dplyr Romain François @romain_francois romain@r-enthusiasts.com