SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Data Visualization using R 
How to get, manage, and present data to tell a compelling science story 
William Gunn 
@mrgunn 
Head of Academic Outreach, Mendeley 
Access point: NRC Visitor
1.A short history of graphical presentation of data 
2.Introduction to R 
3.Finding, cleaning, and presenting data 
4.Reproducibility and data sharing
Data viz has a long history 
John Snow’s cholera map helped communicate the idea that cholera was a water-borne disease.
Florence Nightingale used dataviz
Modernization of dataviz
Chart junk: good, bad, and ugly 
Which presentation is better?
It can be elegant…
Tufte
Tufte
How our eyes and brain perceive 
It takes 200 ms to initiate an eye movement, but the red dot can be found in 100 ms or less. This is due to pre-attentive processing.
Shape is a little slower than color!
Pre-attentive processing fails!
There are many “primitive” properties which we perceive 
•Length 
•Width 
•Size 
•Density 
•Hue 
•Color intensity 
•Depth 
•3-D orientation
Length
Width
Density
Hue
Color Intensity
Depth
3D orientation
Types of color schemes 
•Sequential – suited for ordered data that progress from low to high. Use light colors for low values and dark colors for higher. 
•Diverging – uses hue to show the breakpoint and intensity to show divergent extremes. 
•Qualitative – uses different colors to represent different categories. Beware of using hue/saturation to highlight unimportant categories.
Sequential 
http://colorbrewer2.org/
Diverging
Qualitative
Tips for maps 
•Keep it to 5-7 data classes 
•~8% of men are red-green colorblind 
•Diverging schemes don’t do well when printed or photocopied 
•Colors will often render differently on different screens, especially low-end LCD screens 
•http://colorbrewer2.org
Part 2 
Introduction to R
Why R? 
•Open source tool 
•Huge variety of packages for any kind of analysis 
•Saves time repeating data processing steps 
•Allows working with more diverse types of data and much larger datasets than Excel 
•Processing is much faster than Excel 
•Scripts are easily shareable, promoting reproducible work
.csv and .xls / xlsx 
•Excel files are designed to hold the appearance of the spreadsheet in addition to the data. 
•R just wants the data, so always save as .csv if you have tabular data
data structures 
•x<-c(1,2,3,4,5,6,7,8,9,10) 
•x 
•length(x) 
•x[1] 
•x[2] 
•x<-c(1:10) 
•x
types of data 
•y<-c(“abc”, “def”, “g”, “h”, “i”) 
•y 
•class(y) 
•y[2] 
•length(y) 
•data can be integer (1,2,3,…), numeric (1.0, 2.3, …), character (a, b, c,…), logical (TRUE, FALSE) or other things
Vectors 
•R can hold data organized a few different ways 
•vectors (1,2,3,4) but not (1,2,3,x,y,z) 
•lists – can hold heterogeneous data 
–1 
–2 
–a 
•x 
•arrays – multi-dimensional 
•dataframes – lists of vectors - like spreadsheets
Vector operations 
•x + 1 
•x 
•sum(x) 
•mean(x) 
•mean(x+1) 
•x[2]<-x[2]+1 
•x 
•x+c(2:3) 
•x[2:10] + c(2:3)
working with lists 
•y<-list(name = “Bob”, age = 24) 
•y 
•y$name 
•y[1] 
•y[[1]] 
•class(y[1]) 
•class(y[[1]]) 
•y<-list(y$name, “Sue”) 
•y$name 
•y$age[2]<-list(33)
Loading data 
•data<-read.csv("C:/Users/William Gunn/Desktop/Dropbox/Scripting/Data/traffic_accidents/accidents2010_all.csv", header = TRUE, stringsAsFactors = FALSE)
Selecting subsets of data 
•“[“ 
•“$” 
•which 
•grep and grepl 
•subset
PLOTS 
•ggplot2 – an implementation of the “grammar of graphics” in R 
•a set of graph types and a way of mapping variables to graph features 
•graph types are called “geoms” 
•mappings are “aesthetics” 
•graphs are built up by layering geoms
Types of geoms 
•point – dotplot – takes x,y coords of points 
•abline – line layer – takes slope, intercept 
•line – connect points with a line 
•smooth – fit a curve 
•bar – aka histogram – takes vector of data 
•boxplot – box and whiskers 
•density – to show relative distributions 
•errorbar – what it says on the tin

Contenu connexe

Similaire à Data Visualization using R: How to Get, Manage, and Present Data

An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at ScaleSri Ambati
 
Making sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualizationMaking sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualizationVladimir Milev
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsIBM
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxmyworld93
 
Design and Support Recommendations from Data Visualization Research
Design and Support Recommendations from Data Visualization ResearchDesign and Support Recommendations from Data Visualization Research
Design and Support Recommendations from Data Visualization ResearchAngela Zoss
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data ScienceMaloy Manna, PMP®
 
Beyond The Bench Workshops
Beyond The Bench WorkshopsBeyond The Bench Workshops
Beyond The Bench WorkshopsBeyond The Bench
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)Nicholas Knize, Ph.D., GISP
 
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...Jose Mº Muñoz
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxAnusuya123
 
Exploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptxExploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptxZahidRiazHaans
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Digital Contact's big data presentation to the University of Kent
Digital Contact's big data presentation to the University of KentDigital Contact's big data presentation to the University of Kent
Digital Contact's big data presentation to the University of Kentdigitalcontact
 

Similaire à Data Visualization using R: How to Get, Manage, and Present Data (20)

3 module 2
3 module 23 module 2
3 module 2
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Making sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualizationMaking sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualization
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptx
 
Design and Support Recommendations from Data Visualization Research
Design and Support Recommendations from Data Visualization ResearchDesign and Support Recommendations from Data Visualization Research
Design and Support Recommendations from Data Visualization Research
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Beyond The Bench Workshops
Beyond The Bench WorkshopsBeyond The Bench Workshops
Beyond The Bench Workshops
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)
 
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
 
CS6715-Module1
CS6715-Module1CS6715-Module1
CS6715-Module1
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Data Visualization.pptx
Data Visualization.pptxData Visualization.pptx
Data Visualization.pptx
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Exploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptxExploratory Data Analysis (EDA) .pptx
Exploratory Data Analysis (EDA) .pptx
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Digital Contact's big data presentation to the University of Kent
Digital Contact's big data presentation to the University of KentDigital Contact's big data presentation to the University of Kent
Digital Contact's big data presentation to the University of Kent
 

Plus de William Gunn

AAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationAAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationWilliam Gunn
 
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...William Gunn
 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesWilliam Gunn
 
AGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for ScientistsAGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for ScientistsWilliam Gunn
 
Academia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindAcademia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindWilliam Gunn
 
Social metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualitySocial metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualityWilliam Gunn
 
ASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at MendeleyASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at MendeleyWilliam Gunn
 
Code4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with MendeleyCode4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with MendeleyWilliam Gunn
 
Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond William Gunn
 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchWilliam Gunn
 
ESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of ExpertiseESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of ExpertiseWilliam Gunn
 
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly ContentCharleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly ContentWilliam Gunn
 
VIVO 2010 2010 Paper
VIVO 2010 2010 PaperVIVO 2010 2010 Paper
VIVO 2010 2010 PaperWilliam Gunn
 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperWilliam Gunn
 
Beyond the PDF 2011 Paper
Beyond the PDF 2011 PaperBeyond the PDF 2011 Paper
Beyond the PDF 2011 PaperWilliam Gunn
 
Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!William Gunn
 
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationSci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationWilliam Gunn
 
Open Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is DoneOpen Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is DoneWilliam Gunn
 
CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!
CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!
CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!William Gunn
 

Plus de William Gunn (20)

AAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes CollaborationAAAS 2014: How the Web Changes Collaboration
AAAS 2014: How the Web Changes Collaboration
 
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
LISA VII: The Scientific and Technical Foundation for Altmetrics in the Unite...
 
The Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United StatesThe Scientific and Technical Foundation for Altmetrics in the United States
The Scientific and Technical Foundation for Altmetrics in the United States
 
AGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for ScientistsAGU2012: Creating a Collaborative Network for Scientists
AGU2012: Creating a Collaborative Network for Scientists
 
Academia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia BehindAcademia to Entrepreneur: Why and How to Leave Academia Behind
Academia to Entrepreneur: Why and How to Leave Academia Behind
 
Social metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and QualitySocial metrics for Research: Quantity and Quality
Social metrics for Research: Quantity and Quality
 
ASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at MendeleyASIST 2013 Panel: Altmetrics at Mendeley
ASIST 2013 Panel: Altmetrics at Mendeley
 
Code4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with MendeleyCode4lib 2012: Building Research Applications with Mendeley
Code4lib 2012: Building Research Applications with Mendeley
 
Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond Beyond Academia: Communicating your Work in Academia and Beyond
Beyond Academia: Communicating your Work in Academia and Beyond
 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of Research
 
ESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of ExpertiseESIP FED Spring 2012: Evolving Networks of Expertise
ESIP FED Spring 2012: Evolving Networks of Expertise
 
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly ContentCharleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
Charleston 2012: Altmetrics: Analyzing the Value in Scholarly Content
 
VIVO 2010 2010 Paper
VIVO 2010 2010 PaperVIVO 2010 2010 Paper
VIVO 2010 2010 Paper
 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 Paper
 
Beyond the PDF 2011 Paper
Beyond the PDF 2011 PaperBeyond the PDF 2011 Paper
Beyond the PDF 2011 Paper
 
Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!Connecting Researchers with Information - and Unlocking It!
Connecting Researchers with Information - and Unlocking It!
 
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationSci Tech Forum LA 2013: New Directions in Scholarly Communication
Sci Tech Forum LA 2013: New Directions in Scholarly Communication
 
Open Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is DoneOpen Science Summit 2011: It's Time We Changed How Science is Done
Open Science Summit 2011: It's Time We Changed How Science is Done
 
VIVO 2011 Paper
VIVO 2011 PaperVIVO 2011 Paper
VIVO 2011 Paper
 
CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!
CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!
CNI Spring 2011: Connecting Researchers with Information - and Unlocking It!
 

Dernier

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Dernier (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Data Visualization using R: How to Get, Manage, and Present Data

  • 1. Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley Access point: NRC Visitor
  • 2. 1.A short history of graphical presentation of data 2.Introduction to R 3.Finding, cleaning, and presenting data 4.Reproducibility and data sharing
  • 3. Data viz has a long history John Snow’s cholera map helped communicate the idea that cholera was a water-borne disease.
  • 6. Chart junk: good, bad, and ugly Which presentation is better?
  • 7.
  • 8. It can be elegant…
  • 9.
  • 10. Tufte
  • 11. Tufte
  • 12. How our eyes and brain perceive It takes 200 ms to initiate an eye movement, but the red dot can be found in 100 ms or less. This is due to pre-attentive processing.
  • 13. Shape is a little slower than color!
  • 15. There are many “primitive” properties which we perceive •Length •Width •Size •Density •Hue •Color intensity •Depth •3-D orientation
  • 17. Width
  • 19. Hue
  • 21. Depth
  • 23.
  • 24. Types of color schemes •Sequential – suited for ordered data that progress from low to high. Use light colors for low values and dark colors for higher. •Diverging – uses hue to show the breakpoint and intensity to show divergent extremes. •Qualitative – uses different colors to represent different categories. Beware of using hue/saturation to highlight unimportant categories.
  • 28. Tips for maps •Keep it to 5-7 data classes •~8% of men are red-green colorblind •Diverging schemes don’t do well when printed or photocopied •Colors will often render differently on different screens, especially low-end LCD screens •http://colorbrewer2.org
  • 30. Why R? •Open source tool •Huge variety of packages for any kind of analysis •Saves time repeating data processing steps •Allows working with more diverse types of data and much larger datasets than Excel •Processing is much faster than Excel •Scripts are easily shareable, promoting reproducible work
  • 31. .csv and .xls / xlsx •Excel files are designed to hold the appearance of the spreadsheet in addition to the data. •R just wants the data, so always save as .csv if you have tabular data
  • 32. data structures •x<-c(1,2,3,4,5,6,7,8,9,10) •x •length(x) •x[1] •x[2] •x<-c(1:10) •x
  • 33. types of data •y<-c(“abc”, “def”, “g”, “h”, “i”) •y •class(y) •y[2] •length(y) •data can be integer (1,2,3,…), numeric (1.0, 2.3, …), character (a, b, c,…), logical (TRUE, FALSE) or other things
  • 34. Vectors •R can hold data organized a few different ways •vectors (1,2,3,4) but not (1,2,3,x,y,z) •lists – can hold heterogeneous data –1 –2 –a •x •arrays – multi-dimensional •dataframes – lists of vectors - like spreadsheets
  • 35. Vector operations •x + 1 •x •sum(x) •mean(x) •mean(x+1) •x[2]<-x[2]+1 •x •x+c(2:3) •x[2:10] + c(2:3)
  • 36. working with lists •y<-list(name = “Bob”, age = 24) •y •y$name •y[1] •y[[1]] •class(y[1]) •class(y[[1]]) •y<-list(y$name, “Sue”) •y$name •y$age[2]<-list(33)
  • 37. Loading data •data<-read.csv("C:/Users/William Gunn/Desktop/Dropbox/Scripting/Data/traffic_accidents/accidents2010_all.csv", header = TRUE, stringsAsFactors = FALSE)
  • 38. Selecting subsets of data •“[“ •“$” •which •grep and grepl •subset
  • 39. PLOTS •ggplot2 – an implementation of the “grammar of graphics” in R •a set of graph types and a way of mapping variables to graph features •graph types are called “geoms” •mappings are “aesthetics” •graphs are built up by layering geoms
  • 40. Types of geoms •point – dotplot – takes x,y coords of points •abline – line layer – takes slope, intercept •line – connect points with a line •smooth – fit a curve •bar – aka histogram – takes vector of data •boxplot – box and whiskers •density – to show relative distributions •errorbar – what it says on the tin