SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Integrating R and Hadoop
Why R on Hadoop ?
Storing and processing large amounts of data is a challenging job for
existing statistical computer applications such as R
! Statistical applications are incapable of handling Big Data 

! Data management tools lack analytical and statistical capabilities 

! Both R and Hadoop have their own working environments 

! R provides the analytics and statistics functionality 

! Hadoop provides algorithms for processing and storing distributed data 

Integrating R with Hadoop bridges the gap between these two applications 

Analyse Hadoop data using R
Because R is one of the most well known statistical software, an analyst
working with Hadoop may also want to use existing R packages with
Hadoop
! R is the most comprehensive statistical analysis package available

! R is free and open source software
! R packages are powerful and widely used for statistical and data analysis

! Can be used for parallel computing across a number of cores and clusters
Integration can leverage the processing power of R and Hadoop and
make it sufficient for Big Data Analytics
Enabling R on Hadoop
Functionality from R open source packages can be used in the
writing of mapper and reducer functions
R and Hadoop can be integrated by
! RHadoop

! RHIPE

! Segue

! R with Hadoop Streaming
Options for R on Hadoop
RHadoop Overview
RHadoop is an open source project that allows programmers
directly use the functionality of MapReduce in R code
! Collection of R packages:
rhdfs
rmr2
rhbase
plyrmr


! Mostly implemented in native R
When to use RHadoop
For data exploration
Data aggregation need
To make use of parallel framework in Hadoop
To sample data
Majorly RHadoop is used for managing and performing data
analysis tasks with Hadoop framework
RHadoop Packages Overview
This R package provides basic connectivity to the HDFS
! Helps to browse, read, write, and modify files stored in HDFS
! Functions kind of replicate standard HDFS commands
! File manipulations
hdfs.copy, hdfs.move, hdfs.delete, hdfs.put, hdfs.get
! Handling directories hdfs.dircreate, hdfs.mkdir
About rhdfs
RHadoop Packages Overview
• library(rhdfs) #Loading the R library
• hdfs.init() #rhdfs package initialization
• hdfs.ls(‘/’) #Lists out all HDFS related files and directories
• hdfs.mkdir() #Create new directory in HDFS file system
• hdfs.rm() #Remove directory from HDFS file system
• help(‘rhdfs’) #Lists all functions of rhdfs package
More examples later...
Sample rhdfs functions
RHadoop Packages Overview
This R package allows an R programmer to perform statistical analysis via
MapReduce on a Hadoop cluster
! More focus on the data analysis of very large data sets

! Java alternative for writing MapReduce programs

! Uses Hadoop Streaming API to write MapReduce jobs in R

! All components communicate via key-value pairs

! By default, it supports some HDFS data loading functions
About rmr2
MapReduce workflow in rmr2
The rmr2 package creates a client-side
environment for MapReduce to execute
map and reduce functions
! Allows these functions to access
variables outside their scope 

! Work with inputs and outputs of
MapReduce 

! Enables programmers to write R
variables to HDFS and vice versa 

Function Categories in rmr2
! For storing and retrieving data


ü to.dfs: To write R objects to HDFS

ü from.dfs: To read mapreduce output from HDFS to R file system
! For mapreduce


ü mapreduce(): For defining and executing mapreduce jobs
ü keyval(): To create and extract key-value pairs
MapReduce function syntax in rmr2
Syntax of rmr2 function:

mapreduce (input, output, map, reduce, input.format, output.format) 

! Input: HDFS path for the input data 

! Output: HDFS path for the output data 

! Map/Reduce: Map and Reduce functions applied on data 

! Input.format/Output.format: Data format i.e. text, csv, json 

! Typically, map and reduce components consists of keyval helper 

function to ensure output is key-value pairs 

Text Analytics using
RHadoop
How Text Mining Works with R and Hadoop
Lexical statistics, study of measuring the frequency of words
Data mining techniques used to identify relationships and patterns
Sentiment analysis used to understand the underlying attitude
Tools like R and SAS offer statistical functionality
Handling large databases needs new technologies (Hadoop)
Text Analysis Process

Information Extraction
Data Mining
Text Data Pre-Processing
Post Processing Analysis
Steps Involved
Sentiment Analysis
• Also known as opinion mining
• Important components of text mining
• Extract opinion sentiment from end user reviews
• Sentiment further classified as positive, negative or neutral
Study of analysing people’s opinions, sentiments,
attitudes, appraisals, and evaluations
Parameters used in Sentiment Analysis
• Polarity, which can be positive, negative, or neutral
• Emotional states, which can be sad, angry, or happy
• Scaling system or numeric values
• Subjectivity/objectivity
• Features based on key entities such as durability of the furniture,
• Screen size of the cell phone, lens quality of a camera, etc.
The process of sentiment analysis involves classification of
given text on the basis of the following parameters:
How Sentiment Analysis Works
A Simple Sentiment Algorithm: This algorithm assigns sentiment score by simply
counting the number of occurrences of “positive” and “negative” words in any
sentence
“I bought an iPhone few days back. It is really nice. The touch screen and voice quality are really cool. It is so
better than my old Blackberry phone which was so hard to type with tiny keys. However iPhone is a bit
expensive.”
Positive Words: nice, cool, better
Negative Words: hard, expensive
Sentence Sentiment Score: Tot. Pos – Tot. Neg (3-2=>1)
Sentence Sentiment Polarity: Positive
Overall Score: Sum of all sentence sentiment scores
Process workflow of this Sentimental Analysis
Workflow
Any Questions ….?

Contenu connexe

Tendances

Data Hacking with RHadoop
Data Hacking with RHadoopData Hacking with RHadoop
Data Hacking with RHadoopEd Kohlwey
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Trainingstratapps
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Yu Liu
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview questionpappupassindia
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringGeorge Ang
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 

Tendances (19)

Data Hacking with RHadoop
Data Hacking with RHadoopData Hacking with RHadoop
Data Hacking with RHadoop
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 
Hadoop Design and k -Means Clustering
Hadoop Design and k -Means ClusteringHadoop Design and k -Means Clustering
Hadoop Design and k -Means Clustering
 
myHadoop 0.30
myHadoop 0.30myHadoop 0.30
myHadoop 0.30
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 

En vedette

Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)fridolin.wild
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Jeffrey Breen
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity RankingKavita Ganesan
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysisDiana Maynard
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project reportBharat Khanna
 

En vedette (20)

RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity Ranking
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
 

Similaire à Integrating R & Hadoop - Text Mining & Sentiment Analysis

Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
R Hadoop integration
R Hadoop integrationR Hadoop integration
R Hadoop integrationDzung Nguyen
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsNetajiGandi1
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Open source analytics
Open source analyticsOpen source analytics
Open source analyticsAjay Ohri
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programminghemasri56
 
hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxmrudulasb
 
Hadoop With R language.pptx
Hadoop With R language.pptxHadoop With R language.pptx
Hadoop With R language.pptxujjwalmatoliya
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on HadoopMing Yuan
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 

Similaire à Integrating R & Hadoop - Text Mining & Sentiment Analysis (20)

Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Data Science
Data ScienceData Science
Data Science
 
R Hadoop integration
R Hadoop integrationR Hadoop integration
R Hadoop integration
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Hadoop MapReduce
Hadoop MapReduceHadoop MapReduce
Hadoop MapReduce
 
hadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptxhadoop eco system regarding big data analytics.pptx
hadoop eco system regarding big data analytics.pptx
 
Hadoop With R language.pptx
Hadoop With R language.pptxHadoop With R language.pptx
Hadoop With R language.pptx
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 

Dernier

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Dernier (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Integrating R & Hadoop - Text Mining & Sentiment Analysis

  • 2. Why R on Hadoop ? Storing and processing large amounts of data is a challenging job for existing statistical computer applications such as R ! Statistical applications are incapable of handling Big Data 
 ! Data management tools lack analytical and statistical capabilities 
 ! Both R and Hadoop have their own working environments 
 ! R provides the analytics and statistics functionality 
 ! Hadoop provides algorithms for processing and storing distributed data 
 Integrating R with Hadoop bridges the gap between these two applications 

  • 3. Analyse Hadoop data using R Because R is one of the most well known statistical software, an analyst working with Hadoop may also want to use existing R packages with Hadoop ! R is the most comprehensive statistical analysis package available
 ! R is free and open source software ! R packages are powerful and widely used for statistical and data analysis
 ! Can be used for parallel computing across a number of cores and clusters Integration can leverage the processing power of R and Hadoop and make it sufficient for Big Data Analytics
  • 4. Enabling R on Hadoop Functionality from R open source packages can be used in the writing of mapper and reducer functions R and Hadoop can be integrated by ! RHadoop
 ! RHIPE
 ! Segue
 ! R with Hadoop Streaming Options for R on Hadoop
  • 5. RHadoop Overview RHadoop is an open source project that allows programmers directly use the functionality of MapReduce in R code ! Collection of R packages: rhdfs rmr2 rhbase plyrmr 
 ! Mostly implemented in native R
  • 6. When to use RHadoop For data exploration Data aggregation need To make use of parallel framework in Hadoop To sample data Majorly RHadoop is used for managing and performing data analysis tasks with Hadoop framework
  • 7. RHadoop Packages Overview This R package provides basic connectivity to the HDFS ! Helps to browse, read, write, and modify files stored in HDFS ! Functions kind of replicate standard HDFS commands ! File manipulations hdfs.copy, hdfs.move, hdfs.delete, hdfs.put, hdfs.get ! Handling directories hdfs.dircreate, hdfs.mkdir About rhdfs
  • 8. RHadoop Packages Overview • library(rhdfs) #Loading the R library • hdfs.init() #rhdfs package initialization • hdfs.ls(‘/’) #Lists out all HDFS related files and directories • hdfs.mkdir() #Create new directory in HDFS file system • hdfs.rm() #Remove directory from HDFS file system • help(‘rhdfs’) #Lists all functions of rhdfs package More examples later... Sample rhdfs functions
  • 9. RHadoop Packages Overview This R package allows an R programmer to perform statistical analysis via MapReduce on a Hadoop cluster ! More focus on the data analysis of very large data sets
 ! Java alternative for writing MapReduce programs
 ! Uses Hadoop Streaming API to write MapReduce jobs in R
 ! All components communicate via key-value pairs
 ! By default, it supports some HDFS data loading functions About rmr2
  • 10. MapReduce workflow in rmr2 The rmr2 package creates a client-side environment for MapReduce to execute map and reduce functions ! Allows these functions to access variables outside their scope 
 ! Work with inputs and outputs of MapReduce 
 ! Enables programmers to write R variables to HDFS and vice versa 

  • 11. Function Categories in rmr2 ! For storing and retrieving data 
 ü to.dfs: To write R objects to HDFS
 ü from.dfs: To read mapreduce output from HDFS to R file system ! For mapreduce 
 ü mapreduce(): For defining and executing mapreduce jobs ü keyval(): To create and extract key-value pairs
  • 12. MapReduce function syntax in rmr2 Syntax of rmr2 function:
 mapreduce (input, output, map, reduce, input.format, output.format) 
 ! Input: HDFS path for the input data 
 ! Output: HDFS path for the output data 
 ! Map/Reduce: Map and Reduce functions applied on data 
 ! Input.format/Output.format: Data format i.e. text, csv, json 
 ! Typically, map and reduce components consists of keyval helper 
 function to ensure output is key-value pairs 

  • 14. How Text Mining Works with R and Hadoop Lexical statistics, study of measuring the frequency of words Data mining techniques used to identify relationships and patterns Sentiment analysis used to understand the underlying attitude Tools like R and SAS offer statistical functionality Handling large databases needs new technologies (Hadoop)
  • 15. Text Analysis Process
 Information Extraction Data Mining Text Data Pre-Processing Post Processing Analysis Steps Involved
  • 16. Sentiment Analysis • Also known as opinion mining • Important components of text mining • Extract opinion sentiment from end user reviews • Sentiment further classified as positive, negative or neutral Study of analysing people’s opinions, sentiments, attitudes, appraisals, and evaluations
  • 17. Parameters used in Sentiment Analysis • Polarity, which can be positive, negative, or neutral • Emotional states, which can be sad, angry, or happy • Scaling system or numeric values • Subjectivity/objectivity • Features based on key entities such as durability of the furniture, • Screen size of the cell phone, lens quality of a camera, etc. The process of sentiment analysis involves classification of given text on the basis of the following parameters:
  • 18. How Sentiment Analysis Works A Simple Sentiment Algorithm: This algorithm assigns sentiment score by simply counting the number of occurrences of “positive” and “negative” words in any sentence “I bought an iPhone few days back. It is really nice. The touch screen and voice quality are really cool. It is so better than my old Blackberry phone which was so hard to type with tiny keys. However iPhone is a bit expensive.” Positive Words: nice, cool, better Negative Words: hard, expensive Sentence Sentiment Score: Tot. Pos – Tot. Neg (3-2=>1) Sentence Sentiment Polarity: Positive Overall Score: Sum of all sentence sentiment scores
  • 19. Process workflow of this Sentimental Analysis Workflow