SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
Natural Language Processing
in R (rNLP)
Fridolin Wild, The Open University, UK
Tutorial to the Doctoral School
at the Institute of Business Informatics
of the Goethe University Frankfurt
Structure of this tutorial
• An introduction to R and cRunch
• Language basics in R
• Basic I/O in R
• Social Network Analysis
• Latent Semantic Analysis
• Twitter
• Sentiment
• (Advanced I/O in R: MySQL, SparQL)
Introduction
cRunch
• is an infrastructure
• for computationally-intense learning
analytics
• supporting researchers
• in investigating big data
• generated in the co-construction of
knowledge
… and beyond
…
Architecture
(Thiele & Lehner, 2011)
Architecture
(Thiele & Lehner, 2011)
Living Reports
data shop
cron jobs
R webservices
Reports
Living reports
• reports with embedded
scripts and data
• knitr and Sweave
• render to html, PDF, …
• visualisations:
– ggplot2, trellis, graphix
– jpg, png, eps, pdf
png(file=”n.png”, plot(network(m)))
• Fill-in-the-blanks:
Drop out quote went down to
<<echo=FALSE>>=
doquote[“OU”,”2011”]
@
documentclass[a4paper]{article}
title{Sweave Example 1}
author{Friedrich Leisch}
begin{document}
maketitle
In this example we embed parts of the examples from the
texttt{kruskal.test} help page into a LaTeX{} document:
<<>>=
data(airquality)
library(ctest)
kruskal.test(Ozone ~ Month, data = airquality)
@
which shows that the location parameter of the Ozone
distribution varies significantly from month to month. Finally we
include a boxplot of the data:
begin{center}
<<fig=TRUE,echo=FALSE>>=
boxplot(Ozone ~ Month, data = airquality)
@
end{center}
end{document}
Example PDF report
Example html5 report
Example Report
=============
This is an example of embedded scripts and
data.
```{r}
a = "hello world”
print(a)
```
And here is an example of how to embed a chart.
```{r fig.width=7, fig.height=6}
plot( 5:20 )
```
Shiny Widgets (1)
• Widgets: use-case
sized encapsulations
of mini apps
• HTML5
• Two files:
ui.R, server.R
• Still missing:
manifest files
(info.plist, config.xml)
Shiny Widgets (2)
From http://www.rstudio.com/shiny/
Web Services
harmonization &
data warehousing
Example R web service
print “hello world”
More complex R web service
setContentType("image/png")
a = c(1,3,5,12,13,15)
image_file = tempfile()
png(file=image_file)
plot(a,
main = "The magic image",
ylab = "", xlab = "",
col = c("darkred", "darkblue", "darkgreen")
)
dev.off()
sendBin(readBin(image_file,'raw',n=file.info(image_file)$size))
unlink(image_file)
R web services
• Uses the apache
mod_R.so
• See http://Rapache.net
• Common server functions:
– GET and POST variables
– setContentType
– sendBin
– …
A word on memory mgmt.
• Advanced memory management
(see p.70 of Dietl diploma thesis):
– Use package big memory
(for shared memory across
threads)
– Use package Rserve (for shared
read-only access across threads)
– Swap out memory objects with
save() and load()
– The latter is typically sufficient
(hard disks are fast!)
• data management abstraction
layer for mod_R.so:
configure handler in http.conf:
specify directory match and load specific
data management routines at start up:
REvalOnStartup
"source(‟/dbal.R');"
Harvesting
data acquisition
Job scheduling
• crontab entries for R webservices
• e.g. harvest feeds
• e.g. store in local DB
data shop
sharing
Data shop and the community
• You have a „public/‟ folder :)
– „public/data‟: save() any .rda file and
it will be indexed within the hour
– „public/services‟: use this to execute
your scripts; indexed within the hour
– „public/gallery‟: use this to store
your public visualisations
– code sharing: Any .R script in your
„public/‟ folder is source readable by
the web
Not covered
The useful pointer
More NLP packages
install.packages("Natural
LanguageProcessing”)
library("Natural
LanguageProcessing")
studio
exploratory
programming
studio
Social Network Analysis
Fridolin Wild, The Open University, UK
The Idea
The basic concept
• Precursors date back to 1920s, math to
Euler‟s „Seven Bridges of Koenigsberg‟
The basic concept
• Precursors date back to 1920s, math to
Euler‟s „Seven Bridges of Koenigsberg‟
The basic concept
• Precursors date back to 1920s, math to
Euler‟s „Seven Bridges of Koenigsberg‟
• Social Networks are:
• Actors (people, groups, media, tags, …)
• Ties (interactions, relationships, …)
• Actors and ties form graph
• Graph has measurable structural
properties
• Betweenness,
• Degree of Centrality,
• Density,
• Cohesion
• Structural Patterns
Forum Messages
message_id forum_id parent_id author
130 2853483 2853445 N 2043
131 1440740 785876 N 1669
132 2515257 2515256 N 5814
133 4704949 4699874 N 5810
134 2597170 2558273 N 2054
135 2316951 2230821 N 5095
136 3407573 3407568 N 36
137 2277393 2277387 N 359
138 3394136 3382201 N 1050
139 4603931 4167338 N 453
140 6234819 6189254 6231352 5400
141 806699 785877 804668 2177
142 4430290 3371246 3380313 48
143 3395686 3391024 3391129 35
144 6270213 6024351 6265378 5780
145 2496015 2491522 2491536 2774
146 4707562 4699873 4707502 5810
147 2574199 2440094 2443801 5801
148 4501993 4424215 4491650 5232
message_id forum_id parent_id author
60 734569 31117 N 2491
221 762702 31117 1
317 762717 31117 762702 1927
1528 819660 31117 793408 1197
1950 840406 31117 839998 1348
1047 841810 31117 767386 1879
2239 862709 31117 N 1982
2420 869839 31117 862709 2038
2694 884824 31117 N 5439
2503 896399 31117 862709 1982
2846 901691 31117 895022 992
3321 951376 31117 N 5174
3384 952895 31117 951376 1597
1186 955595 31117 767386 5724
3604 958065 31117 N 716
2551 960734 31117 862709 1939
4072 975816 31117 N 584
2574 986038 31117 862709 2043
2590 987842 31117 862709 1982
Incidence Matrix
• msg_id = incident, authors appear in incidents
Derive Adjacency Matrix
= t(im) %*% im
Visualization: Sociogramme
Degree
Betweenness
Network Density
• Total edges = 29
• Possible edges =
18 * (18-1)/2 = 153
• Density = 0.19
kmeans Cluster (k=3)
Analysis
• Mix
• Match
• Optimise
Tutorials
• Starter: sna-simple.Rmd
• Real: sna-blog.Rmd
• Advanced: sna-forum.Rmd
Latent Semantic Analysis
Fridolin Wild, The Open University, UK
Latent Semantic Analysis
• “Humans learn word meanings and how to combine
them into passage meaning through experience
with ~paragraph unitized verbal environments.”
• “They don‟t remember all the separate words of a
passage; they remember its overall gist or
meaning.”
• “LSA learns by „reading‟ ~paragraph unitized
texts that represent the environment.”
• “It doesn‟t remember all the separate words of a
text it; it remembers its overall gist or meaning.”
(Landauer, 2007)
Word choice is over-rated
• Educated adult understands ~100,000 word forms
• An average sentence contains 20 tokens.
• Thus 100,00020 possible combinations of words in a
sentence
• maximum of log2 100,00020
= 332 bits in word choice alone.
• 20! = 2.4 x 1018 possible orders of 20 words
= maximum of 61 bits from order of the words.
• 332/(61+ 332) = 84% word choice
(Landauer, 2007)
LSA (2)
• Assumption: texts have a semantic structure
• However, this structure is obscured by word
usage (noise, synonymy, polysemy, …)
• Proposed LSA Solution:
– map doc-term matrix
– using conceptual indices
– derived statistically (truncated SVD)
– and make similarity comparisons using
angles
Input (e.g., documents)
{ M } =
Deerwester, Dumais, Furnas, Landauer, and Harshman (1990):
Indexing by Latent Semantic Analysis, In: Journal of the American
Society for Information Science, 41(6):391-407
Only the red terms appear in more
than one document, so strip the rest.
term = feature
vocabulary = ordered set of features
TEXTMATRIX
Singular Value Decomposition
=
Truncated SVD
latent-semantic space
Reconstructed, Reduced Matrix
m4: Graph minors: A survey
Similarity in a Latent-Semantic Space
Query
Target 1
Target 2Angle 2
Angle 1
Ydimension
X dimension
doc2doc - similarities
Unreduced = pure vector
space model
- Based on M = TSD’
- Pearson Correlation
over document vectors
reduced
- based on M2 = TS2D’
- Pearson Correlation
over document vectors
Ex Post Updating: Folding-In
• SVD factor stability
– SVD calculates factors over a given text base
– Different texts – different factors
– Challenge: avoid unwanted factor changes
(e.g., bad essays)
– Solution: folding-in of essays instead of recalculating
• SVD is computationally expensive
Folding-In in Detail
1
kk
T
i STvd
1
T
ikki dSTm
2
vT
Tk Sk Dk
Mk
(Berry et al., 1995)
(1) convert
Original
Vector to
„Dk“-format
(2) convert
„Dk“-format
vector to
„Mk“-format
LSA Process & Driving Parameters
4 x 12 x 7 x 2 x 3
= 2016 Combinations
Pre-Processing
• Stemming
– Porter Stemmer (snowball.tartarus.org)
– ‚move„, ‚moving„, ‚moves„ => ‚move„
– in German even more important (more flections)
• Stop Word Elimination
– 373 Stop Words in German
• Stemming plus Stop Word Elimination
• Unprocessed („raw‟) Terms
Term Weighting Schemes
• Global Weights (GW)
– None (‚raw‘ tf)
– Normalisation
– Inverse Document
Frequency (IDF)
– 1 + Entropy
.
1
2
1
j
ij
i
tf
norm
1
)(
log2
idocfreq
numdocs
idfi
1
log
log
1
j
ijij
i
numdocs
pp
entplusone 1
j
ij
ij
ij
tf
tf
p, where
weightij = lw(tfij) ∙ gw(tfij)
 Local Weights (LW)
 None (‘raw’ tf)
 Binary Term Frequency
 Logarithmized Term Frequency
(log)
SVD-Dimensionality
• Many different proposals (see package)
• 80% variance is a good estimator
Proximity Measures
• Pearson Correlation
• Cosine Correlation
• Spearman„s Rho
pics: http://davidmlane.com/hyperstat/A62891.html
Pair-wise dis/similarity
Convergence expected: ‘eu’, ‘österreich’ Divergence expected: ‘jahr’, ‘wien’
The Package
• Available via CRAN, e.g.:
http://cran.r-project.org/web/packages/lsa/index.html
• Higher-level Abstraction to Ease Use
– Core methods:
textmatrix() / query()
lsa()
fold_in()
as.textmatrix()
– Support methods for term weighting, dimensionality
calculation, correlation measurement, …
Core Workflow
• tm = textmatrix(„dir/„)
• tm = lw_logtf(tm) *
gw_idf(tm)
• space = lsa(tm,
dims=dimcalc_share())
• tm3 = fold_in(tm, space)
• as.textmatrix(tm)
Pre-Processing Chain
Tutorials
• Starter: lsa-indexing.Rmd
• Real: lsa-essayscoring.Rmd
• Advanced: lsa-sparse.Rmd
Additional tutorials
Fridolin Wild, The Open University, UK
Tutorials
• Advanced I/O: twitter.Rmd
• Advanced I/O: sparql.Rmd
• Advanced NLP: twitter-sentiment.Rmd
• Evaluation: interrater-agreement.Rmd

Contenu connexe

Tendances

Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Text Mining Using R
Text Mining Using RText Mining Using R
Text Mining Using RKnoldus Inc.
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rVivian S. Zhang
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classificationshakimov
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupDan Sullivan, Ph.D.
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentKemal Can Kara
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
 
Slides
SlidesSlides
Slidesbutest
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisMehwish Alam
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureDr. Christian Betz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)Eran Yahav
 

Tendances (20)

Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Text Mining Using R
Text Mining Using RText Mining Using R
Text Mining Using R
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classification
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitment
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
 
Slides
SlidesSlides
Slides
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept Analysis
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)
 

En vedette

Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisAravind Babu
 
Social media analysis in R using twitter API
Social media analysis in R using twitter API Social media analysis in R using twitter API
Social media analysis in R using twitter API Mohd Shadab Alam
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Marina Santini
 
NLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみたNLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみたYoshiyuki Kakihara
 
NLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey StellaNLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey StellaSpark Summit
 
Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6William Colen
 
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning KeynoteStartupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning KeynoteStartupfest
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksGuillaume Pitel
 
Natural language procesing in R
Natural language procesing in RNatural language procesing in R
Natural language procesing in ROlabanji Shonibare
 
Practical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsPractical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsZhipeng Liang
 
Webinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceWebinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceQuanticMind
 
An ad words ad performance analysis by r
An ad words ad performance analysis by rAn ad words ad performance analysis by r
An ad words ad performance analysis by rSimonChen888
 
Building Emoji Autocomplete
Building Emoji AutocompleteBuilding Emoji Autocomplete
Building Emoji AutocompleteDasmer Singh
 

En vedette (20)

TextMining with R
TextMining with RTextMining with R
TextMining with R
 
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
 
Social media analysis in R using twitter API
Social media analysis in R using twitter API Social media analysis in R using twitter API
Social media analysis in R using twitter API
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?
 
Tulane March 2017 Talk
Tulane March 2017 TalkTulane March 2017 Talk
Tulane March 2017 Talk
 
NLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみたNLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみた
 
NLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey StellaNLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey Stella
 
Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6
 
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning KeynoteStartupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasks
 
Natural language procesing in R
Natural language procesing in RNatural language procesing in R
Natural language procesing in R
 
Practical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsPractical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and Methods
 
Webinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceWebinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data Science
 
NLP from scratch
NLP from scratch NLP from scratch
NLP from scratch
 
An ad words ad performance analysis by r
An ad words ad performance analysis by rAn ad words ad performance analysis by r
An ad words ad performance analysis by r
 
Building Emoji Autocomplete
Building Emoji AutocompleteBuilding Emoji Autocomplete
Building Emoji Autocomplete
 

Similaire à Natural Language Processing in R (rNLP)

Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD Aldo Gangemi
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesBig Data Colombia
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingStefan Marr
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
KDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfKDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfssuserf2f0fe
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2yannabraham
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 

Similaire à Natural Language Processing in R (rNLP) (20)

Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
User biglm
User biglmUser biglm
User biglm
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent Programming
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
R tutorial
R tutorialR tutorial
R tutorial
 
KDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfKDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdf
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
What's in a textbook
What's in a textbookWhat's in a textbook
What's in a textbook
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Db1 04
Db1 04Db1 04
Db1 04
 

Plus de fridolin.wild

Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)fridolin.wild
 
Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0 Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0 fridolin.wild
 
Performance Augmentation
Performance AugmentationPerformance Augmentation
Performance Augmentationfridolin.wild
 
Reality As A Knowledge Medium
Reality As A Knowledge MediumReality As A Knowledge Medium
Reality As A Knowledge Mediumfridolin.wild
 
ARLEM draft spec - overview
ARLEM draft spec - overviewARLEM draft spec - overview
ARLEM draft spec - overviewfridolin.wild
 
AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015fridolin.wild
 
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015fridolin.wild
 
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015fridolin.wild
 
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)fridolin.wild
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)fridolin.wild
 
Learning from meaningful, purposive interaction
Learning from meaningful, purposive interactionLearning from meaningful, purposive interaction
Learning from meaningful, purposive interactionfridolin.wild
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)fridolin.wild
 
ARgh! kinesthetic learning
ARgh! kinesthetic learningARgh! kinesthetic learning
ARgh! kinesthetic learningfridolin.wild
 
Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2fridolin.wild
 
Quantifying reflection
Quantifying reflectionQuantifying reflection
Quantifying reflectionfridolin.wild
 
What if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the FutureWhat if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the Futurefridolin.wild
 
The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.fridolin.wild
 

Plus de fridolin.wild (20)

Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)
 
Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0 Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0
 
Performance Augmentation
Performance AugmentationPerformance Augmentation
Performance Augmentation
 
Reality As A Knowledge Medium
Reality As A Knowledge MediumReality As A Knowledge Medium
Reality As A Knowledge Medium
 
ARLEM draft spec - overview
ARLEM draft spec - overviewARLEM draft spec - overview
ARLEM draft spec - overview
 
AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015
 
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
 
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
 
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)
 
Learning from meaningful, purposive interaction
Learning from meaningful, purposive interactionLearning from meaningful, purposive interaction
Learning from meaningful, purposive interaction
 
Reality as a Medium
Reality as a MediumReality as a Medium
Reality as a Medium
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)
 
ARgh! kinesthetic learning
ARgh! kinesthetic learningARgh! kinesthetic learning
ARgh! kinesthetic learning
 
learning by doing.
learning by doing.learning by doing.
learning by doing.
 
Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2
 
Quantifying reflection
Quantifying reflectionQuantifying reflection
Quantifying reflection
 
What if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the FutureWhat if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the Future
 
Widget- based PLEs
Widget-based PLEsWidget-based PLEs
Widget- based PLEs
 
The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.
 

Dernier

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Natural Language Processing in R (rNLP)

  • 1. Natural Language Processing in R (rNLP) Fridolin Wild, The Open University, UK Tutorial to the Doctoral School at the Institute of Business Informatics of the Goethe University Frankfurt
  • 2. Structure of this tutorial • An introduction to R and cRunch • Language basics in R • Basic I/O in R • Social Network Analysis • Latent Semantic Analysis • Twitter • Sentiment • (Advanced I/O in R: MySQL, SparQL)
  • 4. cRunch • is an infrastructure • for computationally-intense learning analytics • supporting researchers • in investigating big data • generated in the co-construction of knowledge … and beyond …
  • 6. Architecture (Thiele & Lehner, 2011) Living Reports data shop cron jobs R webservices
  • 8. Living reports • reports with embedded scripts and data • knitr and Sweave • render to html, PDF, … • visualisations: – ggplot2, trellis, graphix – jpg, png, eps, pdf png(file=”n.png”, plot(network(m))) • Fill-in-the-blanks: Drop out quote went down to <<echo=FALSE>>= doquote[“OU”,”2011”] @ documentclass[a4paper]{article} title{Sweave Example 1} author{Friedrich Leisch} begin{document} maketitle In this example we embed parts of the examples from the texttt{kruskal.test} help page into a LaTeX{} document: <<>>= data(airquality) library(ctest) kruskal.test(Ozone ~ Month, data = airquality) @ which shows that the location parameter of the Ozone distribution varies significantly from month to month. Finally we include a boxplot of the data: begin{center} <<fig=TRUE,echo=FALSE>>= boxplot(Ozone ~ Month, data = airquality) @ end{center} end{document}
  • 10. Example html5 report Example Report ============= This is an example of embedded scripts and data. ```{r} a = "hello world” print(a) ``` And here is an example of how to embed a chart. ```{r fig.width=7, fig.height=6} plot( 5:20 ) ```
  • 11. Shiny Widgets (1) • Widgets: use-case sized encapsulations of mini apps • HTML5 • Two files: ui.R, server.R • Still missing: manifest files (info.plist, config.xml)
  • 12. Shiny Widgets (2) From http://www.rstudio.com/shiny/
  • 14. Example R web service print “hello world”
  • 15. More complex R web service setContentType("image/png") a = c(1,3,5,12,13,15) image_file = tempfile() png(file=image_file) plot(a, main = "The magic image", ylab = "", xlab = "", col = c("darkred", "darkblue", "darkgreen") ) dev.off() sendBin(readBin(image_file,'raw',n=file.info(image_file)$size)) unlink(image_file)
  • 16. R web services • Uses the apache mod_R.so • See http://Rapache.net • Common server functions: – GET and POST variables – setContentType – sendBin – …
  • 17. A word on memory mgmt. • Advanced memory management (see p.70 of Dietl diploma thesis): – Use package big memory (for shared memory across threads) – Use package Rserve (for shared read-only access across threads) – Swap out memory objects with save() and load() – The latter is typically sufficient (hard disks are fast!) • data management abstraction layer for mod_R.so: configure handler in http.conf: specify directory match and load specific data management routines at start up: REvalOnStartup "source(‟/dbal.R');"
  • 19. Job scheduling • crontab entries for R webservices • e.g. harvest feeds • e.g. store in local DB
  • 21. Data shop and the community • You have a „public/‟ folder :) – „public/data‟: save() any .rda file and it will be indexed within the hour – „public/services‟: use this to execute your scripts; indexed within the hour – „public/gallery‟: use this to store your public visualisations – code sharing: Any .R script in your „public/‟ folder is source readable by the web
  • 26. Social Network Analysis Fridolin Wild, The Open University, UK
  • 28. The basic concept • Precursors date back to 1920s, math to Euler‟s „Seven Bridges of Koenigsberg‟
  • 29. The basic concept • Precursors date back to 1920s, math to Euler‟s „Seven Bridges of Koenigsberg‟
  • 30. The basic concept • Precursors date back to 1920s, math to Euler‟s „Seven Bridges of Koenigsberg‟ • Social Networks are: • Actors (people, groups, media, tags, …) • Ties (interactions, relationships, …) • Actors and ties form graph • Graph has measurable structural properties • Betweenness, • Degree of Centrality, • Density, • Cohesion • Structural Patterns
  • 31. Forum Messages message_id forum_id parent_id author 130 2853483 2853445 N 2043 131 1440740 785876 N 1669 132 2515257 2515256 N 5814 133 4704949 4699874 N 5810 134 2597170 2558273 N 2054 135 2316951 2230821 N 5095 136 3407573 3407568 N 36 137 2277393 2277387 N 359 138 3394136 3382201 N 1050 139 4603931 4167338 N 453 140 6234819 6189254 6231352 5400 141 806699 785877 804668 2177 142 4430290 3371246 3380313 48 143 3395686 3391024 3391129 35 144 6270213 6024351 6265378 5780 145 2496015 2491522 2491536 2774 146 4707562 4699873 4707502 5810 147 2574199 2440094 2443801 5801 148 4501993 4424215 4491650 5232 message_id forum_id parent_id author 60 734569 31117 N 2491 221 762702 31117 1 317 762717 31117 762702 1927 1528 819660 31117 793408 1197 1950 840406 31117 839998 1348 1047 841810 31117 767386 1879 2239 862709 31117 N 1982 2420 869839 31117 862709 2038 2694 884824 31117 N 5439 2503 896399 31117 862709 1982 2846 901691 31117 895022 992 3321 951376 31117 N 5174 3384 952895 31117 951376 1597 1186 955595 31117 767386 5724 3604 958065 31117 N 716 2551 960734 31117 862709 1939 4072 975816 31117 N 584 2574 986038 31117 862709 2043 2590 987842 31117 862709 1982
  • 32. Incidence Matrix • msg_id = incident, authors appear in incidents
  • 37. Network Density • Total edges = 29 • Possible edges = 18 * (18-1)/2 = 153 • Density = 0.19
  • 40. Tutorials • Starter: sna-simple.Rmd • Real: sna-blog.Rmd • Advanced: sna-forum.Rmd
  • 41. Latent Semantic Analysis Fridolin Wild, The Open University, UK
  • 42. Latent Semantic Analysis • “Humans learn word meanings and how to combine them into passage meaning through experience with ~paragraph unitized verbal environments.” • “They don‟t remember all the separate words of a passage; they remember its overall gist or meaning.” • “LSA learns by „reading‟ ~paragraph unitized texts that represent the environment.” • “It doesn‟t remember all the separate words of a text it; it remembers its overall gist or meaning.” (Landauer, 2007)
  • 43. Word choice is over-rated • Educated adult understands ~100,000 word forms • An average sentence contains 20 tokens. • Thus 100,00020 possible combinations of words in a sentence • maximum of log2 100,00020 = 332 bits in word choice alone. • 20! = 2.4 x 1018 possible orders of 20 words = maximum of 61 bits from order of the words. • 332/(61+ 332) = 84% word choice (Landauer, 2007)
  • 44. LSA (2) • Assumption: texts have a semantic structure • However, this structure is obscured by word usage (noise, synonymy, polysemy, …) • Proposed LSA Solution: – map doc-term matrix – using conceptual indices – derived statistically (truncated SVD) – and make similarity comparisons using angles
  • 45. Input (e.g., documents) { M } = Deerwester, Dumais, Furnas, Landauer, and Harshman (1990): Indexing by Latent Semantic Analysis, In: Journal of the American Society for Information Science, 41(6):391-407 Only the red terms appear in more than one document, so strip the rest. term = feature vocabulary = ordered set of features TEXTMATRIX
  • 48. Reconstructed, Reduced Matrix m4: Graph minors: A survey
  • 49. Similarity in a Latent-Semantic Space Query Target 1 Target 2Angle 2 Angle 1 Ydimension X dimension
  • 50. doc2doc - similarities Unreduced = pure vector space model - Based on M = TSD’ - Pearson Correlation over document vectors reduced - based on M2 = TS2D’ - Pearson Correlation over document vectors
  • 51. Ex Post Updating: Folding-In • SVD factor stability – SVD calculates factors over a given text base – Different texts – different factors – Challenge: avoid unwanted factor changes (e.g., bad essays) – Solution: folding-in of essays instead of recalculating • SVD is computationally expensive
  • 52. Folding-In in Detail 1 kk T i STvd 1 T ikki dSTm 2 vT Tk Sk Dk Mk (Berry et al., 1995) (1) convert Original Vector to „Dk“-format (2) convert „Dk“-format vector to „Mk“-format
  • 53. LSA Process & Driving Parameters 4 x 12 x 7 x 2 x 3 = 2016 Combinations
  • 54. Pre-Processing • Stemming – Porter Stemmer (snowball.tartarus.org) – ‚move„, ‚moving„, ‚moves„ => ‚move„ – in German even more important (more flections) • Stop Word Elimination – 373 Stop Words in German • Stemming plus Stop Word Elimination • Unprocessed („raw‟) Terms
  • 55. Term Weighting Schemes • Global Weights (GW) – None (‚raw‘ tf) – Normalisation – Inverse Document Frequency (IDF) – 1 + Entropy . 1 2 1 j ij i tf norm 1 )( log2 idocfreq numdocs idfi 1 log log 1 j ijij i numdocs pp entplusone 1 j ij ij ij tf tf p, where weightij = lw(tfij) ∙ gw(tfij)  Local Weights (LW)  None (‘raw’ tf)  Binary Term Frequency  Logarithmized Term Frequency (log)
  • 56. SVD-Dimensionality • Many different proposals (see package) • 80% variance is a good estimator
  • 57. Proximity Measures • Pearson Correlation • Cosine Correlation • Spearman„s Rho pics: http://davidmlane.com/hyperstat/A62891.html
  • 58. Pair-wise dis/similarity Convergence expected: ‘eu’, ‘österreich’ Divergence expected: ‘jahr’, ‘wien’
  • 59. The Package • Available via CRAN, e.g.: http://cran.r-project.org/web/packages/lsa/index.html • Higher-level Abstraction to Ease Use – Core methods: textmatrix() / query() lsa() fold_in() as.textmatrix() – Support methods for term weighting, dimensionality calculation, correlation measurement, …
  • 60. Core Workflow • tm = textmatrix(„dir/„) • tm = lw_logtf(tm) * gw_idf(tm) • space = lsa(tm, dims=dimcalc_share()) • tm3 = fold_in(tm, space) • as.textmatrix(tm)
  • 62. Tutorials • Starter: lsa-indexing.Rmd • Real: lsa-essayscoring.Rmd • Advanced: lsa-sparse.Rmd
  • 63. Additional tutorials Fridolin Wild, The Open University, UK
  • 64. Tutorials • Advanced I/O: twitter.Rmd • Advanced I/O: sparql.Rmd • Advanced NLP: twitter-sentiment.Rmd • Evaluation: interrater-agreement.Rmd