SlideShare une entreprise Scribd logo
1  sur  24
Natural Language Processing—An Introduction
Colleen M. Farrelly, Staticlysm
Brief bio –
Colleen M. Farrelly is a machine learning scientist whose expertise includes
supervised learning, unsupervised learning, psychometrics, topological data
analysis, and natural language processing. She has an analytics book in review
that touches upon the analysis of text data with topological data analysis tools.
Introduction
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Text Data and Applications
• What do all of these have in
common?
• Clinical case notes
• Chatbot conversations
• Client email interactions
• Court case
summaries/transcripts
• Published research articles
• Tweets
• Voice recordings
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Text Data and Applications
• Commonalities
• Text data
• Contain potentially-
informative features for
predicting an outcome or
categorizing data
• May contain information
not available in structured
datasets
• Linguistic insight on the
speaker/writer
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Example
Legal
• Imagine both the witness and the robber in these two examples.
• How might these observations impact the outcome of a police investigation?
• Statement 1:
• She pulled the gun, took the money, and ran.
• Statement 2:
• The petite blonde pulled a shotgun on the clerk at station 2, filled a bag with cash from the
register, and absconded with the money and a handful of pens.
• How many suspects might the police have to stop to find Bonnie and Clyde?
Which witness statement might have more impact on a jury?
• How might differences in clinical case notes by clinicians inform health outcome
models? How might they reflect on the individual clinician?
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Making Sense of Text Data
• Natural language
processing (NLP)
• Collection of tools to parse
human language into
something understandable by
algorithms
• What is said
• Computational linguistics
• Deriving insight about human
behavior or traits based on
text data
• How it’s said
Common NLP Tools
An Overview
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Parsing Documents/Sentences
An Example
• Tokens (words or punctuation)
• Punctuation (non-word tokens)
• Stop words (less important words)
• Root words (stemming/lemmatizing)
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Tagging Features
• Parts of speech
• Clauses
• Grammatical relations
• Entity recognition
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Deriving Sentiment
• Language-dependent
• Sentiment dictionaries
• Positive/negative/neutral
(afinn, for instance)
• Emotion groups from
psychological models
Bonnie hopped into Clyde’s new car.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Vectorizing/Summarizing Results
• Many options for turning
NLP results into usable
data in machine learning
and statistical tools:
• Vectorization
• Word frequency matrices
• Summary tables
Bonnie hopped into Clyde’s new car.
Using Statistical Tools to Understand NLP
An Overview
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Summary Statistics
• Common summary
statistic uses
1. Conversation length
(example: engagement
metric)
2. Swear count (example:
escalation marker)
3. Conversation sentiment
over time (example:
engagement and
satisfaction)
4. Key word frequency
(example: products with
most issues)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Use as Machine Learning Features
• Examples combining
NLP data with data
from structured
databases
1. Clustering (example:
types of churn from
client feedback and
account data)
2. Predictive modeling
(example: patient
outcomes from case
notes and medical
records)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Psychometric Applications
• Some published papers:
1. Personality trait
identification in industrial
psychology research
2. Author identification in
plagiarism software
3. Quantification of release
risk in justice systems
4. Quantification of relapse
risk in mental health
applications
Other Uses of NLP
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Other Common NLP Applications
• Chatbots
• Personal assistants
• Translation services
• Sentence completion
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
In General
Useful References/Software
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Main NLP Software Options
• NLTK (Python)
• spaCy (Python)
• Stanford CoreNLP (Java)
• John Snow Labs/Spark NLP (Spark)
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Some NLP Literature
• Dunnmon, J. A., Ratner, A. J., Saab, K., Khandwala, N., Markert, M., Sagreiya, H., ...
& Ré, C. (2020). Cross-modal data programming enables rapid medical machine
learning. Patterns, 100019.
• Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June).
Learning word vectors for sentiment analysis. In Proceedings of the 49th annual
meeting of the association for computational linguistics: Human language
technologies (pp. 142-150).
• Pennebaker, J. W. (2011). The secret life of pronouns. New Scientist, 211(2828),
42-45.
• Polsley, S., Jhunjhunwala, P., & Huang, R. (2016, December). Casesummarizer: a
system for automated summarization of legal texts. In Proceedings of COLING
2016, the 26th international conference on Computational Linguistics: System
Demonstrations (pp. 258-262).
• Velupillai, S., Suominen, H., Liakata, M., Roberts, A., Shah, A. D., Morley, K., ... &
Chapman, W. (2018). Using clinical Natural Language Processing for health
outcomes research: Overview and actionable suggestions for future advances.
Journal of biomedical informatics, 88, 11-19.
Thank you!
Contact Information
cfarrelly@med.miami.edu
SAS Global 2021 Introduction to Natural Language Processing

Contenu connexe

Similaire à SAS Global 2021 Introduction to Natural Language Processing

SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiWilliam Nadolski
 
Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely AgileBI Guru
 
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docxCase StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docxwendolynhalbert
 
3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...DATUM LLC
 
Top Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software DemoTop Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software DemoMark Walker
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位eLearning Consortium 電子學習聯盟
 
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing ResearchBurtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing ResearchLinda Burtch
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - ExperimentsGaurav Marwaha
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016Andrey Karpov
 
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...Intellipaat
 
Scanning of Business Analysis
Scanning of Business AnalysisScanning of Business Analysis
Scanning of Business AnalysisTechShiv
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights Joe Lamantia
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101MaRS Discovery District
 
NuanceChoosingACodingPartner
NuanceChoosingACodingPartnerNuanceChoosingACodingPartner
NuanceChoosingACodingPartnerLisa Hazen
 
Careers Chamblee 2011
Careers Chamblee 2011Careers Chamblee 2011
Careers Chamblee 2011achamblee
 

Similaire à SAS Global 2021 Introduction to Natural Language Processing (20)

SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William Nadolski
 
Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely Seven Agile Methods that Help Deliver Visualizations Agilely
Seven Agile Methods that Help Deliver Visualizations Agilely
 
Top Tips for eDiscovery Software Demo iControl ESI
Top Tips for eDiscovery Software Demo iControl ESITop Tips for eDiscovery Software Demo iControl ESI
Top Tips for eDiscovery Software Demo iControl ESI
 
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docxCase StudyIn March 1994, Randal Schwartz was indicted on three f.docx
Case StudyIn March 1994, Randal Schwartz was indicted on three f.docx
 
3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...3 Essential Steps to Deliver Information Governance Success Through Strategy ...
3 Essential Steps to Deliver Information Governance Success Through Strategy ...
 
Top Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software DemoTop Tips to a Successful eDiscovery Software Demo
Top Tips to a Successful eDiscovery Software Demo
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing ResearchBurtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
Burtch Works Top Career Tips for Analytics, Data Science, & Marketing Research
 
Data Science - Experiments
Data Science - ExperimentsData Science - Experiments
Data Science - Experiments
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
Data Science Engineer Resume | Data Scientist Resume | Data Science Resume Ti...
 
ncV
ncVncV
ncV
 
OSAE data final
OSAE data finalOSAE data final
OSAE data final
 
Scanning of Business Analysis
Scanning of Business AnalysisScanning of Business Analysis
Scanning of Business Analysis
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
Customer Discovery and Market Intelligence - MaRS Entrepreneurship 101
 
NuanceChoosingACodingPartner
NuanceChoosingACodingPartnerNuanceChoosingACodingPartner
NuanceChoosingACodingPartner
 
Careers Chamblee 2011
Careers Chamblee 2011Careers Chamblee 2011
Careers Chamblee 2011
 
SRECO_Profile
SRECO_ProfileSRECO_Profile
SRECO_Profile
 

Plus de Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceColleen Farrelly
 

Plus de Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
 

Dernier

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Dernier (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

SAS Global 2021 Introduction to Natural Language Processing

  • 1.
  • 2. Natural Language Processing—An Introduction Colleen M. Farrelly, Staticlysm Brief bio – Colleen M. Farrelly is a machine learning scientist whose expertise includes supervised learning, unsupervised learning, psychometrics, topological data analysis, and natural language processing. She has an analytics book in review that touches upon the analysis of text data with topological data analysis tools.
  • 4. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Text Data and Applications • What do all of these have in common? • Clinical case notes • Chatbot conversations • Client email interactions • Court case summaries/transcripts • Published research articles • Tweets • Voice recordings
  • 5. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Text Data and Applications • Commonalities • Text data • Contain potentially- informative features for predicting an outcome or categorizing data • May contain information not available in structured datasets • Linguistic insight on the speaker/writer
  • 6. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Example Legal • Imagine both the witness and the robber in these two examples. • How might these observations impact the outcome of a police investigation? • Statement 1: • She pulled the gun, took the money, and ran. • Statement 2: • The petite blonde pulled a shotgun on the clerk at station 2, filled a bag with cash from the register, and absconded with the money and a handful of pens. • How many suspects might the police have to stop to find Bonnie and Clyde? Which witness statement might have more impact on a jury? • How might differences in clinical case notes by clinicians inform health outcome models? How might they reflect on the individual clinician?
  • 7. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Making Sense of Text Data • Natural language processing (NLP) • Collection of tools to parse human language into something understandable by algorithms • What is said • Computational linguistics • Deriving insight about human behavior or traits based on text data • How it’s said
  • 9. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Parsing Documents/Sentences An Example • Tokens (words or punctuation) • Punctuation (non-word tokens) • Stop words (less important words) • Root words (stemming/lemmatizing) Bonnie hopped into Clyde’s new car.
  • 10. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Tagging Features • Parts of speech • Clauses • Grammatical relations • Entity recognition Bonnie hopped into Clyde’s new car.
  • 11. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Deriving Sentiment • Language-dependent • Sentiment dictionaries • Positive/negative/neutral (afinn, for instance) • Emotion groups from psychological models Bonnie hopped into Clyde’s new car.
  • 12. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Vectorizing/Summarizing Results • Many options for turning NLP results into usable data in machine learning and statistical tools: • Vectorization • Word frequency matrices • Summary tables Bonnie hopped into Clyde’s new car.
  • 13. Using Statistical Tools to Understand NLP An Overview
  • 14. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Summary Statistics • Common summary statistic uses 1. Conversation length (example: engagement metric) 2. Swear count (example: escalation marker) 3. Conversation sentiment over time (example: engagement and satisfaction) 4. Key word frequency (example: products with most issues)
  • 15. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Use as Machine Learning Features • Examples combining NLP data with data from structured databases 1. Clustering (example: types of churn from client feedback and account data) 2. Predictive modeling (example: patient outcomes from case notes and medical records)
  • 16. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Psychometric Applications • Some published papers: 1. Personality trait identification in industrial psychology research 2. Author identification in plagiarism software 3. Quantification of release risk in justice systems 4. Quantification of relapse risk in mental health applications
  • 18. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Other Common NLP Applications • Chatbots • Personal assistants • Translation services • Sentence completion
  • 19. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. In General
  • 21. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Main NLP Software Options • NLTK (Python) • spaCy (Python) • Stanford CoreNLP (Java) • John Snow Labs/Spark NLP (Spark)
  • 22. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Some NLP Literature • Dunnmon, J. A., Ratner, A. J., Saab, K., Khandwala, N., Markert, M., Sagreiya, H., ... & Ré, C. (2020). Cross-modal data programming enables rapid medical machine learning. Patterns, 100019. • Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 142-150). • Pennebaker, J. W. (2011). The secret life of pronouns. New Scientist, 211(2828), 42-45. • Polsley, S., Jhunjhunwala, P., & Huang, R. (2016, December). Casesummarizer: a system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations (pp. 258-262). • Velupillai, S., Suominen, H., Liakata, M., Roberts, A., Shah, A. D., Morley, K., ... & Chapman, W. (2018). Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. Journal of biomedical informatics, 88, 11-19.