Does sizematter

•Download as PPTX, PDF•

1 like•335 views

Amparo Elizabeth Cano Basave

Technology Education

Does Size Matter? When Small is Good Enough A.L. Gentile, A.E. Cano, A.-S. Dadzie, V. Lanfranchi and N. Ireson The Oak Group, Department of Computer Science, The University of Sheffield

Outline Introduction Email Corpus Dynamic Topic Classification Of Short Texts Experiments Conclusions Outline

Hypothesis:Results obtained using longer texts may be approximated by short texts, of micropost size, i.e., maximum length 140 characters ,[object Object]

Methodology: corpus-driven topic extraction/ document topic classification,[object Object]

Sometimes perceived negatively in the workplace, as they may be seen to reduce productivity [TNS US Group, 2009], and/or pose threats to security and privacy

where restrictions to use are in place, alternatives are sought that obtain the same benefits - Same communication patterns in alternative media: email as a short message service for communication via, e.g., mailing lists

Introduction Research Questions ,[object Object]

Content analysis of emails as microposts, to evaluate to what degree the knowledge content of truncated or abbreviated messages can be compared to the complete message.,[object Object]

Topic Classificat Topic Classification ,[object Object]

Chosen task: text classification on non-predefined topics.

Test bed: generated by preprocessing the Oak email corpus to obtain several fixed-size corpora.

Method: - Corpus-driven topic extraction: a number of topics are automatically extracted from a document collection; each topic is represented as a weighted vector of terms; - Document topic classification: each document is labelled with the topic it is most similar to, and classified into the corresponding cluster.

Topic Classificat Topic Extraction: Proximity-based Clustering ,[object Object]

each document di= {t1,...,tv } is a vector of weighted terms

Term clusters C = {c1,...,ck } (clustering performed by using as feature space the inverted index of D)

What's hot

Text categorizationShubham Pahune

An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce

A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...IJERA Editor

Term weightingPrimya Tamil

A Text Mining Research Based on LDA Topic Modellingcsandit

Ir 09Mohammed Romi

Document Classification Using KNN with Fuzzy Bags of Word Representationsuthi

Text Segmentation for Online Subjective Examination using Machine LearningIRJET Journal

Cc35451454IJERA Editor

Topic Extraction on Domain OntologyKeerti Bhogaraju

Ir 08Mohammed Romi

Finding Similar Files in Large Document Repositoriesfeiwin

Neural Models for Document RankingBhaskar Mitra

Information retrieval 7 boolean modelVaibhav Khanna

I6 mala3 sowmyaJasline Presilda

A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approachesijtsrd

Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION

Paper id 25201435IJRAT

What's hot (18)

Text categorization

An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...

A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...

Term weighting

A Text Mining Research Based on LDA Topic Modelling

Ir 09

Document Classification Using KNN with Fuzzy Bags of Word Representation

Text Segmentation for Online Subjective Examination using Machine Learning

Cc35451454

Topic Extraction on Domain Ontology

Ir 08

Finding Similar Files in Large Document Repositories

Neural Models for Document Ranking

Information retrieval 7 boolean model

I6 mala3 sowmya

A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches

Text Mining at Feature Level: A Review

Paper id 25201435

Viewers also liked

Harnessing Linked Knowledge Sources for Topic Classification in Social MediaAmparo Elizabeth Cano Basave

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsAmparo Elizabeth Cano Basave

Product CEO vs The WorldTariq Krim

Detecting child grooming behaviour patterns on social mediaAmparo Elizabeth Cano Basave

Locklearguest63a2b8

Violence det ijcnlp13-slideshareAmparo Elizabeth Cano Basave

Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Amparo Elizabeth Cano Basave

Ekaw2010 tutorial3 practicalAmparo Elizabeth Cano Basave

Pedir Servir Traernrodriguez

A Study of the Impact of Persuasive Argumentation in Political DebatesAmparo Elizabeth Cano Basave

Sensing  Presence (PreSense) Ontology –   User  Modelling  in  the  Semantic ...Amparo Elizabeth Cano Basave

Volatile Classification of Point of Interests based on Social Activity StreamsAmparo Elizabeth Cano Basave

Units Of Measurement Spanishnrodriguez

Introduction to Biometric lectures... Prepared by Dr.AbbasBasra University, Iraq

Reflexive Verb Intronrodriguez

El Modo Imperativo Updatednrodriguez

Viewers also liked (16)

Harnessing Linked Knowledge Sources for Topic Classification in Social Media

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Product CEO vs The World

Detecting child grooming behaviour patterns on social media

Locklear

Violence det ijcnlp13-slideshare

Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...

Ekaw2010 tutorial3 practical

Pedir Servir Traer

A Study of the Impact of Persuasive Argumentation in Political Debates

Sensing  Presence (PreSense) Ontology –   User  Modelling  in  the  Semantic ...

Volatile Classification of Point of Interests based on Social Activity Streams

Units Of Measurement Spanish

Introduction to Biometric lectures... Prepared by Dr.Abbas

Reflexive Verb Intro

El Modo Imperativo Updated

Similar to Does sizematter

A rough set based hybrid method to text categorizationNinad Samel

A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf

A-Study_TopicModelingSardhendu Mishra

A Novel Approach for Keyword extraction in learning objects using text miningIJSRD

[ ] uottawa_copeck.docbutest

Semantic tagging for documents using 'short text' informationcsandit

Text Categorizationof Multi-Label Documents For Text MiningIIRindia

6.domain extraction from research papersEditorJST

kantorNSF-NIJ-ISI-03-06-04.pptbutest

A systematic study of text mining techniquesijnlc

[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.PadmapriyaIJET - International Journal of Engineering and Techniques

A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP

An Evaluation of Preprocessing Techniques for Text ClassificationIJCSIS Research Publications

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato

A New Architecture for Email Knowledge Extraction dannyijwest

International Journal of Engineering Research and Development (IJERD)IJERD Editor

Does Size Matter? When Small is Good Enoughaba-sah

EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc

Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus

Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf

Similar to Does sizematter (20)

A rough set based hybrid method to text categorization

A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING

A-Study_TopicModeling

A Novel Approach for Keyword extraction in learning objects using text mining

[ ] uottawa_copeck.doc

Semantic tagging for documents using 'short text' information

Text Categorizationof Multi-Label Documents For Text Mining

6.domain extraction from research papers

kantorNSF-NIJ-ISI-03-06-04.ppt

A systematic study of text mining techniques

[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya

A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...

An Evaluation of Preprocessing Techniques for Text Classification

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks

A New Architecture for Email Knowledge Extraction

International Journal of Engineering Research and Development (IJERD)

Does Size Matter? When Small is Good Enough

EXPERT OPINION AND COHERENCE BASED TOPIC MODELING

Understanding Natural Languange with Corpora-based Generation of Dependency G...

Concurrent Inference of Topic Models and Distributed Vector Representations

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

From Family Reminiscence to Scholarly Archive .Alan Dix

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Commit 2024 - Secret Management made easyAlfredo García Lavilla

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

unit 4 immunoblotting technique complete.pptxBkGupta21

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

From Family Reminiscence to Scholarly Archive .

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Connect Wave/ connectwave Pitch Deck Presentation

Take control of your SAP testing with UiPath Test Suite

Are Multi-Cloud and Serverless Good or Bad?

SAP Build Work Zone - Overview L2-L3.pptx

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Commit 2024 - Secret Management made easy

DSPy a system for AI to Write Prompts and Do Fine Tuning

Gen AI in Business - Global Trends Report 2024.pdf

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Unleash Your Potential - Namagunga Girls Coding Club

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

What's New in Teams Calling, Meetings and Devices March 2024

"Debugging python applications inside k8s environment", Andrii Soldatenko

unit 4 immunoblotting technique complete.pptx

Does sizematter

1. Does Size Matter? When Small is Good Enough A.L. Gentile, A.E. Cano, A.-S. Dadzie, V. Lanfranchi and N. Ireson The Oak Group, Department of Computer Science, The University of Sheffield

2. Outline Introduction Email Corpus Dynamic Topic Classification Of Short Texts Experiments Conclusions Outline

6. Sometimes perceived negatively in the workplace, as they may be seen to reduce productivity [TNS US Group, 2009], and/or pose threats to security and privacy

7. where restrictions to use are in place, alternatives are sought that obtain the same benefits - Same communication patterns in alternative media: email as a short message service for communication via, e.g., mailing lists

10.

11. Chosen task: text classification on non-predefined topics.

12. Test bed: generated by preprocessing the Oak email corpus to obtain several fixed-size corpora.

13. Method: - Corpus-driven topic extraction: a number of topics are automatically extracted from a document collection; each topic is represented as a weighted vector of terms; - Document topic classification: each document is labelled with the topic it is most similar to, and classified into the corresponding cluster.

14.

15. each document di= {t1,...,tv } is a vector of weighted terms

16. Term clusters C = {c1,...,ck } (clustering performed by using as feature space the inverted index of D)

17. each cluster ck = {t1,...,tn} is a vector of weighted terms

18.

19.

20.

21.

22. Experiments Results

23. Conclusions A fair portion of the emails exchanged, for the corpus generated from a mailing list, are very short, with approximately 40% falling within the single micropost size, and 65% up to two microposts; For the text classification task described, the accuracy of classification for micropost size texts is an acceptable approximation of classification performed on longer texts, with a decrease of only ∼ 5% for up to the second micropost block within a long e-mail. Conclusions

24. Conclusions Enriching the micro-emails with semantic information (e.g., concepts extracted from domain and standard ontologies). would improve the results obtained using unannotatedtext Investigate the influence of other similarity measures. Application to expert finding tasks, exploiting dynamic topic extraction as a means to determine authors’ and recipients’ areas of expertise. Formal evaluation of topic validity will be required, including the human (expert) annotator in the loop. Future Work

25. References References [1] Herbsleb, J. D., Atkins, D. L., Boyer, D. G., Handel, M., and Finholt, T. A. (2002). Introducing instant messaging and chat in the workplace. In Proc., SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, pages 171–178. [2] Isaacs, E., Walendowski, A., Whittaker, S., Schiano, D. J., and Kamm, C. (2002). The character, functions, and styles of instant messaging in the workplace. In Proc., ACM conference on Computer supported cooperative work, pages 11–20. [3] TNS US Group (2009). Social media exploding: More than 40% use online social networks. http://www.tns- us.com/news/social_media_exploding_more_than.php.

Editor's Notes

Comment why the blt sub is different for each person..Stresss that the lightweight ontologies emerging are personal ontologies, way in which a user reffered to or characterise a set of entities.Go more slowly in the evaluation .. Say what you
In this work we were interested in exploring the influence of the size of documents on the accuracy of a given text processing task.Our hypothesis was that results obtained using longer texts could be comparable to those obtained from microblog size texts.. This is from texts containing as much as 140 characters.To test this hypothesis we generated a corpora which consisted of truncated emails starting from 140 characters and successive multiples thereofThe text processing task we evaluated was topic extraction/ and topic classification.
We were particularly interested on study this hypothesis within email based corpora because previous research has shown that - although micropost services are also used in fromal environments such as work, - There is also a tendency to perceive this services negatively as they seem to reduce productivity.. In some environments where restrictions are made on these type of services, there is a tendency to use email as an alternative to micropost services, presenting the same patterns of communication which favor brevity.
So within the email based corpus we were interested to determine if email is indeed used as a short messaging service And we wanted to evaluate to what extent the knowledge content provided by a truncated message is comparable to that contained in a full message. Determine if the knowledge content of short emails may be used to obtain useful information about e.g., topics of interest or expertise within an organisation, as a basis for carrying out tasks such as expert finding or content-based social network analysis (SNA)
Our data set consisted on the Oak mailing list consisting of a 659 email taken from July to January 2010. As we can see in the graph the corpus is already characterised by having a high frquency of short messages going from zero to 280 characters.
For comparing the degree in which the knowledge content of a a shorter message could be compared to that of a full message, we choose a text classification task on non-predefined topics..First we started by partioning the them emails to obtain the several fixed size corpora..Then we performed a corpus-driven topic extraction, in which each topic consists of a weighted vector of keywords.Each document was labelled with the topic it is most similar to..
Based on the definition of Naaman, Wagner and Strohmaier, introduce a notation for characterising these streams. They refer to this notation as a Tweetonomy. We’ll talk about it later on.. But they also identify also introduce the term Personal Awareness Stream for referring to the c
Based on the definition of Naaman, Wagner and Strohmaier, introduce a notation for characterising these streams. They refer to this notation as a Tweetonomy. We’ll talk about it later on.. But they also identify also introduce the term Personal Awareness Stream for referring to the c

Does sizematter

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (16)

Similar to Does sizematter

Similar to Does sizematter (20)

Recently uploaded

Recently uploaded (20)

Does sizematter

Editor's Notes