SlideShare une entreprise Scribd logo
1  sur  2
Télécharger pour lire hors ligne
Apollo: Towards Factfinding in Participatory Sensing
H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth
University of Illinois at Urbana Champaign
Urbana, IL 61801
B. Szymanski, S. Adali
Rensselaer Polytechnic Institute
Troy, NY 12180
ABSTRACT
This demonstration presents Apollo, a new sensor informa-
tion processing tool for uncovering likely facts in noisy par-
ticipatory sensing data1
. Participatory sensing, where users
proactively document and share their observations, has re-
ceived significant attention in recent years as a paradigm for
crowd-sourcing observation tasks. However, it poses inter-
esting challenges in assessing confidence in the information
received. By borrowing clustering and ranking tools from
data mining literature, we show how to group data into sets
(or claims), corroborating specific events or observations,
then iteratively assess both claim and source credibility, ulti-
mately leading to a ranking of described claims by their like-
lihoold of occurrence. Apollo belongs to a category of tools
called fact-finders. It is the first fact-finder designed and im-
plemented specifically for participatory sensing. Apollo uses
Twitter as the underlying engine for sharing participatory
sensing data. Twitter is widely popular, can be interfaced to
cell-phones that share sensor data, and comes with a pow-
erful search API, as well as a publish-subscribe mechanism.
We evaluate it using a participatory sensing application that
collects and posts noisy vehicular traffic data on Twitter, as
well as a set of 60,000 (human) tweets collected during the
Haiti tsunami and a set of 500,000 tweets collected about
Cairo during its recent unrest. Viewers of the demonstra-
tion will interact with Apollo for various fact-finding tasks.
Categories and Subject Descriptors
D.2.5 [Software Engineering]:
General Terms
Design, Reliability, Experimentation
Keywords
Data fusion, Participatory sensing, Quality of information
1
Named after the Greek god of light and truth, among other
designations
IPSN 2011 Chicago, IL USA
1. INTRODUCTION
This demonstration illustrates a fact-finding tool designed
to uncover most likely truth in noisy participatory sensing
data. Participatory sensing [1], where participants proac-
tively report their observations, has become an increasingly
important data collection paradigm, where humans act as
the sensors, or employ devices they own (such as cell phones)
to perform sensing tasks. Its growing importance stems from
the large penetration of sensor-enabled devices with com-
munication capabilities in human populations. However, it
poses challenges in that data are collected from individu-
als and devices who may not always be accurate. Filtering
the deluge of data for correctness is an important challenge,
commonly known in machine learning and knowledge discov-
ery literature as fact-finding. Apollo is the first fact-finder
designed specifically for participatory sensing data.
A fact-finder [6] maintains the abstraction of sources and
claims. In general, starting with no a priori information,
it iteratively computes their credibility: The credibility of
claims depends on the credibility of sources that make them.
Similarly, the credibility of sources depends on the credibil-
ity of claims they make. Iterations continue until they con-
verge. Enhancements of this basic iterative scheme include
a more general notion of claim assertion, where weights de-
scribe how confidently a source asserts a claim [5], incorpo-
ration of prior knowledge in the analysis [4], and clustering
of claims by subject [3] since the credibility of a source may
depend on the subject matter. The impact of dependence
between sources on credibility (e.g., when one spreads claims
made by others) can also be accounted for [2].
Apollo is a general framework for fact-finding in participa-
tory sensing data. Data-type-dependent modules first con-
vert source data (a set of source, observation tuples) into a
common representation of sources and claims. Clustering is
performed on input tuples first, by similarity of their obser-
vations, to generate a smaller number of claims for scalabil-
ity. Their credibility is then assessed in an iterative fashion,
together with the credibility of each source. Figure 1 illus-
trates the architecture of Apollo.
We evaluate our fact-finding algorithm in three distinct sce-
narios. In the first, a controlled experiment is conducted
where a set of participants report (tweet) traffic speed data
from GPS sensors in vehicles. We intentially corrupt a frac-
tion of observations. We also allow some participants to
report traffic information they heard from others, as op-
Sensor
Data
Front-end
Text
Front-end
Image
Front-end
Distance
M etric, S
Distance
M etric, T
Distance
M etric, I
Claim Clusters
A Network
of Claims
and Sources
Claim
Credibility
Assessment
Source
Credibility
Assessment
Figure 1: The Architecture of Apollo
posed to reporting their own (i.e., spread rumors). We show
that computing average statistics from such noisy data is not
always accurate. However, when data tuples are clustered
and ranked by Apollo, the quality of reported observations
increases considerably. The second and third scenarios use
Twitter data sets collected during the Haiti Tsunami (60,000
reports) and the recent Cairo unrest (500,000 reports), rep-
sectively. In each case, the significant number of reports
describe a much smaller number of events, some real and
some rumored. We use our algorithm to identify the dis-
tinct events and rank them by credibility. The results are
then compared to media reports.
2. SYSTEM OVERVIEW
We consider a system composed of human or sensory data
sources that send their observations as Twitter feeds. The
sink has two modes of operation. A follower mode, where it
follows only explicitly named sources, and a crawler mode,
where it uses the Twitter search API to download all tweets
on the topic of a query (subject to API-specific volume con-
straints). A fact-finding algorithm is then applied to clean
up the data. The algorithm has the following reconfigurable
parts:
• The parser: It uses a configuration file (that describes
the format of the input data stream) to convert in-
put data into a standard JSON format. Conceptually,
the data stream is composed of source, observation tu-
ples, where source identifies the data source (e.g., the
Twitter user ID), and the observation content may be
structured, as in the case of phones sharing sensor val-
ues, or unstructured, as in the case of people sharing
text. The configuration file describes the observation
format.
• The distance metrics: A library of different distance
metrics is provided for clustering of observations. For
unstructured text, these metrics reflect text similar-
ity. For structured data, metrics compute differences
in data vectors. Data can be multidimensional. For ex-
ample, when cell-phones report sensor values at given
locations, both measurements and locations can be el-
ements of the data vector.
• The cluster credibility metrics: When observations are
clustered by the distance metric, one needs to rank
different clusters in terms of credibility. A library
of ranking functions are provided, inspired by current
fact-finding literature, ranging from simple ones (how
many tuples are in the cluster) to more complex ones
(that take into account the social network relation be-
tween sources). For example, the rank of a cluster may
be lower if observations in the cluster are reported by a
group of tweeters who are followers of the same source.
• Source credibility metrics: More credible sources are
those whose observations are found more often in more
credible clusters. This basic intuition leads to a library
of simple credibility metrics to rank sources.
With the above parameters, the rest of the algorithm simply
iterates on computing clusters, cluster credibility metrics,
source credibility metrics, until stable results are obtained.
3. THE DEMONSTRATION SCRIPT
During the demo, a laptop will be used that runs Apollo on
three different sets of Twitter data; a set obtained in a con-
trolled vehicular traffic speed measurement experiment, a set
collected from Haiti during its tsunami, and a set collected
from Cairo during its recent unrest. Users will be allowed to
reconfigure the experiments by changing the used distance
and credibility metrics, clustering algorithm parameters, as
well as the amount of data operated on, and observing their
effects on final results. This will help answer questions such
as: what is the impact (in the given scenario) of considering
social network relations between data sources on improv-
ing quality of information in participatory sensing systems?
What are the trade-offs between different distance metrics
among data vectors? How coarse-grained can observation
clustering be without impacting correctness of results? How
much data is needed to get reliable results? Results will
be displayed as lists of credible events and credible sources.
“Ground truth” data will also be shown (which is known for
our controlled experiment and estimated from other media
in the other two case studies).
4. REFERENCES
[1] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan,
S. Reddy, and M. B. Srivastava. Participatory sensing. In In:
Workshop on World-Sensor-Web (WSW): Mobile Device
Centric Sensor Networks and Applications, pages 117–134,
2006.
[2] X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating
conflicting data: the role of source dependence. Proc. VLDB
Endow., 2:550–561, August 2009.
[3] M. Gupta, Y. Sun, and J. Han. Trust analysis with
clustering (poster paper). In World Wide Web Conference
(WWW), Hyderabad, India, March 2011.
[4] J. Pasternack and D. Roth. Knowing what to believe (when
you already know something). In Proc. the International
Conference on Computational Linguistics (COLING),
Beijing, China, August 2010.
[5] J. Pasternack and D. Roth. Generalized fact-finding (poster
paper). In World Wide Web Conference (WWW),
Hyderabad, India, March 2011.
[6] X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple
conflicting information providers on the web. IEEE Trans.
on Knowl. and Data Eng., 20:796–808, June 2008.

Contenu connexe

Similaire à Apollo Towards Factfinding In Participatory Sensing

Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
 
Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...ijnlc
 
Data mining java titles adrit solutions
Data mining java titles adrit solutionsData mining java titles adrit solutions
Data mining java titles adrit solutionsAdrit Techno Solutions
 
Data repository for sensor network a data mining approach
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approachijdms
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsunyil96
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity searchunyil96
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
Information retrieval
Information retrievalInformation retrieval
Information retrievalhplap
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
Inspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal ApplicationsInspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal ApplicationsIRJET Journal
 
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Drjabez
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Dataijtsrd
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticseSAT Journals
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...IJCSIS Research Publications
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET Journal
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detectionIJICTJOURNAL
 

Similaire à Apollo Towards Factfinding In Participatory Sensing (20)

Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...
 
Data mining java titles adrit solutions
Data mining java titles adrit solutionsData mining java titles adrit solutions
Data mining java titles adrit solutions
 
Data repository for sensor network a data mining approach
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approach
 
Lspnew (1)
Lspnew (1)Lspnew (1)
Lspnew (1)
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domains
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity search
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
Inspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal ApplicationsInspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal Applications
 
Ijetcas14 446
Ijetcas14 446Ijetcas14 446
Ijetcas14 446
 
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analytics
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detection
 
Linked sensor data
Linked sensor dataLinked sensor data
Linked sensor data
 

Plus de Carmen Pell

Abstract Paper Sample Format - 010 Essay Examp
Abstract Paper Sample Format - 010 Essay ExampAbstract Paper Sample Format - 010 Essay Examp
Abstract Paper Sample Format - 010 Essay ExampCarmen Pell
 
Introducing A Topic Opinion Writing Writing Introductions
Introducing A Topic Opinion Writing  Writing IntroductionsIntroducing A Topic Opinion Writing  Writing Introductions
Introducing A Topic Opinion Writing Writing IntroductionsCarmen Pell
 
Routemybook - Buy MLA Handbook For Writers Of R
Routemybook - Buy MLA Handbook For Writers Of RRoutemybook - Buy MLA Handbook For Writers Of R
Routemybook - Buy MLA Handbook For Writers Of RCarmen Pell
 
The Fascinating Recommendation Report T
The Fascinating Recommendation Report TThe Fascinating Recommendation Report T
The Fascinating Recommendation Report TCarmen Pell
 
Example Of Abstract In Thesis Writing. Writing An Abstract
Example Of Abstract In Thesis Writing. Writing An AbstractExample Of Abstract In Thesis Writing. Writing An Abstract
Example Of Abstract In Thesis Writing. Writing An AbstractCarmen Pell
 
Policy Brief Format Template
Policy Brief Format TemplatePolicy Brief Format Template
Policy Brief Format TemplateCarmen Pell
 
Sample Annotated Bibliography Free Download
Sample Annotated Bibliography Free DownloadSample Annotated Bibliography Free Download
Sample Annotated Bibliography Free DownloadCarmen Pell
 
Components Of Thesis
Components Of ThesisComponents Of Thesis
Components Of ThesisCarmen Pell
 
TEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
TEP025 Writing The Aims And Hypotheses Of Your Laboratory ReportTEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
TEP025 Writing The Aims And Hypotheses Of Your Laboratory ReportCarmen Pell
 
Animal nutrition 1 (a.pdf
Animal nutrition 1 (a.pdfAnimal nutrition 1 (a.pdf
Animal nutrition 1 (a.pdfCarmen Pell
 
A Review of Sustainable Finance and Financial Performance Literature Mini-Re...
A Review of Sustainable Finance and Financial Performance Literature  Mini-Re...A Review of Sustainable Finance and Financial Performance Literature  Mini-Re...
A Review of Sustainable Finance and Financial Performance Literature Mini-Re...Carmen Pell
 
Alimentacion De Camelidos Sudamericanos
Alimentacion De Camelidos SudamericanosAlimentacion De Camelidos Sudamericanos
Alimentacion De Camelidos SudamericanosCarmen Pell
 
Anisha- Brand Audit
Anisha- Brand AuditAnisha- Brand Audit
Anisha- Brand AuditCarmen Pell
 
Author Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
Author  Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...Author  Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
Author Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...Carmen Pell
 
After The Content Course An Expert-Novice Study Of Disciplinary Literacy Pra...
After The Content Course  An Expert-Novice Study Of Disciplinary Literacy Pra...After The Content Course  An Expert-Novice Study Of Disciplinary Literacy Pra...
After The Content Course An Expert-Novice Study Of Disciplinary Literacy Pra...Carmen Pell
 
A Study Of Visible Light Communication With Li- Fi Technology
A Study Of Visible Light Communication With Li- Fi TechnologyA Study Of Visible Light Communication With Li- Fi Technology
A Study Of Visible Light Communication With Li- Fi TechnologyCarmen Pell
 
Animal Amp Cat Anatomy Coloring Book
Animal  Amp  Cat Anatomy Coloring BookAnimal  Amp  Cat Anatomy Coloring Book
Animal Amp Cat Anatomy Coloring BookCarmen Pell
 
Alan Turing Mathematical Mechanist
Alan Turing  Mathematical MechanistAlan Turing  Mathematical Mechanist
Alan Turing Mathematical MechanistCarmen Pell
 
Analysis Of GMO Food Products Companies Financial Risks And Opportunities In...
Analysis Of GMO Food Products Companies  Financial Risks And Opportunities In...Analysis Of GMO Food Products Companies  Financial Risks And Opportunities In...
Analysis Of GMO Food Products Companies Financial Risks And Opportunities In...Carmen Pell
 
Adults And Children Share Literacy Practices The Case Of Homework
Adults And Children Share Literacy Practices  The Case Of HomeworkAdults And Children Share Literacy Practices  The Case Of Homework
Adults And Children Share Literacy Practices The Case Of HomeworkCarmen Pell
 

Plus de Carmen Pell (20)

Abstract Paper Sample Format - 010 Essay Examp
Abstract Paper Sample Format - 010 Essay ExampAbstract Paper Sample Format - 010 Essay Examp
Abstract Paper Sample Format - 010 Essay Examp
 
Introducing A Topic Opinion Writing Writing Introductions
Introducing A Topic Opinion Writing  Writing IntroductionsIntroducing A Topic Opinion Writing  Writing Introductions
Introducing A Topic Opinion Writing Writing Introductions
 
Routemybook - Buy MLA Handbook For Writers Of R
Routemybook - Buy MLA Handbook For Writers Of RRoutemybook - Buy MLA Handbook For Writers Of R
Routemybook - Buy MLA Handbook For Writers Of R
 
The Fascinating Recommendation Report T
The Fascinating Recommendation Report TThe Fascinating Recommendation Report T
The Fascinating Recommendation Report T
 
Example Of Abstract In Thesis Writing. Writing An Abstract
Example Of Abstract In Thesis Writing. Writing An AbstractExample Of Abstract In Thesis Writing. Writing An Abstract
Example Of Abstract In Thesis Writing. Writing An Abstract
 
Policy Brief Format Template
Policy Brief Format TemplatePolicy Brief Format Template
Policy Brief Format Template
 
Sample Annotated Bibliography Free Download
Sample Annotated Bibliography Free DownloadSample Annotated Bibliography Free Download
Sample Annotated Bibliography Free Download
 
Components Of Thesis
Components Of ThesisComponents Of Thesis
Components Of Thesis
 
TEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
TEP025 Writing The Aims And Hypotheses Of Your Laboratory ReportTEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
TEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
 
Animal nutrition 1 (a.pdf
Animal nutrition 1 (a.pdfAnimal nutrition 1 (a.pdf
Animal nutrition 1 (a.pdf
 
A Review of Sustainable Finance and Financial Performance Literature Mini-Re...
A Review of Sustainable Finance and Financial Performance Literature  Mini-Re...A Review of Sustainable Finance and Financial Performance Literature  Mini-Re...
A Review of Sustainable Finance and Financial Performance Literature Mini-Re...
 
Alimentacion De Camelidos Sudamericanos
Alimentacion De Camelidos SudamericanosAlimentacion De Camelidos Sudamericanos
Alimentacion De Camelidos Sudamericanos
 
Anisha- Brand Audit
Anisha- Brand AuditAnisha- Brand Audit
Anisha- Brand Audit
 
Author Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
Author  Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...Author  Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
Author Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
 
After The Content Course An Expert-Novice Study Of Disciplinary Literacy Pra...
After The Content Course  An Expert-Novice Study Of Disciplinary Literacy Pra...After The Content Course  An Expert-Novice Study Of Disciplinary Literacy Pra...
After The Content Course An Expert-Novice Study Of Disciplinary Literacy Pra...
 
A Study Of Visible Light Communication With Li- Fi Technology
A Study Of Visible Light Communication With Li- Fi TechnologyA Study Of Visible Light Communication With Li- Fi Technology
A Study Of Visible Light Communication With Li- Fi Technology
 
Animal Amp Cat Anatomy Coloring Book
Animal  Amp  Cat Anatomy Coloring BookAnimal  Amp  Cat Anatomy Coloring Book
Animal Amp Cat Anatomy Coloring Book
 
Alan Turing Mathematical Mechanist
Alan Turing  Mathematical MechanistAlan Turing  Mathematical Mechanist
Alan Turing Mathematical Mechanist
 
Analysis Of GMO Food Products Companies Financial Risks And Opportunities In...
Analysis Of GMO Food Products Companies  Financial Risks And Opportunities In...Analysis Of GMO Food Products Companies  Financial Risks And Opportunities In...
Analysis Of GMO Food Products Companies Financial Risks And Opportunities In...
 
Adults And Children Share Literacy Practices The Case Of Homework
Adults And Children Share Literacy Practices  The Case Of HomeworkAdults And Children Share Literacy Practices  The Case Of Homework
Adults And Children Share Literacy Practices The Case Of Homework
 

Dernier

How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17Celine George
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFVivekanand Anglo Vedic Academy
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the lifeNitinDeodare
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17Celine George
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Denish Jangid
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...Gary Wood
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptxPoojaSen20
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismDabee Kamal
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital ManagementMBA Assignment Experts
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptxPoojaSen20
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhleson0603
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxLimon Prince
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 

Dernier (20)

How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 

Apollo Towards Factfinding In Participatory Sensing

  • 1. Apollo: Towards Factfinding in Participatory Sensing H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth University of Illinois at Urbana Champaign Urbana, IL 61801 B. Szymanski, S. Adali Rensselaer Polytechnic Institute Troy, NY 12180 ABSTRACT This demonstration presents Apollo, a new sensor informa- tion processing tool for uncovering likely facts in noisy par- ticipatory sensing data1 . Participatory sensing, where users proactively document and share their observations, has re- ceived significant attention in recent years as a paradigm for crowd-sourcing observation tasks. However, it poses inter- esting challenges in assessing confidence in the information received. By borrowing clustering and ranking tools from data mining literature, we show how to group data into sets (or claims), corroborating specific events or observations, then iteratively assess both claim and source credibility, ulti- mately leading to a ranking of described claims by their like- lihoold of occurrence. Apollo belongs to a category of tools called fact-finders. It is the first fact-finder designed and im- plemented specifically for participatory sensing. Apollo uses Twitter as the underlying engine for sharing participatory sensing data. Twitter is widely popular, can be interfaced to cell-phones that share sensor data, and comes with a pow- erful search API, as well as a publish-subscribe mechanism. We evaluate it using a participatory sensing application that collects and posts noisy vehicular traffic data on Twitter, as well as a set of 60,000 (human) tweets collected during the Haiti tsunami and a set of 500,000 tweets collected about Cairo during its recent unrest. Viewers of the demonstra- tion will interact with Apollo for various fact-finding tasks. Categories and Subject Descriptors D.2.5 [Software Engineering]: General Terms Design, Reliability, Experimentation Keywords Data fusion, Participatory sensing, Quality of information 1 Named after the Greek god of light and truth, among other designations IPSN 2011 Chicago, IL USA 1. INTRODUCTION This demonstration illustrates a fact-finding tool designed to uncover most likely truth in noisy participatory sensing data. Participatory sensing [1], where participants proac- tively report their observations, has become an increasingly important data collection paradigm, where humans act as the sensors, or employ devices they own (such as cell phones) to perform sensing tasks. Its growing importance stems from the large penetration of sensor-enabled devices with com- munication capabilities in human populations. However, it poses challenges in that data are collected from individu- als and devices who may not always be accurate. Filtering the deluge of data for correctness is an important challenge, commonly known in machine learning and knowledge discov- ery literature as fact-finding. Apollo is the first fact-finder designed specifically for participatory sensing data. A fact-finder [6] maintains the abstraction of sources and claims. In general, starting with no a priori information, it iteratively computes their credibility: The credibility of claims depends on the credibility of sources that make them. Similarly, the credibility of sources depends on the credibil- ity of claims they make. Iterations continue until they con- verge. Enhancements of this basic iterative scheme include a more general notion of claim assertion, where weights de- scribe how confidently a source asserts a claim [5], incorpo- ration of prior knowledge in the analysis [4], and clustering of claims by subject [3] since the credibility of a source may depend on the subject matter. The impact of dependence between sources on credibility (e.g., when one spreads claims made by others) can also be accounted for [2]. Apollo is a general framework for fact-finding in participa- tory sensing data. Data-type-dependent modules first con- vert source data (a set of source, observation tuples) into a common representation of sources and claims. Clustering is performed on input tuples first, by similarity of their obser- vations, to generate a smaller number of claims for scalabil- ity. Their credibility is then assessed in an iterative fashion, together with the credibility of each source. Figure 1 illus- trates the architecture of Apollo. We evaluate our fact-finding algorithm in three distinct sce- narios. In the first, a controlled experiment is conducted where a set of participants report (tweet) traffic speed data from GPS sensors in vehicles. We intentially corrupt a frac- tion of observations. We also allow some participants to report traffic information they heard from others, as op-
  • 2. Sensor Data Front-end Text Front-end Image Front-end Distance M etric, S Distance M etric, T Distance M etric, I Claim Clusters A Network of Claims and Sources Claim Credibility Assessment Source Credibility Assessment Figure 1: The Architecture of Apollo posed to reporting their own (i.e., spread rumors). We show that computing average statistics from such noisy data is not always accurate. However, when data tuples are clustered and ranked by Apollo, the quality of reported observations increases considerably. The second and third scenarios use Twitter data sets collected during the Haiti Tsunami (60,000 reports) and the recent Cairo unrest (500,000 reports), rep- sectively. In each case, the significant number of reports describe a much smaller number of events, some real and some rumored. We use our algorithm to identify the dis- tinct events and rank them by credibility. The results are then compared to media reports. 2. SYSTEM OVERVIEW We consider a system composed of human or sensory data sources that send their observations as Twitter feeds. The sink has two modes of operation. A follower mode, where it follows only explicitly named sources, and a crawler mode, where it uses the Twitter search API to download all tweets on the topic of a query (subject to API-specific volume con- straints). A fact-finding algorithm is then applied to clean up the data. The algorithm has the following reconfigurable parts: • The parser: It uses a configuration file (that describes the format of the input data stream) to convert in- put data into a standard JSON format. Conceptually, the data stream is composed of source, observation tu- ples, where source identifies the data source (e.g., the Twitter user ID), and the observation content may be structured, as in the case of phones sharing sensor val- ues, or unstructured, as in the case of people sharing text. The configuration file describes the observation format. • The distance metrics: A library of different distance metrics is provided for clustering of observations. For unstructured text, these metrics reflect text similar- ity. For structured data, metrics compute differences in data vectors. Data can be multidimensional. For ex- ample, when cell-phones report sensor values at given locations, both measurements and locations can be el- ements of the data vector. • The cluster credibility metrics: When observations are clustered by the distance metric, one needs to rank different clusters in terms of credibility. A library of ranking functions are provided, inspired by current fact-finding literature, ranging from simple ones (how many tuples are in the cluster) to more complex ones (that take into account the social network relation be- tween sources). For example, the rank of a cluster may be lower if observations in the cluster are reported by a group of tweeters who are followers of the same source. • Source credibility metrics: More credible sources are those whose observations are found more often in more credible clusters. This basic intuition leads to a library of simple credibility metrics to rank sources. With the above parameters, the rest of the algorithm simply iterates on computing clusters, cluster credibility metrics, source credibility metrics, until stable results are obtained. 3. THE DEMONSTRATION SCRIPT During the demo, a laptop will be used that runs Apollo on three different sets of Twitter data; a set obtained in a con- trolled vehicular traffic speed measurement experiment, a set collected from Haiti during its tsunami, and a set collected from Cairo during its recent unrest. Users will be allowed to reconfigure the experiments by changing the used distance and credibility metrics, clustering algorithm parameters, as well as the amount of data operated on, and observing their effects on final results. This will help answer questions such as: what is the impact (in the given scenario) of considering social network relations between data sources on improv- ing quality of information in participatory sensing systems? What are the trade-offs between different distance metrics among data vectors? How coarse-grained can observation clustering be without impacting correctness of results? How much data is needed to get reliable results? Results will be displayed as lists of credible events and credible sources. “Ground truth” data will also be shown (which is known for our controlled experiment and estimated from other media in the other two case studies). 4. REFERENCES [1] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and M. B. Srivastava. Participatory sensing. In In: Workshop on World-Sensor-Web (WSW): Mobile Device Centric Sensor Networks and Applications, pages 117–134, 2006. [2] X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. Proc. VLDB Endow., 2:550–561, August 2009. [3] M. Gupta, Y. Sun, and J. Han. Trust analysis with clustering (poster paper). In World Wide Web Conference (WWW), Hyderabad, India, March 2011. [4] J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proc. the International Conference on Computational Linguistics (COLING), Beijing, China, August 2010. [5] J. Pasternack and D. Roth. Generalized fact-finding (poster paper). In World Wide Web Conference (WWW), Hyderabad, India, March 2011. [6] X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. on Knowl. and Data Eng., 20:796–808, June 2008.