SlideShare une entreprise Scribd logo
1  sur  57
Examples of Working
with Streaming Data
Yi-Shin Chen
Institute of Information Systems and Applications
Department of Computer Science
National Tsing Hua University
yishin@gmail.com
Hello
陳宜欣 Yi-Shin Chen
 Currently
 Associate professor at NTHU CS
 Director of IDEA Lab
 Education
 Ph.D. in Computer Science, USC, USA
 M.B.A. in Information Management, NCU, TW
 B.B.A. in Information Management, NCU, TW
 Courses
 Introduction to Database Systems
 Advanced Database Systems
 Data Mining: Concepts, Techniques, and
Applications
2
Research Focus from 2000
Storage
Index
Optimization
Query
Mining
DB
Streaming Data
What should we know?
Streaming Data
Continuous flow
 E.g.,
Infinite length
 Impractical to store and use all historical data
Concept drift
 Not wise to use all historical data
Stock Volume
Sensor Data
Social Stream
6
Continuous Queries
Stream DB
Acquisition
Process
Raw data &
Transformation of
Raw Stream
Transformation of
Raw Stream
Continuous
Query
Process
Crowd Wisdom
Rules/Patterns
Continuously Provide Feedback
Three major approaches for continuous queries
•Fast on-line classification/clustering
•Sliding window
•Range aggregation
Example 1
Auto-identify the Influence of Events Based on
Stock News
Framework of Off-line Training Module
Acquisition
Process
Acquisition
Process
Crowd Wisdom
Rules/Patterns
Alignment
Industry:
Finance
Industry:
Textile
Industry:
Car
………
….
𝑏𝑒𝑙𝑜𝑛𝑔 𝑛 = [𝑃 𝑓𝑖𝑛𝑎𝑛𝑐𝑒, 𝑃𝑡𝑒𝑥𝑡𝑖𝑙𝑒, … … , 𝑃 𝑐𝑎𝑟]
於2011年4月在上海車展首度現身的Luxgen
Neora概念車,不但是國產自主品牌Luxgen自
創立以來,首度推出的第一輛概念車款……
𝑏𝑒𝑙𝑜𝑛𝑔 𝑛 = [0, 0, … … , 3]
Comp-
anies
Related
words
Comp-
anies
Related
words
Comp-
anies
Related
words
𝑃𝑓𝑖𝑛𝑎𝑛𝑐𝑒 =0 𝑃𝑡𝑒𝑥𝑡𝑖𝑙𝑒 =0 𝑃𝑐𝑎𝑟 = 3
Itemset Production
日本+地震 日本+救災
日本+地震 日本+淹水
日本+地震 日本+影響
日本+地震 日本+預估
日本+地震 日本+破壞
日本+購買 日本+旅遊
…
…
…
…
…
…
…
…
…
…
…
…
The confidence of
日本+地震:
The number of 日本+地震
appears in all transactions:
𝑢 𝑠
The number of 日本 appears
in all transactions:
𝑛 𝑝
The confidence of 日本+地
震 :
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 =
𝑢 𝑠
𝑛 𝑝
=
5
6
Group
Representative Itemset Selection
Select itemsets based on high confidence as a
candidate of representative itemset.
𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑥 ∗ 𝑡𝑓𝑖𝑑𝑓1 + 𝑦 ∗ 𝑡𝑓𝑖𝑑𝑓2 + 𝑧 ∗ 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒
日本+地震 日本+預估 核能+外洩 危機+發生
日本 地震 預估 核能 外洩 危機 發生
0.22 0.25 0.03 0.18 0.2 0.10 0.001
日本+地震 日本+預估 核能+外洩 危機+發生
0.833 1 0.667 0.667
日本,地震,核能,外洩
Concept
Concept Verification
By considering:
 The daily frequency of concept 𝐶𝑗
 The concept index 𝐶𝐼𝑗 of 𝐶𝑗
 Regression model based on price within sliding windows
If p-value reject 𝐻0, the concept 𝐶𝑗 will be
considered as an influential event
On-line Prediction Module
Regression prediction
 Use most frequent event.
Adjust regression prediction
 Include other events which is not the most frequent.
Pheromone prediction
 Include the past influence.
Continuous
Query
Process
Experimental Data
 Stock data
 Industry index from TWSE.
 2012-01-01 to 2012-05-11
 News data
 Crawl the news form website.
 Yahoo!, udn, Libertytimes, PCHome, etc.
 Total 13 websites.
 2012-01-01 to 2012-05-11
 More than 150,000 news.
 All the news is in Traditional Chinese.
Experimental Setup
Four methods to predict the market:
 Pheromone prediction model
 Adjust regression prediction model
 Regression prediction model
 Blind test.
Prediction
policy: fall rise
NSM
(no significant move)
Performance
Accuracy of four methods:
Methods Average
Accuracy
Pheromone 0.5784574
Adjust
regression
0.5323214
Regression 0.5134457
Blind test 0.3045479
Performance
Is it work on the whole market?
 It catches our attention on using event to predict the
whole market by aggregate all the industry into all.
Type Accuracy
Pheromone 0.6315789
Adjust Regression 0.6896511
Regression 0.5714285
Example2
An Interactive Conducting System
Using Motion Detector
Motivation
Diversify human computer interaction
technology with multimedia
 Music education
 Music experiment
 Amateur and professional conductors
 Composers
 Personal amusement
19
Devices
 Build an interactive conducting system using motion
Microsoft Kinect
20
3D Depth Sensors
Challenges
21
1
2
3
4
1
2
3
4
Conducting Data (Data Streams)
 Cartesian coordinate (x,y,z)
 30 Frames per second under 320x240 resolution
 delay 33 ms (1/30 second)
 Human eyes can process 10 to 12 frames per second [2]
 delay ≈ 100 ms (1/10 second)
22
+Y
+X
Z
Sensor Direction
-X
-Y
Framework
23
Conducting Data
Received
Beat Pattern
Recognition
Whole Measure
Volume Identify Instrument Emphasis
Relative height of hand Tilt Z-Mapping
Volume Adjustment
According to
Instrument Emphasis
Tempo Adjustment
According to
Instrument Emphasis
YesStop Gesture
Recognition
Initial System
PlayStatus = False
Is
PlayStatus
true
No
Is
Stop
true
Is
Start
true
Yes
PlayStatus
= False
No Yes
PlayStatus
= True No
Start Gesture
Recognition
Acquisition
Process
Crowd Wisdom
Rules/Patterns
Offline Analysis
Continuous
Query
Process
Experiments
24
 Evaluation
 Beat pattern and measure recognition
 Volume control and instrument emphasis recognition
 Response time
 Experimental Setup
 Participants
 1 professional
 8 had no experience
 Practice
 30 minutes
Beat Pattern and Measure Recognition Evaluation
25
0.7826
0.86480.8438
0.8821
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Professional No Experiece
RecognitionRate
Recall
Precision
Instrument Emphasis
26
 Adjust volume in the correct instrument sections
1 0.9375 1
0.8666
1 11 1 1 0.9286 1 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RecognitionRate
Recall
Precision
Example3
Social Stream Analysis for Location Identification
Goal
Identify the location of a particular Twitter
user at a given time
 Using exclusively the content of his/her tweets
28
Major Challenges
Twitter Challenges
 Tweets are noisy
 Extensive use of non-standard vocabulary
 Bots and spammers
Geo-locational Challenges
 Users might have several associated locations
 Toponyms
 Scarce information
 False profile information
29
Framework
Acquisition
Process
Crowd Wisdom
Rules/Patterns
Continuous
Query
Process
Experimental Setup
 Original Dataset 1.53 M Twitter users and 13 M tweets
 3,314 Twitter users and 2.2 M tweets
 104,054 geo-tagged tweets
 Although we collected and processed data carefully, it still
needed to be validated
• Use of Local Experts
– People familiar with the geography of the country
Original
Tweets
Subject
Identification
Location
Discovery Tweets
Toponyms
Removal
Timeline
Sorting
Final
Results
329,814 57,153 18,662 9,093 6,928 2,165
Evaluation
Recruited an international work force from
 Crowdsourcing with good reputation
General Statistics
Example4
Social Stream Analysis for Event Identification
Introduction
By analyzing social streams, it can benefit in
 Emergency control
 Crowd opinion analysis
 Unreported events detection
Motivation: event identification from social
streams
35
Methodology
36
Tweets Data
Preprocess
Keyword
Selection
Event Candidate
Recognition
Event
Candidates
User Social
Structures
Evolving Social
Graph Analysis
Event
Identification
Acquisition
Process
Continuous
Query
Process
Offline Analysis
Crowd Wisdom
Rules/Patterns
Methodology – Keyword Selection
Well-noticed criterion
 Compared to the past, if a word suddenly be
mentioned by many users, it is well-noticed
 Time Frame – a unit of time period
 Sliding Window – a certain number of past time frames
time
tf0 tf1 tf2 tf3 tf4
37
Methodology –
Event Candidate Recognition
Idea: group one keyword with its most relevant
keywords into one event candidate
38
boston
explosion confirm
prayerbombing boston-
marathon
threat
iraq
jfk
hospital
victim afghanistan
bomb
america
Methodology –
Evolving Social Graph Analysis
 Information decay:
 Vertex weight, edge weight
 Decay mechanism
 Concept-Based Evolving Graph Sequences (cEGS):
a sequence of directed graphs that demonstrate
information propagation
tf1 tf2 tf3
39
Experiment
Testing
 Events identified in November 2013
 Evaluated by 7 human experts
40
Average precision 86.64%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Nov_2
Nov_3
Nov_4
Nov_5
Nov_6
Nov_7
Nov_8
Nov_10
Nov_11
Nov_12
Nov_13
Nov_14
Nov_15
Nov_16
Nov_17
Nov_18
Nov_19
Nov_22
Nov_23
Nov_24
Nov_25
Nov_26
Nov_27
Nov_28
Nov_29
Nov_30
Precision
Date
Example 5
Social Stream Analysis for Mental Disorder Detection
Introduction
18.1% people suffer from mental disorder in United States (*)
Using Social Network to research on Mental Disorder
National Insititute of Mental Helath:
http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml
Analyze
Background
Bipolar Disorder:
*Unstable and impulsive emotions
Cycling between Maniac and Depression
episodes
Borderline Personality Disorder:
*Unstable and impulsive emotions
Impaired social interactions
Framework Acquisition
Process
Crowd Wisdom
Rules/Patterns
Collect Patient Data
45
Support
Group
Collect Patient Data
46
Followers
Collect Patient Data
47
Collect Patient Data
48
Collect Patient Data
49
Wait!
Control
Group
Needed
Collect Data from Ordinary People
50
Collect Data from Ordinary People
51
Collect Data from Ordinary People
52
Basic Guidelines
 Identify the common and differences between
the experimental and control groups
 Word/pattern frequency
 Emotion related data (e.g., flipping rates, occurrence rates)
 Social interaction (e.g., retweet, reply)
 Lifestyle (e.g., online time, stay-up or not)
 Age and gender
Features
53
Apply Classifiers (Online)
 By utilize the extracted features
 Various classifiers
 Neural Networks
 Naïve Bayes and Bayesian Belief Networks
 Support Vector Machines
 Random forest
54
Continuous
Query
Process
Precisions
55
Possible Continuous Query Results
56
More in the future…
Thank you.
Contact me at:
yishin@gmail.com

Contenu connexe

Tendances

Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
Machine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentMachine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentAnant Agarwal
 
Perception Analyzer Overview
Perception Analyzer OverviewPerception Analyzer Overview
Perception Analyzer Overviewmdulle
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101Setu Chokshi
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
RecSys Challenge 2016
RecSys Challenge 2016RecSys Challenge 2016
RecSys Challenge 2016Fabian Abel
 

Tendances (6)

Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Machine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to DeploymentMachine Learning for Forecasting: From Data to Deployment
Machine Learning for Forecasting: From Data to Deployment
 
Perception Analyzer Overview
Perception Analyzer OverviewPerception Analyzer Overview
Perception Analyzer Overview
 
Machine Learning 101
Machine Learning 101Machine Learning 101
Machine Learning 101
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
RecSys Challenge 2016
RecSys Challenge 2016RecSys Challenge 2016
RecSys Challenge 2016
 

Similaire à Examples of working with streaming data

D1 design and analysis approaches to evaluate cardiovascular risk - 2012 eugm
D1   design and analysis approaches to evaluate cardiovascular risk - 2012 eugmD1   design and analysis approaches to evaluate cardiovascular risk - 2012 eugm
D1 design and analysis approaches to evaluate cardiovascular risk - 2012 eugmtherealreverendbayes
 
Eugm 2012 gaydos - design and analysis approaches to evaluate cardiovascula...
Eugm 2012   gaydos - design and analysis approaches to evaluate cardiovascula...Eugm 2012   gaydos - design and analysis approaches to evaluate cardiovascula...
Eugm 2012 gaydos - design and analysis approaches to evaluate cardiovascula...Cytel USA
 
2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...
2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...
2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...Cytel USA
 
Power and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesPower and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesnQuery
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process ControlNicola Mezzetti
 
Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...Joao Galdino Mello de Souza
 
Six sigma tools an overview
Six sigma tools  an overviewSix sigma tools  an overview
Six sigma tools an overviewKomal Kamble
 
FASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFER
FASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFERFASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFER
FASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFERiQHub
 
Machine learning and Internet of Things, the future of medical prevention
Machine learning and Internet of Things, the future of medical preventionMachine learning and Internet of Things, the future of medical prevention
Machine learning and Internet of Things, the future of medical preventionPierre Gutierrez
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.pptBonnieKabiru
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Biosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory UniversityBiosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory UniversityTaha Kass-Hout, MD, MS
 
An SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE ApproachAn SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE ApproachIOSR Journals
 
Modern Management Techniques.pptx
Modern Management Techniques.pptxModern Management Techniques.pptx
Modern Management Techniques.pptxImmanuel Joshua
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 

Similaire à Examples of working with streaming data (20)

D1 design and analysis approaches to evaluate cardiovascular risk - 2012 eugm
D1   design and analysis approaches to evaluate cardiovascular risk - 2012 eugmD1   design and analysis approaches to evaluate cardiovascular risk - 2012 eugm
D1 design and analysis approaches to evaluate cardiovascular risk - 2012 eugm
 
Eugm 2012 gaydos - design and analysis approaches to evaluate cardiovascula...
Eugm 2012   gaydos - design and analysis approaches to evaluate cardiovascula...Eugm 2012   gaydos - design and analysis approaches to evaluate cardiovascula...
Eugm 2012 gaydos - design and analysis approaches to evaluate cardiovascula...
 
2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...
2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...
2012-05-30 EUGM | GAYDOS | Design & Analysis Approaches to Evaluate Cardiovas...
 
Power and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar SlidesPower and sample size calculations for survival analysis webinar Slides
Power and sample size calculations for survival analysis webinar Slides
 
BPI@BPM2011
BPI@BPM2011BPI@BPM2011
BPI@BPM2011
 
Data mining intro-2009-v2
Data mining intro-2009-v2Data mining intro-2009-v2
Data mining intro-2009-v2
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process Control
 
Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...Did Something Change? - Using Statistical Techniques to Interpret Service and...
Did Something Change? - Using Statistical Techniques to Interpret Service and...
 
Six sigma tools an overview
Six sigma tools  an overviewSix sigma tools  an overview
Six sigma tools an overview
 
FASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFER
FASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFERFASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFER
FASTER PROCESS DEVELOPMENT WITH HYBRID MODELING AND KNOWLEDGE TRANSFER
 
Machine learning and Internet of Things, the future of medical prevention
Machine learning and Internet of Things, the future of medical preventionMachine learning and Internet of Things, the future of medical prevention
Machine learning and Internet of Things, the future of medical prevention
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.ppt
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Swat
SwatSwat
Swat
 
Biosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory UniversityBiosurveillance 2.0: Lecture at Emory University
Biosurveillance 2.0: Lecture at Emory University
 
An SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE ApproachAn SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE Approach
 
Modern Management Techniques.pptx
Modern Management Techniques.pptxModern Management Techniques.pptx
Modern Management Techniques.pptx
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Workshop for newcomers
Workshop for newcomersWorkshop for newcomers
Workshop for newcomers
 
Vikram emerging technologies
Vikram emerging technologiesVikram emerging technologies
Vikram emerging technologies
 

Plus de Yi-Shin Chen

從自然語言處理到文字探勘
從自然語言處理到文字探勘從自然語言處理到文字探勘
從自然語言處理到文字探勘Yi-Shin Chen
 
從人工智慧反思教育現場
從人工智慧反思教育現場從人工智慧反思教育現場
從人工智慧反思教育現場Yi-Shin Chen
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining Yi-Shin Chen
 
2017大數據情緒分析的經驗分享
2017大數據情緒分析的經驗分享2017大數據情緒分析的經驗分享
2017大數據情緒分析的經驗分享Yi-Shin Chen
 
照海華德福教育簡介
照海華德福教育簡介照海華德福教育簡介
照海華德福教育簡介Yi-Shin Chen
 
新竹實驗教育的新契機
新竹實驗教育的新契機新竹實驗教育的新契機
新竹實驗教育的新契機Yi-Shin Chen
 
一名女科技人的反思
一名女科技人的反思一名女科技人的反思
一名女科技人的反思Yi-Shin Chen
 
2017 ncu experience sharing
2017 ncu experience sharing2017 ncu experience sharing
2017 ncu experience sharingYi-Shin Chen
 
Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text MiningYi-Shin Chen
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handoutYi-Shin Chen
 
TAAI 2016 Keynote Talk: Contention and Disruption
TAAI 2016 Keynote Talk: Contention and DisruptionTAAI 2016 Keynote Talk: Contention and Disruption
TAAI 2016 Keynote Talk: Contention and DisruptionYi-Shin Chen
 
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent SystemTAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent SystemYi-Shin Chen
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AIYi-Shin Chen
 
2016 datascience emotion analysis - english version
2016 datascience emotion analysis - english version2016 datascience emotion analysis - english version
2016 datascience emotion analysis - english versionYi-Shin Chen
 
大數據下的情緒分析
大數據下的情緒分析大數據下的情緒分析
大數據下的情緒分析Yi-Shin Chen
 
照海華德福教育簡介
照海華德福教育簡介照海華德福教育簡介
照海華德福教育簡介Yi-Shin Chen
 

Plus de Yi-Shin Chen (17)

從自然語言處理到文字探勘
從自然語言處理到文字探勘從自然語言處理到文字探勘
從自然語言處理到文字探勘
 
從人工智慧反思教育現場
從人工智慧反思教育現場從人工智慧反思教育現場
從人工智慧反思教育現場
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining
 
2017大數據情緒分析的經驗分享
2017大數據情緒分析的經驗分享2017大數據情緒分析的經驗分享
2017大數據情緒分析的經驗分享
 
照海華德福教育簡介
照海華德福教育簡介照海華德福教育簡介
照海華德福教育簡介
 
新竹實驗教育的新契機
新竹實驗教育的新契機新竹實驗教育的新契機
新竹實驗教育的新契機
 
一名女科技人的反思
一名女科技人的反思一名女科技人的反思
一名女科技人的反思
 
2017 ncu experience sharing
2017 ncu experience sharing2017 ncu experience sharing
2017 ncu experience sharing
 
Quick Tour of Text Mining
Quick Tour of Text MiningQuick Tour of Text Mining
Quick Tour of Text Mining
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
TAAI 2016 Keynote Talk: Contention and Disruption
TAAI 2016 Keynote Talk: Contention and DisruptionTAAI 2016 Keynote Talk: Contention and Disruption
TAAI 2016 Keynote Talk: Contention and Disruption
 
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent SystemTAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AI
 
Research and life
Research and lifeResearch and life
Research and life
 
2016 datascience emotion analysis - english version
2016 datascience emotion analysis - english version2016 datascience emotion analysis - english version
2016 datascience emotion analysis - english version
 
大數據下的情緒分析
大數據下的情緒分析大數據下的情緒分析
大數據下的情緒分析
 
照海華德福教育簡介
照海華德福教育簡介照海華德福教育簡介
照海華德福教育簡介
 

Dernier

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 

Dernier (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 

Examples of working with streaming data

  • 1. Examples of Working with Streaming Data Yi-Shin Chen Institute of Information Systems and Applications Department of Computer Science National Tsing Hua University yishin@gmail.com
  • 2. Hello 陳宜欣 Yi-Shin Chen  Currently  Associate professor at NTHU CS  Director of IDEA Lab  Education  Ph.D. in Computer Science, USC, USA  M.B.A. in Information Management, NCU, TW  B.B.A. in Information Management, NCU, TW  Courses  Introduction to Database Systems  Advanced Database Systems  Data Mining: Concepts, Techniques, and Applications 2
  • 3. Research Focus from 2000 Storage Index Optimization Query Mining DB
  • 5. Streaming Data Continuous flow  E.g., Infinite length  Impractical to store and use all historical data Concept drift  Not wise to use all historical data Stock Volume Sensor Data Social Stream
  • 6. 6 Continuous Queries Stream DB Acquisition Process Raw data & Transformation of Raw Stream Transformation of Raw Stream Continuous Query Process Crowd Wisdom Rules/Patterns Continuously Provide Feedback Three major approaches for continuous queries •Fast on-line classification/clustering •Sliding window •Range aggregation
  • 7. Example 1 Auto-identify the Influence of Events Based on Stock News
  • 8. Framework of Off-line Training Module Acquisition Process Acquisition Process Crowd Wisdom Rules/Patterns
  • 9. Alignment Industry: Finance Industry: Textile Industry: Car ……… …. 𝑏𝑒𝑙𝑜𝑛𝑔 𝑛 = [𝑃 𝑓𝑖𝑛𝑎𝑛𝑐𝑒, 𝑃𝑡𝑒𝑥𝑡𝑖𝑙𝑒, … … , 𝑃 𝑐𝑎𝑟] 於2011年4月在上海車展首度現身的Luxgen Neora概念車,不但是國產自主品牌Luxgen自 創立以來,首度推出的第一輛概念車款…… 𝑏𝑒𝑙𝑜𝑛𝑔 𝑛 = [0, 0, … … , 3] Comp- anies Related words Comp- anies Related words Comp- anies Related words 𝑃𝑓𝑖𝑛𝑎𝑛𝑐𝑒 =0 𝑃𝑡𝑒𝑥𝑡𝑖𝑙𝑒 =0 𝑃𝑐𝑎𝑟 = 3
  • 10. Itemset Production 日本+地震 日本+救災 日本+地震 日本+淹水 日本+地震 日本+影響 日本+地震 日本+預估 日本+地震 日本+破壞 日本+購買 日本+旅遊 … … … … … … … … … … … … The confidence of 日本+地震: The number of 日本+地震 appears in all transactions: 𝑢 𝑠 The number of 日本 appears in all transactions: 𝑛 𝑝 The confidence of 日本+地 震 : 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 𝑢 𝑠 𝑛 𝑝 = 5 6 Group
  • 11. Representative Itemset Selection Select itemsets based on high confidence as a candidate of representative itemset. 𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑥 ∗ 𝑡𝑓𝑖𝑑𝑓1 + 𝑦 ∗ 𝑡𝑓𝑖𝑑𝑓2 + 𝑧 ∗ 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 日本+地震 日本+預估 核能+外洩 危機+發生 日本 地震 預估 核能 外洩 危機 發生 0.22 0.25 0.03 0.18 0.2 0.10 0.001 日本+地震 日本+預估 核能+外洩 危機+發生 0.833 1 0.667 0.667 日本,地震,核能,外洩 Concept
  • 12. Concept Verification By considering:  The daily frequency of concept 𝐶𝑗  The concept index 𝐶𝐼𝑗 of 𝐶𝑗  Regression model based on price within sliding windows If p-value reject 𝐻0, the concept 𝐶𝑗 will be considered as an influential event
  • 13. On-line Prediction Module Regression prediction  Use most frequent event. Adjust regression prediction  Include other events which is not the most frequent. Pheromone prediction  Include the past influence. Continuous Query Process
  • 14. Experimental Data  Stock data  Industry index from TWSE.  2012-01-01 to 2012-05-11  News data  Crawl the news form website.  Yahoo!, udn, Libertytimes, PCHome, etc.  Total 13 websites.  2012-01-01 to 2012-05-11  More than 150,000 news.  All the news is in Traditional Chinese.
  • 15. Experimental Setup Four methods to predict the market:  Pheromone prediction model  Adjust regression prediction model  Regression prediction model  Blind test. Prediction policy: fall rise NSM (no significant move)
  • 16. Performance Accuracy of four methods: Methods Average Accuracy Pheromone 0.5784574 Adjust regression 0.5323214 Regression 0.5134457 Blind test 0.3045479
  • 17. Performance Is it work on the whole market?  It catches our attention on using event to predict the whole market by aggregate all the industry into all. Type Accuracy Pheromone 0.6315789 Adjust Regression 0.6896511 Regression 0.5714285
  • 18. Example2 An Interactive Conducting System Using Motion Detector
  • 19. Motivation Diversify human computer interaction technology with multimedia  Music education  Music experiment  Amateur and professional conductors  Composers  Personal amusement 19
  • 20. Devices  Build an interactive conducting system using motion Microsoft Kinect 20 3D Depth Sensors
  • 22. Conducting Data (Data Streams)  Cartesian coordinate (x,y,z)  30 Frames per second under 320x240 resolution  delay 33 ms (1/30 second)  Human eyes can process 10 to 12 frames per second [2]  delay ≈ 100 ms (1/10 second) 22 +Y +X Z Sensor Direction -X -Y
  • 23. Framework 23 Conducting Data Received Beat Pattern Recognition Whole Measure Volume Identify Instrument Emphasis Relative height of hand Tilt Z-Mapping Volume Adjustment According to Instrument Emphasis Tempo Adjustment According to Instrument Emphasis YesStop Gesture Recognition Initial System PlayStatus = False Is PlayStatus true No Is Stop true Is Start true Yes PlayStatus = False No Yes PlayStatus = True No Start Gesture Recognition Acquisition Process Crowd Wisdom Rules/Patterns Offline Analysis Continuous Query Process
  • 24. Experiments 24  Evaluation  Beat pattern and measure recognition  Volume control and instrument emphasis recognition  Response time  Experimental Setup  Participants  1 professional  8 had no experience  Practice  30 minutes
  • 25. Beat Pattern and Measure Recognition Evaluation 25 0.7826 0.86480.8438 0.8821 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Professional No Experiece RecognitionRate Recall Precision
  • 26. Instrument Emphasis 26  Adjust volume in the correct instrument sections 1 0.9375 1 0.8666 1 11 1 1 0.9286 1 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RecognitionRate Recall Precision
  • 27. Example3 Social Stream Analysis for Location Identification
  • 28. Goal Identify the location of a particular Twitter user at a given time  Using exclusively the content of his/her tweets 28
  • 29. Major Challenges Twitter Challenges  Tweets are noisy  Extensive use of non-standard vocabulary  Bots and spammers Geo-locational Challenges  Users might have several associated locations  Toponyms  Scarce information  False profile information 29
  • 31. Experimental Setup  Original Dataset 1.53 M Twitter users and 13 M tweets  3,314 Twitter users and 2.2 M tweets  104,054 geo-tagged tweets  Although we collected and processed data carefully, it still needed to be validated • Use of Local Experts – People familiar with the geography of the country Original Tweets Subject Identification Location Discovery Tweets Toponyms Removal Timeline Sorting Final Results 329,814 57,153 18,662 9,093 6,928 2,165
  • 32. Evaluation Recruited an international work force from  Crowdsourcing with good reputation
  • 34. Example4 Social Stream Analysis for Event Identification
  • 35. Introduction By analyzing social streams, it can benefit in  Emergency control  Crowd opinion analysis  Unreported events detection Motivation: event identification from social streams 35
  • 36. Methodology 36 Tweets Data Preprocess Keyword Selection Event Candidate Recognition Event Candidates User Social Structures Evolving Social Graph Analysis Event Identification Acquisition Process Continuous Query Process Offline Analysis Crowd Wisdom Rules/Patterns
  • 37. Methodology – Keyword Selection Well-noticed criterion  Compared to the past, if a word suddenly be mentioned by many users, it is well-noticed  Time Frame – a unit of time period  Sliding Window – a certain number of past time frames time tf0 tf1 tf2 tf3 tf4 37
  • 38. Methodology – Event Candidate Recognition Idea: group one keyword with its most relevant keywords into one event candidate 38 boston explosion confirm prayerbombing boston- marathon threat iraq jfk hospital victim afghanistan bomb america
  • 39. Methodology – Evolving Social Graph Analysis  Information decay:  Vertex weight, edge weight  Decay mechanism  Concept-Based Evolving Graph Sequences (cEGS): a sequence of directed graphs that demonstrate information propagation tf1 tf2 tf3 39
  • 40. Experiment Testing  Events identified in November 2013  Evaluated by 7 human experts 40 Average precision 86.64% 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Nov_2 Nov_3 Nov_4 Nov_5 Nov_6 Nov_7 Nov_8 Nov_10 Nov_11 Nov_12 Nov_13 Nov_14 Nov_15 Nov_16 Nov_17 Nov_18 Nov_19 Nov_22 Nov_23 Nov_24 Nov_25 Nov_26 Nov_27 Nov_28 Nov_29 Nov_30 Precision Date
  • 41. Example 5 Social Stream Analysis for Mental Disorder Detection
  • 42. Introduction 18.1% people suffer from mental disorder in United States (*) Using Social Network to research on Mental Disorder National Insititute of Mental Helath: http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml Analyze
  • 43. Background Bipolar Disorder: *Unstable and impulsive emotions Cycling between Maniac and Depression episodes Borderline Personality Disorder: *Unstable and impulsive emotions Impaired social interactions
  • 50. Collect Data from Ordinary People 50
  • 51. Collect Data from Ordinary People 51
  • 52. Collect Data from Ordinary People 52
  • 53. Basic Guidelines  Identify the common and differences between the experimental and control groups  Word/pattern frequency  Emotion related data (e.g., flipping rates, occurrence rates)  Social interaction (e.g., retweet, reply)  Lifestyle (e.g., online time, stay-up or not)  Age and gender Features 53
  • 54. Apply Classifiers (Online)  By utilize the extracted features  Various classifiers  Neural Networks  Naïve Bayes and Bayesian Belief Networks  Support Vector Machines  Random forest 54 Continuous Query Process
  • 57. More in the future… Thank you. Contact me at: yishin@gmail.com

Notes de l'éditeur

  1. 由於我們希望分析事件對不同產業的影響性,因此在category alignment中,我們希望將新聞依照產業區分
  2. 在蒐集了各產業的相關氣業以及相關文字之後,我們使用這些資料作為alignment的依據。 當一篇新聞出現的時候,我們賦予每篇新聞一個 belong vector,這個vector 裡面所有的值代表著對於每個產業的適合度 若該文字出現在某一個產業內,那個產業的適合度就會加一 在這個vector 中的值,越大表示越適合 我們的策略在一開始,是選擇最大的值做為新聞分派的產業,這個做法有個缺陷,因為事件不僅僅會影響一個產業 因此我們希望從這個vector中,挑選出相對顯著的產業 這邊我們使用的方法是outlier detection,利用boxplot選出值較大的outlier作為新聞部屬的產業
  3. 我們希望把這些group裡面常出現的keyword set找出來,這些set 就是我們的itemset。
  4. 一個itemset 中的feature 如以下所示 找出itemset不僅是confidence 高,而且keyword也相對重要。 我們希望透過一個自動 training 的方法,幫助我們找出較好的weight來挑選itemset
  5. 控制變數 反應變數
  6. 我們總共有三種不同的驗證方式,在不同的驗證中,有不同假設。
  7. 我們需要一個指揮系統的其中一個目的是要讓我們對於多媒體資訊的互動方式更加多樣化。 舉例來說,一個指揮系統在學校可以讓學生指揮家自行練習。 那業餘指揮家、專業指揮家及編曲家都可以使用指揮系統來模擬及創造音樂的多樣性。 音樂指揮家系統投入到個人娛樂事業也形成了一種趨勢,我們可以從 Wii music 看得出來。
  8. 所以我的研究目標就是使用 Microsoft Kinect 來建立一套互動式的指揮系統,讓指揮家不需要穿戴任何電子儀器,也沒有使用影像處理的一些限制。 選擇 Microsoft Kinect 的原因是因為它有先進的骨架追踪技術,Kinect 上面有 2 個 3D 的 depth sensors,這 2 個 depth sensors 可以回傳環境物體在空間中的深度資料,並且可以追踪到環境人體的骨架。 還有就是它很便宜,所以我們做出來的系統比較容易普及化。
  9. 從這兩張圖中我們可以看得出來,左邊是理想中的 4 拍拍型,右邊是從 Kinect 上讀到的 4 拍拍型軌跡,從 Kinect 讀到的拍型軌跡會有一些點因為手跟身體重疊所以無法辨識。 於是我們不能單純只用 x,y,z 座標來找出拍型,我們設計出來的演算法需要辨識不同拍型,而且要在資料有遺失的狀況下正常運作 所以我們整理了各個拍型之間的特徵來協助我們找出不同的拍型,這些特徵會在之後的實作方法提到。
  10. Kinect 可以同時偵測到一個使用者身上的20個關節點。 Kinect 會將這些關節點以 Cartesian coordinate 的座標系統回傳,該座標系統的原點是脊椎,所以我們的 input data 是一連串的關節點座標。 我們使用 Kinect 的深度圖像是最低的解析度 320x240,因為在這個解析度下 Kinect 一秒鐘會回傳的 Frames 是最多的,也就是一秒鐘會回傳30個 frames,所以資料從 Kinect 傳到我們的系統的會有 33 millisecond 的 delay。 那人眼一秒鐘可以處理 10 到 12 個 frames,delay 大約是 100 millisecond,所以其實 Kinect 的傳輸 delay,不會對系統造成太大的影響,影響較大是演算法的設計,所以我們會盡量減低運算量來維持系統的即時性。 一般指揮系統會偵測指揮家的上半身,我們的演算法只會用到上半身的其中 6 個關節點。 這 6 個點是左右手、左右肩膀、頭及脊椎。 (human reaction and response time delay is 100 millisecond)
  11. 這是我整個指揮系統的 framework,一開始因為要指揮的歌曲還沒有播放,所以系統初始化的時候會將 PlayStatus 這個 flag 賦予 false 的值。 接下來如果有使用者站在 Kinect 前面,Kinect 就會追踪使用者的骨架並將座標回傳到系統。 這個時候系統就會判斷歌曲是否已經播放,如果還沒有,系統就會到 start gesture recognition 來等待使用者開始的手勢。 當開始的手勢被辨認以後,系統會播放音樂並將 PlayStatus 設定成 true,然後根據使用者的指揮軌跡來調整音量,控制樂器聲部及 調整音樂tempo。 當歌曲在播放的時候,系統會偵測使用者是否有停止的手勢,如果停止的手勢被辨認出來,那系統會將 PlayStatus 設定成 false,並等待使用者下一個開始的手勢。 那我們現在先來看一下 Kinect 回傳回來的 data。
  12. 我們的實驗主要分成 3 部分,首先是拍型及小節的辨識 接下來是音量調整及聲部強調的辨識。 最後是演算法的反應時間。 因為時間的關係,所以參與這些測試的只有 1 位專業的指揮家及 8 位沒有指揮經驗的同學,之後我們會在找其他專業指揮家來幫我們測試。 在測試之前我們會先讓使用者使用我們的系統半個小時,讓使用者熟悉我們的系統才開始測試。 首先我們來看拍型及小節辨識的部分
  13. 左邊是專業指揮家的辨識結果,右邊是其他沒有指揮經驗的辨識結果 專業指揮家的 recall 相對較低的原因是因為我們只使用一台 Kinect 所以當指揮家身體往左或往右做聲部強調的時候無法偵測到使用者的拍型軌跡。
  14. Recall = 成功改變的次數/使用者想改變的次數 Precision = 成功改變的次數/系統偵測到改變的次數 在音量調整及聲部強調的部分,我們可以看到只有敲擊樂器及銅管樂器的 precision 及 recall 相對較低,原因是這兩個聲部都是在虛擬樂團的後方,當指揮家要強調後方聲部,在把手伸回來,很容易就不小心改變到它前面的聲部。 那因為左右聲部都有緩衝區,所以辨識的錯誤率會比中間的聲部來得低。
  15. Similarly, in order to measure the effectiveness of our method, the results of the Hometown dataset were split into “Factual” and “Empty | Fictional” -The first category refers to those profiles in which the user has explicitly stated his location as a valid point. Belonging to the second category, are those profiles whose location is listed as empty, fictional, or overbroad -WMAE: Workers MAE -Tw MAE: Tweet MAE -Workers would usually agree on the city , but not on the area as a result of their perception. On a general basis, the error distance remained low. Also for reallocated tweets TW mae remain low as compared to the area of united states 3.1 million square miles
  16. diagram
  17. 1.No decay example 2.Decay example
  18. Y to max 1
  19. Stronger motivation: We can actively