SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Backup
29/05/19 1Stefan Dietze
From (Web) Data to Knowledge: on the Complementarity
of Human and Artificial Intelligence
Prof. Dr. Stefan Dietze
Inaugural Lecture, 28 May 2019
Heinrich-Heine-Universität Düsseldorf
Finding “things” on the Web
• Resources
• Facts
• Claims
• Opinions
29/05/19 2Stefan Dietze
Finding “things” on the Web
• Resources
• Facts
• Claims
• Opinions
29/05/19 3Stefan Dietze
Finding “things” on the Web
• Resources
• Facts
• Claims
• Opinions
29/05/19 4Stefan Dietze
Finding “things” on the Web
• Resources
• Facts
• Claims
• Opinions
We‘ll try to use AI to „answer“ that
question at the end of the talk.
29/05/19 5Stefan Dietze
Finding social sciences research data on the Web
29/05/19 6Stefan Dietze
Human/Crowd Intelligence
Artificial Intelligence
„Supervising AI“ with user-
generated data & knowledge
(„making machines smarter“)
Artificial vs human intelligence: a simplistic Web search perspective
 Information retrieval (crawling, indexing,
ranking etc)
 Natural language processing
 (Hyperlink) graph analysis (e.g. PageRank
et al.)
 Statistics and (deep) learning from user
interactions
o Query interpretation & intent prediction
o Classification of users, documents, queries
o Reranking & personalisation
o ….
Facilitating search, retrieval &
knowledge gain of users
„making humans smarter“
29/05/19 7Stefan Dietze
Part I
Symbolic & subsymbolic AI on the Web – a brief introduction
Part II
Extracting machine-interpretable knowledge („making machines smarter“)
Part III
Facilitating search, retrieval & knowledge gain of users („making humans smarter“)
Overview
29/05/19 8Stefan Dietze
Symbols, data & knowledge on the Web
dbr:Tim_Berners-Lee
dbo:Person
„Tim Berners-Lee“@en
1955-06-08^^xsd:date
dbr:MIT
dbr:Washington_DC
dbr:WWW_Foundation
dbo:Organisation
dbo:keyPersonOf
rdf:type
rdfs:subClassOf
foaf:name
dbo:birthDate
dbo:workplaces
yago:LegalActor
dbo:Scientist
Unstructured data
e.g. web pages, user interactions/behavior, clickstreams, sensor data
Machine-interpretable knowledge
e.g. Knowledge graphs, Web markup
dbr:Jakarta
dbo:location
rdf:type
DBpedia (eng.) 200 million facts
Google KG: 18 billion facts
29/05/19 9Stefan Dietze
Symbolic AI
• AI = manipulation and interpretation of
symbols (eventually: “knowledge”)
• Top-down: knowledge representation,
logics, inference, knowledge graphs
• “strong AI hypothesis” or “Physical Symbol
System Hypothesis” (Newell & Simon,
1976), “GOFAI”
Subsymbolic AI
• AI = emulating/engineering human
intelligence, e.g. through cognitive computing
(“perceptron”, Frank Rosenblatt 1957)
• Bottom up: neural networks, machine/deep
learning, distributional semantics
• Also called: “weak AI hypothesis” (Russel &
Norwig, 1995)
Symbolic vs subsymbolic AI
Knowledge
Information
Data
Symbols
Horse ⊓ ¬RockingHorse ⊑ Animal ⊓ ∀(=4)hasLegs
„Intelligence is ten million rules“
(Douglas Lenat, founder of Cyc)
29/05/19 10Stefan Dietze
Subsymbolic AI & deep learning for language understanding
Percentage of deep learning papers in major NLP conferences
(Source: Young et al., Recent Trends in Deep Learning Based Natural Language Processing)
• Distributional semantics &
embeddings: predicting low-
dimensional vector representations
of words & text, e.g. Word2Vec
[Mikolov et al., 2013]
• Efficient RNN/CNN architectures in
encoder/decoder settings (e.g. for
machine translation) [Vaswani et al.,
2017]
• Pretraining language models for
task-specific transfer learning, e.g.,
BERT - Bidirectional Encoder
Representations from Transformers
[Devlin et al., 2018]
T. Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality, NIPS (2013)
J. Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
A. Vaswani et al. Attention is all you need, NIPS (2017)
29/05/19 11Stefan Dietze
Source: https://techcrunch.com/2016/03/24/microsoft-silences-its-new-a-i-bot-tay-after-twitter-users-teach-it-racism/
• Biases in human interactions can be learned and elevated by ML models
• Meaning / semantics are crucial to facilitate interpretation by/of machines & ML models
[N-word]
Learning without semantics
29/05/19 12Stefan Dietze
Semantics and knowledge: a brief (and incomplete) history
• Deductive reasoning, syllogism & categorisation
(Aristotele, 384 BC – 322 BC)
• Formal logic & calculus rationicator (reasoning, symbol manipulation)
(G.W. Leibniz 1646 - 1716)
• „Begriffschrift“, technically: predicate logic
(Gottlob Frege, 1848 – 1925)
• Frames for representing stereotyped situations
(Marvin Minsky, 1974)
• Rules & expert systems
• Ontologies
(Leibniz, Kant, Gruber 1994)
• Description Logics
(Baader & Hollunder, 1991 et al.)
• Semantic Web
(Berners-Lee, Hendler, Lassila, 2001)
& Linked Data
& Knowledge Graphs
29/05/19 13Stefan Dietze
Symbolic & subsymbolic AI: e.g. linking Web documents & KGs
 Robust methods for named entity
disambiguation (NED), e.g. Ambiverse
[Hoffart et al., 2011], Babelfy [Ferragina et al., 2010],
TagMe [Moro et al., 2014]
 Time- and corpus-specific entity
relatedness; prior probabilities and
meaning of entities change over time, e.g.
“Deutschland” during World Cup
[DL4KGS 2018]
 Meta-EL: supervised ensemble learner
exploiting results of different NED systems
[SAC19, CIKM19]
o Considers features of terms,
mentions/occurrences,
dynamics/temporal drift etc
o Outperforms individual NED systems
across diverse documents/corpora
 Problem:
“Completeness” & coverage of KGs?
Fafalios, P., Joao, R.S., Dietze, S., Same but Different: Distant
Supervision for Predicting and Understanding Entity Linking
Difficulty, ACM SAC19
Mohapatra, N., Iosifidis, V., Ekbal, A., Dietze, S., Fafalios, P., Time-
Aware and Corpus-Specific Entity Relatedness, DL4KGS at ESWC2018.
dbr:Tim_Berners-Lee
29/05/19 14
Overview
Part I
Symbolic & subsymbolic AI on the Web – a brief introduction
Part II
Extracting machine-interpretable knowledge („making machines smarter“)
Part III
Facilitating search, retrieval & knowledge gain of users („making humans smarter“)
29/05/19 15Stefan Dietze
Knowledge about: facts, claims, stances & opinions on the Web
Facts & claims Stances, opinions, interactions
<„Tim Berners-Lee“ s:founderOf „Solid“>
29/05/19 16Stefan Dietze
Mining (long-tail) facts from the Web?
<„Tim Berners-Lee“ s:founderOf „Solid“>
 Obtaining verified facts (or knowledge graph) for a
given entity?
 Application of NLP (e.g. NER, relation extraction) at
Web-scale (Google index: 50 trn pages)?
 Exploiting entity-centric embedded Web page markup
(schema.org), prevalent in roughly 40% off Web pages
(44 Bn „facts“ in Common Crawl 2016/3.2 Bn Web
pages)
 Challenges
o Errors. Factual errors, annotation errors (see also
[Meusel et al, ESWC2015])
o Ambiguity & coreferences. e.g. 18.000 entity
descriptions of “iPhone 6” in Common Crawl 2016
& ambiguous literals (e.g. „Apple“>)
o Redundancies & conflicts vast amounts of
equivalent or conflicting statements
29/05/19 17Stefan Dietze
 0. Noise: data cleansing (node URIs, deduplication etc)
 1.a) Scale: Blocking (BM25 entity retrieval) on markup index
 1.b) Relevance: supervised coreference resolution
 2.) Quality & redundancy: data fusion through supervised fact classification (SVM, knn, RF, LR, NB), diverse
feature set (authority, relevance etc), considering source- (eg PageRank), entity-, & fact-level
KnowMore: data fusion on markup
1. Blocking &
coreference
resolution
2. Fusion / Fact selection
New Query Entities
BBC Audio, type:(Organization)
Chapman & Hall, type:(Publisher)
Put Out More Flags, type:(Book)
(supervised)
Entity Description
author Evelyn Waugh
priorWork Put Out More Flags
ISBN 978031874803074
copyrightHolder Evelyn Waugh
releaseDate 1945
… …
Query Entity
Brideshead Revisited,
type:(Book)
Candidate Facts
node1 publisher Chapman & Hall
node1 releaseDate 1945
node1 publishDate 1961
node2 country UK
node2 publisher Black Bay Books
node3 country US
node3 copyrightHolder Evelyn Waugh
… …. ….
Web page
markup
Web crawl
(Common Crawl,
44 bn facts)
approx. 5000 facts for „Brideshead Revisited“
(compare: 125.000 facts for „iPhone6“)
Yu, R., [..], Dietze, S., KnowMore-Knowledge Base
Augmentation with Structured Web Markup, Semantic
Web Journal 2019 (SWJ2019)
Tempelmeier, N., Demidova, S., Dietze, S., Inferring
Missing Categorical Information in Noisy and Sparse
Web Markup, The Web Conf. 2018 (WWW2018)
20 correct/non-redundant
facts for „Brideshead Rev.“
18Stefan Dietze
Fusion performance
 Baselines: BM25, CBFS [ESWC2015], PreRecCorr [Pochampally
et. al., ACM SIGMOD 2014], strong variance across types
Knowledge Graph Augmentation
 Experiments on books, movies, products
 New facts (wrt DBpedia, Wikidata, Freebase):
 On average 60% - 70% of all facts for books & movies new
(across KBs)
 100% new facts for long-tail entities (e.g. products)
 Additional experiments on learning new categorical features
(e.g. product categories or movie genres) [WWW2018]
Beyond facts: claims, opinions and misinformation on the Web
 Investigations into misinformation and opinion forming
received massive attention across a wide range of
disciplines and industries (e.g. [Vousoughi et al. 2018])
 Insights, mostly (computational) social sciences, e.g.
o Spreading of claims and misinformation
o Effect of biased and fake news on public opinions
o Reinforcement of biases and echo chambers
 Methods, mostly in computer science, e.g. for
o Claim/fact detection and verification („fake news
detection“), e.g. CLEF 2018 Fact Checking Lab
(http://alt.qcri.org/clef2018-factcheck/)
o Stance detection, e.g. Fake News Challenge (FNC)
http://www.fakenewschallenge.org/
 Some recent work
o Large-scale public research corpora for
replicating/improving methods/insights
o TweetsKB: 9 Bn annotated tweets
o ClaimsKG: 30 K annotated claims & truth ratings
o ML models for stance detection of Web documents
(towards given claims)
19Stefan Dietze
Stance detection of Web documents
Motivation
 Problem: detecting stance of documents (Web pages)
towards a given claim (unbalanced class distribution)
 Motivation: stance of documents (in particular
disagreement) useful (a) as signal for fake news
detection and (b) Website classification
Approach
 Cascading binary classifiers: addressing individual
issues (e.g. misclassification costs) per step
 Features, e.g. textual similarity (Word2Vec etc),
sentiments, LIWC, etc.
 Best-performing models: 1) SVM with class-wise
penalty, 2) CNN, 3) SVM with class-wise penalty
 Experiments on FNC-1 dataset (and FNC baselines)
Results
 Minor overall performance improvement
 Improvement on disagree class by 27%
(but still far from robust)
A. Roy, A. Ekbal, S. Dietze, P. Fafalios, Step-by-Step: A three-
stage Pipeline for Stance Classification of Documents
towards Claims, CIKM19 under review.
20Stefan Dietze
http://dbpedia.org/resource/Tim_Berners-Lee
wna:positive-emotion
onyx:hasEmotionIntensity "0.75"
onyx:hasEmotionIntensity "0.0"
Mining opinions & interactions (the case of Twitter)
 Heterogenity: multimodal, multilingual, informal,
“noisy” language
 Context dependence: interpretation of
tweets/posts (entities, sentiments) requires
consideration of context (e.g. time, linked
content), “Dusseldorf” => City or Football team
 Dynamics & scale: e.g. 6000 tweets per second,
plus interactions (retweets etc) and context (e.g.
25% of tweets contain URLs)
 Evolution and temporal aspects: evolution of
interactions over time crucial for many social
sciences questions
 Representativity and bias: demographic
distributions not known a priori in archived data
collections
http://dbpedia.org/resource/Solid
wna:negative-emotion
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public
and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public
and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
Mining knowledge about opinions & interactions: TweetsKB
http://l3s.de/tweetsKB
 Harvesting & archiving of 9 Bn tweets over 5 years
(permanent collection from Twitter 1% sample since
2013)
 Information extraction pipeline (distributed via Hadoop
Map/Reduce)
o Entity linking with knowledge graph/DBpedia
(Yahoo‘s FEL [Blanco et al. 2015])
(“president”/“potus”/”trump” =>
dbp:DonaldTrump), to disambiguate text and use
background knowledge (eg US politicians?
Republicans?), high precision (.85), low recall (.39)
o Sentiment analysis/annotation using SentiStrength
[Thelwall et al., 2012], F1 approx. .80
o Extraction of metadata and lifting into established
schemas (SIOC, schema.org), publication using W3C
standards (RDF/SPARQL)
Use cases
 Aggregating sentiments towards topics/entities, e.g. about
CDU vs SPD politicians in particular time period
 Temporal analytics: evolution of popularity of entities/topics
over time (e.g. for detecting events or trends, such as rise of
populist parties)
 Twitter archives as general corpus for understanding temporal
entity relatedness (e.g. “austerity” & “Greece” 2010-2015)
Limitations
 Bias & representativity: demographic distributions of users
(not known a priori and not representative)
 Cf. use case at the end of the talk
-0.40000
-0.30000
-0.20000
-0.10000
0.00000
0.10000
0.20000
0.30000
0.40000
Cologne Düsseldorf
Overview
Part I
Symbolic & subsymbolic AI on the Web – a brief introduction
Part II
Extracting machine-interpretable knowledge („making machines smarter“)
Part III
Facilitating search, retrieval & knowledge gain of users („making humans smarter“)
23Stefan Dietze
Knowledge (gain) while searching the Web (“Search As Learning”)?
Challenges & results
 Detecting coherent search missions?
 Detecting learning throughout search?
detecting “informational” search missions (as
opposed to “transactional” or “navigational”
missions [Broder, 2002])
o Search mission classification with average F1
score 75%
 How competent is the user? –
Predict/understand knowledge state of users
based on in-session behavior/interactions
 How well does a user achieve his/her learning
goal/information need? - Predict knowledge gain
throughout search missions
o Correlation of user behavior (queries,
browsing, mouse traces, etc) & user
knowledge gain/state in search [CHIIR18]
o Prediction of knowledge gain/state through
supervised models [SIGIR18]
24Stefan Dietze
Understanding knowledge gain/state of user during search?
Data collection
 Crowdsourced collection of search session data
 10 search topics (e.g. “Altitude sickness”, “Tornados”), incl. pre-
and post-tests
 Approx. 1000 distinct crowd workers & 100 sessions per topic
 Tracking of user behavior through 76 features in 5 categories
(session, query, SERP – search engine result page, browsing,
mouse traces)
Some results
 70% of users exhibited a knowledge gain (KG)
 Negative relationship between KG of users and topic popularity
(avg. accuracy of workers in knowledge tests) (R= -.87)
 Amount of time users actively spent on web pages describes 7%
of the variance in their KG
 Query complexity explains 25% of the variance in the KG of users
 Topic-dependent behavior: search behavior correlates stronger
with search topic than with KG/KS
Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing
Knowledge Gain of Users in Informational Search
Sessions on the Web. ACM CHIIR 2018.
25Stefan Dietze
26Stefan Dietze
Predicting knowledge gain/state of user during search?
 Stratification into classes: user knowledge state (KS) and
knowledge gain (KG) into {low, moderate, high} using
(low < (mean ± 0.5 SD) < high)
 Supervised multiclass classification (Naive Bayes, Logistic
regression, SVM, random forest, multilayer perceptron)
 KG prediction performance results (after 10-fold cross-validation)
 Feature importance (KG prediction)
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S.,
Analyzing Knowledge Gain of Users in Informational Search
Sessions on the Web. ACM SIGIR 2018.
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S.,
Analyzing Knowledge Gain of Users in Informational Search
Sessions on the Web. ACM SIGIR 2018.
Predicting knowledge gain/state of user during search?
29/05/19 27Stefan Dietze
 Stratification into classes: user knowledge state (KS) and
knowledge gain (KG) into {low, moderate, high} using
(low < (mean ± 0.5 SD) < high)
 Supervised multiclass classification (Naive Bayes, Logistic
regression, SVM, random forest, multilayer perceptron)
 KG prediction performance results (after 10-fold cross-validation)
 Feature importance (KG prediction)
Shortcomings & future work
 Lab studies to obtain more reliable data (controlled
environment, longer sessions) & additional features (eye-
tracking)
 Resource features (complexity, analytic/emotional
language, multimodality etc) as additional signals
[CIKM2019, under review]
 Improving ranking/retrieval in Web search or other
archives
(SALIENT project, Leibniz Cooperative Excellence)
Applications: social sciences research data on the Web
28Stefan Dietze
Improving findability of
(social science) research data
Mining novel (social science)
research data from the Web
http://l3s.de/tweetsKB
https://data.gesis.org/claimskg
Finally: can we use AI & the Web to answer THE question?
29Stefan Dietze
30Stefan Dietze
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze,
TweetsKB: A Public and Large-Scale RDF Corpus of
Annotated Tweets, ESWC'18.
http://dbpedia.org/resource/Tim_Berners-Lee
wna:positive-emotion
onyx:hasEmotionIntensity "0.75"
onyx:hasEmotionIntensity "0.0"
Recap: “Web-mined opinions” in Tweets KB
http://l3s.de/tweetsKB
http://dbpedia.org/resource/Solid
wna:negative-emotion
Total # tweets mentioning (K, D) in 1.5 bn tweets:
• # dbp:Cologne: 89.564
• # dbp:Dusseldorf: 4723
• Opinions in terms of expressed sentiments?
• „Happiness (X) = mean of sentiment score
delta (positive - negative) of all Tweets
mentioning X“
-0.40000
-0.30000
-0.20000
-0.10000
0.00000
0.10000
0.20000
0.30000
0.40000
Cologne Düsseldorf
Mean sentiment scores (2013-2017):
• Happiness(Cologne) = 0.09281
• Happiness(Dusseldorf) = 0.04056
• Positive (Cologne) = 0.17297
• Positive (Dusseldorf) = 0.1245
• Negative (Cologne) = 0.07948
• Negative (Dusseldorf) = 0.09030
Key Findings
• Cologne happier (no significance
testing yet)
• Cologne & Dusseldorf happy overall
(positive sentiments)
Limitations
• Bias: Twitter users not representative
• Bias: Cologne cathedral=> distribution
of tourists & residents among Twitter
users likely different for both cities
January 2016,
Cologne NYE 2015/2016 aftermath
Cologne vs Dusseldorf: a pseudoscientific “answer” using TweetsKB
March 2017,
Axe attack in D?
Happiness(dbp:Cologne)
Happiness(dbp:Dusseldorf)
31Stefan Dietze
Source: https://theculturetrip.com/europe/germany/articles/8-fascinating-things-didnt-know-colognes-cathedral/© freedom100m
Acknowledgements
Co-authors
• Katarina Boland (GESIS, Germany)
• Elena Demidova (L3S, Germany)
• Asif Ekbal (IIT Patna, India)
• Pavlos Fafalios (L3S, Germany)
• Ujwal Gadiraju (L3S, Germany)
• Peter Holtz (IWM, Germany)
• Eirini Ntoutsi (LUH, Germany)
• Vasilis Iosifidis (L3S, Germany)
• Markus Rokicki (L3S, Germany)
• Arjun Roy (IIT Patna, India)
• Renato Stoffalette Joao (L3S, Germany)
• Davide Taibi (CNR, ITD, Italy)
• Nicolas Tempelmeier (L3S, Germany)
• Konstantin Todorov (LIRMM, France)
• Ran Yu (GESIS, Germany)
• Benjamin Zapilko (GESIS, Germany)
32Stefan Dietze
From (Web) Data to Knowledge: on the Complementarity
of Human and Artificial Intelligence
Prof. Dr. Stefan Dietze
Heinrich-Heine-Universität Düsseldorf
GESIS Leibniz Institute for the Social Sciences

Contenu connexe

Tendances

Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceStefan Dietze
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sören Auer
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Fernando de Assis Rodrigues
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural scienceFrank van Harmelen
 

Tendances (20)

Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScience
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
Web 3.0 Emerging
Web 3.0 EmergingWeb 3.0 Emerging
Web 3.0 Emerging
 
Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
 

Similaire à From Web Data to Knowledge: on the Complementarity of Human and Artificial Intelligence

Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Stefan Dietze
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphsStefan Dietze
 
Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1IPLODProject
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningAbcdDcba12
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...Stefan Dietze
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebStefan Dietze
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningStefan Dietze
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 

Similaire à From Web Data to Knowledge: on the Complementarity of Human and Artificial Intelligence (20)

Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Dwdm
DwdmDwdm
Dwdm
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 

Plus de Stefan Dietze

An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...Stefan Dietze
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISStefan Dietze
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Stefan Dietze
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebStefan Dietze
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesStefan Dietze
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebStefan Dietze
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebStefan Dietze
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebStefan Dietze
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-esStefan Dietze
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Stefan Dietze
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Stefan Dietze
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsStefan Dietze
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedStefan Dietze
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsStefan Dietze
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014Stefan Dietze
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Stefan Dietze
 

Plus de Stefan Dietze (20)

An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESIS
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web Datasets
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked Datasets
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014
 
Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
 

Dernier

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 

Dernier (20)

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 

From Web Data to Knowledge: on the Complementarity of Human and Artificial Intelligence

  • 1. Backup 29/05/19 1Stefan Dietze From (Web) Data to Knowledge: on the Complementarity of Human and Artificial Intelligence Prof. Dr. Stefan Dietze Inaugural Lecture, 28 May 2019 Heinrich-Heine-Universität Düsseldorf
  • 2. Finding “things” on the Web • Resources • Facts • Claims • Opinions 29/05/19 2Stefan Dietze
  • 3. Finding “things” on the Web • Resources • Facts • Claims • Opinions 29/05/19 3Stefan Dietze
  • 4. Finding “things” on the Web • Resources • Facts • Claims • Opinions 29/05/19 4Stefan Dietze
  • 5. Finding “things” on the Web • Resources • Facts • Claims • Opinions We‘ll try to use AI to „answer“ that question at the end of the talk. 29/05/19 5Stefan Dietze
  • 6. Finding social sciences research data on the Web 29/05/19 6Stefan Dietze
  • 7. Human/Crowd Intelligence Artificial Intelligence „Supervising AI“ with user- generated data & knowledge („making machines smarter“) Artificial vs human intelligence: a simplistic Web search perspective  Information retrieval (crawling, indexing, ranking etc)  Natural language processing  (Hyperlink) graph analysis (e.g. PageRank et al.)  Statistics and (deep) learning from user interactions o Query interpretation & intent prediction o Classification of users, documents, queries o Reranking & personalisation o …. Facilitating search, retrieval & knowledge gain of users „making humans smarter“ 29/05/19 7Stefan Dietze
  • 8. Part I Symbolic & subsymbolic AI on the Web – a brief introduction Part II Extracting machine-interpretable knowledge („making machines smarter“) Part III Facilitating search, retrieval & knowledge gain of users („making humans smarter“) Overview 29/05/19 8Stefan Dietze
  • 9. Symbols, data & knowledge on the Web dbr:Tim_Berners-Lee dbo:Person „Tim Berners-Lee“@en 1955-06-08^^xsd:date dbr:MIT dbr:Washington_DC dbr:WWW_Foundation dbo:Organisation dbo:keyPersonOf rdf:type rdfs:subClassOf foaf:name dbo:birthDate dbo:workplaces yago:LegalActor dbo:Scientist Unstructured data e.g. web pages, user interactions/behavior, clickstreams, sensor data Machine-interpretable knowledge e.g. Knowledge graphs, Web markup dbr:Jakarta dbo:location rdf:type DBpedia (eng.) 200 million facts Google KG: 18 billion facts 29/05/19 9Stefan Dietze
  • 10. Symbolic AI • AI = manipulation and interpretation of symbols (eventually: “knowledge”) • Top-down: knowledge representation, logics, inference, knowledge graphs • “strong AI hypothesis” or “Physical Symbol System Hypothesis” (Newell & Simon, 1976), “GOFAI” Subsymbolic AI • AI = emulating/engineering human intelligence, e.g. through cognitive computing (“perceptron”, Frank Rosenblatt 1957) • Bottom up: neural networks, machine/deep learning, distributional semantics • Also called: “weak AI hypothesis” (Russel & Norwig, 1995) Symbolic vs subsymbolic AI Knowledge Information Data Symbols Horse ⊓ ¬RockingHorse ⊑ Animal ⊓ ∀(=4)hasLegs „Intelligence is ten million rules“ (Douglas Lenat, founder of Cyc) 29/05/19 10Stefan Dietze
  • 11. Subsymbolic AI & deep learning for language understanding Percentage of deep learning papers in major NLP conferences (Source: Young et al., Recent Trends in Deep Learning Based Natural Language Processing) • Distributional semantics & embeddings: predicting low- dimensional vector representations of words & text, e.g. Word2Vec [Mikolov et al., 2013] • Efficient RNN/CNN architectures in encoder/decoder settings (e.g. for machine translation) [Vaswani et al., 2017] • Pretraining language models for task-specific transfer learning, e.g., BERT - Bidirectional Encoder Representations from Transformers [Devlin et al., 2018] T. Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality, NIPS (2013) J. Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) A. Vaswani et al. Attention is all you need, NIPS (2017) 29/05/19 11Stefan Dietze
  • 12. Source: https://techcrunch.com/2016/03/24/microsoft-silences-its-new-a-i-bot-tay-after-twitter-users-teach-it-racism/ • Biases in human interactions can be learned and elevated by ML models • Meaning / semantics are crucial to facilitate interpretation by/of machines & ML models [N-word] Learning without semantics 29/05/19 12Stefan Dietze
  • 13. Semantics and knowledge: a brief (and incomplete) history • Deductive reasoning, syllogism & categorisation (Aristotele, 384 BC – 322 BC) • Formal logic & calculus rationicator (reasoning, symbol manipulation) (G.W. Leibniz 1646 - 1716) • „Begriffschrift“, technically: predicate logic (Gottlob Frege, 1848 – 1925) • Frames for representing stereotyped situations (Marvin Minsky, 1974) • Rules & expert systems • Ontologies (Leibniz, Kant, Gruber 1994) • Description Logics (Baader & Hollunder, 1991 et al.) • Semantic Web (Berners-Lee, Hendler, Lassila, 2001) & Linked Data & Knowledge Graphs 29/05/19 13Stefan Dietze
  • 14. Symbolic & subsymbolic AI: e.g. linking Web documents & KGs  Robust methods for named entity disambiguation (NED), e.g. Ambiverse [Hoffart et al., 2011], Babelfy [Ferragina et al., 2010], TagMe [Moro et al., 2014]  Time- and corpus-specific entity relatedness; prior probabilities and meaning of entities change over time, e.g. “Deutschland” during World Cup [DL4KGS 2018]  Meta-EL: supervised ensemble learner exploiting results of different NED systems [SAC19, CIKM19] o Considers features of terms, mentions/occurrences, dynamics/temporal drift etc o Outperforms individual NED systems across diverse documents/corpora  Problem: “Completeness” & coverage of KGs? Fafalios, P., Joao, R.S., Dietze, S., Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty, ACM SAC19 Mohapatra, N., Iosifidis, V., Ekbal, A., Dietze, S., Fafalios, P., Time- Aware and Corpus-Specific Entity Relatedness, DL4KGS at ESWC2018. dbr:Tim_Berners-Lee 29/05/19 14
  • 15. Overview Part I Symbolic & subsymbolic AI on the Web – a brief introduction Part II Extracting machine-interpretable knowledge („making machines smarter“) Part III Facilitating search, retrieval & knowledge gain of users („making humans smarter“) 29/05/19 15Stefan Dietze
  • 16. Knowledge about: facts, claims, stances & opinions on the Web Facts & claims Stances, opinions, interactions <„Tim Berners-Lee“ s:founderOf „Solid“> 29/05/19 16Stefan Dietze
  • 17. Mining (long-tail) facts from the Web? <„Tim Berners-Lee“ s:founderOf „Solid“>  Obtaining verified facts (or knowledge graph) for a given entity?  Application of NLP (e.g. NER, relation extraction) at Web-scale (Google index: 50 trn pages)?  Exploiting entity-centric embedded Web page markup (schema.org), prevalent in roughly 40% off Web pages (44 Bn „facts“ in Common Crawl 2016/3.2 Bn Web pages)  Challenges o Errors. Factual errors, annotation errors (see also [Meusel et al, ESWC2015]) o Ambiguity & coreferences. e.g. 18.000 entity descriptions of “iPhone 6” in Common Crawl 2016 & ambiguous literals (e.g. „Apple“>) o Redundancies & conflicts vast amounts of equivalent or conflicting statements 29/05/19 17Stefan Dietze
  • 18.  0. Noise: data cleansing (node URIs, deduplication etc)  1.a) Scale: Blocking (BM25 entity retrieval) on markup index  1.b) Relevance: supervised coreference resolution  2.) Quality & redundancy: data fusion through supervised fact classification (SVM, knn, RF, LR, NB), diverse feature set (authority, relevance etc), considering source- (eg PageRank), entity-, & fact-level KnowMore: data fusion on markup 1. Blocking & coreference resolution 2. Fusion / Fact selection New Query Entities BBC Audio, type:(Organization) Chapman & Hall, type:(Publisher) Put Out More Flags, type:(Book) (supervised) Entity Description author Evelyn Waugh priorWork Put Out More Flags ISBN 978031874803074 copyrightHolder Evelyn Waugh releaseDate 1945 … … Query Entity Brideshead Revisited, type:(Book) Candidate Facts node1 publisher Chapman & Hall node1 releaseDate 1945 node1 publishDate 1961 node2 country UK node2 publisher Black Bay Books node3 country US node3 copyrightHolder Evelyn Waugh … …. …. Web page markup Web crawl (Common Crawl, 44 bn facts) approx. 5000 facts for „Brideshead Revisited“ (compare: 125.000 facts for „iPhone6“) Yu, R., [..], Dietze, S., KnowMore-Knowledge Base Augmentation with Structured Web Markup, Semantic Web Journal 2019 (SWJ2019) Tempelmeier, N., Demidova, S., Dietze, S., Inferring Missing Categorical Information in Noisy and Sparse Web Markup, The Web Conf. 2018 (WWW2018) 20 correct/non-redundant facts for „Brideshead Rev.“ 18Stefan Dietze Fusion performance  Baselines: BM25, CBFS [ESWC2015], PreRecCorr [Pochampally et. al., ACM SIGMOD 2014], strong variance across types Knowledge Graph Augmentation  Experiments on books, movies, products  New facts (wrt DBpedia, Wikidata, Freebase):  On average 60% - 70% of all facts for books & movies new (across KBs)  100% new facts for long-tail entities (e.g. products)  Additional experiments on learning new categorical features (e.g. product categories or movie genres) [WWW2018]
  • 19. Beyond facts: claims, opinions and misinformation on the Web  Investigations into misinformation and opinion forming received massive attention across a wide range of disciplines and industries (e.g. [Vousoughi et al. 2018])  Insights, mostly (computational) social sciences, e.g. o Spreading of claims and misinformation o Effect of biased and fake news on public opinions o Reinforcement of biases and echo chambers  Methods, mostly in computer science, e.g. for o Claim/fact detection and verification („fake news detection“), e.g. CLEF 2018 Fact Checking Lab (http://alt.qcri.org/clef2018-factcheck/) o Stance detection, e.g. Fake News Challenge (FNC) http://www.fakenewschallenge.org/  Some recent work o Large-scale public research corpora for replicating/improving methods/insights o TweetsKB: 9 Bn annotated tweets o ClaimsKG: 30 K annotated claims & truth ratings o ML models for stance detection of Web documents (towards given claims) 19Stefan Dietze
  • 20. Stance detection of Web documents Motivation  Problem: detecting stance of documents (Web pages) towards a given claim (unbalanced class distribution)  Motivation: stance of documents (in particular disagreement) useful (a) as signal for fake news detection and (b) Website classification Approach  Cascading binary classifiers: addressing individual issues (e.g. misclassification costs) per step  Features, e.g. textual similarity (Word2Vec etc), sentiments, LIWC, etc.  Best-performing models: 1) SVM with class-wise penalty, 2) CNN, 3) SVM with class-wise penalty  Experiments on FNC-1 dataset (and FNC baselines) Results  Minor overall performance improvement  Improvement on disagree class by 27% (but still far from robust) A. Roy, A. Ekbal, S. Dietze, P. Fafalios, Step-by-Step: A three- stage Pipeline for Stance Classification of Documents towards Claims, CIKM19 under review. 20Stefan Dietze
  • 21. http://dbpedia.org/resource/Tim_Berners-Lee wna:positive-emotion onyx:hasEmotionIntensity "0.75" onyx:hasEmotionIntensity "0.0" Mining opinions & interactions (the case of Twitter)  Heterogenity: multimodal, multilingual, informal, “noisy” language  Context dependence: interpretation of tweets/posts (entities, sentiments) requires consideration of context (e.g. time, linked content), “Dusseldorf” => City or Football team  Dynamics & scale: e.g. 6000 tweets per second, plus interactions (retweets etc) and context (e.g. 25% of tweets contain URLs)  Evolution and temporal aspects: evolution of interactions over time crucial for many social sciences questions  Representativity and bias: demographic distributions not known a priori in archived data collections http://dbpedia.org/resource/Solid wna:negative-emotion P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
  • 22. P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18. Mining knowledge about opinions & interactions: TweetsKB http://l3s.de/tweetsKB  Harvesting & archiving of 9 Bn tweets over 5 years (permanent collection from Twitter 1% sample since 2013)  Information extraction pipeline (distributed via Hadoop Map/Reduce) o Entity linking with knowledge graph/DBpedia (Yahoo‘s FEL [Blanco et al. 2015]) (“president”/“potus”/”trump” => dbp:DonaldTrump), to disambiguate text and use background knowledge (eg US politicians? Republicans?), high precision (.85), low recall (.39) o Sentiment analysis/annotation using SentiStrength [Thelwall et al., 2012], F1 approx. .80 o Extraction of metadata and lifting into established schemas (SIOC, schema.org), publication using W3C standards (RDF/SPARQL) Use cases  Aggregating sentiments towards topics/entities, e.g. about CDU vs SPD politicians in particular time period  Temporal analytics: evolution of popularity of entities/topics over time (e.g. for detecting events or trends, such as rise of populist parties)  Twitter archives as general corpus for understanding temporal entity relatedness (e.g. “austerity” & “Greece” 2010-2015) Limitations  Bias & representativity: demographic distributions of users (not known a priori and not representative)  Cf. use case at the end of the talk -0.40000 -0.30000 -0.20000 -0.10000 0.00000 0.10000 0.20000 0.30000 0.40000 Cologne Düsseldorf
  • 23. Overview Part I Symbolic & subsymbolic AI on the Web – a brief introduction Part II Extracting machine-interpretable knowledge („making machines smarter“) Part III Facilitating search, retrieval & knowledge gain of users („making humans smarter“) 23Stefan Dietze
  • 24. Knowledge (gain) while searching the Web (“Search As Learning”)? Challenges & results  Detecting coherent search missions?  Detecting learning throughout search? detecting “informational” search missions (as opposed to “transactional” or “navigational” missions [Broder, 2002]) o Search mission classification with average F1 score 75%  How competent is the user? – Predict/understand knowledge state of users based on in-session behavior/interactions  How well does a user achieve his/her learning goal/information need? - Predict knowledge gain throughout search missions o Correlation of user behavior (queries, browsing, mouse traces, etc) & user knowledge gain/state in search [CHIIR18] o Prediction of knowledge gain/state through supervised models [SIGIR18] 24Stefan Dietze
  • 25. Understanding knowledge gain/state of user during search? Data collection  Crowdsourced collection of search session data  10 search topics (e.g. “Altitude sickness”, “Tornados”), incl. pre- and post-tests  Approx. 1000 distinct crowd workers & 100 sessions per topic  Tracking of user behavior through 76 features in 5 categories (session, query, SERP – search engine result page, browsing, mouse traces) Some results  70% of users exhibited a knowledge gain (KG)  Negative relationship between KG of users and topic popularity (avg. accuracy of workers in knowledge tests) (R= -.87)  Amount of time users actively spent on web pages describes 7% of the variance in their KG  Query complexity explains 25% of the variance in the KG of users  Topic-dependent behavior: search behavior correlates stronger with search topic than with KG/KS Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018. 25Stefan Dietze
  • 26. 26Stefan Dietze Predicting knowledge gain/state of user during search?  Stratification into classes: user knowledge state (KS) and knowledge gain (KG) into {low, moderate, high} using (low < (mean ± 0.5 SD) < high)  Supervised multiclass classification (Naive Bayes, Logistic regression, SVM, random forest, multilayer perceptron)  KG prediction performance results (after 10-fold cross-validation)  Feature importance (KG prediction) Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
  • 27. Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018. Predicting knowledge gain/state of user during search? 29/05/19 27Stefan Dietze  Stratification into classes: user knowledge state (KS) and knowledge gain (KG) into {low, moderate, high} using (low < (mean ± 0.5 SD) < high)  Supervised multiclass classification (Naive Bayes, Logistic regression, SVM, random forest, multilayer perceptron)  KG prediction performance results (after 10-fold cross-validation)  Feature importance (KG prediction) Shortcomings & future work  Lab studies to obtain more reliable data (controlled environment, longer sessions) & additional features (eye- tracking)  Resource features (complexity, analytic/emotional language, multimodality etc) as additional signals [CIKM2019, under review]  Improving ranking/retrieval in Web search or other archives (SALIENT project, Leibniz Cooperative Excellence)
  • 28. Applications: social sciences research data on the Web 28Stefan Dietze Improving findability of (social science) research data Mining novel (social science) research data from the Web http://l3s.de/tweetsKB https://data.gesis.org/claimskg
  • 29. Finally: can we use AI & the Web to answer THE question? 29Stefan Dietze
  • 30. 30Stefan Dietze P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18. http://dbpedia.org/resource/Tim_Berners-Lee wna:positive-emotion onyx:hasEmotionIntensity "0.75" onyx:hasEmotionIntensity "0.0" Recap: “Web-mined opinions” in Tweets KB http://l3s.de/tweetsKB http://dbpedia.org/resource/Solid wna:negative-emotion Total # tweets mentioning (K, D) in 1.5 bn tweets: • # dbp:Cologne: 89.564 • # dbp:Dusseldorf: 4723 • Opinions in terms of expressed sentiments? • „Happiness (X) = mean of sentiment score delta (positive - negative) of all Tweets mentioning X“
  • 31. -0.40000 -0.30000 -0.20000 -0.10000 0.00000 0.10000 0.20000 0.30000 0.40000 Cologne Düsseldorf Mean sentiment scores (2013-2017): • Happiness(Cologne) = 0.09281 • Happiness(Dusseldorf) = 0.04056 • Positive (Cologne) = 0.17297 • Positive (Dusseldorf) = 0.1245 • Negative (Cologne) = 0.07948 • Negative (Dusseldorf) = 0.09030 Key Findings • Cologne happier (no significance testing yet) • Cologne & Dusseldorf happy overall (positive sentiments) Limitations • Bias: Twitter users not representative • Bias: Cologne cathedral=> distribution of tourists & residents among Twitter users likely different for both cities January 2016, Cologne NYE 2015/2016 aftermath Cologne vs Dusseldorf: a pseudoscientific “answer” using TweetsKB March 2017, Axe attack in D? Happiness(dbp:Cologne) Happiness(dbp:Dusseldorf) 31Stefan Dietze Source: https://theculturetrip.com/europe/germany/articles/8-fascinating-things-didnt-know-colognes-cathedral/© freedom100m
  • 32. Acknowledgements Co-authors • Katarina Boland (GESIS, Germany) • Elena Demidova (L3S, Germany) • Asif Ekbal (IIT Patna, India) • Pavlos Fafalios (L3S, Germany) • Ujwal Gadiraju (L3S, Germany) • Peter Holtz (IWM, Germany) • Eirini Ntoutsi (LUH, Germany) • Vasilis Iosifidis (L3S, Germany) • Markus Rokicki (L3S, Germany) • Arjun Roy (IIT Patna, India) • Renato Stoffalette Joao (L3S, Germany) • Davide Taibi (CNR, ITD, Italy) • Nicolas Tempelmeier (L3S, Germany) • Konstantin Todorov (LIRMM, France) • Ran Yu (GESIS, Germany) • Benjamin Zapilko (GESIS, Germany) 32Stefan Dietze
  • 33. From (Web) Data to Knowledge: on the Complementarity of Human and Artificial Intelligence Prof. Dr. Stefan Dietze Heinrich-Heine-Universität Düsseldorf GESIS Leibniz Institute for the Social Sciences