3. A motivating story
● Alzheimer’s Context *
○ Dr. Trumble and the Tsimané ** Project ***
■ Anthropologist studying evolutionary medicine
■ Indigenous people, Bolivia
■ Higher elderly cognitive performance with copy of ApoE4 gene
○ Dr. Liddelow studying immune response in brains
■ Some people die without dementia but with brains clogged with Alzheimer’s pathology
● A Quote (emphasis mine):
“I asked Dr. Liddelow whether he was familiar with the Tsimané research. He admitted that he was not — the field of
evolutionary biology is distant from his own. But he said the hypothesis that the ApoE4 gene evolved to protect our brains
from the effects of parasitic infection made perfect sense. “That’s absolutely in line with what we found. For our ancestors,
an ApoE4 gene could have been beneficial,” Dr. Liddelow said, in part because it would have helped the astrocytes go on
the attack.” *https://www.nytimes.com/2017/07/14/opinion/sunday/alzheimers-cure-south-america.html
**https://en.wikipedia.org/wiki/Tsiman%C3%A9
***http://www.unm.edu/~tsimane/
4. From documents to augmenting knowledge work
Documents
Structured
Documents
Basic
claim
discovery
Entity
identification
Augmented
Claim CraftCoronaWhy OpenSherlock 1Spacy ?
5. Claim representation in HyperKnowledge
Aim: To be able to bring claims together: compare and federate claims,
make claims about claims...
The data model should be rich enough to express claims found in the
literature, and claims about those claims.
7. Basic claim representation
Hydroxychloroquine is used to treat Covid-19
Subject - Predicate - Object (used in RDF)
Each concept (topic) has an identifier (URI) to reduce ambiguity
Covid-19
wiki:Q84263196
Hydroxychloroquine
wiki:Q84263196
Drug used for treatment
wikip:P2176
8. paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label
non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.
“Citation needed”: qualifying claims
9. “Citation needed”: qualifying claims
paper “Hydroxychloroquine and azithromycin as a treatment of COVID‐19: results of an open‐label
non‐randomized clinical trial” describes a study where Hydroxychloroquine was used to treat Covid-19.
RDF does this through reification, Wikidata just gives identity to claims (snaks)
Give an identity to the claim itself, so we can make further claims about that claim, such as
provenance, authority, etc.
Hydroxychloroquine
wiki:Q84263196
Covid-19
wiki:Q84263196
Drug used for treatment
wikip:P2176
Claim
cc00feb7-4b9b-121d-898b-7c6652b2b406
Hydroxychloroquine and azithromycin …
DOI:10.1016/J.IJANTIMICAG.2020.105949
rdf:subject
rdf:predicate
rdf:object
Stated in
wikip:P248
10. Complex claims
The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a
day for 10 days, for 20 patients whose average age was 51 years (σ=19)
Many claims involve many entities in complex relationships, and should be represented as such.
11. Complex claims
The experimental protocol involved oral absorption of 200 mg of hydroxychloroquine three times a
day for 10 days, for 20 patients whose average age was 51 years (σ=19)
Many claims involve many entities in complex relationships, and should be represented as such.
Topic mapping, frames (Minsky), KIF
Hydroxychloroquine
wiki:Q84263196 Covid-19
wiki:Q84263196
Medical
Protocolsubstance
disease
200
mgamount
3x/
day
frequency
10
days
duration
Group 2
Group 1
Control
group
Study
group
20 px
size
μ=51 σ=19
age
12. Introducing Topic Maps
● A Topic Map is like a library without all the books
○ A Topic Map is indexical
■ Like a card catalog
■ Each topic has its own representation
■ Improving on a card catalog, a topic can be identified many different ways
■ Captures metadata and optional content
○ A Topic Map is relational
■ Like a good road map
■ Topics are connected by associations (relations)
■ Topics point to their occurrences in the territory
○ A Topic Map is organized
■ Multiple records on the same topic are co-located (stored as one topic) in the map
14. Some claims are hypothetical
If social distancing measures are not followed, we risk a second wave.
15. Some claims are hypothetical
If social distancing measures are not followed, we risk a second wave.
We need a way to represent hypothetical scenarios.
The hypothetical world is a whole separate universe of discourse, which we represent as a
subgraph. (Sowa’s Conceptual graphs)
Event:
infection rate
> 50 % rise
Social
distancing
norms
Compliance
level
Target
population
< 80%
consequence
Hypothetical situation
16. Points of view should be explicit
Covid-19, as depicted by Fox News, is not more serious than a minor cold.
Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.
The results of laboratory X have been contested.
Claims are made by agents, and adopted by communities. It is sometimes important to distinguish
references to a topic as it is understood by a specific agent or community.
17. Points of view should be explicit
Covid-19, as depicted by Fox News, is not more serious than a minor cold.
Epidemiologists’ estimates of Covid-19 transmission rates have not been explained by virologists.
The results of laboratory X have been contested.
Claims are made by agents, and adopted by communities. It is sometimes important to distinguish
references to a topic as it is understood by a specific agent or community.
Each claim has to be identified as coming from a specific source, maintained by agents. The
properties and links attributed to a topic can be different for each source. Source federation is
explicit.
BA
7
CA
6
CA
8
18. Claims are made and retracted
Lab X claimed to find reinfection after remission, but those cases were due to false negative testing
in an asymptomatic phase.
People can change their minds; claims can be a correction to earlier claim.
19. Claims are made and retracted
Lab X claimed to find reinfection after remission, but those cases were due to false negative testing
in an asymptomatic phase.
We view claims from a source as an event stream. Some events in the stream can explicitly
contradict earlier events
A
B
A
3
A
C
A
5
x y x y
A
D
x
...
{
“@id”: “A”,
“x”: [“B”, “D”],
“y”: 5
}
20. Different communities use different names or identifiers
English names for Covid-19 in Wikidata: 2019-nCoV acute respiratory disease ; coronavirus disease 2019 ; COVID19 ; COVID 19 ;
Covid-19 ; 2019 novel coronavirus pneumonia ; Coronavirus disease 2019 ; nCOVD19 ; nCOVD 19 ; nCOVD-19 ; COVID-2019 ; seafood
market pneumonia ; Wuhan pneumonia ; 2019 NCP ; WuRS ; severe acute respiratory syndrome type 2 ; SARS-CoV-2 infection ; 2019 novel
coronavirus respiratory syndrome ; Wuhan respiratory syndrome ; novel coronavirus ; coronavirus
Of course we’d want to also search for 2019冠状病毒病 etc.
RDF identifiers in Wikidata:
<http://www.wikidata.org/wiki/Q84263196>
<https://catalogue.bnf.fr/ark:/12148/cb17874453m>
<https://d-nb.info/gnd/1206347392>
<https://id.loc.gov/authorities/sh2020000570>
<https://meshb.nlm.nih.gov/#/record/ui?ui=C000657245>
<http://id.nlm.nih.gov/mesh/T001007884>
<http://id.nlm.nih.gov/mesh/M000681578>
<https://www.courrierinternational.com/sujet/covid-19>
<http://www.disease-ontology.org/?id=DOID:0080600>
<http://www.diseasesdatabase.com/ddb60833.htm>
<http://emedicine.medscape.com/article/2500114-overview>
<https://www.britannica.com/science/COVID-19>
<https://www.enciclopedia.cat/EC-GEC-23470930.xml>
<https://icd.who.int/browse10/2019/en#/U07.1>
<https://icd.who.int/browse10/2019/en#/U07.2>
<https://icd.who.int/dev11/f/en#/http://id.who.int/icd/entity/1790791774>
<https://www.malacards.org/card/2019_novel_coronavirus>
<https://www.ne.se/uppslagsverk/encyklopedi/lång/covid-19>
<https://www.nhs.uk/conditions/coronavirus-covid-19>
<http://www.omegawiki.org/DefinedMeaning:1733730>
<https://philpapers.org/browse/covid-19>
<https://www.quora.com/topic/COVID>
<http://snomed.info/id/840539006>
<https://sml.snl.no/covid-19>
<https://www.reddit.com/r/Coronavirus/>
<https://www.reddit.com/r/COVID19/>
<http://www.treccani.it/enciclopedia/ricerca/COVID>
<https://tvtropes.org/pmwiki/pmwiki.php/UsefulNotes/CoronavirusDiseas
e2019Pandemic>
<http://www.yso.fi/onto/yso/p38829>
<https://denstoredanske.lex.dk/COVID-19>
Note missing: kg:/m/01cpyy (Google)
21. Different communities use different names or identifiers
Many concepts share the same name. Many names share the same concept.
Names have to be disambiguated. Global concept identifiers can be tentatively
identified, but all identifiers are tagged with their source, and the identifier X as
used by source A may not correspond to the concept referred to by X in source B.
Unifying topics is the domain of topic mappings
22. Topic Map as a federation platform
● A topic map aggressively works to ensure that, for each individual subject represented in the map,
there will be one and only one location for that subject.
● To accomplish that, when a decision is made that two subject representations in the map are about
the same subject, a new representation - a VirtualProxy- will be created which non-redundantly
contains information from both - or any other topic which later enters the topic map.
23. Federating Silos: introduction
● Siloed Research Topics
○ Raynaud’s Syndrome
Therapies
○ Fish Oil
● Machine Reading collects
graph structures from
different sources
○ Form tuple-like
structures which are
graphs
24. Federating Silos: Topic Mapping
● TopicMap Process
○ Rule:
■ One Location in the Map for
each Subject
■ Federates (merges topics
about the same subject)
collected from different
resources
25. Topic merging opens questions and creates events
● Does Fish Oil
qualify as a
Raynaud’s
therapy?
○ Turns out
Yes
● Topic Merge
events feed back
into the
HyperKnowledge
ecosystem
26. Distributed federation in HyperKnowledge
Each source maintains its own table of topic merges, and federated queries must
keep track of those equivalences.
This can be expanded (with normalization) to identification of composite topics.
The plan is for the HK ecosystem to maintain a probabilistic (bloom) map of which
sources maintain information about which topics.
27. Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.
132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19
treatment.
28. Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.
132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19
treatment.
Once claims have an identity, we can compare claims and make higher-level claims.
Hydroxy-
chloroquine
Covid-19
Drug used
for treatment
Claim 1 DOI:10.1016/
J.ijantimicag.
2020.105949
Hydroxy-
chloroquine
refractory
ventricular
arrhythmia
Side-effect
Claim 2
DOI:10.1080/
15563650500514558
risk/benefit
analysis
Risks outweigh
benefits
risks
benefits
outcome
29. Comparing claims
The research on Hydroxychloroquin in study X was contradicted in study Y.
132/203 virologists consulted believe hydroxychloroquin’s side effects to be too severe for Covid-19
treatment.
Claim streams representing individual points of views can be combined into “community” streams,
and into combined values
...
...
...
...
...
...
...
30. So what can be a stream?
Comparing claims allows combining claims in larger aggregates
● Base case: One person’s point of view
● One team (guild), with a procedure to merge how member’s PoV’s streams are
combined (can be a rule like majority, consent, etc.)
● A thematic collation, with points of dis/agreement marked without resolution
● A curated thematic overview, with data-driven evidence
● Eventually: global federation
Opposite end of the spectrum: Casual small streams (like git branches)
● A thought experiment or hypothetical situation
● A computed slice (query) of a stream can be treated like a stream
31. Inference engine ecosystem
Event sourcing as a backbone for knowledge-based microservices
Services subscribe to claims, produces calculations, main queue subscribes to calculations
Reactive calculations
Eg.: Rule-based inference,
Live query maintenance,
Machine learning,
Inference combination, etc.
...
32. Inference engine ecosystem
Synthesis as a service
Synthesis can be simple
statistics (who believes this),
sample size, Bayesian, etc.
Simple awareness of which
claims are established or
contested (and by whom) is
useful
...
...
...
35. HyperKnowledge
From documents to augmenting knowledge work
Documents
Structured
Documents
Basic
claim
discovery
Entity
identification
Augmented Claim Craft
- Higher order claim
discovery
- Claim combination
- Rule-based claim
micro-services
- ML-based claims
- Human claim
identification
CoronaWhy OpenSherlock 1Spacy !
36. Structured documents to claims with OpenSherlock
● Basic Setup
○ Each document is
■ mapped to a JSON structure and transferred to a Document database
■ broken into individual paragraphs
○ Each paragraph is becomes a Kafka event
● Machine Reading
○ From paragraph Kafka events, each paragraph is
■ Broken into sentences by SpaCy
○ Each sentence is
■ Parsed by SpaCy
■ Parsed by LinkGrammar parser
■ Parse results are processed by a tuple detector to identify claims
37. OpenSherlock: example sentence
The pandemic of obesity, type 2 diabetes
mellitus (T2DM) and nonalcoholic fatty
liver disease (NAFLD) has frequently been
associated with dietary intake of
saturated fats (1) and specifically with
dietary palm oil (PO) (2).
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5272194/
38. OpenSherlock: expected claims from that sentence
Obesity associated with saturated fats
Obesity associated with palm oil
T2DM associated with saturated fats
Type 2 Diabetes mellitus has acronym T2DM
T2DM associated with palm oil
NAFLD associated with saturated fats
Nonalcoholic Fatty Liver Disease has acronym NAFML
NAFLD associated with palm oil
42. Next steps
Higher-order claims are still beyond current NLP techniques; but deep learning
tools can augment intelligence of researchers identifying claims, and symbolic AI
can be used to identify logical connections and contradictions.
The HyperKnowledge federation can help researchers craft higher-order claims
by identifying both the logical and social neighbourhood of claims.
We would like this ecosystem to be how the next Drs. Liddelow and Trumble get to
be aware of one another.
43. References
https://hyperknowledge.org
https://topicquests.org
RDF, W3C
Wikidata data model primer
Patrick Durusau, Steven R. Newcomb, and Robert Barta. Topic maps reference model. ISO standard 13250-5 CD, 11 2007.
John F. Sowa. Handbook of Knowledge Representation, chapter Conceptual Graphs, pages 213–237. Elsevier, 2008. isbn:
9780444522115
Knowledge Interchange Format, Stanford
https://ipld.io