Open source "cognitive computing" systems, specifically OpenSherlock; describes a HyperMembrane structure, a kind of information fabric, for machine reading, literature-based discovery, deep question answering. Platform is open source, uses ElasticSearch, topic maps, JSON, link-grammar parsing, and qualitative process models.
2. The Present Situation
Upon this gifted age, in its dark hour,
Rains from the sky a meteoric shower
Of facts . . . they lie unquestioned,
uncombined.
Wisdom enough to leech us of our ill
Is daily spun; but there exists no loom
To weave it into fabric
Edna St. Vincent Millay, 1939
2
3. Topics To Cover
• Discovery, learning, problem solving
• Topic Maps
• OpenSherlock
• HyperMembranes
• Open Source
• Key reasons for building open source cognitive
systems
3
4. Cognitive Computing: My View
• Cognitive Computing is:
– Far less about what a computer knows
– Far more about how computers can
augment human cognitive capabilities
– Based on the J.C.R Licklider and
Douglas Engelbart augmentation work
J.C.R. Licklider
Douglas Engelbart
4Imgs: Wikipedia
5. A Domain-specific Problem Statement
• An Example:
– Do these two sentences say the same thing?
• CO2 is a causal factor in climate change.
• Climate change is caused by carbon dioxide.
• Problem Statement
– Software agents need elegant methods for
reading, representing, organizing, and modeling
information resources to support discovery and
answering questions.
5
6. A Framing Thought
• From [1]
– The understanding of global brain organization
and its large-scale integration remains a challenge
for modern neurosciences.
• To
– The understanding of global conversations about
topics that matter and their large-scale federation
remain a challenge for modern information
technology.
[1] Petri G, Expert P, Turkheimer F, Carhart-Harris R, Nutt D, Hellyer PJ, Vaccarino F. (2014)
Homological scaffolds of brain functional networks. J. R. Soc. Interface 11: 20140873.
6
7. Our Goals
• Improve Human-Tool Capabilities
• Augment existing analytic methods
– Increase opportunities for discovery
– Improve already sophisticated methods
• Build Looms
– Read documents
– Map and model topics read
– Weave information fabrics
Douglas Engelbart
7
8. Discovery
• Is it really possible for people to see
everything?
– Part of discovery is connecting dots not
yet connected.
– “Cognitive Agents” can help increase
chances of serendipity.
“Discovery consists of
seeing what everybody
has seen and thinking
what nobody has
thought.”
–Albert Szent-Györgyi
8
9. Related Work
• Commercial
– IBM Watson
– Wolfram Alpha
– Viv
– Saffron 10
– Clueda
– Siri
– Google Now
– Cortana
– …
• Open Source
– OAQA
– DeepDive
– OpenCog
– OpenNARS
– Watsonsim
– YodaQA
– AKSW OpenQA
– AKSW QA
– AquaLog
– OpenSherlock
– OpenIRIS (CALO)
– …
• Research
– Project Aristo
– Project Halo
– FREyA
– CASIA
– NLP-Reduce
– EIS Sina
– WDAqua ITM
– Intui2
– …
9
10. Biologically Inspired Design
• Humans are blessed with:
– Memory to keep concepts organized and
connected
– Internal mechanisms which map sensor data into
memory for processing and storage
– The abilities of complex, adaptive, anticipatory
systems
10
11. Memory: Introducing Topic Maps
• A Topic Map is like a library without all the books*
– A Topic Map is indexical
• Like a card catalog
– Each topic has its own representation
• Improving on a card catalog, a topic can be identified many
different ways
• Captures metadata and optionally content
– A Topic Map is relational
• Like a good road map
– Topics are connected by associations (relations)
– Topics point to their occurrences in the territory
– A Topic Map is organized
• Multiple records on the same topic are co-located (stored as one
topic) in the map
*a map is not its territory
11
13. Processing Mechanisms
• Typically, software processes take the form of
variants of NLP (natural language processing)
– Parsers
– Cluster analysis
– Entity recognition
– Relation detection
– Role recognition
– Probabilistic methods
13
14. A Key Question in My Research
• Can a Topic Map learn (construct itself) by “reading” literature?
– Relevant issues:
• Bootstrapping
• Machine reading
– NLP
– Linguistics
– Statistics
– Analogy & Metaphor
– …
• Knowledge representation
• Model building
– Anticipation
• Weaving information fabrics
• Literature-based discovery
• Deep Question Answering
14
15. A Simple Example
• Read this sentence:
– Gene expression is caused by insoluble hormones
binding to a plasma membrane hormone receptor
• Topic Map recognizes:
– Gene expression GeneExpression
– insoluble hormones InsolubleHormone
– plasma membrane hormone receptor
PlasmaMembraneReceptor
• Software agents transform:
– is caused by Cause
– binding to Binds
• Final semantic structure:
• { {InsolubleHormone, Binds, PlasmaMembraneReceptor},
Cause, GeneExpression }
15
16. Introducing OpenSherlock
• OpenSherlock is:
– A Topic Map for information resource identity and organization
– A HyperMembrane information fabric structure
– A society of agents system which can
• Read documents
• Process information resources
– Maintain the topic map
– Maintain the HyperMembrane
– Build and maintain models
– Perform discovery tasks
– Answer questions
– Agents are coordinated by:
• A blackboard system
• A dynamic task-based agenda
• Event propagation and handling
16
17. Observations 1
• A Topic Map is central to the key question, and
therefore to a thesis entailed by this research
– It serves as a kind of memory for social processes
– It provides a robust platform for subject identity
– It can also serve as a repository for domain-
specific vocabularies (ontologies, taxonomies,
naming conventions,…)
17
18. Observations 2
• A Topic Map is necessary but not sufficient to support
discovery, learning, or problem solving
– It really only provides a powerful indexical structure related to
the key artifacts in any universe of discourse:
• Actors
• Their relations
• Their states
• Rules, laws, theories,…
• To model those key artifacts, other representation
strategies are required
– Conceptual Graphs
– Qualitative Process Theory
– Belief Networks
– …
18
19. A Research Question
• What processes are available which, if
performed while harvesting (reading)
documents, can reduce the amount of
processing required later during question
answering?
– The question entails
• Synthesis of ontology
• Co-reference resolution
• Re-representation during question lifting
• …
19
20. A Working Hypothesis
• Process
– Build and maintain a content-addressable memory
of questions, claims, arguments, and evidence
fields.
• We call that a HyperMembrane
– Note:
• Every text object passed into the system is processed by
the same algorithms
– Sentences harvested from text
– Questions and responses posed by humans
20
21. Key Concept: HyperMembrane
• HyperMembrane is a key concept in the
working hypothesis that OpenSherlock seeks
to explore and demonstrate
– A growing graph as a collection of woven and
intersecting fabrics
• constructed from normalized tuples (n-tuples) which
are designed to reduce the amount of NLP required to
read documents
• such that intersections of fabrics occur where named
entities in the graph of n-tuples are the same
– Inspired by Ted Nelson’s ZigZag Architecture
21
22. Machine Reading in OpenSherlock
• Goals:
– Grow the topic map
• Topic Map then serves to support fabrication of higher-order
knowledge structures
– Conceptual Graphs
– Belief Networks
– QP Theory Models
– HyperMembrane
– …
• Process Loop:
– For a given document
• For every paragraph in that document
– For every sentence in each paragraph
» Read the sentence
22
23. Sentence Reading
• First Step:
– Process sentence into word grams*
• Second Step:
– Where possible
• Transform word grams into n-tuples**
• n-tuples form the HyperMembrane
* A container of words, from 1 to 8 words per container
** A container of symbols based on words in word grams
23
24. Process Sentence into WordGrams
• Approach
– Break sentence into word grams*
• WordGram objects are shared across sentences
– Count of sentence identifiers associated with each object
serves as basis for probabilistic models
– Either
• TopicMap recognizes terms
– Or
• Sentence is parsed by Link-Grammar Parser**
• TopicMap learns from parse results
*http://en.wikipedia.org/wiki/W-shingling **http://www.link.cs.cmu.edu/link/
24
25. Transform WordGrams to N-Tuples
• Normalized tuple (N-Tuple)
– A structure where the subject, predicate, and object are normalized
• Nouns and verbs transformed
– CO2, Carbon Dioxide, … CO2
– causes, is caused by, … cause
• Two sentence example
– CO2 is a cause of climate change.
– Climate change is caused by carbon dioxide.
– Result:
» { CO2, cause, climate change }
– Normalization processes include general and domain specific lenses
• Rule-based interpreters which detect structures
– Taxonomy
– Causality
– Biomedical
– Geophysical
– …
• Process models
– Built and maintained while reading
– Predict while reading – Anticipatory Reading
25
26. About N-Tuples
• An N-Tuple is a structured record of
– Topics in the topic map
– Those topics are harvested from text
• An N-Tuple takes the form:
– { Subject, Predicate, Object }
– Where
• Subject and/or Object can be one of:
– A topic from the topic map
– Another N-Tuple
• An N-Tuple is identified by the identities of the terms it contains
– When thinking in terms of terms (words) read from documents, the identities
(numeric representations) of those terms form the identity of the N-Tuple
object.
• N-Tuples are content addressable
• Disambiguation of subjects is a topic mapping process
– Learning means continuous refinement of subject identity
– Ambiguities can also be solved through human intervention
26
28. Current State of OpenSherlock
ElasticSearch
Titan
or
Blazegraph
Ontology
Importer
Ontologies
PubMed
Reader
PubMed
Abstracts
HyperMembrane
Engine
TellAsk
UMLS
Importer
UMLS
28
29. Observations 3
• HyperMembrane is a reminding system
– HyperMembrane is a record of federated human
conversation
• Harvested from books, papers, and recorded
conversation
• Includes statistical properties of recorded utterances
– HyperMembrane records:
• That which is common
• That which is novel
– Possibly wrong
– Possibly game changing
29
30. TellAsk Interface
Conversation Tree
User can click a
node to select as
parent for any
user response
Response Type Selectors.
Selection required before
response.
User types here
Linear
conversation flow
Entry Forms Selector List
Map starts a new conversation
with entered topic
30
31. The Open Source Stack
• Persistence
– ElasticSearch
– Considering Titan
– Considering Blazegraph (Bigdata™ RDF Store)
• Libraries
– Many from Apache Foundation and others
– LinkGrammarParser (Java version)
– XML PullParser
– Simple JSON Parser
• Tools
– Eclipse
31
33. Current State of Development
• Aim to answer simple questions about
casuality
– Current focus on biomedical domain
– Current focus on two lenses
• Taxonomy
• Casuality
– No Conceptual Graphs
– No Process Models
– No Probabilistic Models
33
34. Future Work
• Aim to complete an anticipatory system
– Process models for anticipation
– Conceptual graphs
– Probabilistic models
– More lenses
• Pluggable lenses
• Adaptive lenses
– More domains
34
35. Why Do This?
• Augment human capabilities in problem
solving
• Participate in Open Science
35
38. Key Context for Open Science
• A planet-wide, collaborative quest for Global
Thrivability*.
– Issues include
• Sociological events
– Health, epidemics, wars,…
• Geophysical events
– Climate change, earthquakes, volcanoes, …
• Astrophysical events
– Asteroids, our Sun. …
* Let’s call the quest: EarthMoonshot
38
39. Completed Representation
antioxidants
kill
free radicals
Contraindicates
macrophages use
free radicals to
kill bacteria
Bacterial Infection Antioxidants
Because
Appropriate For
Compromised Host
Let us co-create Cognitive Agents for Discovery
jackpark@topicquests.org
OpenSherlock documents at: http://debategraph.org/OpenSherlock
Code emerging at: https://github.com/opensherlock/
Slides online at http://slideshare.net/jackpark/
Acknowledgments:
Bob Gleichauf
David Alexander Price
Arun Majumdar
Robert S. Stephenson
Mark Szpakowski
Martin Radley
Sherry Jones
Alexander Wenzowski
Ted Kahn
Patrick Durusau
39