3. Solution research under way?
• Collaboration and User Experience
• Knowledge gardens
• Subject of a different talk
• Open Source: OpenSherlock
• Information Silos
• Literature-based discovery
• Machine reading
• Knowledge federation
• Knowledge representation
• A range of representation systems working together
• Hybrid architectures
• Pandemonium-style architectures
3
4. On Information Silos
• An insular management system†
• Silo effect arises from
• Silo mentality
• Incompatible data and information systems
• Domain specialization
• …
• Silo effect creates
• Reduced signal to noise ratio in public information systems
• Redundant resources
• Unconnected concepts (dots)
• ...
†https://en.wikipedia.org/wiki/Information_silo 4
5. Silo Effect: An Example
• Alzheimer’s Context†
• Dr. Trumbleand the Tsimané Project‡
• Anthropologist studying evolutionary medicine
• Indigenous people, Bolivia
• Higher elderly cognitive performance with the ApoE4 gene
• Dr. Liddelow studying immune response in brains
• Some people die without dementia but with brains clogged with Alzheimer’s pathology
• A Quote (emphasis mine):
• “I asked Dr. Liddelow whether he was familiar with the Tsimane research. He admitted that he was not
— the field of evolutionary biology is distant from his own. But he said the hypothesis that the ApoE4
gene evolved to protect our brains from the effects of parasitic infection made perfect sense. “That’s
absolutely in line with what we found. For our ancestors, an ApoE4 gene could have been beneficial,”
Dr. Liddelow said, in part because it would have helped the astrocytes go on the attack.”
†
Kennedy, P. (2017, July 14). An Ancient Cure for Alzheimer's? The New York Times.
‡
University of New Mexico. The Tsimane Health and Life History Project. http://www.unm.edu/~tsimane/
5
7. Swanson’s ABC Linking Method†
1. Pick a topic of interest (Raynaud’s Disease)
2. Search to find literature A = {Raynaud’s}
3. Hypothesize that B (e.g., blood factors) should be studied in relation to
Raynaud’s
4. Search literature A’ = A ∩ {blood}
5. Notice two common descriptors: blood viscosity, red blood cell rigidity
6. Search literature C = {blood viscosity} U { red blood cell rigidity }
7. Notice the term “Fish Oil”
8. Search literature C = {Fish Oil}
9. Show {Fish Oil} ∩ {Raynaud’s} = { }
10. Show plausible connection between Raynaud’s and Fish Oil
Show the literature has not yet made
the connection (empty set)
†Adapted from: http://www.dimacs.rutgers.edu/~billp/pubs/JASISTLBD.pdf
7
9. On Knowledge Federation
• Knowledge Federation on the Web†
• International Workshop, Dagstuhl Castle, Germany, May 1-6, 2005
• First Semiannual Knowledge Federation Conference‡
• Dubrovnik, Croatia, October 20-22, 2008
†
Jantke, K.P., Lunzer, A., Spyratos, N. and Tanaka, Y. (Editors;
2006) Federation over the Web. Springer-Verlag, Berlin Heidelberg.
‡
Karabeg, D and Park, J. (2008) First International Workshop on
Knowledge Federation. CEUR workshop proceedings, http://ceur-ws.org.
9
11. Research Questions
• Can a topic map learn by reading?
• Can reading by topic maps (the process of topic mapping) perform
literature-based discovery?
• Perhaps a form of automating Swanson’s ABC method
• What software architectures support machine reading in the context
of topic mapping?
11
15. Topic Maps: Long Term Memory (LTM)
• A Topic Map is like a library without all the books
• A container for a universe of topics
• A Topic Map is indexical
• Like a card catalog
• Each topic has its own representation (proxy)
• Improving on a card catalog, a topic can be identified many different
ways
• Captures metadata and optionally content
• A Topic Map is relational
• Like a good road map
• Topics are connected by associations (relations)
• Topics point to their occurrences in the territory
• A Topic Map is organized
• Multiple records on the same topic are co-located (stored as one
topic) in the map
• One location in the map for each topic
Img: Jack Park
15
16. Observations 1
• A Topic Map is central to the key
question research question
• It serves as a kind of memory for
social processes
• It provides a robust platform for
subject identity
• It also serves as a repository for
domain-specific vocabularies
(ontologies, taxonomies, naming
conventions,…)
• It federates those vocabularies Cultural Algorithm Framework†
† After Figure 1: Reynolds, RG & Peng, B (2005) CULTURAL ALGORITHMS: COMPUTATIONAL
MODELING OF HOW CULTURES LEARN TO SOLVE PROBLEMS: AN ENGINEERING EXAMPLE.
Cybernetics and Systems An International Journal Volume 36, 2005 - Issue 8
16
17. Observations 2
• A Topic Map is necessary but not sufficient to support discovery, learning,
or problem solving
• It really only provides a powerful indexicalstructure related to the key artifacts in any
universe of discourse:
• Actors
• Their relations
• Their states
• Rules, laws, theories,…
• To model those key artifacts, other representation strategies are required
• Conceptual Graphs
• Qualitative Process Theory
• Belief Networks
• Deep Learning
• …
17
19. asA?
• For taxonomy, we have isA
• And its variants
• subclassOf
• instanceOf
• For roles, we do not have asA
• Instead, we model roles
• Ulam quote
• Towards an ideal structure
• { subject symbol, subject role symbol, predicate
symbol, object symbol, object role symbol }
• Stanislaw Ulam to Gian-Carlo Rota † :
• Convinced that perception is the key to
intelligence, Ulam was trying to explain the
subtlety ofhuman perception by showing
how subjective it is, how influenced by
context. He said to Rota, "When you perceive
intelligently, you always perceive a function,
never an object in the physical sense.
Cameras always register objects, but human
perception is always the perception of
functional roles. The two processes could not
be more different.... Your friends in AI are
now beginning to trumpet the role of
contexts, but they are not practicing their
lesson. They still want to build machines that
see by imitating cameras, perhaps with some
feedback thrown in. Such an approach is
bound to fail...”
†Hofstadter, DR (1995). On Seeing A‘s and Seeing As
https://web.stanford.edu/group/SHR/4-2/text/hofstadter.html
19
23. Machine Reading: Topic Mapping
• TopicMap Process
• Rule:
• One Location in the Map for each
Subject
• Federates (merges topics about
the same subject) collected from
different resources
• Federation does a Union on
topics from disparate sources
• Federation Intersects silos
around common topics
• Topic Map
23
24. Towards a “Cognitive” Platform: OpenSherlock*
• OpenSherlockis:
• A Topic Map for information resource identity and organization
• A HyperMembrane information fabric structure
• A society of agents system which can
• Read documents
• Process information resources
• Maintain the topic map
• Maintain the HyperMembrane
• Build and maintain models
• Perform discovery tasks
• Physical process discovery
• Literature-based discovery
• Theory formation
• Answer questions
• Agents are coordinated by:
• A blackboard system
• A dynamic task-based agenda
• Event propagation and handling
• Stream processing
*OpenSherlock is an open source mostly
Java-based research platform.
24
27. Sentence Reading
• First Step:
• Process sentence into WordGrams†
• WordGramobjects are shared across sentences
• Count of sentence identifiers associated with each object serves as basis for probabilistic models
• WordGrams can represent topic labels (names)
• Second Step:
• Where possible
• Transform WordGrams into N-Tuples‡
• By way of Triple Structures: { Subject WordGram, Predicate WordGram, Object WordGram }
• N-Tuples form the HyperMembrane
• N-Tuples reflect structures in the TopicMap
• Where not possible
• Try to:
• Locate noun phrases
• Locate verb phrases
† A container of words, from 1 to 8 words per container
‡ A container of symbols based on words in WordGrams and
topics from the TopicMap. A collection of N-Tuples forms
what we call a HyperMembrane. It is the final step between
reading and adding resources to the topic map.
27
31. Transform WordGram Triples to N-Tuples
• Normalized N-Tuple
• A structure where the subject, predicate, and object are normalized
• Nouns and verbs transformed
• CO2, Carbon Dioxide, … à CO2
• causes, is caused by, … à cause
• Two sentence example
• CO2 is a cause of climate change.
• Climate change is caused by carbon dioxide.
• Both result in the same structure:
• { CO2, cause, climate change }
31
32. About N-Tuples
• An N-Tuple is a structured record of
• Topics in the topic map
• Those topics are harvested from text
• An N-Tuple takes the form:
• { Subject, Predicate, Object }
• Where
• Subject and/or Object can be one of:
• A topic from the topic map
• Another N-Tuple
• An N-Tuple is identifiedby the identities of the terms it contains
• When thinking in terms of terms (words) read from documents, the identities (numeric representations) of
those terms form the identity of the N-Tuple object.
• N-Tuples are content addressable
• Disambiguation of subjects is a topic mapping process
• Learning means continuous refinement of subject identity
• Ambiguities can also be solved through machine learning and/or human intervention
Note: as a system of
symbols, a subject, or
object, or another N-
Tuple refers to the
symbol which serves as
that object’s identifier
32
33. A Simple Example
• Read this sentence:
• Gene expression is caused by soluble hormones binding to a
plasma membrane hormone receptor
• Topic Map recognizes:
• Gene expression ß GeneExpression
• soluble hormones ß SolubleHormone
• plasma membrane hormone receptor ß PlasmaMembraneReceptor
• Software agents transform:
• is caused by ß Cause
• binding to ß Binds
• Final semantic structure (N-Tuple):
• { {SolubleHormone, Binds, PlasmaMembraneReceptor},
Cause, GeneExpression }
33
34. A Complex Example
• The Sentence:
• The pandemic of obesity, type 2 diabetes mellitus (T2DM) and
nonalcoholic fatty liver disease (NAFLD) has frequently been
associated with dietary intake of saturated fats (1) and
specifically with dietary palm oil (PO) (2).
• Expected WordGram Triples:
• Obesity associated with saturated fats
• Obesity associated with palm oil
• T2DM associated with saturated fats
• T2DM associated with palm oil
• NAFLD associated with saturated fats
• NAFLD associated with palm oil
34
39. WordGrams: Pattern Recognition Demons
• A demon is a sensor
• Individual demons focus on one thing.
• A WordGram represents a word or a phrase
• It serves as a sensor for instances of self in sentences being read
• A demon shouts according to the strength of its response to a situation
• A WordGram shouts according to the degree to which it is known to the
TopicMap
• If a WordGramrepresents the label for a single topic, it wins
• If a WordGramrepresents the label for multiple topics, ambiguity exists
• Expectation failure event
• If a WordGramis a known noun/noun phrase, it shouts but does not necessarily win
• If a WordGramis not yet known, it is relatively silent
39
40. Pandemonium Again
• An Opportunity to Explore Related and Hybrid Architectures
• More and different pattern recognition demons
• Neural Nets
• Analogy
• Anaphoric Reference
• Subject Disambiguation
• …
• Probabilistic Graphs
• Bayesian belief networds
• Neural-symbolic hybrids†
40
† Tarek R. Besold, Artur d’Avila Garcez, Isaac Noble (eds.) (2017) Pre-
Proceedings of the 12thInternational Workshop on Neural-Symbolic
Learning and Reasoning NeSy’17
http://daselab.cs.wright.edu/nesy/NeSy17/nesy17_pre-proceedings.pdf
41. Looking Ahead
• Technologies I would like to explore with the OpenSherlock ecosystem:
• Distributed Prolog
• Probabilistic Prolog
• UIMA-based ecosystems to include
• Renaud Richardet’s Sherlok
• BlueBrain’s Bluima
• Petr Baudiš’s YodaQA
• CMU’s OAQA
• WordGrams
• TensorFlow
• …
• A grand, unifying wishlist:
• A global ecosystem of opensource “cognitive” systems, all working with humans on
quests to solve and/or prevent complex, urgent problems.
• Collaborations with commercial (closed source) platforms is included in that wish.
41