3. Context: Two Kinds of Discovery
• Data-based
– Harvesting nuggets from collected data
• Literature-based: Deep Question Answering
– Discovering connections between dots in the
literature
4. Target: Deep Question Answering
Breadth
Depth
Information
Retrieval
Semantic
Representation
Goal
Diagram adapted from a talk by Percy Liang at
Stanford, 20140407
5. Our Goals
• Improve Human-Tool Capabilities
• Augment existing analytic methods
– Increase opportunities for discovery
– Improve already sophisticated methods
“Discovery consists of
seeing what everybody
has seen and thinking
what nobody has
thought.”
–Albert Szent-Györgyi
6. Our Approach
• Explore and develop the technologies of so-
called Cognitive Agents
– Current examples
• IBM’s Watson
• SIRI
• An opportunity
– Couple two platforms
• Berkeley Data Analytics Stack (BDAS)
• SolrSherlock
7. Berkeley Data Analytics Stack
Deep QA Issues*
• Low latency queries
– Perform faster inferences
– Explore larger spaces
– Better decisions
• Sophisticated analysis
– Better forecasts
– Better decisions
• Unification of existing data computation models
– Integrate interactive queries, batch and streaming
processing
*http://strata.oreilly.com/2013/02/the-future-of-big-data-with-bdas-the-berkeley-data-analytics-stack.html
9. Literature-based Discovery
• Forming bisociative links* between information in
different literature sources which are not known
to be related
• Swanson example (simplified)**:
– Literature associated with Raynaud’s
• Raynaud’s therapy linked to blood thinners
– Literature associated with fish oils
• Fish oil linked to blood thinners
– “Blood thinners” as an implicit link between fish oil
and Raynaud’s Syndrome
• Akin to the wormholes formed by tags on web pages or
hashtags
*Arthur Koestler (1964). The Act of Creation
** Swanson, Don (1986) "Fish oil, Raynaud's syndrome, and undiscovered
public knowledge." Perspectives in Biology and Medicine 30(1): 7-18.
10. Cognitive Agents
• Examples
– Proprietary
• IBM’s Watson
• SIRI
• SRI’s CALO
– Part of which: IRIS, was made open source as OpenIRIS
• Others…
– Open Source
• Cougaar
– http://www.cougaar.org/
• Open Cog
– http://opencog.org/
• Open Advancement of Question Answering Systems
– Closely related to IBM’s Watson
– http://oaqa.github.io/
• SolrSherlock
– http://debategraph.org/SolrSherlock
• Many others…
11. Use Cases for Big Data Harvesting
• Resource Collection
– Federation
• bring together and organize without filters
• Resource Augmentation
– Tagging
– Annotating
– Debate
• Knowledge Cartography
– Connecting resources
– Map maintenance
– More Debate
• Research Augmentation
– Crowd-sourced discovery
– Harvesting
– Automated inferences /reasoning
– Knowledge sharing
Federated
Information
Resources
Harvesting
Activities
Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 29
Harvesting
Activities
Harvesting
Activities
12. A Strong Conjecture
• A Knowledge Federation’s topic map provides
a Rosetta Stone-like substrate
– Reasoning by analogy
– Big Data mined for clues
– Map:
• Where we have been
• Where we haven’t (Dragons be here)
Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 33
13. Topic Maps for Knowledge Federation
• Maintain well-organized by topic structure
• Key issue:
– For any given information resource added to a
map:
• Agents must answer this question:
– Have I seen this before by any other name or description?
14. Are We There Yet?
• We are now at the edges of discovery:
– Deeper ways of representing
– Deeper ways of knowing
• Relational Biology
15. Relational Biology
• Paraphrasing Nicholas Rashevsky*:
– We can tease open a living cell and count all its
components, but we cannot put it back together
and we have no clue why
• Interpreting Robert Rosen**:
– Rashevsky’s quest for a relational mathematics for
biology (complex systems) entails topological
algebras (Category Theory)
• Category theory is said to facilitate modeling
the social lives of members of the categories
*http://en.wikipedia.org/wiki/Nicholas_Rashevsky
**http://en.wikipedia.org/wiki/Robert_Rosen_(theoretical_biologist)
16. Relational Modeling 1
• Starts with Ontologies
– Ontologies grant uniform vocabularies to
universes of discourse
• Including describing data
– Ontology-based frameworks provide ways to
model social and other relational structures
• SIOC: Semantically Interlinked Online Communities*
• SWAN: Semantic Web Applications in Neuromedicine**
*http://www.sioc-project.org/
**http://www.w3.org/TR/hcls-swan/
17. SIOC Closer Look
• A way to model
components entailed by a
situation (blog post in this
case)
– Uniform vocabulary
– Structural relations
• Creates a foundation for
much deeper modeling
– Including:
• Other ontologies
• Other structures
• Feedback loops
SIOC Blog Post*
*http://rdfs.org/sioc/spec/
18. Massive Connectivity and Feedback
http://geography.oii.ox.ac.uk/?page=home
Complex Communication Processes
19. Feedback Loops: Crucial to Learning
Image: FEDERAL HEALTH FUTURES SUMMIT LEADERSHIP LEARNING for TRANSFORMATIONAL
CHANGE. September 10-11, 2012 Washington DC Metro Region Page 23
20. Relational Biology: Context
• Context is about Relations among the
components themselves
• Context is about Relations among the
components and their environment
• Context is about Feedback
21. Example from Breast Cancer 1
Extracellular Matrix (EM) as Context
Complex Communication Processes
Milk producing tissue
http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer
22. Example from Breast Cancer 2
Cells missing their EM Cells with restored EM
http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer
23. Towards Cognitive Agents
• Harvest and represent
– Patterns
• Actors
• Relations
• States
– Context in which patterns exist
• Discover
– Processes
– Unrecognized connections
– …
24. Watson’s Architecture* (Simplified)
• Analysis determines
answer type and topics in
play
• Hypothesis formation
seeks candidate answers
from sources
– Pattern matching
• Hypothesis scoring
weighs evidence for each
hypothesis
• Answer ranking uses
models to select answer
Question
Analysis
Answer
Sources
Evidence
Sources
Hypothesis
Formation
Hypothesis
Scoring
Answer
Ranking
Answer
*http://www.aaai.org/Magazine/Watson/watson.php
25. SolrSherlock Architecture (Simplified)
Topic Map
Conceptual Graphs
Harvested Documents
Harvester:
HyperMembrane
Information Fabrics,
Agents
Literature-based
Discovery:
Process documents
into structures
(information fabrics)
from which patterns
are harvested.
Federate Data
Analysis with
Literature:
Federate Data
Observations and
predictions with
concepts and relations
harvested from the
literature
Model Processes,
Structures, and
Analogies
27. Looking Forward
• Coupling Literature-based research with
BigData analysis
– Common ontologies
– Hypothesis formation
– Evidence gathering
– Relation discovery
28. Completed Representation
antioxidants
kill
free radicals
Contraindicates
macrophages use
free radicals to
kill bacteria
Bacterial Infection Antioxidants
Because
Appropriate For
Compromised Host
Let us co-create Cognitive Agents for Discovery
jackpark@topicquests.org
Thanks to Martin Radley , Patrick Durusau Sherry Jones, and Mark
Szpakowski for valuable comments
SolrSherlock at:
http://debategraph.org/SolrSherlock and https://github.com/SolrSherlock