Chem4Word Wade

Director, Scholarly Communications Corporate VP
Microsoft Research Connections Microsoft Research Connections

http://research.microsoft.com/connections/
2 GEPS2011

Envisioning a New Era of Research Reporting
Imagine…
• Live research reports that had multiple
end-user ‘views’ and which could
dynamically tailor their presentation to
each user Reproducible
• An authoring environment that absorbs Research
and encapsulates research workflows
and outputs from the lab experiments
• A report that can be dropped into an Interactive Collaboration
electronic lab workbench in order to Data
reconstitute an entire experiment
• A researcher working with multiple
reports on a Surface and having the Dynamic
ability to mash up data and workflows Documents
across experiments
• The ability to apply new analyses and Reputation
visualizations and to perform new in & Influence
silico experiments

Words & Pictures
• Papers/reports today describe chemical reactions/entities in a variety
of ways:
– common (or brand-name) labels
– identifiers and shorthand notations
– chemical formulae
– two- (and three-) dimensional graphical images of molecular structure.
• Describing chemical data becomes an exercise in typesetting and/or
graphics, and cross- and re-referencing existing chemical entities is
labor intensive.
– The resulting text is usually interpretable by humans but chemical data are
lost in the process, making it difficult to programmatically extract
meaningful information from such reports.
• The goals of Chem4Word are to:
– simplify the task of authoring a chemical document,
– do so in a way that produces a semantically meaningful document, facilitating
downstream tasks such as publishers workflows, entity extraction, and semantic
applications.

Chemistry Add-in for Word
aka Chem4Word
• Chem4Word allows chemists to create, edit and manipulate
chemistry in the Word environment, by
– Providing a built in dictionary of chemical structures
– Enabling online lookup of further structures via web services (e.g. Pubchem)
– Facilitating linking/embedding chemical structures inside a Word document
– Modification of chemical structures & representations of those structures
• Authoring is backed by semantic data in
Chemical Markup Language (CML), enabling:
– novel functionality in data checking during the authoring process
– chemistry-centric article reading support
– data-mining applications.

• Open source project (Outercurve Foundation); Apache 2.0 license
• ~500K downloads to date

Word UI Extensibility

• Ribbon
• Task Pane
• Gallery
• Templates
• Recognizers
• Applications

FILE FORMATS:
OFFICE OPEN XML DOCUMENTS

Thanks to: http://www.slideshare.net/HollowKnight/a-quick-tour-of-open-xml-format

Binary Office Open XML
format format

Office Open XML
is a ZIP file …

Images stored in
native format
(JPEG, PNG, GIF, …)

Programmer View of Open XML Files

• ZIP Archive
• Document Parts
– XML Parts
– Binary Parts
– Typed (RFC 2616)
• Relationships
– Connections between parts
• Content Type Stream
– A specially-named stream
– Defines mappings from part names to content types
– Not itself a part, not URI addressable

• Folder structure for convenience only

Multiple ‘views’ backed by
a single CML data file

EXAMPLE OF GETTING CML DATA
BACK OUT OF A DOCUMENT

To conclude..

Current publishing With Chem4Word
… is broken for data-rich science … the cycle is closed

Data publication difficult and unsupported Data preparation integrated into user workflow

Insufficient data to fully support research Open Standards promote Open Semantic
Science

Important Details
• Project Site
– http://research.microsoft.com/chem4word

• Binaries and source code
– http://chem4word.codeplex.com
• Facebook Page
– http://www.facebook.com/groups/186300551397797/
• Outercurve Foundation
– http://www.outercurve.org

Contributors
University of Cambridge Microsoft Research
• Peter Murray-Rust • Alex D. Wade
• Jim Downing • Savas Parastatidis
• Joe Townsend • Oscar Naim
• Pablo Fernicola
• Murray Sargent
• Geraldine Wade
• Tola Chhoeun
• Anthony Hanses
• Jim McGill

Chem4Word Wade

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (17)

Similar to Chem4Word Wade

Similar to Chem4Word Wade (20)

Recently uploaded

Recently uploaded (20)

Chem4Word Wade

Editor's Notes