1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
API economy
Transformation from closed to open innovation
Data Archiving and Network Services (DANS)
Vyacheslav Tykhonov
Senior Data Scientist
vyacheslav.tykhonov@dans.knaw.nl
The Hague, 20.09.2016
2. What is API?
Concept
• Application program interface (API) is an interface between different
software systems that allows to facilitate their interaction in the same way
as human and computer
• API is the only way to win over the technological disruption
Benefits
• API can provide direct programable access to systems and processes inside
of any organisation or company
• in the modern world API driven usually means data driven
• building APIs is the way to improve any digital ecosystem and continue
development in new agile way, speed up rapid prototyping
• organisation’s core assets can be reused, shared, and monetised through
data services offering APIs
3. Problems
• No integrated approach
• Systems should talk to each other in order to share
information across physical and logical boundaries
• new development should be driven by increasingly
sophisticated ecosystems and business processes that are
supported by complex interactions across multiple
endpoints
• very difficult to build efficient communication channels
between communities in different digital ecosystems
4. Vision and concept
• API provides a better way to encapsulate and share
information and enable transaction processing between
elements in the solution stack
• APIs should be managed like a product—one built on top of
a potentially complex technical footprint that includes
legacy and third-party systems and data
• Agile (lean) delivery models has put an emphasis on rapid
experimentation and development
• Data and services are the currency that will fuel the new
API economy
• All connected devices will have inherent internal and
external dependencies in underlying data and services
7. Open vs closed innovation
Henry Chesbrough’s Open Innovation: The New Imperative for Creating And Profiting from Technology
8. Open innovation in the digital ecosystem
• Open innovation is the process by which organizations use
both internal and external knowledge to drive and accelerate
their internal innovation strategy
• APIs can help companies innovate faster and lead to new
products and new customers
• APIs making data available in useful and reusable ways
• staying competitive means trade on the insights and services
across your balance sheet
• promoting things you already do well and bringing them to
the broadest possible audience both inside of organisation or
outside
• API is the key to unlock data values and force internal
innovation to make organisation more agile
9. Open innovation and community engagement
• Social Engagement is the ability to work constructively within
and between social groups or individuals to create more
resilient and sustainable communities
• Community engagement is a set of communications,
interactions and participations of different groups of people
excited by the same idea or vision
• API economy will force open innovation to create
opportunities that can be seed in communities in order to get
even more ideas and values
• new insights can help to build bridge to other communities to
combine knowledge from different ecosystems
10. API Recycling as open innovation
• Communities can benefit from digital ecosystem holding all data and services
created before
• there is no need to break everything and create again from scratch
• No knowledge can be lost in the future so next generations can benefit from it
• saving costs, time and human resources by reusing not developing the same
product every time
• Netflix and Uber is a great example of open innovation with Open Data
• Analog vs digital content, individual vs shared information
• New ideas and knowledge from different fields will force singularity and
community engagement
• added value is the transformation from closed commercial restricted values to
publicly open community values
• combining of knowledge from different disciplined will force inventions of
something principally new
11. data and tools mean RESTful API
• REST is acronym for REpresentational State Transfer and stands for
architectural style for distributed hypermedia systems (first
presented by Roy Fielding in 2000)
• The key abstraction of information in REST is a resource (any
information that can be named can be a resource)
• REST uses a resource identifier to identify the particular resource
involved in an interaction between components.
• information representation consists of data, metadata describing the
data and hypermedia links which can help the clients in transition to
next desired state (see Dataverse APIs)
• resource methods to be used to perform the desired transition are
forming uniform interface like GET/PUT/POST/DELETE commands in
HTTP protocol
12. Guiding Principles of REST (white paper)
1 Client–server – By separating the user interface concerns from the data storage concerns, we
improve the portability of the user interface across multiple platforms and improve scalability by
simplifying the server components.
2 Stateless – Each request from client to server must contain all of the information necessary to
understand the request, and cannot take advantage of any stored context on the server.
Session state is therefore kept entirely on the client.
3 Cacheable – Cache constraints require that the data within a response to a request be
implicitly or explicitly labeled as cacheable or non-cacheable. If a response is cacheable, then a
client cache is given the right to reuse that response data for later, equivalent requests.
4 Uniform interface – By applying the software engineering principle of generality to the
component interface, the overall system architecture is simplified and the visibility of
interactions is improved. In order to obtain a uniform interface, multiple architectural
constraints are needed to guide the behavior of components. REST is defined by four interface
constraints: identification of resources; manipulation of resources through representations; self-
descriptive messages; and, hypermedia as the engine of application state.
5 Layered system – The layered system style allows an architecture to be composed of
hierarchical layers by constraining component behavior such that each component cannot “see”
beyond the immediate layer with which they are interacting.
6 Code on demand (optional) – REST allows client functionality to be extended by downloading
and executing code in the form of applets or scripts. This simplifies clients by reducing the
number of features required to be pre-implemented.
13. Customer Development Methodology in API economy
• developed by serial entrepreneur and business school
Professor Steve Blank for startup communities
• the same principles can be applied to different digital
ecosystems to force innovation and create added value
• building high level communication layers between
participants delivering different stuff to community: vision,
ideas, technical solutions and requirements have the same
value
• balanced communities form sustainable ecosystems that can
help to exchange valuable information with other
communities and everybody can benefit from this
collaboration
14. Use case: CLARIAH’s data and tools
• Goal: create overview automatically for data and tools and
deliver to the research community to force open innovation
• Process of data collection: Harvesting public datasets from
all partners to collect metadata and datasets links (data)
• Software methods: Aggregating activity from CLARIAH
github accounts to generate overview of all working
packages development (tools)
• Generating RESTful APIs to deliver information on data
and tools in the standardised form (JSON) ready for export
to other libraries or external discovery services
http://www.clariah.nl/over/geschiedenis/voorstel pdf p. 3
In order to profit from the potential offered by the Digital Turn, the following problems must be tackled:
there is no integrated approach for the humanities in dealing with digital data and tools
existing data sets are not connected and tools often apply to idiosyncratic formats only
there is lack of training of researchers and students in using digital methods to analyse large data sets
The availability of massive quantities of digital sources (textual, audio-visual and structured data) for research is revolutionizing the humanities. Top-quality humanities scholarship of today and tomorrow is therefore only possible with the use of sophisticated ICT tools. CLARIAH aims to offer humanities scholars a Common Lab that provides them access to large collections of digital resources and innovative user-friendly processing tools, thus enabling them to carry out ground-breaking research to discover the nature of human culture.
To build such an infrastructure, called the Common Lab, is the goal of the CLARIAH project
(e-science infrastructure for the humanities)
CLARIAH is the national counterpart for the European infrastructure programmes CLARIN-EU and DARIAH-EU
http://www.clariah.nl/over/geschiedenis/voorstel
Proposal was
-submitted to NWO on 1 October 2013 by a core team of applicants, who all wrote a Letter of Intent
-supported by a large number of institutes (universities, research-, heritage-, and public institutions, and companies), who all wrote a Letter of Support
Blue: core infrastructure
-based on LOD
-WPs are responsible for the selection and curation of datasets and to expose these as graphs
-these graphs form domain-specific linked data clouds
-these graphs expose resources through an API and and offer search endpoints
-WP2 will index the domain-specific graphs base don generic ontologies
Orange: software for user interaction with the core infrastructure. Analytical tools, visualisation tools, work environments
-avoid customisation, but rather attempt to generalise as much as possible
Blue: core infrastructure
-based on LOD
-WPs are responsible for the selection and curation of datasets and to expose these as graphs
-these graphs form domain-specific linked data clouds
-these graphs expose resources through an API and and offer search endpoints
-WP2 will index the domain-specific graphs base don generic ontologies
Orange: software for user interaction with the core infrastructure. Analytical tools, visualisation tools, work environments
-avoid customisation, but rather attempt to generalise as much as possible
Blue: core infrastructure
-based on LOD
-WPs are responsible for the selection and curation of datasets and to expose these as graphs
-these graphs form domain-specific linked data clouds
-these graphs expose resources through an API and and offer search endpoints
-WP2 will index the domain-specific graphs base don generic ontologies
Orange: software for user interaction with the core infrastructure. Analytical tools, visualisation tools, work environments
-avoid customisation, but rather attempt to generalise as much as possible
Basic information such as the amount of data, the format, a metadata description, etc.
CKAN is a powerful data management system that makes data accessible
Rinke over CKAN: het is een plan dat gedoemd is te mislukken tenzij de CKAN instance automatisch bevolkt wordt vanuit de repositories waar data echt staat. Als dit niet gebeurt, dan loopt de CKAN snel achter op de daadwerkelijke situatie.