WordPress Websites for Engineers: Elevate Your Brand
Unit 10: XML and Beyond (Sematic Web, Web Services, ...)
1. Unit 10: XML and Web and Beyond
XML
DTD, XMLSchema
XSL, Xquery
Web Services
SOAP, WSDL
RESTful Web Services
Semantic Web
Introduction
RDF, RDF Schema, OWL, SPARQL
dsbw 2011/2012 q1 1
2. eXtensible Markup Language
“... is a simple, very flexible text format derived from SGML
(ISO 8879). Originally designed to meet the challenges of
large-scale electronic publishing, XML is also playing an
increasingly important role in the exchange of a wide variety
of data on the Web and elsewhere. ”
W3 Consortium
XML …
is not a solution but a tool to build solutions
is not a language but a meta-language that require
interoperating applications that use it to adopt clear
conventions on how to use it
is a standardized text format that is used to represent
structured information
dsbw 2011/2012 q1 2
3. SGML, XML and their applications
Meta-Markup Language
SGML
Application
Markup Language
XML
HyTime HTML
XHTML SMIL SOAP WML
dsbw 2011/2012 q1 3
4. Well-Formed XML Documents
The document has exactly one root element
The root element can be preceded by an optional XML declaration
Non-empty elements are delimited by both a start-tag and an end-tag.
Empty elements are marked with an empty-element (self-closing) tag
Tags may be nested but must not overlap
All attribute values are quoted with either single (') or double (") quotes
<?xml version="1.0" encoding="UTF-8"?>
<address>
<street>
<line>123 Pine Rd.</line>
</street>
<city name="Lexington"/>
<state abbrev="SC"/>
<zip base="19072" plus4=""/>
</address>
dsbw 2011/2012 q1 4
5. Valid XML Documents
Are well-formed XML documents
Are documents that conform the rules defined by certain
schemas
Schema: define the legal building blocks of an XML
document. It defines the document structure with a list of
legal elements. Two ways to define a schema:
DTD: Document Type Definition
XML Schema
dsbw 2011/2012 q1 5
6. DTD Example: Embedded and External Definitions
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE address [
<!ELEMENT address (street, city, state, zip)>
<!ELEMENT street line+>
<!ELEMENT line (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)> ]>
<address> ... </address>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE address SYSTEM
"http://dtd.mycompany.com/address.dtd">
<address> ... </address>
dsbw 2011/2012 q1 6
7. DTD Limitations
DTD is not integrated with Namespace technology so users
cannot import and reuse code
DTD does not support data types other than character data
DTD syntax is not XML compliant
DTD language constructs are no extensible
dsbw 2011/2012 q1 7
9. Processing XML Documents
Using a programming language and the SAX API.
SAX is a lexical, event-driven interface in which a document is
read serially and its contents are reported as "callbacks" to
various methods on a handler object of the user's design
Using a programming language and the DOM API.
DOM allows for navigation of the entire document as if it were
a tree of "Node" objects representing the document's contents.
Using a transformation engine and a filter
XSLT, XQuery, etc
dsbw 2011/2012 q1 9
10. XML Uses
Alternative/complement to HTML
XML + CSS, XML + XSL, XHTML
Declarative application programming/configuration
Configuration files, descriptors, etc.
Data exchange among heterogeneous systems
B2B, e-commerce: ebXML
Data Integration from heterogeneous sources
Schema mediation
Data storage and processing
XML Databases, XQuery (XPath)
Protocol definition
SOAP, WAP, WML, etc.
dsbw 2011/2012 q1 10
11. XPath
Expression language to address elements of an XML
document (used in XSLT, XQuery, …)
A location path is a sequence of location steps separated by a
slash (/)
Various navigation axes such as child, parent, following
etc.
XPath expressions look similar to file pathnames:
/bib/book
/bib/book[year>2008]/title
//author[3]
dsbw 2011/2012 q1 11
12. eXtensible Stylesheet Language: XSL
XSL serves the dual purpose of
transforming XML documents
exhibiting control over document rendering
XSL consists of two parts:
XSL Transformations (XSLT):
An XML language for transforming XML documents
It uses XPath to search and transverse the element hierarchy of
XML documents
XSL Formatting Objects (XSL-FO):
An XML language for specifying the visual formatting of an XML
document.
It is a superset of the CSS functionally designed to support print
layouts.
dsbw 2011/2012 q1 12
13. XQuery (XML Query): Example (source)
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price>39.95</price>
</book>
</book>
</bib>
dsbw 2011/2012 q1 13
14. XQuery (XML Query): Example (query)
<results>
{ let $a := doc("http://bstore1.example.com/bib/bib.xml")//author
for $last in distinct-values($a/last),
$first in distinct-values($a[last=$last]/first)
order by $last, $first
return For each author, retrieve its last, first names
as well as the title of its books, ordered by
<author> last, first names
<name>
<last>{ $last }</last><first>{ $first }</first>
</name>
{ for $b in doc("http://bstore1.example.com/bib.xml")/bib/book
where some $ba in $b/author
satisfies ($ba/last = $last and $ba/first=$first)
return $b/title }
</author> }
</results>
dsbw 2011/2012 q1 14
15. XQuery (XML Query): Example (result)
<results>
<author>
<name>
<last>Abiteboul</last><first>Serge</first>
</name>
<title>Data on the Web</title>
</author>
<author>
<name>
<last>Stevens</last><first>W.</first>
</name>
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix environment</title>
</author>
<author>
<name>
<last>Suciu</last><first>Dan</first>
</name>
<title>Data on the Web</title>
</author>
</results>
dsbw 2011/2012 q1 15
16. A Smarter Web Is Possible
People and communities have data stores and applications to share
Vision:
Expand the Web to include more machine-understandable resources
Enable global interoperability between resources you know should be
interoperable as well as those you don't yet know should be
interoperable
Key Web technologies:
Web Services: Web of Programs
Standards for interactions between programs, linked on the Web
Easier to Expose and Use services (and data they provide)
Semantic Web: Web of Data
Standards for things, relationships and descriptions, linked on the Web
Easier to Understand, Search for, Share, Re-Use, Aggregate, Extend
information
dsbw 2011/2012 q1 16
17. Web Services
“A Web service is a software system designed to support interoperable
machine-to-machine interaction over a network. It has an interface
described in a machine-processable format (specifically WSDL). Other
systems interact with the Web service in a manner prescribed by its
description using SOAP-messages, typically conveyed using HTTP with an
XML serialization in conjunction with other Web-related standards”. Web
Services Glossary, W3C, http://www.w3.org/TR/ws-gloss/
UDDI: Universal
Description,
Discovery and
Integration
dsbw 2011/2012 q1 17
18. Simple Object Access Protocol (SOAP)
SOAP is a simple XML based protocol to let applications
exchange information over HTTP.
A SOAP message is a XML document containing the following
elements:
A required Envelope element that identifies the XML document
as a SOAP message
An optional Header element that contains header information
A required Body element that contains call and response
information
An optional Fault element that provides information about
errors that occurred while processing the message
dsbw 2011/2012 q1 18
21. Web Services Description Language (WSDL)
A WSDL document describes a web <definitions>
service using these major elements: <types>
<portType>: The operations type definition ......
performed by the web service </types>
<message>: The messages used <message>
by the web service
message definition ...
<types>: The data types used by </message>
the web service
<portType>
<binding>: The communication
port definition ....
protocols used by the web
</portType>
service
<binding>
binding definition ..
</binding>
</definitions>
dsbw 2011/2012 q1 21
23. RESTful Web Services
RESTFul Web Services expose their
data and functionality trough
resources identified by URI
Uniform Interface Principle: Clients
interact with resources through a
fix set of verbs. Example HTTP:
GET (read), PUT (update), DELETE, POST (catch all),
Multiple representations (MIME types) for the same resource:
XML, JSON, …
Hyperlinks model resource relationships and valid state
transitions for dynamic protocol description and discovery
dsbw 2011/2012 q1 23
24. Representational State Transfer (REST)
REST is an architectural style for networked systems based on the
following principles:
Client-server
Stateless
no client context being stored on the server between requests
Cacheable
Layered System
Any number of connectors (e.g., clients, servers, caches, firewalls,
tunnels, etc.) can mediate the request, but each does so without
being concern about anything but its own request
Code-on-demand (optional)
Servers can extend or customize the functionality of a client by
transferring to it logic that it can execute.
Uniform Interface
dsbw 2011/2012 q1 24
25. REST: Uniform Interface
All important resources are identified by one (uniform)
resource identifier mechanism (e.g. URI)
Access methods mean the same for all resources (universal
semantics; e.g.: GET, POST, DELETE, PUT)
Hypertext as the engine of application state (HATEOAS):
A successful response indicates (or contains) a current
representation of the state of the identified resource
The resource remains hidden behind the interface.
Some representations contain links to potential next
application states, including direction on how to transition
to those states when a transition is selected.
dsbw 2011/2012 q1 25
26. RESTful WS: URI Design Guidelines
Only two base URIs per resource:
Collection: /stocks (plural noun)
Element: /stocks/{stock_id} (e.g. /stocks/IBM )
Complex variations:
/dogs?color=red&state=running&location=park
Versioning:
/v1/stocks
Positioning:
/stocks?limit=25&offset=50
Non-resources (e.g. calculate, convert, …):
/convert?from=EUR&to=CNY&amount=100 (verbs, not nouns)
dsbw 2011/2012 q1 26
27. RESTful WS: Example (adapted from Wikipedia)
Resource GET PUT POST DELETE
http://www.stock.org/ List the Replace the Create a new entry Delete the
stocks members entire in the collection. entire
(URIs and collection with The new entry's ID collection.
perhaps other another is assigned
details) of the collection. automatically and
collection. For is usually returned
example list all by the operation.
the stocks.
http://www.stock.org/ Retrieve a Update the Treat the Delete the
stocks/IBM representation addressed addressed member addressed
of the member of the as a collection in its member of
addressed collection, or if own right the
member of it doesn't and create a new collection.
the collection, exist,create it. entry in it.
expressed in
an appropriate
Internet media
type.
dsbw 2011/2012 q1 27
29. Semantic Web = The Web of Data
“The Web was designed as an information space, with the goal that
it should be useful not only for human-human communication, but
also that machines would be able to participate and help. One of
the major obstacles to this has been the fact that most information
on the Web is designed for human consumption, and even if it was
derived from a database with well defined meanings (in at least
some terms) for its columns, that the structure of the data is not
evident to a robot browsing the web. Leaving aside the artificial
intelligence problem of training machines to behave like people, the
Semantic Web approach instead develops languages for expressing
information in a machine processable form”.
"If HTML and the Web made all the online documents look like one
huge book, RDF, schema, and inference languages will make all the
data in the world look like one huge database"
Tim Berners-Lee
dsbw 2011/2012 q1 29
30. The Current Web (1/2)
Resources:
Identified by URI's
untyped
Links:
href, src, ...
limited, non-descriptive
Users:
A lot of information, but its
meaning must be interpreted
and deduced from the
content as it has been done
since millenniums
Machines:
They don’t understand.
dsbw 2011/2012 q1 30
31. The Current Web (2/2)
The Public Web
The web found when searching and browsing
At least 21 billion pages indexed by standard search engines
The Deep Web
Large data repositories that require their own internal searches.
About 6 trillion documents not indexed by standard search
engines.
The Private Web
Password-protected sites and data: corporate intranets, private
networks, subscription-based services, etc.
About 3 trillion documents not indexed by standard search
engines.
dsbw 2011/2012 q1 31
32. The Semantic Web
Resources:
Globally identified by URIs
or locally (Blank)
Extensible
Relational
Links:
Identified by URIs
Extensible
Relational
Users:
More an better information
Machines:
More processable
information (Data Web)
dsbw 2011/2012 q1 32
33. Semantic Web: How?
Make web resources more accessible to automated processes
Extend existing rendering markup with semantic markup
Metadata (data about data) annotations that describe
content/function of web accessible resources
Use Ontologies to provide vocabulary for annotations
“Formal specification” accessible to machines
A prerequisite is a standard web ontology language
Need to agree common syntax before we can share
semantics
Syntactic web based on standards such as HTTP and HTML
dsbw 2011/2012 q1 33
35. Semantic Web: W3C Standards and Tools
RDF (Resource Description
Framework): simple data model to
describe resources and their
relationships
RDF Schema: is a language for
declaring basic class and types for
describing the terms used in RDF,
that allows defining class
hierarchies
SPARQL: SPARQL Protocol and RDF
Query Language
OWL: Web Ontology Language.
Allows enriching the description of
properties and classes, including,
among others, class disjunction,
association cardinality, richer data
types, property features (eg.
symmetry), etc.
dsbw 2011/2012 q1 35
36. Resource Description Framework (RDF)
RDF is graphical formalism ( + XML syntax + semantics)
for representing metadata
for describing the semantics of information in a machine- accessible
way
RDF Statements are <subject, predicate, object> triples that
describe properties of resources :
<Carles,hasColleague,Ernest>
XML representation:
<Description about="some.uri/person/carles_farre">
<hasColleague
resource="some.uri/person/ernest_teniente"/>
</Description>
dsbw 2011/2012 q1 36
37. RDF Schema
RDF Schema allows you to define vocabulary terms and the
relations between those terms
it gives “extra meaning” to particular RDF predicates and resources
this “extra meaning”, or semantics, specifies how a term should be
interpreted
Examples:
<Person,type,Class>
<hasColleague,type,Property>
<Professor,subClassOf,Person>
<Cristina,type,Professor>
<hasColleague,range,Person>
<hasColleague,domain,Person>
dsbw 2011/2012 q1 37
38. Problems with RDFS
RDFS too weak to describe resources in sufficient detail
No localized range and domain constraints
Can’t say that the range of hasChild is person when applied to
persons and elephant when applied to elephants
No existence/cardinality constraints
Can’t say that all instances of person have a mother that is also a
person, or that persons have exactly 2 parents
No transitive, inverse or symmetrical properties
Can’t say that isPartOf is a transitive property, that hasPart is the
inverse of isPartOf or that touches is symmetrical
…
Difficult to provide reasoning support
No “native” reasoners for non-standard semantics
May be possible to reason via FO axiomatization
dsbw 2011/2012 q1 38
39. Web Ontology Language (OWL)
OWL is RDF(S), adding vocabulary to specify:
Relations between classes
Cardinality
Equality
More typing of and characteristics of properties
Enumerated classes
Three species of OWL
OWL full is union of OWL syntax and RDF
OWL DL restricted to FOL fragment (≅ SHIQ Description Logic)
OWL Lite is “easier to implement” subset of OWL DL
OWL DL Benefits from many years of DL research
Well defined semantics
Formal properties well understood (complexity, decidability)
Known reasoning algorithms
Implemented systems (highly optimised)
dsbw 2011/2012 q1 39
41. SPARQL Protocol And RDF Query Language
Designed to query collections of triples…
…and to easily traverse relationships
Vaguely SQL-like syntax (SELECT, WHERE)
“Matches graph patterns”
SELECT ?sal
WHERE { emps:e13954 HR:salary ?sal }
dsbw 2011/2012 q1 41
42. SQL vs SPARQL
EMP_ID NAME HIRE_ SALARY
DATE
emps:e13954 HR:name 'Joe'
emps:e13954 HR:hire-date 2000-04-14
13954 Joe 2000-04-14 48000 emps:e13954 HR:salary 48000
10335 Mary 1998-11-23 52000 emps:e10335 HR:name ‘Mary'
… … … … emps:e10335 HR:hire-date 1998-11-23
emps:e10335 HR:salary 52000
04182 Bob 2005-02-10 21750 …
SELECT hire_date SELECT ?hdate
WHERE
FROM employees
{ ?id HR:salary ?sal
WHERE salary >= 21750 ?id HR:hire_date ?hdate
FILTER ?sal >= 21750 }
dsbw 2011/2012 q1 42
43. Semantic Web Services
Web Services
Dynamic UDDI, WSDL, SOAP Semantic Web Services
Static WWW Semantic Web
URI, HTML, HTTP RDF, RDF(S), OWL
The main aim is to enable highly flexible Web services
architectures, where new services can be quickly discovered,
orchestrated and composed into workflows by
creating a semantic markup of Web services that makes them
machine understandable and use-apparent is necessary
developing an agent technology that exploits this semantic markup to
support automated Web service composition and interoperability
dsbw 2011/2012 q1 43
44. References
KAPPEL, Gerti et al. Web Engineering, John Wiley & Sons,
2006. Chapter 14.
SHKLAR, Leon and ROSEN, Rich. Web Application
Architecture: Principles, Protocols and Practices, 2nd Edition.
John Wiley & Sons, 2009. Chapters 5 and 13.
RAY, Kate. Web 3.0 (video) http://vimeo.com/11529540
www.w3.org
www.w3schools.com
dsbw 2011/2012 q1 44