SlideShare a Scribd company logo
1 of 160
Download to read offline
Integrating the Cloud into Content
Using Semantics to Enhance Content Publishing

                    Jamie Taylor
        http://semprog.com/presentations/web20ny
What do y'all mean
  "Semantics"
critique          misfortune

                              bad luck   occurrence
     roast


KNOCK                 sound              zing

                   knocking zizz    vroom
    bang
        belt   bash               bump

   rap whack

        blow
critique           misfortune

                               bad luck   occurrence
        roast


LJOMF                  sound              zing

                   knocking zizz     vroom
    bang
        belt    bash               bump

   rap whack

        blow
IBM
1 New Orchard Road
          Publicaly Listed                                                      Armonk, New York
            Company




                                                                     rs
                                    Le
                                                                                          0000051143




                                                                 arte
                                     ga
                                       lS
                                                                                                    NYSE:IBM




                                                             dqu
                                         tru
                                                                                             ol




                                                           Hea
                                             ctu
          1889                                                                             b




                                                                     K
                                                                                       ym




                                                                  CI
                                                re
                      Dat                                                            S
                          e   Fou                                          e     r
                                 nde                                 Ti ck
                                         d
Thomas Watson          Founders                                                                   Sam Palmisano
                                                       IBM                     CEO


                                SIC                              O
                                                                   pe
    3571:Electronic                                                     ra
                                      IC                                   t


                                                          Soft
                                                     es
      Computers                     NA           diari
                                                                            in
                                                                               g
                                                                                     In

                                                           war
                                                                                       co
                                                                                          m
                                                   i


                                                              e De
                                               Subs


                                                                                            e
   334111:Electronic                                                                              17,604,000,000
 Computer Manufacturing                                          velo
                                                                  ped                               USD 2006
                                 Cognos
                               Cross Worlds                                    SANSF, ViaVoice
                                                                                 Lotus Notes
1 New Orchard Road
          Publicaly Listed                                                      Armonk, New York
            Company




                                                                     rs
                                    Le
                                                                                          0000051143




                                                                 arte
                                     ga
                                       lS
                                                                                                    NYSE:IBM




                                                             dqu
                                         tru
                                                                                             ol




                                                           Hea
                                             ctu
          1889                                                                             b




                                                                     K
                                                                                       ym




                                                                  CI
                                                re
                      Dat                                                            S
                          e   Fou                                          e     r
                                 nde                                 Ti ck
                                         d
Thomas Watson          Founders                                                                   Sam Palmisano
                                                                               CEO


                                SIC                              O
                                                                   pe
    3571:Electronic                                                     ra
                                      IC                                   t


                                                          Soft
                                                     es
      Computers                     NA           diari
                                                                            in
                                                                               g
                                                                                     In

                                                           war
                                                                                       co
                                                                                          m
                                                   i


                                                              e De
                                               Subs


                                                                                            e
   334111:Electronic                                                                              17,604,000,000
 Computer Manufacturing                                          velo
                                                                  ped                               USD 2006
                                 Cognos
                               Cross Worlds                                    SANSF, ViaVoice
                                                                                 Lotus Notes
http://www.flickr.com/photos/pacroon/
http://www.flickr.com/photos/soldiersmediacenter/
PageRank
       tm
1 New Orchard Road
          Publicaly Listed                Armonk, New York
            Company
                                            0000051143
                                                    NYSE:IBM
          1889


Thomas Watson                                    Sam Palmisano


    3571:Electronic
      Computers

   334111:Electronic                             17,604,000,000
 Computer Manufacturing                            USD 2006
                            Cognos
                          Cross Worlds   SANSF, ViaVoice
                                           Lotus Notes
Earlier this year, the AP slashed prices to try to hold on to
subscribers.

That's not the answer, says Jeff Jarvis, journalism professor at
City University of New York.

  JEFF JARVIS: The fundamentals of the media economy
are changing, from a content economy to a link-based
economy.

Jarvis says the AP needs to become the broker for those links,
like helping the Baltimore Sun link to a story about GM from the
Detroit Free Press.
Jarvis resorts to the
                                          concept of a "gift
                                          economy" to explain
                                          the link economy




http://www.flickr.com/photos/pagedooley/
I am a behavioral
economist.

Gift economics are
frequently used as
explanations for
what we don't
understand
Worse I am a
Behaviorist

Only talk about
what you can
observe
Semantics



Process of communicating enough
  meaning to result in an action
Link Economy


•   Enriching links focuses meaning
    •   Improves "findability" (SEO)
    •   Increased usability
    •   Better ad selection
Link Economy   At the end of this talk -
                              you should be able to
                              say how semantics
                              benefits each of these
                              groups




•   Semantics Benefit
    •   Site owners
    •   Site users
    •   Developers
    •   You
Wish it were real
Might be real
Is real, but don't believe it
Is very useful



                  Build Flexible
                 Applications with
                   Graph Data
Not Your Typical
Semantic Web Talk
The W3C Layer Cake




     The Cake
       taken from http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png
AI Agents




      http://www.flickr.com/photos/matthewtownsend/
Ontologies
RDF Serialization Formats
<http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000005b7ab1a> <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://rdf.freebase.com/ns/business.employment_tenure>.
<http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000005b7ab1a> <http://rdf.freebase.com/ns/
business.employment_tenure.company> <http://rdf.freebase.com/ns/en.determine_software>.
<http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000007e53e16> <http://rdf.freebase.com/ns/
education.education.institution> <http://rdf.freebase.com/ns/en.mounds_view_high_school>.<http://rdf.freebase.com/ns/
guid.9202a8c04000641f8000000007e53e16> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://
rdf.freebase.com/ns/education.education>.
<http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000007e53e16> <http://rdf.freebase.com/ns/
education.education.student> <http://rdf.freebase.com/ns/en.jamie_taylor>.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/business.company_founder.companies_founded>
<http://rdf.freebase.com/ns/en.mobius_net>.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://creativecommons.org/ns#attributionName> "Source: Freebase - The
World's database".
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/people.person.nationality> <http://
rdf.freebase.com/ns/en.united_states>.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/common.topic.image> <http://rdf.freebase.com/
ns/en.jamie_headshot>.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/type.object.name> "Jamie Taylor"@en.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://
rdf.freebase.com/ns/user.skud.freebase_events.tshirt_recipient>.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://
rdf.freebase.com/ns/user.skud.freebase_events.topic>.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://
rdf.freebase.com/ns/book.author>.
<http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://
rdf.freebase.com/ns/people.person>.
Instead....
                                    Part I
                                    - so you can explain to other

                                    Part II
                                    - so you can do what you say




• Part      I
 •   Why
     •   Uses, Benefits

• Part      II
 •   How
     •   Representation, Concepts
Part I
Why
Is very useful



                  Build Flexible
                 Applications with
                   Graph Data
The Office (US)                                       Leatherheads
TV Program                                           Film




                 stars in                        starred in




                            John Krasinski
                            Person, Actor




                                      attended




                            Brown University
                            College/university




                 Graph Data Model
A socially managed semantic database
Freebase has Many Types of Things
9,547,107 Topics
Contributions over $50000 made to members of the
US congress in the 2008 election cycle by companies
    headquartered outside of the United States

                                        topic:                                                topic:
                                  Barack Obama                                             Switzerland



                   government position held      took money from             is based in



          topic:
                                                                    topic:
      United States
                                                                   UBS AG
        Senator




                                                        Freebase
Industry Browser Identity Model
Industry (USCB)         Company              Company              Donations
    NAICS                Ticker        CRP    CRP ID     CRP       CRP ID

       NAICS/SIC Map
                            SEC
          Freebase


Industry (SEC)          Company               People               Person
     SIC          SEC     CIK          SEC     CIK     Freebase   Wikipedia

                            Freebase                                  Wikipedia


                        Location                                   Article
                        ZIP Code
Industry Browser




http://kiwitobes.com/industry_mashup/
Barriers between science and
                  the humanities impede solving
                  humanities important problems




Web 2.0 + Semantics
"Smoov"
Ankolekar et al.2007
Topic Blocks
http://www.freebase.com/topicblocks/index?id=/en/pirates_of_the_caribbean_3
http://www.freebase.com/widget/topic?
mode=i&pane=image,article_props& id=/en/pirates_of_the_caribbean_3




              http://www.freebase.com/widget/topic?
       mode=i&pane=image,article_props&id=/en/blade_runner
Patrick Sinclair (BBC)
About the Content (and visitor?)
MIT Simile
Simile




http://dev.mqlx.com/~jamie/simile/timeline.html
Data Portability
Data     Data


                Semantics allows data to
                be utilized by
         Data
                unanticipated new
                applications
Data
Simile
MIT Simile: Exhibit
User Experience
Topic Hubs
Open Calais
Open Calais
http://p.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633


                          Open Calais
<rdf:Description rdf:nodeID="A1">
                               <att:lastupdated>2009-06-18T21:22:28</att:lastupdated>
                               <att:text>IBM Corporation And Siemens Announce Integrated Solutions To Help
                            Companies</att:text>
                             </rdf:Description>
                             <rdf:Description rdf:nodeID="A2">
                               <att:code>3577</att:code>
                               <att:description>Computer Periph'L Equipment, Nec</att:description>
                             </rdf:Description>
                             <rdf:Description rdf:nodeID="A3">
                               <att:code>7371</att:code>
                               <att:description>Computer Programming Services</att:description>
                             </rdf:Description>
                             <rdf:Description rdf:nodeID="A4">
                               <att:age>46</att:age>
                               <att:lastname>Iwata</att:lastname>
                               <att:officerurl rdf:resource="http://www.reuters.com/finance/stocks/
                            officerProfile?symbol=IBM.N&amp;officerId=222727"/>
                               <att:firstname>Jon</att:firstname>
                               <att:title>Senior Vice President - Marketing and Communications</att:title>
                               <att:middle>C.</att:middle>
                             </rdf:Description>




http://p.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633


                          Open Calais
<owl:sameAs rdf:resource="http://dbpedia.org/resource/IBM"/>
   <owl:sameAs rdf:resource="http://cb.semsol.org/company/ibm#self"/>

                A Graph of Graphs
   <owl:sameAs rdf:resource="http://p.opencalais.com/er/company/ralg-
tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633"/>
Epispider




Herman Tolentino et al. http://epispider.net/index.php
Chris Thorpe




guardian.co.uk Open Platform
Vocabulary


Do you understand the words that are
coming out of my mouth?
         -Chris Tucker, Rush Hour
1 New Orchard Road
          Publicaly Listed                                                 Armonk, New York
            Company




                                                                  rs
                               Le
                                                                                    0000051143




                                                              arte
                                  g
                                  al
                                                                                                 NYSE:IBM




                                                          dqu
                                    Str
                                       uc
                                                                                          ol




                                                        Hea
          1889                                                                          b




                                                                  K
                                       tur
                                                                                   ym




                                                               CI
                      Dat                                                        S




                                          e
                         eF                                             e    r
                            oun
                                ded                               Ti ck
Thomas Watson          Founders                                                                Sam Palmisano
                                                                          CEO


                              SIC                             O
                                                                pe
    3571:Electronic                                                  ra
                                  IC                                   tin


                                                       Soft
                                                  es
      Computers                NA                                         g
                                            diari                                In

                                                        war
                                                                                    com
                                               i


                                                         e De
                                          Subs

                                                                                          e
   334111:Electronic                                                                           17,604,000,000
 Computer Manufacturing                                       velo
                                                               ped                               USD 2006
                                Cognos
                              Cross Worlds                                SANSF, ViaVoice
                                                                            Lotus Notes
Epispider




Herman Tolentino et al. http://epispider.net/index.php
vocabularies...are
   everywhere
@




              Short URLs
                           #




The Twitter Vocabulary
Pivot on an @ tag
Pivot on a # tag
http://bit.ly/info/3zyJ8g




Pivot on a Short URL
Vocabularies make links
 more understandable

...and thus content more
         findable
microformats


 Annotate existing HTML so the
content can be "extracted by
software and indexed, searched for,
saved, cross-referenced or
combined. "
microformats
microformats
<div class="vcard">
.....
	     	   <div id="view">
	     	   	     <div id="home">

	   	    	    	    <table>
	   	    	    	    	    <tr>
	   	    	    	    	    	      <td class="f">address</td>
	   	    	    	    	    	      <td class="v">
	   	    	    	    	    	      	    <div class="adr">
	   	    	    	    	    	      	    	    <span class="locality">Berkeley</span>,
	   	    	    	    	    	      	    	    <span class="region">CA</span>
	   	    	    	    	    	      	    	    <div class="country-name">United States</div>

	    	    	   	    	   	     	    </div>
	    	    	   	    	   	     </td>
	    	    	   	    	   </tr>
	    	    	   	    	   <tr>
	    	    	   	    	   	     <td class="f">aim</td>
	    	    	   	    	   	     <td class="v"><a id="aim" class="url im offline"
href="aim:goim?screenname=jaredhanson@mac.com">jaredhanson@mac.com</a></td>
	    	    	   	    	   </tr>
microformats.org
microformats


• (Relatively) easy to use
• Small, fixed vocabulary
• No standard parsing pattern
• No strong identifiers
  • Limits utility
RDFa




Annotate HTML with machine readable RDF
RDFa



<div xmlns:fb=”http://rdf.freebase.com/ns/”
    about=”http://rdf.freebase.com/ns/en.jamie_taylor”
    rel=fb:people.person.place_of_birth>

 <span resource=”http://rdf.freebase.com/ns/en.saint_paul”/>

</div>
RDFa

• Unambiguous identifiers
• Extensible vocabulary
• Standard parsing pattern
  • Produces RDF
• Hard to use
  • Rules about formatting based on RDF
What “concepts” are covered in content
                                  Like existing tagging,
                                         but with strong identifiers!
            <resource>

                 tagged




                  Tag        taggingDate     "2001-01-01"



         label            means

"text"                      <resource>
                                                  Strong identifier goes here!
<resource>

                                                                    tagged




                                                                     Tag        taggingDate



                                                            label            means

<div class="rdfa"                                  "text"                      <resource>

     xmlns:ctag="http://commontag.org/ns#">


    NASA's
    <a typeof="ctag:Tag"
         rel="ctag:means"
         href="http://rdf.freebase.com/ns/en.phoenix_mars_mission"
         property="ctag:label">Phoenix Mars Lander</a>
    has deployed its robotic arm.

</div>
And the winner is....
HTML5 MicroData



•   Annotate HTML with machine
    readable data
•   Simple Name-Value Pair design
HTML5 MicroData
Sometimes, it is desirable to annotate
content with specific machine-readable
labels, e.g. to allow generic scripts to
provide services that are customised to
the page, or to enable content from a
variety of cooperating authors to be
processed by a single script in a
consistent manner.
HTML5




        Simple! 15 pages of 657 page spec
HTML5 MicroData

<section itemscope itemtype="http://example.org/animals#cat"
                   itemid="http://semprog.com/jamiestuff/hedral">

     <h1 itemprop="name">Hedral</h1>
     <p itemprop="desc">Hedral is a male american domestic
                  shorthair, that is
     <span itemprop="http://example.com/color">black</span> and
     <span itemprop="http://example.com/color">white</span>.</p>

     <img itemprop="img" src="hedral.jpeg"
                     alt="" title="Hedral, age 18 months">

</section>
MicroData Widgets
HTML5 MicroData
•   Easy to use
•   Strong identifiers
•   Extensible vocabulary
•   Easy to parse


•   In last call for comments stage!
    •   Usable! Now!
Vocabulary Powered Search
                     Search Applications:
                     - Enhanced results
                     - Info Bar
<div class="hReview-aggregate">
<div class="item vcard">
	    <h1 class="fn org">Taylor&#39;s Automatic Refresher</h1>
	    <div class=rating>
       <img class="stars_3_half rating average" width="83" height="325" title="3.5 star rating" alt="3.5 star
      rating"
            src="http://static1.px.yelp.com/static/2843250757/i/new/ico/stars/stars_map.png"/></div>
	       <em>based on <span class="count">888</span> reviews</em>
      </div>

<div id="bizInfoContent">
	    	    	    <p id="bizCategories">Category:
	    	    	    <span id="cat_display"><a href="/c/sf/burgers">Burgers</a> </span>
<address class="adr">
	    	    	    Neighborhood: Embarcadero<br/>
   <span class="street-address">1 Ferry Bldg<br />Marketplace Shop #6</span><br />
   <span class="locality">San Francisco</span>,
   <span class="region">CA</span>
   <span class="postal-code">94111</span><br />
</address>
<span id="bizPhone" class="tel">(866) 328-3663</span>
<div class="hReview-aggregate">
<div class="item vcard">
	     <h1 class="fn org">Taylor&#39;s Automatic Refresher</h1>
	     <div class=rating>
        <img class="stars_3_half rating average" width="83" height="325" title="3.5 star rating" alt="3.5 star rating"
              src="http://static1.px.yelp.com/static/2843250757/i/new/ico/stars/stars_map.png"/></div>
	        <em>based on <span class="count">888</span> reviews</em>
       </div>

<div id="bizInfoContent">
	     	     	    <p id="bizCategories">Category:
	     	     	    <span id="cat_display"><a href="/c/sf/burgers">Burgers</a> </span>
<address class="adr">
	     	     	    Neighborhood: Embarcadero<br/>
   <span class="street-address">1 Ferry Bldg<br />Marketplace Shop #6</span><br />
   <span class="locality">San Francisco</span>,
   <span class="region">CA</span>
   <span class="postal-code">94111</span><br />
</address>
<span id="bizPhone" class="tel">(866) 328-3663</span>
Search Monkey Vocabulary
Search Monkey Vocabulary
DBPedia Place Vocabulary
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-
schema#">
<rdf:Description rdf:about="http://dbpedia.org/ontology/areaTotal"><rdfs:domain rdf:resource="http://dbpedia.org/
ontology/Place"/></rdf:Description>
<rdf:Description rdf:nodeID="b29203"><rdf:first rdf:resource="http://dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/Place/nickname"><rdfs:domain rdf:resource="http://
dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/Place/location"><rdfs:range rdf:resource="http://dbpedia.org/
ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/maximumDepth"><rdfs:domain rdf:resource="http://
dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/Place/maximumElevation"><rdfs:domain rdf:resource="http://
dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:nodeID="b29250"><rdf:first rdf:resource="http://dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/nearestCity"><rdfs:domain rdf:resource="http://dbpedia.org/
ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/PopulatedPlace"><rdfs:subClassOf rdf:resource="http://
dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/Place/maximumDepth"><rdfs:domain rdf:resource="http://
dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:about="http://dbpedia.org/ontology/Place/location"><rdfs:domain rdf:resource="http://
dbpedia.org/ontology/Place"/></rdf:Description>
<rdf:Description rdf:nodeID="b29225"><rdf:first rdf:resource="http://dbpedia.org/ontology/Place"/></rdf:Description>
Rich Snippet Vocabulary
•   name 	
•   affiliation 	
•   nickname 	 	
•   price	
•   postal-code 	
•   dtReviewed
•   photo 	 	
•   country-name
•   locality
•   reviewer
•   region
•   count
•   address
•   itemReviewed
•   title
•   brand
•   category
•   role

              http://data-vocabulary.org
Rich Snippet Vocabulary
<rdf:Property rdf:ID="affiliation">
 <rdfs:comment>An affiliation can be specified by a string literal or an Organization instance.</rdfs:comment>
 <rdfs:domain rdf:resource="#Person"/>
 <rdfs:range>
   <owl:Class>
    <owl:unionOf rdf:parseType="Collection">
      <owl:Class rdf:about="#Organization"/>
      <owl:Class rdf:about="xsd:string"/>
    </owl:unionOf>
   </owl:Class>
 </rdfs:range>
</rdf:Property>

<rdf:Property rdf:ID="brand">
 <rdfs:domain rdf:resource="#Product"/>
</rdf:Property>

<rdf:Property rdf:ID="category">
 <rdfs:domain>
   <owl:Class>
    <owl:unionOf rdf:parseType="Collection">
      <owl:Class rdf:about="#Organization"/>
      <owl:Class rdf:about="#Product"/>
    </owl:unionOf>
   </owl:Class>
 </rdfs:domain>
</rdf:Property>
HTML5 Vocabularies
Vocab Hub
http://microdata.freebaseapps.com/
Part II
          How

(or why we wrote the book)
The Office (US)                                       Leatherheads
TV Program                                           Film




                 stars in                        starred in




                            John Krasinski
                            Person, Actor




                                      attended




                            Brown University
                            College/university




                 Rich Graph Data
Connected to other rich sources
Where does your data live?
Traditional data-modeling
Tabular data

 Restaurant           Address           Cuisine        Price                Open
   Deli Lllama       Peachtree Rd          Deli          $         Mon, Tue, Wed, Thu, Fri
   Peking Inn           Lake St          Chinese        $$$              Thur, Fri, Sat
    Thai Tanic         Branch Dr           Thai         $$       Tue, Wed, Thu, Fri, Sat, Sun
 Lord of the Fries     Flower Ave       Fast food       $$       Tue, Wed, Thu, Fri, Sat, Sun
Marquis de Salade       Main St          French         $$$              Thur, Fri, Sat
  Wok this way         Second St         Chinese         $     Mon, Tue, Wed, Thu, Fri, Sat, Sun
    Luna Sea          Autumn Dr          Seafood        $$$            Tue, Thu, Fri, Sat
     Pita Pan         Thunder Rd      Middle Eastern    $$     Mon, Tue, Wed, Thu, Fri, Sat, Sun
 Award Weiners       Dorfold Mews       Fast food        $       Mon, Tue, Wed, Thu, Fri, Sat
   Lettuce Eat       Rustic Parkway        Deli         $$         Mon, Tue, Wed, Thu, Fri




                     The beloved spreadsheet
Tabular Data



Restaurant     Address        Cuisine   Price                             Open
 Deli Lllama   Peachtree Rd     Deli      $     Mon (11a-4p), Tue (11-4), Wed (11-4), Thu (11-7), Fri (11-8)
 Peking Inn      Lake St      Chinese    $$$              Thur (5p-10p), Fri (5p-1a), Sat (5p-1a)
    etc…




           Too much information, not enough cells
A simple schema

 Restaurant            Hours
 id                    restaurant_id
 name                  day
 address               open
 cuisine_id            close




Cuisine
id
name

     Allows for simple queries
A simple schema


id    name         address price        restaurant_id   day    open   close
1    Deli Lllama   Peachtree    $              1        Mon     11      16
                      Rd                       1         Tue    11      16
2    Peking Inn     Lake St    $$$
                                               1        Thu     11      19
         ...                                   2         Fri     5      23
                                              ...




                               Filled with data
Some new data


      Bar              Address      DJ    Best Drink
  The Bitter End       14th Ave     No        Beer
   Peking Inn           Lake St     No    Scorpion Bowl
  Hammer Time          Wildcat Dr   Yes    Hennessey
Marquis de Salade       Main St     Yes      Martini




      This doesn’t fit into our schema...
Half-empty columns

  Restaurant               Address        Price   DJ    Best Drink
     Deli Lllama          Peachtree Rd       $
     Peking Inn              Lake St       $$$    No    Scorpion Bowl
     Thai Tanic             Branch Dr       $$
  Lord of the Fries        Flower Ave       $$
 Marquis de Salade           Main St       $$$    Yes      Martini
    Wok this way            Second St        $
      Luna Sea             Autumn Dr       $$$
      Pita Pan             Thunder Rd       $$
   Award Weiners          Dorfold Mews       $
     Lettuce Eat         Rustic Parkway     $$
   Hammer Time              Wildcat Dr            Yes    Hennessey
   The Bitter End            14th St              No       Beer


Maybe ok now, but can’t this keep happening?
Link the tables


Restaurant       RB_Link
id               restaurant_id   Bar
name             bar_id          id
address                          name
cuisine_id                       dj
                                 best_drink




But now the information is duplicated :(
Split place / purpose
                                Bar
                                id
                                venue_id
                                dj
Hours              Venue        best_drink
venue_id           id
day                name
open               address     Restaurant
close                          id
                               venue_id
                               cuisine_id


   Better, but now we have to “migrate”
Large schemas




A small section of a limited product
A flexible schema

Venue          Properties
id             venue_id
name           field_id
address        value



               field
               id
               name


Does this look familiar?
Add some data
id    name          address       venue_id   field_id           value
1    Deli Lllama   Peachtree Rd      1          1                Deli
2    Peking Inn      Lake St         1          2                    $
         ...                         2          1              Chinese
                                     2          2                $$$
                                     2          3           Scorpion Bowl
                                     2          4                    No



                                    id              name
                                     1               Cuisine
                                     2                Price
                                     3          Specialty Cocktail
                                     4                 DJ?

                     simple enough...
Add live music info
id    name         address        venue_id   field_id           value
1    Deli Lllama   Peachtree Rd      1          1                Deli
                                     1          2                  $
2    Peking Inn      Lake St
                                     2          1              Chinese
3    Thai Tanic     Branch Dr        2          2                $$$
                                     2          3           Scorpion Bowl
                                     2          4                 No
                                     3          5                 Yes
                                     3          6                Jazz

                                    id              name
                                     1               Cuisine
                                     2                Price
                                     3          Specialty Cocktail
                                     4                 DJ?
                                     5             Live Music
                                     6            Music Genre
            No schema change required
Explicit semantics
The basic data unit




subject     predicate         object



   Remember this from grammar class?
Restaurants as triples
            subject   predicate          object
              S1         cuisine          “Deli”
              S1          price             “$”
              S1          name         “Deli Llama”
              S2         cuisine        “Chinese”
              S2          price             “$”
              S2          name         “Peking Inn”
              S2       best drink    “Scorpion Bowl”
              S2        address         “Lake St”
              S2           DJ?             “No”
              S4          name         “Fendalton”
              S4      contained-by          S5
              S5          name        “Christchurch”
              S1        location            S4
              S6          name         “Downtown”
              S6      contained-by          S7
              S7          name       “Wellington, NZ”
              S2        location            S6

Machine readable and almost human readable
...or as a graph
                     Deli Liiama
         Name

         Cuisine
    S1                  Deli
             Price

                         $
Restaurant Graph
     Peking Inn                                  Deli Liiama
                                   Name

                                    Cuisine
           Name            S1                        Deli
                                        Price
          S2
                                                      $
                            Location
Cuisine         Location


Chinese                         Contained-by
                                                Christchurch
                           S4
                                  Name           Fendalton
Extending The Restaurant Model
                                                   Deli Liiama
Urban Chic                           Name
                     Decor
                                      Cuisine
                             S1                        Deli
             Music                        Price

                                                        $
                              Location

  Live DJ

                                  Contained-by
                                                  Christchurch
                             S4
                                    Name           Fendalton
Integrating Graph Data Models
                   Deli Liiama
         Name
                                             Deli Liiama
                                 Name
A2
                                 Cuisine
                        S1                      Deli
                                     Price
 OnTap
                                                 $


Z6       Brand
                 Leinenkugel
     Brand
                 Pabst BR
What Went Wrong?
                            Scripting Languages
                            facilitate change

                            ....where is the data
                            model that does the
                            same?

Things change
Requirements change
User expectations change
Data structures change

Our data models aren’t keeping up
Semantic Representation


Relationships are represented explicitly
Schema can be represented as a graph
Data integration is the union of two graphs
This makes creating, extending, and
combining data much easier than before
Just enough RDF
Just Enough RDF

RDF is a Data Model
   A very simple model!
Cosmos was written by Carl Sagan
Subject Predicate Object



(Cosmos) (was written by) (Carl Sagan)


               author       Carl
   Cosmos
                           Sagan
Subject   Which Cosmos?




(Cosmos)
Subject   Which Cosmos?




(Cosmos)
Identifiers are Everywhere




#w2e
The humble URI

•URI’s provide strong references
 •Much like pointing in the physical
  world
             “this is red”
            “this is a pen”
 •a URIref is an unambiguous pointer
  to something of meaning
Subject                      Which Cosmos?




(Cosmos)



 http://rdf.freebase.com/ns/authority.openlibrary.book.OL3568862M
What do you mean, author?

http://rdf.freebase.com/ns/book.written_work.author




                   author                Carl
Cosmos
                                        Sagan

              vocabulary
There are billions of Carl Sagans...
      http://rdf.freebase.com/ns/en.carl_sagan




  Cosmos            author
0 ”
                          9 8
                      d “1
                  h e
             b lis
         p u

            author                 Carl
Cosmos
                                  Sagan
RDF Data Model


Nodes (“Subjects”)
connect via Links (“Predicates”)
to Objects
 •   either Nodes or Literals
Expressions of RDF


RDF has many (inconvenient) serializations
   •RDF-XML
   •N3
   •Turtle
   •NTriples
   •RDFa
URIs provide identity
http://rdf.freebase.com/ns/en.robert_cook



  Stability
  Simplicity
  Manageability
Not all URL’s are good identifiers
Plugable Data

         Data


                 Semantics allows an
Data



                 application to utilize
                 unanticipated new
  Data




          Data
                 data sources
Plugable Data
Data Portability
Data     Data


                Semantics allows data to
                be utilized by
         Data
                unanticipated new
                applications
Data
Data Portability




     http://dev.mqlx.com/~jamie/simile/timeline.html
Data Portability
Why Does This Work?

 Semantics facilitate shared meaning through
• Subject Identity
• Strong and Consistent Semantics
• Open APIS + Open Data
 These principles make it much easier to
 extend, combine, and integrate data
RDF Graphs
 Carrie
              Starred In   Star Wars
 Fisher



              Starred In



Harrison                     Blade
              Starred In
 Ford                       Runner



              Starred In



 Daryl
Hannah
Triple Stores
(aka Graph Stores)
Allegro Graph
+


               +

Keep your data as flexible as the source
Strong Identifiers

Strong Semantics
(strong vocabularies)

    Open Data
Can describe?!   At the end of this talk -
                                you should be able to
                                say how semantics
                                benefits each of these
                                groups




•   Semantics Benefit
    •   Site owners
    •   Site users
    •   Developers
    •   You
Using Semantics to Enhance Content Publishing

More Related Content

More from Jamie Taylor

Social Fabric of Semantics - SemTech 2010
Social Fabric of Semantics - SemTech 2010Social Fabric of Semantics - SemTech 2010
Social Fabric of Semantics - SemTech 2010Jamie Taylor
 
Freebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code CampFreebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code CampJamie Taylor
 
Geo Location Semantics
Geo Location SemanticsGeo Location Semantics
Geo Location SemanticsJamie Taylor
 
Text Analytic Summit 2010
Text Analytic Summit 2010Text Analytic Summit 2010
Text Analytic Summit 2010Jamie Taylor
 
The next phase of Web2.0: Data
The next phase of Web2.0: DataThe next phase of Web2.0: Data
The next phase of Web2.0: DataJamie Taylor
 
Public private-cloud
Public private-cloudPublic private-cloud
Public private-cloudJamie Taylor
 
NYC Semantic Web Meetup - Aug 2009
NYC Semantic Web Meetup -  Aug 2009NYC Semantic Web Meetup -  Aug 2009
NYC Semantic Web Meetup - Aug 2009Jamie Taylor
 
Freebase, RDF and the Semantic Web
Freebase, RDF and the Semantic WebFreebase, RDF and the Semantic Web
Freebase, RDF and the Semantic WebJamie Taylor
 

More from Jamie Taylor (9)

Freebase Schema
Freebase SchemaFreebase Schema
Freebase Schema
 
Social Fabric of Semantics - SemTech 2010
Social Fabric of Semantics - SemTech 2010Social Fabric of Semantics - SemTech 2010
Social Fabric of Semantics - SemTech 2010
 
Freebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code CampFreebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code Camp
 
Geo Location Semantics
Geo Location SemanticsGeo Location Semantics
Geo Location Semantics
 
Text Analytic Summit 2010
Text Analytic Summit 2010Text Analytic Summit 2010
Text Analytic Summit 2010
 
The next phase of Web2.0: Data
The next phase of Web2.0: DataThe next phase of Web2.0: Data
The next phase of Web2.0: Data
 
Public private-cloud
Public private-cloudPublic private-cloud
Public private-cloud
 
NYC Semantic Web Meetup - Aug 2009
NYC Semantic Web Meetup -  Aug 2009NYC Semantic Web Meetup -  Aug 2009
NYC Semantic Web Meetup - Aug 2009
 
Freebase, RDF and the Semantic Web
Freebase, RDF and the Semantic WebFreebase, RDF and the Semantic Web
Freebase, RDF and the Semantic Web
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Using Semantics to Enhance Content Publishing

  • 1. Integrating the Cloud into Content Using Semantics to Enhance Content Publishing Jamie Taylor http://semprog.com/presentations/web20ny
  • 2. What do y'all mean "Semantics"
  • 3. critique misfortune bad luck occurrence roast KNOCK sound zing knocking zizz vroom bang belt bash bump rap whack blow
  • 4. critique misfortune bad luck occurrence roast LJOMF sound zing knocking zizz vroom bang belt bash bump rap whack blow
  • 5. IBM
  • 6. 1 New Orchard Road Publicaly Listed Armonk, New York Company rs Le 0000051143 arte ga lS NYSE:IBM dqu tru ol Hea ctu 1889 b K ym CI re Dat S e Fou e r nde Ti ck d Thomas Watson Founders Sam Palmisano IBM CEO SIC O pe 3571:Electronic ra IC t Soft es Computers NA diari in g In war co m i e De Subs e 334111:Electronic 17,604,000,000 Computer Manufacturing velo ped USD 2006 Cognos Cross Worlds SANSF, ViaVoice Lotus Notes
  • 7. 1 New Orchard Road Publicaly Listed Armonk, New York Company rs Le 0000051143 arte ga lS NYSE:IBM dqu tru ol Hea ctu 1889 b K ym CI re Dat S e Fou e r nde Ti ck d Thomas Watson Founders Sam Palmisano CEO SIC O pe 3571:Electronic ra IC t Soft es Computers NA diari in g In war co m i e De Subs e 334111:Electronic 17,604,000,000 Computer Manufacturing velo ped USD 2006 Cognos Cross Worlds SANSF, ViaVoice Lotus Notes
  • 8.
  • 9.
  • 10.
  • 12. PageRank tm
  • 13. 1 New Orchard Road Publicaly Listed Armonk, New York Company 0000051143 NYSE:IBM 1889 Thomas Watson Sam Palmisano 3571:Electronic Computers 334111:Electronic 17,604,000,000 Computer Manufacturing USD 2006 Cognos Cross Worlds SANSF, ViaVoice Lotus Notes
  • 14.
  • 15. Earlier this year, the AP slashed prices to try to hold on to subscribers. That's not the answer, says Jeff Jarvis, journalism professor at City University of New York. JEFF JARVIS: The fundamentals of the media economy are changing, from a content economy to a link-based economy. Jarvis says the AP needs to become the broker for those links, like helping the Baltimore Sun link to a story about GM from the Detroit Free Press.
  • 16. Jarvis resorts to the concept of a "gift economy" to explain the link economy http://www.flickr.com/photos/pagedooley/
  • 17. I am a behavioral economist. Gift economics are frequently used as explanations for what we don't understand
  • 18. Worse I am a Behaviorist Only talk about what you can observe
  • 19. Semantics Process of communicating enough meaning to result in an action
  • 20. Link Economy • Enriching links focuses meaning • Improves "findability" (SEO) • Increased usability • Better ad selection
  • 21. Link Economy At the end of this talk - you should be able to say how semantics benefits each of these groups • Semantics Benefit • Site owners • Site users • Developers • You
  • 22.
  • 23.
  • 24. Wish it were real
  • 26. Is real, but don't believe it
  • 27. Is very useful Build Flexible Applications with Graph Data
  • 29. The W3C Layer Cake The Cake taken from http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/layerCake-4.png
  • 30. AI Agents http://www.flickr.com/photos/matthewtownsend/
  • 32. RDF Serialization Formats <http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000005b7ab1a> <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://rdf.freebase.com/ns/business.employment_tenure>. <http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000005b7ab1a> <http://rdf.freebase.com/ns/ business.employment_tenure.company> <http://rdf.freebase.com/ns/en.determine_software>. <http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000007e53e16> <http://rdf.freebase.com/ns/ education.education.institution> <http://rdf.freebase.com/ns/en.mounds_view_high_school>.<http://rdf.freebase.com/ns/ guid.9202a8c04000641f8000000007e53e16> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:// rdf.freebase.com/ns/education.education>. <http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000007e53e16> <http://rdf.freebase.com/ns/ education.education.student> <http://rdf.freebase.com/ns/en.jamie_taylor>. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/business.company_founder.companies_founded> <http://rdf.freebase.com/ns/en.mobius_net>. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://creativecommons.org/ns#attributionName> "Source: Freebase - The World's database". <http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/people.person.nationality> <http:// rdf.freebase.com/ns/en.united_states>. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/common.topic.image> <http://rdf.freebase.com/ ns/en.jamie_headshot>. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://rdf.freebase.com/ns/type.object.name> "Jamie Taylor"@en. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:// rdf.freebase.com/ns/user.skud.freebase_events.tshirt_recipient>. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:// rdf.freebase.com/ns/user.skud.freebase_events.topic>. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:// rdf.freebase.com/ns/book.author>. <http://rdf.freebase.com/ns/en.jamie_taylor> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http:// rdf.freebase.com/ns/people.person>.
  • 33. Instead.... Part I - so you can explain to other Part II - so you can do what you say • Part I • Why • Uses, Benefits • Part II • How • Representation, Concepts
  • 35. Is very useful Build Flexible Applications with Graph Data
  • 36. The Office (US) Leatherheads TV Program Film stars in starred in John Krasinski Person, Actor attended Brown University College/university Graph Data Model
  • 37. A socially managed semantic database
  • 38. Freebase has Many Types of Things
  • 39.
  • 40.
  • 42. Contributions over $50000 made to members of the US congress in the 2008 election cycle by companies headquartered outside of the United States topic: topic: Barack Obama Switzerland government position held took money from is based in topic: topic: United States UBS AG Senator Freebase
  • 43. Industry Browser Identity Model Industry (USCB) Company Company Donations NAICS Ticker CRP CRP ID CRP CRP ID NAICS/SIC Map SEC Freebase Industry (SEC) Company People Person SIC SEC CIK SEC CIK Freebase Wikipedia Freebase Wikipedia Location Article ZIP Code
  • 45. Barriers between science and the humanities impede solving humanities important problems Web 2.0 + Semantics
  • 48. http://www.freebase.com/widget/topic? mode=i&pane=image,article_props& id=/en/pirates_of_the_caribbean_3 http://www.freebase.com/widget/topic? mode=i&pane=image,article_props&id=/en/blade_runner
  • 49.
  • 50.
  • 52. About the Content (and visitor?)
  • 55. Data Portability Data Data Semantics allows data to be utilized by Data unanticipated new applications Data
  • 63. <rdf:Description rdf:nodeID="A1"> <att:lastupdated>2009-06-18T21:22:28</att:lastupdated> <att:text>IBM Corporation And Siemens Announce Integrated Solutions To Help Companies</att:text> </rdf:Description> <rdf:Description rdf:nodeID="A2"> <att:code>3577</att:code> <att:description>Computer Periph'L Equipment, Nec</att:description> </rdf:Description> <rdf:Description rdf:nodeID="A3"> <att:code>7371</att:code> <att:description>Computer Programming Services</att:description> </rdf:Description> <rdf:Description rdf:nodeID="A4"> <att:age>46</att:age> <att:lastname>Iwata</att:lastname> <att:officerurl rdf:resource="http://www.reuters.com/finance/stocks/ officerProfile?symbol=IBM.N&amp;officerId=222727"/> <att:firstname>Jon</att:firstname> <att:title>Senior Vice President - Marketing and Communications</att:title> <att:middle>C.</att:middle> </rdf:Description> http://p.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633 Open Calais
  • 64. <owl:sameAs rdf:resource="http://dbpedia.org/resource/IBM"/> <owl:sameAs rdf:resource="http://cb.semsol.org/company/ibm#self"/> A Graph of Graphs <owl:sameAs rdf:resource="http://p.opencalais.com/er/company/ralg- tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633"/>
  • 65. Epispider Herman Tolentino et al. http://epispider.net/index.php
  • 67. Vocabulary Do you understand the words that are coming out of my mouth? -Chris Tucker, Rush Hour
  • 68. 1 New Orchard Road Publicaly Listed Armonk, New York Company rs Le 0000051143 arte g al NYSE:IBM dqu Str uc ol Hea 1889 b K tur ym CI Dat S e eF e r oun ded Ti ck Thomas Watson Founders Sam Palmisano CEO SIC O pe 3571:Electronic ra IC tin Soft es Computers NA g diari In war com i e De Subs e 334111:Electronic 17,604,000,000 Computer Manufacturing velo ped USD 2006 Cognos Cross Worlds SANSF, ViaVoice Lotus Notes
  • 69. Epispider Herman Tolentino et al. http://epispider.net/index.php
  • 70. vocabularies...are everywhere
  • 71. @ Short URLs # The Twitter Vocabulary
  • 72. Pivot on an @ tag
  • 73. Pivot on a # tag
  • 75. Vocabularies make links more understandable ...and thus content more findable
  • 76. microformats Annotate existing HTML so the content can be "extracted by software and indexed, searched for, saved, cross-referenced or combined. "
  • 78. microformats <div class="vcard"> ..... <div id="view"> <div id="home"> <table> <tr> <td class="f">address</td> <td class="v"> <div class="adr"> <span class="locality">Berkeley</span>, <span class="region">CA</span> <div class="country-name">United States</div> </div> </td> </tr> <tr> <td class="f">aim</td> <td class="v"><a id="aim" class="url im offline" href="aim:goim?screenname=jaredhanson@mac.com">jaredhanson@mac.com</a></td> </tr>
  • 80. microformats • (Relatively) easy to use • Small, fixed vocabulary • No standard parsing pattern • No strong identifiers • Limits utility
  • 81. RDFa Annotate HTML with machine readable RDF
  • 82. RDFa <div xmlns:fb=”http://rdf.freebase.com/ns/” about=”http://rdf.freebase.com/ns/en.jamie_taylor” rel=fb:people.person.place_of_birth> <span resource=”http://rdf.freebase.com/ns/en.saint_paul”/> </div>
  • 83. RDFa • Unambiguous identifiers • Extensible vocabulary • Standard parsing pattern • Produces RDF • Hard to use • Rules about formatting based on RDF
  • 84. What “concepts” are covered in content Like existing tagging, but with strong identifiers! <resource> tagged Tag taggingDate "2001-01-01" label means "text" <resource> Strong identifier goes here!
  • 85. <resource> tagged Tag taggingDate label means <div class="rdfa" "text" <resource> xmlns:ctag="http://commontag.org/ns#"> NASA's <a typeof="ctag:Tag" rel="ctag:means" href="http://rdf.freebase.com/ns/en.phoenix_mars_mission" property="ctag:label">Phoenix Mars Lander</a> has deployed its robotic arm. </div>
  • 86.
  • 87. And the winner is....
  • 88. HTML5 MicroData • Annotate HTML with machine readable data • Simple Name-Value Pair design
  • 89. HTML5 MicroData Sometimes, it is desirable to annotate content with specific machine-readable labels, e.g. to allow generic scripts to provide services that are customised to the page, or to enable content from a variety of cooperating authors to be processed by a single script in a consistent manner.
  • 90. HTML5 Simple! 15 pages of 657 page spec
  • 91. HTML5 MicroData <section itemscope itemtype="http://example.org/animals#cat" itemid="http://semprog.com/jamiestuff/hedral"> <h1 itemprop="name">Hedral</h1> <p itemprop="desc">Hedral is a male american domestic shorthair, that is <span itemprop="http://example.com/color">black</span> and <span itemprop="http://example.com/color">white</span>.</p> <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months"> </section>
  • 93. HTML5 MicroData • Easy to use • Strong identifiers • Extensible vocabulary • Easy to parse • In last call for comments stage! • Usable! Now!
  • 94. Vocabulary Powered Search Search Applications: - Enhanced results - Info Bar
  • 95. <div class="hReview-aggregate"> <div class="item vcard"> <h1 class="fn org">Taylor&#39;s Automatic Refresher</h1> <div class=rating> <img class="stars_3_half rating average" width="83" height="325" title="3.5 star rating" alt="3.5 star rating" src="http://static1.px.yelp.com/static/2843250757/i/new/ico/stars/stars_map.png"/></div> <em>based on <span class="count">888</span> reviews</em> </div> <div id="bizInfoContent"> <p id="bizCategories">Category: <span id="cat_display"><a href="/c/sf/burgers">Burgers</a> </span> <address class="adr"> Neighborhood: Embarcadero<br/> <span class="street-address">1 Ferry Bldg<br />Marketplace Shop #6</span><br /> <span class="locality">San Francisco</span>, <span class="region">CA</span> <span class="postal-code">94111</span><br /> </address> <span id="bizPhone" class="tel">(866) 328-3663</span>
  • 96.
  • 97. <div class="hReview-aggregate"> <div class="item vcard"> <h1 class="fn org">Taylor&#39;s Automatic Refresher</h1> <div class=rating> <img class="stars_3_half rating average" width="83" height="325" title="3.5 star rating" alt="3.5 star rating" src="http://static1.px.yelp.com/static/2843250757/i/new/ico/stars/stars_map.png"/></div> <em>based on <span class="count">888</span> reviews</em> </div> <div id="bizInfoContent"> <p id="bizCategories">Category: <span id="cat_display"><a href="/c/sf/burgers">Burgers</a> </span> <address class="adr"> Neighborhood: Embarcadero<br/> <span class="street-address">1 Ferry Bldg<br />Marketplace Shop #6</span><br /> <span class="locality">San Francisco</span>, <span class="region">CA</span> <span class="postal-code">94111</span><br /> </address> <span id="bizPhone" class="tel">(866) 328-3663</span>
  • 100. DBPedia Place Vocabulary <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf- schema#"> <rdf:Description rdf:about="http://dbpedia.org/ontology/areaTotal"><rdfs:domain rdf:resource="http://dbpedia.org/ ontology/Place"/></rdf:Description> <rdf:Description rdf:nodeID="b29203"><rdf:first rdf:resource="http://dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/Place/nickname"><rdfs:domain rdf:resource="http:// dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/Place/location"><rdfs:range rdf:resource="http://dbpedia.org/ ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/maximumDepth"><rdfs:domain rdf:resource="http:// dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/Place/maximumElevation"><rdfs:domain rdf:resource="http:// dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:nodeID="b29250"><rdf:first rdf:resource="http://dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/nearestCity"><rdfs:domain rdf:resource="http://dbpedia.org/ ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/PopulatedPlace"><rdfs:subClassOf rdf:resource="http:// dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/Place/maximumDepth"><rdfs:domain rdf:resource="http:// dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/ontology/Place/location"><rdfs:domain rdf:resource="http:// dbpedia.org/ontology/Place"/></rdf:Description> <rdf:Description rdf:nodeID="b29225"><rdf:first rdf:resource="http://dbpedia.org/ontology/Place"/></rdf:Description>
  • 101. Rich Snippet Vocabulary • name • affiliation • nickname • price • postal-code • dtReviewed • photo • country-name • locality • reviewer • region • count • address • itemReviewed • title • brand • category • role http://data-vocabulary.org
  • 102. Rich Snippet Vocabulary <rdf:Property rdf:ID="affiliation"> <rdfs:comment>An affiliation can be specified by a string literal or an Organization instance.</rdfs:comment> <rdfs:domain rdf:resource="#Person"/> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Organization"/> <owl:Class rdf:about="xsd:string"/> </owl:unionOf> </owl:Class> </rdfs:range> </rdf:Property> <rdf:Property rdf:ID="brand"> <rdfs:domain rdf:resource="#Product"/> </rdf:Property> <rdf:Property rdf:ID="category"> <rdfs:domain> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Organization"/> <owl:Class rdf:about="#Product"/> </owl:unionOf> </owl:Class> </rdfs:domain> </rdf:Property>
  • 105. Part II How (or why we wrote the book)
  • 106.
  • 107. The Office (US) Leatherheads TV Program Film stars in starred in John Krasinski Person, Actor attended Brown University College/university Rich Graph Data
  • 108. Connected to other rich sources
  • 109. Where does your data live?
  • 111. Tabular data Restaurant Address Cuisine Price Open Deli Lllama Peachtree Rd Deli $ Mon, Tue, Wed, Thu, Fri Peking Inn Lake St Chinese $$$ Thur, Fri, Sat Thai Tanic Branch Dr Thai $$ Tue, Wed, Thu, Fri, Sat, Sun Lord of the Fries Flower Ave Fast food $$ Tue, Wed, Thu, Fri, Sat, Sun Marquis de Salade Main St French $$$ Thur, Fri, Sat Wok this way Second St Chinese $ Mon, Tue, Wed, Thu, Fri, Sat, Sun Luna Sea Autumn Dr Seafood $$$ Tue, Thu, Fri, Sat Pita Pan Thunder Rd Middle Eastern $$ Mon, Tue, Wed, Thu, Fri, Sat, Sun Award Weiners Dorfold Mews Fast food $ Mon, Tue, Wed, Thu, Fri, Sat Lettuce Eat Rustic Parkway Deli $$ Mon, Tue, Wed, Thu, Fri The beloved spreadsheet
  • 112. Tabular Data Restaurant Address Cuisine Price Open Deli Lllama Peachtree Rd Deli $ Mon (11a-4p), Tue (11-4), Wed (11-4), Thu (11-7), Fri (11-8) Peking Inn Lake St Chinese $$$ Thur (5p-10p), Fri (5p-1a), Sat (5p-1a) etc… Too much information, not enough cells
  • 113. A simple schema Restaurant Hours id restaurant_id name day address open cuisine_id close Cuisine id name Allows for simple queries
  • 114. A simple schema id name address price restaurant_id day open close 1 Deli Lllama Peachtree $ 1 Mon 11 16 Rd 1 Tue 11 16 2 Peking Inn Lake St $$$ 1 Thu 11 19 ... 2 Fri 5 23 ... Filled with data
  • 115. Some new data Bar Address DJ Best Drink The Bitter End 14th Ave No Beer Peking Inn Lake St No Scorpion Bowl Hammer Time Wildcat Dr Yes Hennessey Marquis de Salade Main St Yes Martini This doesn’t fit into our schema...
  • 116. Half-empty columns Restaurant Address Price DJ Best Drink Deli Lllama Peachtree Rd $ Peking Inn Lake St $$$ No Scorpion Bowl Thai Tanic Branch Dr $$ Lord of the Fries Flower Ave $$ Marquis de Salade Main St $$$ Yes Martini Wok this way Second St $ Luna Sea Autumn Dr $$$ Pita Pan Thunder Rd $$ Award Weiners Dorfold Mews $ Lettuce Eat Rustic Parkway $$ Hammer Time Wildcat Dr Yes Hennessey The Bitter End 14th St No Beer Maybe ok now, but can’t this keep happening?
  • 117. Link the tables Restaurant RB_Link id restaurant_id Bar name bar_id id address name cuisine_id dj best_drink But now the information is duplicated :(
  • 118. Split place / purpose Bar id venue_id dj Hours Venue best_drink venue_id id day name open address Restaurant close id venue_id cuisine_id Better, but now we have to “migrate”
  • 119. Large schemas A small section of a limited product
  • 120. A flexible schema Venue Properties id venue_id name field_id address value field id name Does this look familiar?
  • 121. Add some data id name address venue_id field_id value 1 Deli Lllama Peachtree Rd 1 1 Deli 2 Peking Inn Lake St 1 2 $ ... 2 1 Chinese 2 2 $$$ 2 3 Scorpion Bowl 2 4 No id name 1 Cuisine 2 Price 3 Specialty Cocktail 4 DJ? simple enough...
  • 122. Add live music info id name address venue_id field_id value 1 Deli Lllama Peachtree Rd 1 1 Deli 1 2 $ 2 Peking Inn Lake St 2 1 Chinese 3 Thai Tanic Branch Dr 2 2 $$$ 2 3 Scorpion Bowl 2 4 No 3 5 Yes 3 6 Jazz id name 1 Cuisine 2 Price 3 Specialty Cocktail 4 DJ? 5 Live Music 6 Music Genre No schema change required
  • 124. The basic data unit subject predicate object Remember this from grammar class?
  • 125. Restaurants as triples subject predicate object S1 cuisine “Deli” S1 price “$” S1 name “Deli Llama” S2 cuisine “Chinese” S2 price “$” S2 name “Peking Inn” S2 best drink “Scorpion Bowl” S2 address “Lake St” S2 DJ? “No” S4 name “Fendalton” S4 contained-by S5 S5 name “Christchurch” S1 location S4 S6 name “Downtown” S6 contained-by S7 S7 name “Wellington, NZ” S2 location S6 Machine readable and almost human readable
  • 126. ...or as a graph Deli Liiama Name Cuisine S1 Deli Price $
  • 127. Restaurant Graph Peking Inn Deli Liiama Name Cuisine Name S1 Deli Price S2 $ Location Cuisine Location Chinese Contained-by Christchurch S4 Name Fendalton
  • 128. Extending The Restaurant Model Deli Liiama Urban Chic Name Decor Cuisine S1 Deli Music Price $ Location Live DJ Contained-by Christchurch S4 Name Fendalton
  • 129. Integrating Graph Data Models Deli Liiama Name Deli Liiama Name A2 Cuisine S1 Deli Price OnTap $ Z6 Brand Leinenkugel Brand Pabst BR
  • 130. What Went Wrong? Scripting Languages facilitate change ....where is the data model that does the same? Things change Requirements change User expectations change Data structures change Our data models aren’t keeping up
  • 131. Semantic Representation Relationships are represented explicitly Schema can be represented as a graph Data integration is the union of two graphs This makes creating, extending, and combining data much easier than before
  • 133. Just Enough RDF RDF is a Data Model A very simple model!
  • 134. Cosmos was written by Carl Sagan
  • 135. Subject Predicate Object (Cosmos) (was written by) (Carl Sagan) author Carl Cosmos Sagan
  • 136. Subject Which Cosmos? (Cosmos)
  • 137. Subject Which Cosmos? (Cosmos)
  • 139. The humble URI •URI’s provide strong references •Much like pointing in the physical world “this is red” “this is a pen” •a URIref is an unambiguous pointer to something of meaning
  • 140. Subject Which Cosmos? (Cosmos) http://rdf.freebase.com/ns/authority.openlibrary.book.OL3568862M
  • 141. What do you mean, author? http://rdf.freebase.com/ns/book.written_work.author author Carl Cosmos Sagan vocabulary
  • 142. There are billions of Carl Sagans... http://rdf.freebase.com/ns/en.carl_sagan Cosmos author
  • 143. 0 ” 9 8 d “1 h e b lis p u author Carl Cosmos Sagan
  • 144. RDF Data Model Nodes (“Subjects”) connect via Links (“Predicates”) to Objects • either Nodes or Literals
  • 145. Expressions of RDF RDF has many (inconvenient) serializations •RDF-XML •N3 •Turtle •NTriples •RDFa
  • 147. Not all URL’s are good identifiers
  • 148. Plugable Data Data Semantics allows an Data application to utilize unanticipated new Data Data data sources
  • 150. Data Portability Data Data Semantics allows data to be utilized by Data unanticipated new applications Data
  • 151. Data Portability http://dev.mqlx.com/~jamie/simile/timeline.html
  • 153. Why Does This Work? Semantics facilitate shared meaning through • Subject Identity • Strong and Consistent Semantics • Open APIS + Open Data These principles make it much easier to extend, combine, and integrate data
  • 154. RDF Graphs Carrie Starred In Star Wars Fisher Starred In Harrison Blade Starred In Ford Runner Starred In Daryl Hannah
  • 157. + + Keep your data as flexible as the source
  • 159. Can describe?! At the end of this talk - you should be able to say how semantics benefits each of these groups • Semantics Benefit • Site owners • Site users • Developers • You