SlideShare une entreprise Scribd logo
1  sur  84
Taxonomy Design Best Practices
Workshop
VOGIN-IP-lezing “Zoeken & vinden”
21 March 2019
Amsterdam
Presented by
Heather Hedden
▪ Controlled vocabulary editor at a library database vendor, Gale/Cengage
▪ Taxonomy consultant
▪ Book indexer
▪ Taxonomy online course instructor
▪ Author of The Accidental Taxonomist, 2nd ed. (Information Today, Inc.)
About Heather Hedden
2
1. Introduction to taxonomies and other
knowledge organization systems
• Types and uses
• Taxonomies and metadata
• Standards and models
2. Taxonomies in support of search
• Searching on taxonomies
• Taxonomies for post-search refinements
• Knowledge graphs
• Faceted taxonomies
• Options for search on taxonomies
3. Term creation
• Wording of terms
• Synonyms/alternative labels
Outline
3
4. Term relationships
• Hierarchical relationships
• Associative relationships
• Customized, semantic relationships
5. Structural design
• Hierarchical taxonomy design
• Faceted taxonomy design
6. User displays
• Hierarchical display options
• Faceted taxonomy displays
1. Introduction to taxonomies and other knowledge organization systems
2. Taxonomies in support of search
3. Term creation
4. Term relationships
5. Structural design
6. User displays
Outline
4
Knowledge organization system (controlled vocabulary)
The most general, broadest all applications
▪ An authoritative, restricted list of terms (words or phrases) mainly used for
indexing/tagging content to support retrieval
▪ Controlled in who and when new terms can be added
▪ Usually makes use of variants/synonyms/alternative labels to point to the
correct term names
▪ May or may not have structured relationships between terms
Introduction: Types and Uses
5
Types of knowledge organization systems (controlled vocabularies)
▪ Simple term list
▪ Synonym ring (search-support “thesaurus”)
▪ Authority file (controlled list with variants; no hierarchy)
▪ Taxonomy
− Hierarchical taxonomy
− Faceted taxonomy
▪ Thesaurus
▪ Ontology
“Taxonomy” sometimes means any controlled vocabulary.
Introduction: Types and Uses
6
Term list
▪ A simple list of terms
▪ Usually alphabetical, but could be in other logical
order
▪ Lacking synonyms, it is usually short enough for
quick browsing
▪ Can appear in drop-down scroll boxes
▪ May be used for various metadata values
▪ Part of a larger set of controlled vocabularies
Introduction: Types and Uses
7
Synonym ring
▪ A controlled vocabulary with synonyms or
near-synonyms for each concept
▪ No designated “preferred” term: All terms
are equal and point to each other, as in a
ring.
▪ Synonyms are usually not displayed to
the user.
▪ Usually used to support search.
▪ Also called a “search thesaurus.”
Introduction: Types and Uses
8
Software
Computer
programs
Tools
Applications
Taxonomy
▪ A controlled vocabulary with broader/narrower (parent/child) term
relationships that include all terms to create a hierarchical structure
▪ With focus for categorizing and organization concepts
▪ May or may not have “synonyms” to point to the correct, preferred terms
▪ May comprise several hierarchies or facets
(A facet can be considered a hierarchy.)
Introduction: Types and Uses
9
Introduction: Types and Uses
10
Taxonomy
Examples
Leisure and culture
. Arts and entertainment venues
. . Museums and galleries
. Children's activities
. Culture and creativity
. . Architecture
. . Crafts
. . Heritage
. . Literature
. . Music
. . Performing arts
. . Visual arts
. Entertainment and events
. Gambling and lotteries
. Hobbies and interests
. Parks and gardens
. Sports and recreation
. . Team sports
. . . Cricket
. . . Football
. . . Rugby
. . Water sports
. . Winter sports
. Sports and recreation facilities
. Tourism
. . Passports and visas
. Young people's activities
Career Level
• Student
• Entry Level
• Experienced
• Manager
• Director
• Executive
Function
• Customer Service & Support
• Delivery
• Engineering
• Finance
• General Management
• Legal & Regulatory Affairs
• Marketing & Advertising
[more]
Industry
• Agriculture
• Apparel & Fashion
• Automotive
• Aviation & Aerospace
• Banking
• Biotechnology
• Broadcast Media
• Chemicals
[more]
Faceted
Taxonomy
Example
Hierarchical
Taxonomy
Example
Thesaurus
▪ A controlled vocabulary that has standard structured relationships between terms
‒ Hierarchical: broader term/narrower term (BT/NT)
‒ Associative: related terms (RT)
‒ Equivalence: preferred term (“use for” or “used for”)/non-preferred term (use)
(USE/UF)
▪ Created in accordance with standards:
‒ ANSI/NISO Z39.19 Guidelines for Construction, Format, and Management of
Monolingual Controlled Vocabularies
‒ ISO 25964-1 Part 1, Thesauri and interoperability with other vocabularies
▪ “Thesaurus” is most often the kind of controlled vocabulary used in indexing
periodical literature
Introduction: Types and Uses
11
Introduction: Types and Uses
12
Thesaurus entry examples
materials acquisitions
UF acquisitions (of materials)
library acquisitions
BT collection development
NT accessions
approval plans
gifts and exchanges
materials claims
materials orders
subscriptions
RT book vendors
jobbers
subscription agencies
subscription cancellations
Gale thesaurusASIS&T thesaurus
Introduction: Types and Uses
13
Controlled Vocabularies - Complexity
Pick List Synonym Ring Authority File Taxonomy Thesaurus Ontology
Ambiguity control Ambiguity control
Synonym control
Ambiguity control
Synonym control
Ambiguity control
(Synonym control)
Hierarchical
relationships
Ambiguity control
Synonym control
Hierarchical
relationship
Associative
relationships
Ambiguity control
(Synonym control)
Semantic
relationships
Classes
Less MoreControlled Vocabularies - Complexity
Summary of Controlled Vocabulary Types
Applications and uses
1. Indexing support
a) Manual indexing
b) Automated indexing
2. Findability or retrieval support
a) In browsing
• Alphabetical browsing
• Hierarchical browsing
b) In searching
Introduction: Types and Uses
14
Indexing Support
▪ As a structured list of agreed-upon terms to ensure consistent indexing
• Across multiple documents or content items, where different
synonyms describe the same concepts
• By multiple indexers working on the same collection
• By machine-aided indexing / autoclassification, where taxonomy
terms have rules, clues, or sample tuned documents
Introduction: Types and Uses
15
Indexing Support
Introduction: Types and Uses
16
Manual indexing
example
Cengage/Gale
Subject Thesaurus
Internal indexer
alphabetical browse
view
Retrieval Support: in browsing
a) Alphabetical browse
Display method for thesauri, name/proper noun lists, and book-style indexes
Example of an alphabetical browse thesaurus:
UNESCO Thesaurus
http://vocabularies.unesco.org/browser/thesaurus/en/
Introduction: Types and Uses
17
Retrieval
Support
Introduction: Types and Uses
18
Browse example
Books and
Authors
Retrieval Support: in browsing
b) Hierarchical browse
Categorization scheme for information organization, classification, guided
search
▪ More often for end-users guidance than for indexers, but also for database
indexers
▪ Example of hierarchical browse taxonomy:
Getty Art & Architecture Thesaurus
http://www.getty.edu/research/tools/vocabularies/aat/
Introduction: Types and Uses
19
Introduction: Types and Uses
20
Introduction: Taxonomies and Metadata
21
Metadata Types and Uses
▪ Descriptive
• for content discovery/retrieval (via searching or browsing)
• for content identification (to cite sources, contact authors, compare content, resolve
content issues, etc.)
▪ Administrative
• for content management (for building collections, information products, web pages
and websites; for content maintenance)
• for workflow/process management (assigning content, archiving, preserving)
▪ Structural
• for content navigation (within a large content resource)
• for content presentation (as markup/style metadata)
Metadata Standards – Examples
Dublin Core for generic online networked resources
DDI (Data Documentation Initiative) for describing data from the social, behavioral,
and economic sciences
IPTC (International Press Telecommunications Council) for photographs
MARC (Machine Readable Cataloging) for bibliographic data for library materials
PREMIS (Preservation Metadata: Implementation Strategies) for repositories of
digital objects
SDMX (Statistical Data and Metadata Exchange) for the exchange of statistical data.
VRA Core (Visual Resources Association) for describing images of cultural heritage
Introduction: Standards and Models
22
Knowledge organization systems (controlled vocabulary) standards / models
▪ For best practices in forming terms and their relationships:
• ISO 25964 (2011, 2013) Thesauri and Interoperability with Other Vocabularies
• ANSI/NISO Z39.19 (2005, renewed 2010) Guidelines for Construction, Format,
and Management of Monolingual Controlled Vocabularies
http://www.niso.org/apps/group_public/download.php/12591/z39-19-
2005r2010.pdf
▪ For a set of specification for making controlled vocabularies exchangeable:
Simple Knowledge Organization System (SKOS)
A World Wide Web (W3C) recommendation
“A common data model for sharing and linking knowledge organization systems
via the web” https://www.w3.org/TR/skos-reference/
Introduction: Standards and Models
23
Knowledge organization systems (controlled vocabulary) standards / models
▪ ANSI/NISO Z39.19 / ISO 25964
Provides guidelines for term format/style and creating relationships:
• Hierarchical: broader term/narrower term (BT/NT)
• Associative: related terms (RT)
• Equivalence (synonyms): preferred term/non-preferred term (USE/UF)
▪ SKOS
Provides specifications for designating terms and relationships:
• Lexical labels: skos:prefLabel, skos:altLabel and skos:hiddenLabel
• Semantic relations:
• skos:broader (broader concept)
• skos:narrower (narrower concept)
• skos:related (related concept)
Introduction: Standards and Models
24
1. Introduction to taxonomies and other knowledge organization systems
2. Taxonomies in support of search
3. Term creation
4. Term relationships
5. Structural design
6. User displays
Outline
25
Trends in taxonomy implementation, search integration and display
1. Originally, full taxonomy hierarchical browse or thesaurus alphabetical browse.
Search on content was separate from browsing taxonomies.
2. Full large taxonomies came to be displayed less. Search on taxonomy terms.
3. Search on more than just the taxonomy terms.
Search on a combination of taxonomy terms and words in titles, texts, etc.
4. Faceted taxonomies – combining search and limited browsing
5. Post-search filters – especially for larger taxonomies or thesauri
6. Knowledge graphs
Taxonomies in Support of Search
26
Taxonomies in Support
of Search
27
Originally
full taxonomy browse
Taxonomies in Support of Search
28
Gale former display, ca. 2011
Originally
full taxonomy browse
Taxonomies in Support of Search: Search on Taxonomy Terms
29
Search on words/phrases in the Subject field
Gale current displays
Search on terms in a thesaurus
Taxonomies in Support of Search: Search on Taxonomy Terms
30
In Basic
Search
Search subjects and titles in the same field
31
Faceted
Taxonomies
Aspects
Filters
Dimensions
Limiters
Refinements
Examples of
ecommerce
facets For clothes For books For software For furniture
Taxonomies in Support of Search: Facets
32
Taxonomy terms to refine post-search results
▪ A display of taxonomy terms that have been used to index the content
items in the search result set (not all taxonomy terms).
▪ A display of selected terms from the taxonomy, not the taxonomy itself.
▪ Any relationships between terms are not indicated.
▪ Displayed in order of usage frequency of the search result set.
Suitable when a large taxonomy or thesaurus does not fit into a facet.
Taxonomies in Support of Search: Post-Search Refinements
Subjects for post-searching filtering in Gale search results
Displayed subjects on an individual Gale content item
Knowledge graphs
▪ A representation of knowledge as a graph (a network of nodes and links, not
tables of rows and columns)
▪ Usually based on data in graph databases, rather than relational databases
▪ Usually includes, but not limited, to the visualization of
– An output of graph analytics
– Display of interconnected nodes and links
– Display of related data in a "fact box"
▪ Improve search results beyond machine learning and algorithms
Taxonomies in Support of Search: Knowledge Graphs
35
Taxonomy search options
▪ Different types of search on taxonomy
▪ Displayed taxonomy terms in type-ahead or search-suggest
▪ Alternative labels / nonpreferred terms / synonyms
Taxonomies in Support of Search: Search Options
36
Different types of search on terms
▪ Exact
▪ Contains
▪ Begins
▪ Smart
Implementations
▪ Exact – option for experienced or repeat users
▪ Contains – also called phrase search. Can be done with quotation marks.
▪ Begins – option if there is a type-ahead display of terms
▪ Smart – sometimes the default, if no other options, and terms are not
displayed
Taxonomies in Support of Search: Search Options
37
– exact match
– exact match phrase with additional words before or after
– alphabetical from start, but allows end truncation
– words within the term in any order and also internal word
stemming (singular/plural)
Taxonomies in Support of Search: Search Options
38
Different types of search on terms
Taxonomies in Support of Search: Search Options
39
Type-ahead Search-suggest
Benefits of taxonomies over search alone
▪ Indexing and retrieval based on concepts, not just words/phrases improves
search results in both precision (accuracy) and recall (comprehensiveness)
▪ Taxonomy terms allow limiting/filtering search by topic.
▪ Broader categories/terms allow users to choose a broad subject first and
then limit by other metadata.
▪ Relationships between terms allow users to explore related topics.
▪ Subjects displayed on search results (post-search) allow users to refine and
focus their searches by precise topic or explore related topics.
▪ Taxonomies support the indexing of nontext content (images, video, audio).
▪ Multilingual taxonomies support accurate search and retrieval across
multilingual content.
Taxonomies in Support of Search
40
1. Introduction to taxonomies and other knowledge organization systems
2. Taxonomies in support of search
3. Term creation
4. Term relationships
5. Structural design
6. User displays
Outline
41
Term creation issues
▪ Deciding whether to include a concept
▪ Choosing a preferred term/label name
▪ Term format and style
Term Creation: Wording of Terms
42
Whether a concept should be included as a term
1. Is it within the subject-area scope of the controlled vocabulary?
• In book indexing, off-topic is OK; not in a controlled vocabulary
2. Is there enough information on the subject?
• In book indexing, text tells something about the subject; sufficient
number of sentences on the topic
• In a controlled vocabulary: sufficient number of anticipated documents
or articles on the topic
3. Is it important, likely to be looked up? Do users want and expect it?
Term Creation: Wording of Terms
43
Choosing the preferred term wording
▪ Unlike book indexes, there are no “double-posts” in taxonomies and
thesauri
▪ You must always choose a “preferred term” (except in synonym rings)
▪ Variants and synonyms are designated as “non-preferred terms” or
“alternative labels”
▪ Wording is based on user expectations and needs, more so than on
content, which varies.
Term Creation: Wording of Terms
44
Choosing the preferred term/label wording (the displayed form)
Choosing between two “synonyms”:
Doctors vs. Physicians
Movies vs. Motion pictures
Cars vs. Automobiles
Consider:
1. Wording of terms most likely looked up by the intended users/audience,
especially in browsed controlled vocabulary
2. Enforcing organizational/enterprise controlled vocabulary
3. Conforming to academic or professional standards
4. Consistency in style throughout the controlled vocabulary
5. Wording with in the documents/content indexed
Term Creation: Wording of Terms
45
Choosing the preferred term/label wording
The other becomes a nonpreferred term/alternative label.
Differentiate closely related terms, or use one as preferred:
Foreign policy vs. International relations
Differentiate topics from actions, or use one as preferred:
Contracts vs. Contracting
Differentiate broader and more specific concept, or use one:
Electric power plants vs. Hydroelectric power plants
Consider likely occurrences of the more specific topic in the content.
Term Creation: Wording of Terms
46
Term format and style
▪ Consistent capitalization: lower case or initial capitalization; not title caps
Corporate finance; corporate finance; not Corporate Finance
▪ Single words or multi-word phrases
▪ Nouns or noun phrases
▪ Adjectives alone can be terms in special circumstances and where noun
is obvious from context.
▪ Parenthetical qualifiers may be used for disambiguation, not modification.
▪ Countable nouns are usually plural.
▪ Avoid term inversions (e.g. noun, adjective) because it is searchable
Term Creation: Wording of Terms
47
▪ Defined: Approximately synonymous words or phrases to refer to an
equivalent concept, for the context of the controlled vocabulary and the set of
content.
▪ Purpose: To capture different wordings of how different people might
describe or look up the same concept or idea and used as alternative entries.
➢ Differences between that of the author and the user/reader
➢ Differences between that of the indexers and the end-users
➢ Differences among different users/readers
▪ Serving as “multiple entry points” to look up and retrieve the desired content,
as do double posts or See references in an index.
▪ Enabling consistent indexing
Term Creation: Variants, Nonpreferred Terms, or Alternative Labels
48
Examples
from
Gale Subject
Thesaurus
Term Creation: Variants, Nonpreferred Terms, or Alternative Labels
49
Conflict management
Conflict resolution
Managing conflict
Wills
Codicils
Last will and testament
Testaments (Wills)
Influenza
Flu
Grippe
Movies
Cinema
Films (Movies)
Motion pictures
Movie genres
Telecommunications industry
Communications industry
Digital transmission industry
Interexchange carriers
Telecommunications services industry
Telephone holding companies
Telephone industry
Telephone services industry
Environmental management
Adaptive management (Environmental
management)
Environmental control
Environmental stewardship
Natural resource management
Stewardship (Environmental management)
Piano music [no variants]
Term Creation: Variants, Nonpreferred Terms, or Alternative Labels
50
Nonpreferred Term
▪ Formal designation in thesauri, in accordance with ANSI/NISO Z.39-19
and ISO 25964 and standards.
▪ Shortened as NPT.
▪ Associated with a Preferred term.
▪ Considered a kind of “relationship” of the Equivalency type.
▪ Reciprocity of relationship, pointing in both directions:
USE and UF (use and used for/use for).
▪ Example: Inundations USE Floods
UF Inundations
▪ Both preferred terms and non-preferred terms are “terms.”
Term Creation: Variants, Nonpreferred Terms, or Alternative Labels
51
Alternative Label
▪ Formal designation for SKOS (Simple Knowledge Organization System)
vocabularies, a World Wide Web (W3C) recommendation.
▪ Shortened as altLabel.
▪ Associated with a Preferred label.
▪ Instead of terms, there are concepts, each with any number of labels
▪ Concepts have a preferred label (for each language).
▪ Concepts have any number of alternative labels and hidden labels (for
each language).
▪ Alternative labels are part of a concept’s attributes, not equivalent terms
and not connected by “relationships.”
Variants, Nonpreferred Terms, or Alternative Labels
52
Thesaurus model: Synaptica
Variants, Nonpreferred Terms, or Alternative Labels
53
SKOS model: PoolParty
Term Creation: Variants, Nonpreferred Terms, or Alternative Labels
54
Sources for variant terms
▪ Same sources as for concepts and preferred terms
➢ Survey/audit of the content and terms used
➢ Search query logs and other internal usage data
➢ External sources: websites, Wikipedia, other taxonomies and controlled
vocabularies, book tables of contents, etc.
▪ Creative changes of terms (after verification of variant term usage in search)
▪ Not to be used as a source:
Dictionary-type thesaurus, such as Roget's Thesaurus of English Words and
Phrases or thesaurus-dictionary websites
1. Introduction to taxonomies and other knowledge organization systems
2. Taxonomies in support of search
3. Term creation
4. Term relationships
5. Structural design
6. User displays
Outline
55
Term Relationships
56
Types of relationships between terms
Between preferred and nonpreferred terms in a thesaurus:
1. Equivalence: Use (USE) / Used for (UF)
Between concepts or preferred terms:
2. Hierarchical: Broader term (BT) / Narrower term (NT)
3. Associative: Related term (RT)
4. Customized relationships: More specific types of BT/NT or RT
Relationships are reciprocal between terms/concepts.
Term Relationships: Hierarchical
57
Hierarchical relationships
▪ Broader-narrower / Topic-subtopic / Parent-child / Superordinate-Subordinate
▪ Required feature of both thesauri and taxonomies
▪ Thesaurus designation of BT / NT (broader term / narrower term)
▪ SKOS designation: Broader concept / Narrower concept
▪ Terms usually have more than one narrower term (NT), unless they are the
most specific terms in the vocabulary.
▪ On occasion, a term may have more than one broader term (BT),
referred to as polyhierarchy.
Term Relationships: Hierarchical
58
Hierarchical relationships
Reciprocal (bi-directional) relationships, but asymmetrical
Broader term (BT) Fruits
SOME ALL SOME ALL
Narrower term (NT) Oranges
Fruits NT Oranges Oranges BT Fruits
Three types:
1. Generic – Specific
2. Generic – Named entity instance: Common noun – Proper noun
3. Whole – Part
Term Relationships: Associative
59
Associative relationships
▪ Suggestions to the user of possible related terms of interest
▪ Like See also in an index
▪ Required feature of thesauri
▪ Optional feature of taxonomies
▪ Thesaurus designation of RT (Related term)
▪ SKOS designation: Related concept
▪ Symmetrical bi-directional relationship
▪ Between terms within the same hierarchy or in different hierarchies
Term Relationships: Customized, Semantic
60
Specific/customized relationships
▪ Relationships containing meaning: “semantic”
▪ Variations on equivalence (USE/UF), hierarchical (BT/NT) or associative
(RT) relationships, but usually associative.
▪ Reciprocal, but asymmetrical, or directional, not plain RT.
▪ Specific enough to convey the necessary meaning, but not uniquely specific.
▪ Relationships are between terms of different types, across different
designated categories or classes.
▪ Taxonomist defines the relationships and their codes and the categories.
▪ A defining characteristic of ontologies or an “ontology lite.”
1. Introduction to taxonomies and other knowledge organization systems
2. Taxonomies in support of search
3. Term creation
4. Term relationships
5. Structural design
6. User displays
Outline
61
Structural Design: Hierarchies
62
Hierarchies
▪ The extension of hierarchical relationships (BT/NT) to include all terms
▪ More important for taxonomies than for thesauri
▪ Emphasize categorization, classification, sorting
▪ Involve working from the top down
▪ Also known as “tree” structures
A single taxonomy may have more than one top-term hierarchy.
Structural Design: Hierarchies
63
Examples of hierarchies for classification
Classifying of things – can only go in one place
▪ Linnaean taxonomy of classification of organisms
National Center for Biotechnology Information of the National Library of
Medicine
http://www.ncbi.nlm.nih.gov/Taxonomy
▪ Dewey Decimal Classification system for library materials
http://www.oclc.org/dewey/resources/summaries/deweysummaries.pdf
▪ NAICS codes for industries
http://www.census.gov/naics/2007/NAICOD07.HTM
Structural Design: Hierarchies
64
Examples of hierarchies
Linnaean taxonomy:
National Center for
Biotechnology Information,
National Library of Medicine
Taxonomy Browser
65
350 Public administration & military science 360
Social problems & social services
370 Education
380 Commerce, communications & transportation
390 Customs, etiquette & folklore 400 Language
400 Language
410 Linguistics
420 English & Old English languages
430 German & related languages
440 French & related languages
450 Italian, Romanian & related languages
460 Spanish & Portuguese languages
470 Latin & Italic languages
480 Classical & modern Greek languages
490 Other languages
500 Science
510 Mathematics
520 Astronomy
530 Physics
540 Chemistry
550 Earth sciences & geology
560 Fossils & prehistoric life
570 Life sciences; biology
580 Plants (Botany)
590 Animals (Zoology)
600 Technology
610 Medicine & health
620 Engineering
630 Agriculture
640 Home & family management
650 Management & public relations
660 Chemical engineering
670 Manufacturing
680 Manufacture for specific uses
690 Building & construction
700 Arts
710 Landscaping & area planning
720 Architecture
730 Sculpture, ceramics & metalwork
740 Drawing & decorative arts
750 Painting
760 Graphic arts
770 Photography & computer art
780 Music
790 Sports, games & entertainment
800 Literature, rhetoric & criticism
810 American literature in English
820 English & Old English literatures
830 German & related literatures
840 French & related literatures
850 Italian, Romanian & related literatures
860 Spanish & Portuguese literatures
870 Latin & Italic literatures
880 Classical & modern Greek literatures
890 Other literatures
900 History
910 Geography & travel
920 Biography & genealogy
930 History of ancient world (to ca. 499)
940 History of Europe
950 History of Asia
960 History of Africa
970 History of North America
980 History of South America
990 History of other areas
000 Computer science, knowledge & systems
010 Bibliographies
020 Library & information sciences
030 Encyclopedias & books of facts
040 [Unassigned]
050 Magazines, journals & serials
060 Associations, organizations & museums
070 News media, journalism & publishing
080 Quotations
090 Manuscripts & rare books
100 Philosophy
110 Metaphysics
120 Epistemology
130 Parapsychology & occultism
140 Philosophical schools of thought
150 Psychology
160 Logic
170 Ethics
180 Ancient, medieval & eastern philosophy
190 Modern western philosophy
200 Religion
210 Philosophy & theory of religion
220 The Bible
230 Christianity & Christian theology
240 Christian practice & observance
250 Christian pastoral practice & religious orders
260 Christian organization, social work & worship
270 History of Christianity
280 Christian denominations
290 Other religions
300 Social sciences, sociology & anthropology
310 Statistics
320 Political science
330 Economics
340 Law
Dewey Decimal
Classification
100s level
Structural Design:
Hierarchies
66
North American
Industrial Classification
System
Structural Design: Hierarchies
67
Hierarchy purpose
1. Serving users who are browsing, exploring, discovering, not searching, to
whom the hierarchy is displayed.
➢ Users don’t even have to know the first word or few letters, as in
alphabetical browsing.
2. Instructing users on classification
3. Enabling “recursive”/“rolled up” retrieval results
(A term retrieves what is indexed to it and what is indexed to each on of its
narrower terms, all together.)
Structural Design: Hierarchies
68
Polyhierarchies
Sometimes a term can have two or more broader terms.
▪ Polyhierarchy is permitted if the
hierarchical relationship is valid
in both/all cases
▪ Remember “All-and-Some” test
for each generic hierarchical
relationship
▪ Systems may or may not
support it.
Online ServicesBanking
Online Banking
Structural Design: Facets
69
Facets
▪ For serving faceted classification, which allows the assignment of multiple
classifications to an object
▪ A “dimension” of a query; a type of concept
▪ Intended for searching with multiple terms in combination (post-
coordination), one from each facet
▪ A refinement, filter, limit by, narrow by
▪ Can be for topics or for named entities, but generally not both
▪ Reflect the domain of content
70
Examples of
ecommerce
facets
For clothes For books For software For furniture
Structural Design: Facets
Structural Design: Facets
71
Facet advantages
▪ Supports more complex search queries by users
▪ Allows users to control the search refinement, narrowing or broadening in
any manner or order
▪ Familiar to novice users; suitable for expert users
Facet disadvantages
▪ Only suitable for somewhat structured, unified type of content that all share
the same multiple facets
▪ Not practical for extremely large topical controlled vocabularies
▪ May not support “advanced search” of multiple terms selected at once
(“or”) from the same facet
▪ Requires investment of thorough indexing/tagging
1. Introduction to taxonomies and other knowledge organization systems
2. Taxonomies in support of search
3. Term creation
4. Term relationships
5. Structural design
6. User displays
Outline
72
User Displays: Hierarchy Options
73
End-user browse display options
Hierarchy end-user displays may be implemented in different ways:
▪ Expandable tree
− Accommodates inconsistent numbers of terms per level
− Insufficient for very large taxonomies or large numbers of terms at the same
level
▪ One level per web page
− Large number of terms can display at each level
− Less appropriate for taxonomies with varied levels or levels containing just
one or a few terms
▪ Fly-out subcategories
− Not so suitable for more than 3 levels or large taxonomies
User Displays: Hierarchy Options
74
Expandable
hierarchies
Term Relationships
75
Amazon.com:
One level per web page
Term Relationships
76
TESCO:
Fly-out subcategories
Term Relationships
77
Term Relationships
78
User Displays: Facets
79
Facet display features
▪ Collapsible displays of values to display more facets
▪ Graphical options: quantity sliders, color selections
▪ Counts of content items
▪ Tick boxes to make multiple selections
Resources
80
Books
Abbas, June. (2010) Structures for Organizing Knowledge. New York: Neal-
Schuman Publishers.
Harping, Patricia. (2010) Introduction to Controlled Vocabularies: Terminology for
Art, Architecture, and Other Cultural Works. Los Angeles: Getty Research Institute.
Hedden, Heather. (2016) The Accidental Taxonomist, 2nd edition. Medford, NJ:
Information Today Inc. http://www.hedden-information.com/accidental-taxonomist/
Hlava, Marjorie M.K. (2015) The Taxobook. Morgan & Claypool Publishers.
Lambe, Patrick. (2007). Organising Knowledge: Taxonomies, Knowledge and
Organisational Effectiveness. Oxford, England: Chandos Publishing.
Resources
81
Standards and Guidelines
ANSI/NISO Z39.19-2005 (2010) Guidelines for Construction, Format, and
Management of Monolingual Controlled Vocabularies. Bethesda, MD: NISO
Press. http://www.niso.org/apps/group_public/download.php/12591/z39-19-
2005r2010.pdf
NISO TR-06-2017 Issues in Vocabulary Management
http://www.niso.org/publications/tr/tr-06-2017
ISO 25964-1 Thesauri and Interoperability with other Vocabularies: Part 1:
Thesauri for Information Retrieval https://www.iso.org/standard/53657.html
Resources
82
Websites
Accidental Taxonomist book websites
http://www.hedden-information.com/accidental-taxonomist/websites/
Taxonomy Warehouse
www.taxonomywarehouse.com
Construction of Controlled Vocabularies: A Primer
http://marciazeng.slis.kent.edu/Z3919/index.htm
Thesaurus Construction tutorial by Tim Craven
http://publish.uwo.ca/~craven/677/thesaur/main00.htm
The Accidental Taxonomist Blog
http://accidental-taxonomist.blogspot.com
Hedden Information Management past presentations
http://www.hedden-information.com/presentations/
Resources
83
Courses, Workshops, Webinars
Taxonomies and Controlled Vocabularies” self-paced online course from
Hedden Information Management
http://www.hedden-information.com/courses-workshops/taxonomy-course/
Taxonomy Boot Camp London
pre-conference workshops October 14; conference October 15-16, 2019
http://www.taxonomybootcamp.com/london
“Practical Taxonomy Creation” 3-part webinar course recording, through the
American Society for Indexing
http://www.asindexing.org/online-learning/taxonomy-hedden
SLA Taxonomy Division webinars
http://taxonomy.sla.org
Heather Hedden
Hedden Information Management
Carlisle, MA USA
www.hedden-information.com
accidental-taxonomist.blogspot.com
heather@hedden.net
+1-978-371-0822
Questions/Contact
84

Contenu connexe

Tendances

The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
Edad 695 research methodology
Edad 695 research methodologyEdad 695 research methodology
Edad 695 research methodologyScott Lancaster
 
Making things findable
Making things findableMaking things findable
Making things findablePeter Mika
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 
An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchDavid Amerland
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Review of search and retrieval strategies
Review of search and retrieval strategiesReview of search and retrieval strategies
Review of search and retrieval strategiesAbid Fakhre Alam
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Roi Blanco
 

Tendances (20)

The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
Edad 695 research methodology
Edad 695 research methodologyEdad 695 research methodology
Edad 695 research methodology
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic Search
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
SOC 111 (Fall 2012)
SOC 111 (Fall 2012)SOC 111 (Fall 2012)
SOC 111 (Fall 2012)
 
Share point summit_2010_lemieux-toc
Share point summit_2010_lemieux-tocShare point summit_2010_lemieux-toc
Share point summit_2010_lemieux-toc
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Review of search and retrieval strategies
Review of search and retrieval strategiesReview of search and retrieval strategies
Review of search and retrieval strategies
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Segmentation
SegmentationSegmentation
Segmentation
 
Starting a search application
Starting a search applicationStarting a search application
Starting a search application
 
Search strategy
Search strategySearch strategy
Search strategy
 
Search engines
Search enginesSearch engines
Search engines
 
English 1102 2018
English 1102 2018English 1102 2018
English 1102 2018
 
Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement Influence of Timeline and Named-entity Components on User Engagement
Influence of Timeline and Named-entity Components on User Engagement
 

Similaire à Taxonomy design best practices

Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsSynonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsHeather Hedden
 
Taxonomy Development and Digital Projects
Taxonomy Development and Digital ProjectsTaxonomy Development and Digital Projects
Taxonomy Development and Digital Projects daniela barbosa
 
Taxonomies in Support of Search
Taxonomies in Support of SearchTaxonomies in Support of Search
Taxonomies in Support of SearchHeather Hedden
 
Theresa regli bw-3
Theresa regli bw-3Theresa regli bw-3
Theresa regli bw-3R Aunpad
 
The Role of Thesauri in Data Modeling
The Role of Thesauri in Data ModelingThe Role of Thesauri in Data Modeling
The Role of Thesauri in Data ModelingDanny Greefhorst
 
Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...Ryan Scicluna
 
Taxonomies & folksonomies
Taxonomies  & folksonomiesTaxonomies  & folksonomies
Taxonomies & folksonomiesAparna Sane
 
Customer-Focused Thesauri
Customer-Focused ThesauriCustomer-Focused Thesauri
Customer-Focused ThesauriHeather Hedden
 
SharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycSharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycVincent Biret
 
Taxo for km chicago 20121009
Taxo for km chicago 20121009Taxo for km chicago 20121009
Taxo for km chicago 20121009KM Chicago
 
FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...
FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...
FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...Peter McQuilton
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
 
Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...
Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...
Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...Jesús Tramullas
 

Similaire à Taxonomy design best practices (20)

Taxonomy made easy
Taxonomy made easyTaxonomy made easy
Taxonomy made easy
 
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsSynonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred Terms
 
Taxonomy Development and Digital Projects
Taxonomy Development and Digital ProjectsTaxonomy Development and Digital Projects
Taxonomy Development and Digital Projects
 
Taxonomies in Support of Search
Taxonomies in Support of SearchTaxonomies in Support of Search
Taxonomies in Support of Search
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Taxonomies and Metadata
Taxonomies and MetadataTaxonomies and Metadata
Taxonomies and Metadata
 
Theresa regli bw-3
Theresa regli bw-3Theresa regli bw-3
Theresa regli bw-3
 
The Role of Thesauri in Data Modeling
The Role of Thesauri in Data ModelingThe Role of Thesauri in Data Modeling
The Role of Thesauri in Data Modeling
 
Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...
 
Taxonomies & folksonomies
Taxonomies  & folksonomiesTaxonomies  & folksonomies
Taxonomies & folksonomies
 
Customer-Focused Thesauri
Customer-Focused ThesauriCustomer-Focused Thesauri
Customer-Focused Thesauri
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
SharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycSharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnyc
 
Taxo for km chicago 20121009
Taxo for km chicago 20121009Taxo for km chicago 20121009
Taxo for km chicago 20121009
 
FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...
FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...
FAIRsharing - Mapping the Landscape of Databases, Repositories, Standards and...
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...
Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...
Gateways, Clearinghouses and Portals in the Social Sciences: Classsifications...
 
Searching electronic resources effectively BLDS, November 2012
Searching electronic resources effectively BLDS, November 2012Searching electronic resources effectively BLDS, November 2012
Searching electronic resources effectively BLDS, November 2012
 
Taxonomy 101
Taxonomy 101Taxonomy 101
Taxonomy 101
 

Plus de voginip

Zo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko BoonstraZo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko Boonstravoginip
 
Automatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenAutomatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenvoginip
 
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingHybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingvoginip
 
Solving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source TechniquesSolving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source Techniquesvoginip
 
PiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenPiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenvoginip
 
Red het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimteRed het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimtevoginip
 
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)voginip
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
The Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical ResearchThe Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical Researchvoginip
 
Oude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en WikipediaOude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en Wikipediavoginip
 
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...voginip
 
Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?voginip
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...voginip
 
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...voginip
 
Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!voginip
 
Systematisch zoeken op het web
Systematisch zoeken op het webSystematisch zoeken op het web
Systematisch zoeken op het webvoginip
 
Grote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als dataGrote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als datavoginip
 
Werken met Wikidata
Werken met WikidataWerken met Wikidata
Werken met Wikidatavoginip
 
Een gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenEen gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenvoginip
 
Een startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietEen startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietvoginip
 

Plus de voginip (20)

Zo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko BoonstraZo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko Boonstra
 
Automatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenAutomatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingen
 
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingHybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
 
Solving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source TechniquesSolving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source Techniques
 
PiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenPiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar maken
 
Red het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimteRed het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimte
 
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
The Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical ResearchThe Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical Research
 
Oude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en WikipediaOude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en Wikipedia
 
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
 
Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...
 
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
 
Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!
 
Systematisch zoeken op het web
Systematisch zoeken op het webSystematisch zoeken op het web
Systematisch zoeken op het web
 
Grote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als dataGrote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als data
 
Werken met Wikidata
Werken met WikidataWerken met Wikidata
Werken met Wikidata
 
Een gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenEen gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardigheden
 
Een startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietEen startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat niet
 

Dernier

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

Dernier (20)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

Taxonomy design best practices

  • 1. Taxonomy Design Best Practices Workshop VOGIN-IP-lezing “Zoeken & vinden” 21 March 2019 Amsterdam Presented by Heather Hedden
  • 2. ▪ Controlled vocabulary editor at a library database vendor, Gale/Cengage ▪ Taxonomy consultant ▪ Book indexer ▪ Taxonomy online course instructor ▪ Author of The Accidental Taxonomist, 2nd ed. (Information Today, Inc.) About Heather Hedden 2
  • 3. 1. Introduction to taxonomies and other knowledge organization systems • Types and uses • Taxonomies and metadata • Standards and models 2. Taxonomies in support of search • Searching on taxonomies • Taxonomies for post-search refinements • Knowledge graphs • Faceted taxonomies • Options for search on taxonomies 3. Term creation • Wording of terms • Synonyms/alternative labels Outline 3 4. Term relationships • Hierarchical relationships • Associative relationships • Customized, semantic relationships 5. Structural design • Hierarchical taxonomy design • Faceted taxonomy design 6. User displays • Hierarchical display options • Faceted taxonomy displays
  • 4. 1. Introduction to taxonomies and other knowledge organization systems 2. Taxonomies in support of search 3. Term creation 4. Term relationships 5. Structural design 6. User displays Outline 4
  • 5. Knowledge organization system (controlled vocabulary) The most general, broadest all applications ▪ An authoritative, restricted list of terms (words or phrases) mainly used for indexing/tagging content to support retrieval ▪ Controlled in who and when new terms can be added ▪ Usually makes use of variants/synonyms/alternative labels to point to the correct term names ▪ May or may not have structured relationships between terms Introduction: Types and Uses 5
  • 6. Types of knowledge organization systems (controlled vocabularies) ▪ Simple term list ▪ Synonym ring (search-support “thesaurus”) ▪ Authority file (controlled list with variants; no hierarchy) ▪ Taxonomy − Hierarchical taxonomy − Faceted taxonomy ▪ Thesaurus ▪ Ontology “Taxonomy” sometimes means any controlled vocabulary. Introduction: Types and Uses 6
  • 7. Term list ▪ A simple list of terms ▪ Usually alphabetical, but could be in other logical order ▪ Lacking synonyms, it is usually short enough for quick browsing ▪ Can appear in drop-down scroll boxes ▪ May be used for various metadata values ▪ Part of a larger set of controlled vocabularies Introduction: Types and Uses 7
  • 8. Synonym ring ▪ A controlled vocabulary with synonyms or near-synonyms for each concept ▪ No designated “preferred” term: All terms are equal and point to each other, as in a ring. ▪ Synonyms are usually not displayed to the user. ▪ Usually used to support search. ▪ Also called a “search thesaurus.” Introduction: Types and Uses 8 Software Computer programs Tools Applications
  • 9. Taxonomy ▪ A controlled vocabulary with broader/narrower (parent/child) term relationships that include all terms to create a hierarchical structure ▪ With focus for categorizing and organization concepts ▪ May or may not have “synonyms” to point to the correct, preferred terms ▪ May comprise several hierarchies or facets (A facet can be considered a hierarchy.) Introduction: Types and Uses 9
  • 10. Introduction: Types and Uses 10 Taxonomy Examples Leisure and culture . Arts and entertainment venues . . Museums and galleries . Children's activities . Culture and creativity . . Architecture . . Crafts . . Heritage . . Literature . . Music . . Performing arts . . Visual arts . Entertainment and events . Gambling and lotteries . Hobbies and interests . Parks and gardens . Sports and recreation . . Team sports . . . Cricket . . . Football . . . Rugby . . Water sports . . Winter sports . Sports and recreation facilities . Tourism . . Passports and visas . Young people's activities Career Level • Student • Entry Level • Experienced • Manager • Director • Executive Function • Customer Service & Support • Delivery • Engineering • Finance • General Management • Legal & Regulatory Affairs • Marketing & Advertising [more] Industry • Agriculture • Apparel & Fashion • Automotive • Aviation & Aerospace • Banking • Biotechnology • Broadcast Media • Chemicals [more] Faceted Taxonomy Example Hierarchical Taxonomy Example
  • 11. Thesaurus ▪ A controlled vocabulary that has standard structured relationships between terms ‒ Hierarchical: broader term/narrower term (BT/NT) ‒ Associative: related terms (RT) ‒ Equivalence: preferred term (“use for” or “used for”)/non-preferred term (use) (USE/UF) ▪ Created in accordance with standards: ‒ ANSI/NISO Z39.19 Guidelines for Construction, Format, and Management of Monolingual Controlled Vocabularies ‒ ISO 25964-1 Part 1, Thesauri and interoperability with other vocabularies ▪ “Thesaurus” is most often the kind of controlled vocabulary used in indexing periodical literature Introduction: Types and Uses 11
  • 12. Introduction: Types and Uses 12 Thesaurus entry examples materials acquisitions UF acquisitions (of materials) library acquisitions BT collection development NT accessions approval plans gifts and exchanges materials claims materials orders subscriptions RT book vendors jobbers subscription agencies subscription cancellations Gale thesaurusASIS&T thesaurus
  • 13. Introduction: Types and Uses 13 Controlled Vocabularies - Complexity Pick List Synonym Ring Authority File Taxonomy Thesaurus Ontology Ambiguity control Ambiguity control Synonym control Ambiguity control Synonym control Ambiguity control (Synonym control) Hierarchical relationships Ambiguity control Synonym control Hierarchical relationship Associative relationships Ambiguity control (Synonym control) Semantic relationships Classes Less MoreControlled Vocabularies - Complexity Summary of Controlled Vocabulary Types
  • 14. Applications and uses 1. Indexing support a) Manual indexing b) Automated indexing 2. Findability or retrieval support a) In browsing • Alphabetical browsing • Hierarchical browsing b) In searching Introduction: Types and Uses 14
  • 15. Indexing Support ▪ As a structured list of agreed-upon terms to ensure consistent indexing • Across multiple documents or content items, where different synonyms describe the same concepts • By multiple indexers working on the same collection • By machine-aided indexing / autoclassification, where taxonomy terms have rules, clues, or sample tuned documents Introduction: Types and Uses 15
  • 16. Indexing Support Introduction: Types and Uses 16 Manual indexing example Cengage/Gale Subject Thesaurus Internal indexer alphabetical browse view
  • 17. Retrieval Support: in browsing a) Alphabetical browse Display method for thesauri, name/proper noun lists, and book-style indexes Example of an alphabetical browse thesaurus: UNESCO Thesaurus http://vocabularies.unesco.org/browser/thesaurus/en/ Introduction: Types and Uses 17
  • 18. Retrieval Support Introduction: Types and Uses 18 Browse example Books and Authors
  • 19. Retrieval Support: in browsing b) Hierarchical browse Categorization scheme for information organization, classification, guided search ▪ More often for end-users guidance than for indexers, but also for database indexers ▪ Example of hierarchical browse taxonomy: Getty Art & Architecture Thesaurus http://www.getty.edu/research/tools/vocabularies/aat/ Introduction: Types and Uses 19
  • 21. Introduction: Taxonomies and Metadata 21 Metadata Types and Uses ▪ Descriptive • for content discovery/retrieval (via searching or browsing) • for content identification (to cite sources, contact authors, compare content, resolve content issues, etc.) ▪ Administrative • for content management (for building collections, information products, web pages and websites; for content maintenance) • for workflow/process management (assigning content, archiving, preserving) ▪ Structural • for content navigation (within a large content resource) • for content presentation (as markup/style metadata)
  • 22. Metadata Standards – Examples Dublin Core for generic online networked resources DDI (Data Documentation Initiative) for describing data from the social, behavioral, and economic sciences IPTC (International Press Telecommunications Council) for photographs MARC (Machine Readable Cataloging) for bibliographic data for library materials PREMIS (Preservation Metadata: Implementation Strategies) for repositories of digital objects SDMX (Statistical Data and Metadata Exchange) for the exchange of statistical data. VRA Core (Visual Resources Association) for describing images of cultural heritage Introduction: Standards and Models 22
  • 23. Knowledge organization systems (controlled vocabulary) standards / models ▪ For best practices in forming terms and their relationships: • ISO 25964 (2011, 2013) Thesauri and Interoperability with Other Vocabularies • ANSI/NISO Z39.19 (2005, renewed 2010) Guidelines for Construction, Format, and Management of Monolingual Controlled Vocabularies http://www.niso.org/apps/group_public/download.php/12591/z39-19- 2005r2010.pdf ▪ For a set of specification for making controlled vocabularies exchangeable: Simple Knowledge Organization System (SKOS) A World Wide Web (W3C) recommendation “A common data model for sharing and linking knowledge organization systems via the web” https://www.w3.org/TR/skos-reference/ Introduction: Standards and Models 23
  • 24. Knowledge organization systems (controlled vocabulary) standards / models ▪ ANSI/NISO Z39.19 / ISO 25964 Provides guidelines for term format/style and creating relationships: • Hierarchical: broader term/narrower term (BT/NT) • Associative: related terms (RT) • Equivalence (synonyms): preferred term/non-preferred term (USE/UF) ▪ SKOS Provides specifications for designating terms and relationships: • Lexical labels: skos:prefLabel, skos:altLabel and skos:hiddenLabel • Semantic relations: • skos:broader (broader concept) • skos:narrower (narrower concept) • skos:related (related concept) Introduction: Standards and Models 24
  • 25. 1. Introduction to taxonomies and other knowledge organization systems 2. Taxonomies in support of search 3. Term creation 4. Term relationships 5. Structural design 6. User displays Outline 25
  • 26. Trends in taxonomy implementation, search integration and display 1. Originally, full taxonomy hierarchical browse or thesaurus alphabetical browse. Search on content was separate from browsing taxonomies. 2. Full large taxonomies came to be displayed less. Search on taxonomy terms. 3. Search on more than just the taxonomy terms. Search on a combination of taxonomy terms and words in titles, texts, etc. 4. Faceted taxonomies – combining search and limited browsing 5. Post-search filters – especially for larger taxonomies or thesauri 6. Knowledge graphs Taxonomies in Support of Search 26
  • 27. Taxonomies in Support of Search 27 Originally full taxonomy browse
  • 28. Taxonomies in Support of Search 28 Gale former display, ca. 2011 Originally full taxonomy browse
  • 29. Taxonomies in Support of Search: Search on Taxonomy Terms 29 Search on words/phrases in the Subject field Gale current displays Search on terms in a thesaurus
  • 30. Taxonomies in Support of Search: Search on Taxonomy Terms 30 In Basic Search Search subjects and titles in the same field
  • 31. 31 Faceted Taxonomies Aspects Filters Dimensions Limiters Refinements Examples of ecommerce facets For clothes For books For software For furniture Taxonomies in Support of Search: Facets
  • 32. 32 Taxonomy terms to refine post-search results ▪ A display of taxonomy terms that have been used to index the content items in the search result set (not all taxonomy terms). ▪ A display of selected terms from the taxonomy, not the taxonomy itself. ▪ Any relationships between terms are not indicated. ▪ Displayed in order of usage frequency of the search result set. Suitable when a large taxonomy or thesaurus does not fit into a facet. Taxonomies in Support of Search: Post-Search Refinements
  • 33. Subjects for post-searching filtering in Gale search results
  • 34. Displayed subjects on an individual Gale content item
  • 35. Knowledge graphs ▪ A representation of knowledge as a graph (a network of nodes and links, not tables of rows and columns) ▪ Usually based on data in graph databases, rather than relational databases ▪ Usually includes, but not limited, to the visualization of – An output of graph analytics – Display of interconnected nodes and links – Display of related data in a "fact box" ▪ Improve search results beyond machine learning and algorithms Taxonomies in Support of Search: Knowledge Graphs 35
  • 36. Taxonomy search options ▪ Different types of search on taxonomy ▪ Displayed taxonomy terms in type-ahead or search-suggest ▪ Alternative labels / nonpreferred terms / synonyms Taxonomies in Support of Search: Search Options 36
  • 37. Different types of search on terms ▪ Exact ▪ Contains ▪ Begins ▪ Smart Implementations ▪ Exact – option for experienced or repeat users ▪ Contains – also called phrase search. Can be done with quotation marks. ▪ Begins – option if there is a type-ahead display of terms ▪ Smart – sometimes the default, if no other options, and terms are not displayed Taxonomies in Support of Search: Search Options 37 – exact match – exact match phrase with additional words before or after – alphabetical from start, but allows end truncation – words within the term in any order and also internal word stemming (singular/plural)
  • 38. Taxonomies in Support of Search: Search Options 38 Different types of search on terms
  • 39. Taxonomies in Support of Search: Search Options 39 Type-ahead Search-suggest
  • 40. Benefits of taxonomies over search alone ▪ Indexing and retrieval based on concepts, not just words/phrases improves search results in both precision (accuracy) and recall (comprehensiveness) ▪ Taxonomy terms allow limiting/filtering search by topic. ▪ Broader categories/terms allow users to choose a broad subject first and then limit by other metadata. ▪ Relationships between terms allow users to explore related topics. ▪ Subjects displayed on search results (post-search) allow users to refine and focus their searches by precise topic or explore related topics. ▪ Taxonomies support the indexing of nontext content (images, video, audio). ▪ Multilingual taxonomies support accurate search and retrieval across multilingual content. Taxonomies in Support of Search 40
  • 41. 1. Introduction to taxonomies and other knowledge organization systems 2. Taxonomies in support of search 3. Term creation 4. Term relationships 5. Structural design 6. User displays Outline 41
  • 42. Term creation issues ▪ Deciding whether to include a concept ▪ Choosing a preferred term/label name ▪ Term format and style Term Creation: Wording of Terms 42
  • 43. Whether a concept should be included as a term 1. Is it within the subject-area scope of the controlled vocabulary? • In book indexing, off-topic is OK; not in a controlled vocabulary 2. Is there enough information on the subject? • In book indexing, text tells something about the subject; sufficient number of sentences on the topic • In a controlled vocabulary: sufficient number of anticipated documents or articles on the topic 3. Is it important, likely to be looked up? Do users want and expect it? Term Creation: Wording of Terms 43
  • 44. Choosing the preferred term wording ▪ Unlike book indexes, there are no “double-posts” in taxonomies and thesauri ▪ You must always choose a “preferred term” (except in synonym rings) ▪ Variants and synonyms are designated as “non-preferred terms” or “alternative labels” ▪ Wording is based on user expectations and needs, more so than on content, which varies. Term Creation: Wording of Terms 44
  • 45. Choosing the preferred term/label wording (the displayed form) Choosing between two “synonyms”: Doctors vs. Physicians Movies vs. Motion pictures Cars vs. Automobiles Consider: 1. Wording of terms most likely looked up by the intended users/audience, especially in browsed controlled vocabulary 2. Enforcing organizational/enterprise controlled vocabulary 3. Conforming to academic or professional standards 4. Consistency in style throughout the controlled vocabulary 5. Wording with in the documents/content indexed Term Creation: Wording of Terms 45
  • 46. Choosing the preferred term/label wording The other becomes a nonpreferred term/alternative label. Differentiate closely related terms, or use one as preferred: Foreign policy vs. International relations Differentiate topics from actions, or use one as preferred: Contracts vs. Contracting Differentiate broader and more specific concept, or use one: Electric power plants vs. Hydroelectric power plants Consider likely occurrences of the more specific topic in the content. Term Creation: Wording of Terms 46
  • 47. Term format and style ▪ Consistent capitalization: lower case or initial capitalization; not title caps Corporate finance; corporate finance; not Corporate Finance ▪ Single words or multi-word phrases ▪ Nouns or noun phrases ▪ Adjectives alone can be terms in special circumstances and where noun is obvious from context. ▪ Parenthetical qualifiers may be used for disambiguation, not modification. ▪ Countable nouns are usually plural. ▪ Avoid term inversions (e.g. noun, adjective) because it is searchable Term Creation: Wording of Terms 47
  • 48. ▪ Defined: Approximately synonymous words or phrases to refer to an equivalent concept, for the context of the controlled vocabulary and the set of content. ▪ Purpose: To capture different wordings of how different people might describe or look up the same concept or idea and used as alternative entries. ➢ Differences between that of the author and the user/reader ➢ Differences between that of the indexers and the end-users ➢ Differences among different users/readers ▪ Serving as “multiple entry points” to look up and retrieve the desired content, as do double posts or See references in an index. ▪ Enabling consistent indexing Term Creation: Variants, Nonpreferred Terms, or Alternative Labels 48
  • 49. Examples from Gale Subject Thesaurus Term Creation: Variants, Nonpreferred Terms, or Alternative Labels 49 Conflict management Conflict resolution Managing conflict Wills Codicils Last will and testament Testaments (Wills) Influenza Flu Grippe Movies Cinema Films (Movies) Motion pictures Movie genres Telecommunications industry Communications industry Digital transmission industry Interexchange carriers Telecommunications services industry Telephone holding companies Telephone industry Telephone services industry Environmental management Adaptive management (Environmental management) Environmental control Environmental stewardship Natural resource management Stewardship (Environmental management) Piano music [no variants]
  • 50. Term Creation: Variants, Nonpreferred Terms, or Alternative Labels 50 Nonpreferred Term ▪ Formal designation in thesauri, in accordance with ANSI/NISO Z.39-19 and ISO 25964 and standards. ▪ Shortened as NPT. ▪ Associated with a Preferred term. ▪ Considered a kind of “relationship” of the Equivalency type. ▪ Reciprocity of relationship, pointing in both directions: USE and UF (use and used for/use for). ▪ Example: Inundations USE Floods UF Inundations ▪ Both preferred terms and non-preferred terms are “terms.”
  • 51. Term Creation: Variants, Nonpreferred Terms, or Alternative Labels 51 Alternative Label ▪ Formal designation for SKOS (Simple Knowledge Organization System) vocabularies, a World Wide Web (W3C) recommendation. ▪ Shortened as altLabel. ▪ Associated with a Preferred label. ▪ Instead of terms, there are concepts, each with any number of labels ▪ Concepts have a preferred label (for each language). ▪ Concepts have any number of alternative labels and hidden labels (for each language). ▪ Alternative labels are part of a concept’s attributes, not equivalent terms and not connected by “relationships.”
  • 52. Variants, Nonpreferred Terms, or Alternative Labels 52 Thesaurus model: Synaptica
  • 53. Variants, Nonpreferred Terms, or Alternative Labels 53 SKOS model: PoolParty
  • 54. Term Creation: Variants, Nonpreferred Terms, or Alternative Labels 54 Sources for variant terms ▪ Same sources as for concepts and preferred terms ➢ Survey/audit of the content and terms used ➢ Search query logs and other internal usage data ➢ External sources: websites, Wikipedia, other taxonomies and controlled vocabularies, book tables of contents, etc. ▪ Creative changes of terms (after verification of variant term usage in search) ▪ Not to be used as a source: Dictionary-type thesaurus, such as Roget's Thesaurus of English Words and Phrases or thesaurus-dictionary websites
  • 55. 1. Introduction to taxonomies and other knowledge organization systems 2. Taxonomies in support of search 3. Term creation 4. Term relationships 5. Structural design 6. User displays Outline 55
  • 56. Term Relationships 56 Types of relationships between terms Between preferred and nonpreferred terms in a thesaurus: 1. Equivalence: Use (USE) / Used for (UF) Between concepts or preferred terms: 2. Hierarchical: Broader term (BT) / Narrower term (NT) 3. Associative: Related term (RT) 4. Customized relationships: More specific types of BT/NT or RT Relationships are reciprocal between terms/concepts.
  • 57. Term Relationships: Hierarchical 57 Hierarchical relationships ▪ Broader-narrower / Topic-subtopic / Parent-child / Superordinate-Subordinate ▪ Required feature of both thesauri and taxonomies ▪ Thesaurus designation of BT / NT (broader term / narrower term) ▪ SKOS designation: Broader concept / Narrower concept ▪ Terms usually have more than one narrower term (NT), unless they are the most specific terms in the vocabulary. ▪ On occasion, a term may have more than one broader term (BT), referred to as polyhierarchy.
  • 58. Term Relationships: Hierarchical 58 Hierarchical relationships Reciprocal (bi-directional) relationships, but asymmetrical Broader term (BT) Fruits SOME ALL SOME ALL Narrower term (NT) Oranges Fruits NT Oranges Oranges BT Fruits Three types: 1. Generic – Specific 2. Generic – Named entity instance: Common noun – Proper noun 3. Whole – Part
  • 59. Term Relationships: Associative 59 Associative relationships ▪ Suggestions to the user of possible related terms of interest ▪ Like See also in an index ▪ Required feature of thesauri ▪ Optional feature of taxonomies ▪ Thesaurus designation of RT (Related term) ▪ SKOS designation: Related concept ▪ Symmetrical bi-directional relationship ▪ Between terms within the same hierarchy or in different hierarchies
  • 60. Term Relationships: Customized, Semantic 60 Specific/customized relationships ▪ Relationships containing meaning: “semantic” ▪ Variations on equivalence (USE/UF), hierarchical (BT/NT) or associative (RT) relationships, but usually associative. ▪ Reciprocal, but asymmetrical, or directional, not plain RT. ▪ Specific enough to convey the necessary meaning, but not uniquely specific. ▪ Relationships are between terms of different types, across different designated categories or classes. ▪ Taxonomist defines the relationships and their codes and the categories. ▪ A defining characteristic of ontologies or an “ontology lite.”
  • 61. 1. Introduction to taxonomies and other knowledge organization systems 2. Taxonomies in support of search 3. Term creation 4. Term relationships 5. Structural design 6. User displays Outline 61
  • 62. Structural Design: Hierarchies 62 Hierarchies ▪ The extension of hierarchical relationships (BT/NT) to include all terms ▪ More important for taxonomies than for thesauri ▪ Emphasize categorization, classification, sorting ▪ Involve working from the top down ▪ Also known as “tree” structures A single taxonomy may have more than one top-term hierarchy.
  • 63. Structural Design: Hierarchies 63 Examples of hierarchies for classification Classifying of things – can only go in one place ▪ Linnaean taxonomy of classification of organisms National Center for Biotechnology Information of the National Library of Medicine http://www.ncbi.nlm.nih.gov/Taxonomy ▪ Dewey Decimal Classification system for library materials http://www.oclc.org/dewey/resources/summaries/deweysummaries.pdf ▪ NAICS codes for industries http://www.census.gov/naics/2007/NAICOD07.HTM
  • 64. Structural Design: Hierarchies 64 Examples of hierarchies Linnaean taxonomy: National Center for Biotechnology Information, National Library of Medicine Taxonomy Browser
  • 65. 65 350 Public administration & military science 360 Social problems & social services 370 Education 380 Commerce, communications & transportation 390 Customs, etiquette & folklore 400 Language 400 Language 410 Linguistics 420 English & Old English languages 430 German & related languages 440 French & related languages 450 Italian, Romanian & related languages 460 Spanish & Portuguese languages 470 Latin & Italic languages 480 Classical & modern Greek languages 490 Other languages 500 Science 510 Mathematics 520 Astronomy 530 Physics 540 Chemistry 550 Earth sciences & geology 560 Fossils & prehistoric life 570 Life sciences; biology 580 Plants (Botany) 590 Animals (Zoology) 600 Technology 610 Medicine & health 620 Engineering 630 Agriculture 640 Home & family management 650 Management & public relations 660 Chemical engineering 670 Manufacturing 680 Manufacture for specific uses 690 Building & construction 700 Arts 710 Landscaping & area planning 720 Architecture 730 Sculpture, ceramics & metalwork 740 Drawing & decorative arts 750 Painting 760 Graphic arts 770 Photography & computer art 780 Music 790 Sports, games & entertainment 800 Literature, rhetoric & criticism 810 American literature in English 820 English & Old English literatures 830 German & related literatures 840 French & related literatures 850 Italian, Romanian & related literatures 860 Spanish & Portuguese literatures 870 Latin & Italic literatures 880 Classical & modern Greek literatures 890 Other literatures 900 History 910 Geography & travel 920 Biography & genealogy 930 History of ancient world (to ca. 499) 940 History of Europe 950 History of Asia 960 History of Africa 970 History of North America 980 History of South America 990 History of other areas 000 Computer science, knowledge & systems 010 Bibliographies 020 Library & information sciences 030 Encyclopedias & books of facts 040 [Unassigned] 050 Magazines, journals & serials 060 Associations, organizations & museums 070 News media, journalism & publishing 080 Quotations 090 Manuscripts & rare books 100 Philosophy 110 Metaphysics 120 Epistemology 130 Parapsychology & occultism 140 Philosophical schools of thought 150 Psychology 160 Logic 170 Ethics 180 Ancient, medieval & eastern philosophy 190 Modern western philosophy 200 Religion 210 Philosophy & theory of religion 220 The Bible 230 Christianity & Christian theology 240 Christian practice & observance 250 Christian pastoral practice & religious orders 260 Christian organization, social work & worship 270 History of Christianity 280 Christian denominations 290 Other religions 300 Social sciences, sociology & anthropology 310 Statistics 320 Political science 330 Economics 340 Law Dewey Decimal Classification 100s level
  • 67. Structural Design: Hierarchies 67 Hierarchy purpose 1. Serving users who are browsing, exploring, discovering, not searching, to whom the hierarchy is displayed. ➢ Users don’t even have to know the first word or few letters, as in alphabetical browsing. 2. Instructing users on classification 3. Enabling “recursive”/“rolled up” retrieval results (A term retrieves what is indexed to it and what is indexed to each on of its narrower terms, all together.)
  • 68. Structural Design: Hierarchies 68 Polyhierarchies Sometimes a term can have two or more broader terms. ▪ Polyhierarchy is permitted if the hierarchical relationship is valid in both/all cases ▪ Remember “All-and-Some” test for each generic hierarchical relationship ▪ Systems may or may not support it. Online ServicesBanking Online Banking
  • 69. Structural Design: Facets 69 Facets ▪ For serving faceted classification, which allows the assignment of multiple classifications to an object ▪ A “dimension” of a query; a type of concept ▪ Intended for searching with multiple terms in combination (post- coordination), one from each facet ▪ A refinement, filter, limit by, narrow by ▪ Can be for topics or for named entities, but generally not both ▪ Reflect the domain of content
  • 70. 70 Examples of ecommerce facets For clothes For books For software For furniture Structural Design: Facets
  • 71. Structural Design: Facets 71 Facet advantages ▪ Supports more complex search queries by users ▪ Allows users to control the search refinement, narrowing or broadening in any manner or order ▪ Familiar to novice users; suitable for expert users Facet disadvantages ▪ Only suitable for somewhat structured, unified type of content that all share the same multiple facets ▪ Not practical for extremely large topical controlled vocabularies ▪ May not support “advanced search” of multiple terms selected at once (“or”) from the same facet ▪ Requires investment of thorough indexing/tagging
  • 72. 1. Introduction to taxonomies and other knowledge organization systems 2. Taxonomies in support of search 3. Term creation 4. Term relationships 5. Structural design 6. User displays Outline 72
  • 73. User Displays: Hierarchy Options 73 End-user browse display options Hierarchy end-user displays may be implemented in different ways: ▪ Expandable tree − Accommodates inconsistent numbers of terms per level − Insufficient for very large taxonomies or large numbers of terms at the same level ▪ One level per web page − Large number of terms can display at each level − Less appropriate for taxonomies with varied levels or levels containing just one or a few terms ▪ Fly-out subcategories − Not so suitable for more than 3 levels or large taxonomies
  • 74. User Displays: Hierarchy Options 74 Expandable hierarchies
  • 79. User Displays: Facets 79 Facet display features ▪ Collapsible displays of values to display more facets ▪ Graphical options: quantity sliders, color selections ▪ Counts of content items ▪ Tick boxes to make multiple selections
  • 80. Resources 80 Books Abbas, June. (2010) Structures for Organizing Knowledge. New York: Neal- Schuman Publishers. Harping, Patricia. (2010) Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works. Los Angeles: Getty Research Institute. Hedden, Heather. (2016) The Accidental Taxonomist, 2nd edition. Medford, NJ: Information Today Inc. http://www.hedden-information.com/accidental-taxonomist/ Hlava, Marjorie M.K. (2015) The Taxobook. Morgan & Claypool Publishers. Lambe, Patrick. (2007). Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness. Oxford, England: Chandos Publishing.
  • 81. Resources 81 Standards and Guidelines ANSI/NISO Z39.19-2005 (2010) Guidelines for Construction, Format, and Management of Monolingual Controlled Vocabularies. Bethesda, MD: NISO Press. http://www.niso.org/apps/group_public/download.php/12591/z39-19- 2005r2010.pdf NISO TR-06-2017 Issues in Vocabulary Management http://www.niso.org/publications/tr/tr-06-2017 ISO 25964-1 Thesauri and Interoperability with other Vocabularies: Part 1: Thesauri for Information Retrieval https://www.iso.org/standard/53657.html
  • 82. Resources 82 Websites Accidental Taxonomist book websites http://www.hedden-information.com/accidental-taxonomist/websites/ Taxonomy Warehouse www.taxonomywarehouse.com Construction of Controlled Vocabularies: A Primer http://marciazeng.slis.kent.edu/Z3919/index.htm Thesaurus Construction tutorial by Tim Craven http://publish.uwo.ca/~craven/677/thesaur/main00.htm The Accidental Taxonomist Blog http://accidental-taxonomist.blogspot.com Hedden Information Management past presentations http://www.hedden-information.com/presentations/
  • 83. Resources 83 Courses, Workshops, Webinars Taxonomies and Controlled Vocabularies” self-paced online course from Hedden Information Management http://www.hedden-information.com/courses-workshops/taxonomy-course/ Taxonomy Boot Camp London pre-conference workshops October 14; conference October 15-16, 2019 http://www.taxonomybootcamp.com/london “Practical Taxonomy Creation” 3-part webinar course recording, through the American Society for Indexing http://www.asindexing.org/online-learning/taxonomy-hedden SLA Taxonomy Division webinars http://taxonomy.sla.org
  • 84. Heather Hedden Hedden Information Management Carlisle, MA USA www.hedden-information.com accidental-taxonomist.blogspot.com heather@hedden.net +1-978-371-0822 Questions/Contact 84