Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Automated metadata projects at Yle
1. Automated metadata generation projects at Yle
Elina Selkälä
Manager, archive publishing and metadata
Yle Archives
elina.selkala@yle.fi
FIAT/IFTA Media Management Seminar
Lugano
8.-9.6.2017
2. Agenda
Automated metadata generation projects at Yle
• Yle in a nutshell
• Yle Archives, collections and materials
• Production of metadata at Yle
• What we experimented on: examples of automatic content analysis projects
• What we learned
• What is happening next
• What is the role of the information professional in the age of AI
3. This is Yle
Automated metadata generation projects at Yle
• Public service broadcasting company
• 3 nationwide television & 6 radio channels, 24 regional radio stations
• Extensive online presence: yle.fi, svenska.yle.fi, Yle Areena, Yle Elävä arkisto
• In addition to Finnish and Swedish, has broadcasts in 11 languages, e.g. Sami,
English and Russian
• National programming hours per year:
50,000 hours of radio programming 20,000 hours of TV programming
5,000 hours of audio content online 15,000 hours of video content online
4. Yle Archives
Automated metadata generation projects at Yle
• Archives and catalogues Yle produced and co-produced radio and TV programmes
• Fosters and curates the archive collections of Yle
• Offers information services and training for Yle staff
• Publishes archive material online
Collections
• TV and radio materials, photographs, sound effects and music
• Archived in Media Asset Management System ”Metro” (Avid)
• Represents an important part of Finnish cultural heritage
• Archive has also sheet music, books and online resources e.g. papers, magazines,
databases
5. Radio and TV Archive collections
Automated metadata generation projects at Yle
TV materials
• TV programmes and raw material from
1957 onwards & film materials from 1906
onward
• Collection consists of around 700,000
programmes and clips
• All Yle productions / co-productions have
been systematically archived since 1984
• Archiving in native digital form since 2009
• Around 10,000 hours of video content is
archived / year
• Relatively good metadata
Radio materials
• Yle produced programmes and raw
material, oldest surviving clip from 1935
• The collection consists of around 2 million
programmes and clips
• Currently around 10% of radio
transmissions are archived (e.g. News and
works of art)
• Archiving in native digital form from the
beginning of the 2000s
• Around 20,000 hours of audio content is
archived / year
• Metadata of varying quality
6. Metadata production at Yle
Archived radio and TV programmes
Automated metadata generation projects at Yle
• Yle’s archive materials are widely used as whole programmes (reruns) and clips
• Metadata incomplete or insufficient for many reasons → hinders findability and safe
re-use
• Alongside tape collections digitization projects, related programme metadata is
updated and improved
• Huge endeavour, therefore prioritization is needed (most used, customer orders)
• Descriptive metadata is done manually
• Done by Archives’ information specialists (about 15 people)
7. Metadata production at Yle
New audio and video content
Automated metadata generation projects at Yle
• Metadata production decentralized
Metadata added and stored throughout the production and publishing process
Some metadata from production and publishing systems, descriptive metadata
filled out manually
Done by Yle staff; production coordinators, editors, producers, etc.
• Company-wide Archiving Policy
Defines the responsibilities, contents to be archived, metadata and formats
• Growing amount of published content
• Metadata is used for archiving and reuse purposes, as well as reporting
• New needs for metadata: improve discoverability and visibility on
8. Automated content analysis projects at Yle
Fall 2016
• Automated content analysis (virtual) team with participants from different parts of Yle
• Improve discoverability on web services (Yle Areena)
• Improve discoverability from archive databases
• New ways to subtitle video content
• Management of raw materials and versions
• Team’s goals were to:
• Learn about AI, machine learning and automatic content analysis methods in
theory and practice
• Carry out pilot projects (PoCs) with some companies
• Find solutions for automated metadata production in practic
Automated metadata generation projects at Yle
9. Case 1
Automatic content analysis of TV programmes (1/2)
Pilot project with Valossa Labs
Goal
• Test and evaluate the quality and
suitability of automatically produced
(descriptive) metadata in Yle’s metadata
production
Tested methods
• Text analysis of subtitles → tagging,
annotation
• Image recognition: object and face
recognition
• OCR of captions
• Automatic segmentation
Automated metadata generation projects at Yle
10. Case 1
Automatic content analysis of TV programmes (2/2)
Results
• Face recognition works well, object recognition is somewhat unreliable and too detailed
• Subtitles could also be used for content analysis
• Automatic segmentation (scenes, inserts) works well
• Test period was too short, no experiences about the learning capabilities of the system
• Speech recognition alongside image recognition would probably be profitable, but the
tested application did not support this feature
Automated metadata generation projects at Yle
11. Case 2
Automatic content analysis of audio content (1/2)
Pilot project with Lingsoft
Goal
• Test and evaluate the quality of speech &
music recognition and automatic
annotation
Tested methods
• Speech recognition → textual data for text
analysis
• Automatic annotation and indexing
• Music recognition (distinguish music from
speech)
Automated metadata generation projects at Yle
12. Case 2
Automatic content analysis of audio content (2/2)
Results
• Quality of the audio and speaker's way to speak have a significant impact
• Accuracy of the transcription is sufficient for annotation → relevant keywords, tags
• Music recognition works fairly well
• Speaker recognition would be useful, but the tested service did not support this feature
Automated metadata generation projects at Yle
13. Case 3
Automatic content analysis of Yle Areena content (1/3)
Pilot project with Qvik, Valossa Labs and Aalto University
Goal
• Improve findability and usability of audio and video content in Yle Areena online service
Three experiments
• Speech recognition: Time-code based transcriptions of audio files
• Image / structure recognition: fast forward opening & closing credits, inserts
• Text analysis: automatic annotation
Yle Areena
content
New functionalities
for the end user
Automatic
content
analysis
Media Metadata
Automated metadata generation projects at Yle
14. Case 3
Automatic content analysis of Yle Areena content (2/3)
Speech-to-text & text analysis
• Time-coded transcription and
automatic annotation of audio and
video content
Results
• Transcriptions were added to Yle
Areena web page, search engines
were able to index contents →
searches to verbal content was
made possible
• Identification of relevant concepts
was successful
Automated metadata generation projects at Yle
15. Case 3
Automatic content analysis of
Yle Areena content (3/3)
Identifying the structure of the content
• Automatic segmentation and identification
of recurrent elements (opening & closing
credits)
• Object recognition
Results
• Recurring elements (based on images) and
topics (based on subtitling) can be
identified → intelligent fast forward is
possible (Demo)
• Object recognition is somewhat unreliable
Automated metadata generation projects at Yle
16. Lessons learned
Define needs, requirements, and goals
• What is needed and who needs
• Costs and benefits
Define how success is measured
• Define how success is measured
• Evaluation criteria
Plan lead-through of projects
• Time and other resources
Cooperation with outside partners
• Ready-made test material packages
Contract and copyrights issues
Share your information
Automated metadata generation projects at Yle
17. On-going projects
Production
• Robot journalism, Voitto-robotti (pilot project)
• Automatic annotation of Yle’s web articles (in production)
Publishing
• Automatic metadata production by speech recognition and
image recognition (PoC)
• Speech recognition in subtitling (PoC)
Consumption / use
• Recommendation for Yle Areena content (in production)
• Yle Uutisvahti application, recommendation engine (in
production)
• Automatic moderation of web discussions (PoC)
• Deduction of customer demographics (in production)
Automated metadata generation projects at Yle
18. Information professionals changing role
What is the role of information professionals in the age of AI?
• Machine’s teacher
• Quality assessor, quality control manager
• Curator and valuer of metadata
• Customer value assessor
• Publisher of (archived) content
New skills are needed
• Comprehension of the methods to assess the opportunities available
• Technical know-how
Information professional and the machine need to coexist
Automated metadata generation projects at Yle