Automatic metadata generation in the Finnish broadcasting company
1. Identifying the business cases for automatic
metadata in the Finnish broadcasting company
Kim Viljanen, Elina Selkälä
The Finnish Broadcasting Company
2. Get to know Yle in 120 seconds
Yle – Your Story
• Founded 1926
• CEO Merja Ylä-Anttila
• Four television channels and three channel slots
• Six nationwide radio channels, three radio
services and varied web services
• Present all around in Finland, in 25 areas
• 18 Finnish, 5 Swedish and one Sami speaking
news rooms
• 10 regional television news broadcasts
• 2,811 permanent employees (12/2018)
• 90 % of employees create content and
programmes
• The company is 99.9 percent state-owned and
operates under the Act on Yleisradio Oy.
• Yle is financed by a tax paid by both individuals
and companies.
3. Yle - The AI Company
Planning Production Publication Consume
- Robot Journalist Voitto:
Automatic news content
generation IN PRODUCTION
- Automated tagging for
online texts IN PRODUCTION
- Image analysis and tagging,
stills library POC*
- On demand speech
recognition tool POC*
- Automatically created
transcriptions as an aid in
archive database searches
POC*
- Automatic metadata
generation: speech & image
recognition, text analysis &
automatic annotation etc.
POC*
- Automatic content analysis of
Yle Areena publications POC
- Speech recognition in
subtitling POC
- Audio description POC
- Music identification POC
- ASR based content
recommendation, Yle Areena IN
PRODUCTION
- News Watch App content
recommendation, Yle News IN
PRODUCTION
- Deduction of customer
demographics IN PRODUCTION
- Automatic moderation of web
discussions POC
- 360 view to content
and users with
metadata
- Editor’s assistant
Onnibot predicts
article’s performance
IN PRODUCTION
Functions done or assisted by AI (* Done at the Archives)
4. content analysis Metadata
new functionalities
for the end user
What’s next?
Company-wide automatic metadata processor
Analyse every content item at the right phase of the process
6. Vision: The Metadata Machine
All audiovisual content is automatically analysed as early as possible
Content creation
(raw material)
Procurement
(ready-made content)
Publishing
(published content)
Archiving
(what do we have?)
Automatic content analysis engine
Speech
recognition
Image
recognition
Person identifier Fingerprinting
Sound
identifying
Video frame
color analysis
Music identifier
Text analysis
Company-wide metadata database on all content items
Language
identifier
...
7. A growing and fast moving market of
automatic metadata extractors (AME)
- Cloud companies
- Companies focusing on one or several extractors
- AME orchestration companies
- Media product vendors (e.g. MAMs) that
incorporate AME as part of the service
- ...
Differentiators between service providers:
- What metadata extractors do each provide?
- Focus only on media business?
- Onsite vs. cloud
- Ready-made vs. tailored
- Pricing model
- Ease of integration into supply chain
- Quality of metadata results
- Ability to train Machine Learning models
- Speed of developing their products and services
- ...
- Speech recognition
- Face recognition
- Optical character
recognition (OCR)
- Sound detection
- Language detection
- Visual object detection
- Landmark detection
- Logo detection
- Automatic translation
- ...
8. Do we have use cases
for metadata that
contains errors?
9. Metadata machine - Proof of concept project spring
2019
1. Buy a metadata machine (Graymeta Curio)
2. Involve as many teams as possible around the company to identify and test their
business cases for the Metadata machine.
3. Run lots of Yle content through the machine, extract as much as possible metadata
4. Run a wide variety of Yle content to test the capabilities of the extractors
5. Collect the results from the teams, identify the most prominent business cases
6. Final verdict: is the combined benefits bigger than the required investment
test round 1 test round 2 test round 3
analysis &
next steps
10. A production ready platform that
powers automated metadata collection
using best of breed and custom
machine learning services.
11. Identifying the business cases for automatic metadata
100+ ideas 10+ proof of concepts 1+ to production
12. How to evaluate individual ideas?
What kind of metadata
does it require?
Does it improve existing
processes?
Does it enable something
completely new?
How much money or time
does it save?
How much does it increase
customer satisfaction?
Is the technology solution
available today?
What are the direct and
indirect costs involved?
How to optimize the costs?
How does it affect the
surrounding production
process / way of working?
How to combine human work
with automation?
What are the success criterias
/ KPIs … ?
...
13. Use case: sport (editing)
- The need: speed up making a video
compilation around a specific topic.
- Test case: Make a compilation about the
Finnish athlete Iivo Niskanen.
- Material: All Yle content about the Seefeld
competitions 2019
- Extractors: faces, OCR, speech recognition,
…
- Possible next step: EU leader identification
(to speed up editing of reports from EU
meetings in Helsinki later this year)
14. Use case: Content ingest and processing
- automatic slate identification
- black and silence detection
- end credit detection
- …
- Tools to automate quality
checking of incoming
material. Is this the media we
ordered?
15. Use case: Understanding the content for Archive
and Analytics
- “All” metadata potentially useful
for archive and analytics use.
- face recognition
- speech recognition
- visual description
- main topics
- natural language processing
- contains music?
- …
- In addition to having lots of
metadata, it is important to make
the data easy to use and view.
16. Use case: Spoken language detection
- The need: Identification of what languages are spoken in which parts of a program.
Important information for multiple teams inside Yle, e.g. the translation
department.
- Current commercial services (to our knowledge) can identify the main language of
the whole media, but not individual language segments inside the media.
17. Three horizons of automatic metadata
Horizon 1
Improve core business
Horizon 2
New opportunities
Horizon 3
Visionaries
The possible and in
production at Yle
The possible, but not yet in
production at Yle.
The impossible. Not near to
production yet.
Speech recognition Finnish
(in limited production from 2017).
The Metadata machine project
(and other projects)
The MeMAD project
How to improve the existing? What can/should we buy now?
What are the business cases?
How does the future look like?
How to co-operate with ML
researchers/visionaries?
Time
18. MeMAD project has received funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 780069. This presentation has been produced by theMeMAD
project. The content in this presentation represents the views of the authors, and the European
Commission has no liability in respect of the content.
- Three year research project in the Horizon 2020 program.
- Started in 2018, ends in end of 2020.
- Four universities, four companies: Aalto University, University of
Helsinki, University of Surrey, Eurecom, INA, Limecraft, Lingsoft, Yle
- MeMAD is about:
- methods for efficient re-use and re-purposing of multilingual audiovisual content for
video management and digital storytelling in broadcasting and media production.
- combines automatic efficiency with human accuracy.
- produces a rich description of moving images, speech and audio.
- www.memad.eu
The MeMAD project shortly
19. Example: Multimodal translation
Image caption translation
● WMT 2018 multimodal translation task
● EN text + image → DE/FR text (skipped Czech)
Speech-to-text translation
● IWSLT 2018, English audio (TED talks) to
German text
● Is the translation more accurate if part of the
speech recognition system itself?
20. Example: Automatic captioning of video image
● Currently: create a human readable
natural language description of what
is happening in each shot.
● Towards automatic recognition of the
narrational structure of a shot (and
the whole program).
● MeMAD project / Aalto University
● Based on deep neural network
features and LSTM language model
21. Potential use case: Automatic Audio Description
● “Steven”is a producer in charge of
delivering audio descriptions for
documentaries.
Thanks to automatically generated audio
descriptions, that are reviewed and
corrected manually, Steven can deliver
audio descriptions to end users for a
smaller budget, enabling more content to
be audio described. (UC4)
Auto®
22. Outcome so far (work in progress)
- Better understanding on:
- our needs
- what can be solved with current ML and AME technologies
- the limitations of the technology - and how to work around the problems
- (what should not be solved with ML and AME technologies)
- what is available in the market and what is not; different ways to buy services
- how to work with automatic metadata companies and the academics
- how to share data for ML research (legal, technical, process, …)
- how to integrate a Metadata machine (Graymeta Curio) to Yle systems
- how to combine human metadata work with automated metadata
- …
- We are starting to understand the impacts and new requirements on our processes,
human skills, ...
- Company wide commitment and involvement ⇒ AME relevant to many departments!
- More practical attempt on AME than ever! We are moving from visions to reality.
23. Lessons learned
- The technology is tempting, but identifying the business cases is difficult and
requires lots of work. ⇒ Learning what are the realistic expectations for AME.
- Current off the shelf automatic metadata extraction services work and provide
value ⇒ Work in progress to estimate the business value for individual cases.
- Tuning the settings of “ready to use” ML services require time and skills. ⇒ Impact
on future skill requirements for Yle personnel.
- Machine learning requires a lot of teaching data. Preferably pairs of data (e.g. two
language pairs) ⇒ The “huge” Yle archive turns out to be relatively small and limited
from ML point of view.
- Taking a company wide perspective on automatic metadata seems to work for the
time being ― instead of solving each business case as an individual project.
- Accept the imperfect! Start now!
24. The future
- Analyse the whole archive? Where to start? How to optimize the costs?
- Legal and privacy impacts. Can our material become too easy to find? (e.g. face
recognition)
- Continue following the markets and technologies
- Deciding on the next steps:
- What should we implement right away?
- What should we test more?
- What should we research more?
- Who should we do co-operation with in this area?