SlideShare a Scribd company logo
1 of 13
Recommender systems
evaluation: a 3D benchmark
  Alan Said1, Domonkos Tikk2, Yue
     Shi3, Martha Larson3, Klára
     Stumpf2, Paolo Cremonesi4

1: TU Berlin
2: Gravity R&D
3: TU Delft
4: Politecnico di Milano/Moviri
Motivation
• Current recsys evaluation benchmarks are
  insufficient
  – mostly focused on IR measures (RMSE,
    MAP@X, precision/recall)
  – does not consider the need of all stakeholders
    (users, content provider, recsys vendor)
  – technological and business requirements are
    mostly overlooked
• 3D Recommender System Benchmarking
  Model
Stakeholders




users


                      content of service
                          provider
        recommender
The Proposed 3D model
Recent benchmarks (1)

• pros:
  – Large scale
  – very well organized
• cons:
  – qualitative assessment of recommendation:
    simplified to RMSE
  – rating prediction (not ranking)
  – no focus on direct business and technical
    parameters (scalability, robustness, reactivity)
Recent benchmarks (2)


• pros:
  – constraints on training and response time
  – real traffic (only planned)
  – major driver: revenue increase
• cons:
  – only business goals, but otherwise unclear
    optimization criteria
  – user needs are neglected
  – organization
Recent Benchmarks (3)


• pros:
  – availability of additional metadata (compared to
    KDD Cup 2011)
  – not rating based (implicit feedback)
  – ranking based evaluation metric (MAP@500)
• cons:
  – offline evaluation
  – size does not matter anymore (lower interest)
  – no business requirements or technical constraint
3D MODEL
User requirements
• functional (quality-related)
  – relevant, interesting, novel, diverse,
    serendipitious, context-aware, ethical, etc.
• non-functional (technology related)
  – real-time
  – usability-related
Business requirements
• Business model
  – for-profit: revenue stream
  – NP-style: award driven (reputation,
    community building)
• KPI depends on the application area
  – Revenue increase
  – CTR
  – Raise awarness to content or service
Technical constraints
• data driven
  – availability of user feedback (e.g. satellite TV)
• system driven
  – hardware/software limitations (device-
    dependent)
• scalability
  – typical response time
• robustness
Example
• VoD recommendation scenario (TV)
  – user: easy contect exploration, context-
    awareness (time, viewer identification)
  – business: increase VoD sales & awareness
    (user base)
  – technical: middleware, HW/SW of the
    provider, response time
Facit
• Recommendation tasks have many aspects
  typically overlooked
• Tasks define the important user, business,
  and technical quality measures
  – the fulfilment of all is required at a certain level
  – trade-off is usually required
• Proposal: with our 3D evaluation concept
  more comprehensive evaluation can be
  achieved

More Related Content

Viewers also liked

Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
 
Recommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsRecommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsDomonkos Tikk
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Domonkos Tikk
 
MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterSimon Dooms
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DDomonkos Tikk
 

Viewers also liked (6)

Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scale
 
Recommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsRecommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspects
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...
 
MovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitterMovieTweetings: a movie rating dataset collected from twitter
MovieTweetings: a movie rating dataset collected from twitter
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
 

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

Downloads abc 2006 presentation downloads-ramesh_babu
Downloads abc 2006   presentation downloads-ramesh_babuDownloads abc 2006   presentation downloads-ramesh_babu
Downloads abc 2006 presentation downloads-ramesh_babuHem Rana
 
10 - Project Management
10 - Project Management10 - Project Management
10 - Project ManagementRaymond Gao
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Agile DevOps Transformation Strategy
Agile DevOps Transformation StrategyAgile DevOps Transformation Strategy
Agile DevOps Transformation StrategySatish Nath
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notesSiva Ayyakutti
 
Module 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdfModule 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdfMASantos15
 
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...Evergreen Systems
 
01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.ppt01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.pptiqbal051663
 
Se lect11 btech
Se lect11 btechSe lect11 btech
Se lect11 btechIIITA
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life CycleSrujanaMerugu1
 
Feasibility Study - Management PPT Slides
Feasibility Study  - Management PPT SlidesFeasibility Study  - Management PPT Slides
Feasibility Study - Management PPT SlidesNusaike Mufthie
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project ManagementShauryaGupta38
 
Requirements Gathering And Management
Requirements Gathering And ManagementRequirements Gathering And Management
Requirements Gathering And ManagementAlan McSweeney
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3Azhar Shaik
 
City universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitschCity universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitschalanreitsch
 
ASUG Utilities Presentation
ASUG Utilities PresentationASUG Utilities Presentation
ASUG Utilities PresentationMichael Robinson
 

Similar to Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012 (20)

Downloads abc 2006 presentation downloads-ramesh_babu
Downloads abc 2006   presentation downloads-ramesh_babuDownloads abc 2006   presentation downloads-ramesh_babu
Downloads abc 2006 presentation downloads-ramesh_babu
 
10 - Project Management
10 - Project Management10 - Project Management
10 - Project Management
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Agile DevOps Transformation Strategy
Agile DevOps Transformation StrategyAgile DevOps Transformation Strategy
Agile DevOps Transformation Strategy
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
 
Module 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdfModule 6 - Systems Planning bak.pptx.pdf
Module 6 - Systems Planning bak.pptx.pdf
 
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
Leveraging IT Service Catalog to Transform Services Delivery - Argonne Nation...
 
01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.ppt01. Developing Business _ IT Solutions 2011.ppt
01. Developing Business _ IT Solutions 2011.ppt
 
Se lect11 btech
Se lect11 btechSe lect11 btech
Se lect11 btech
 
PMI Presentation2
PMI Presentation2PMI Presentation2
PMI Presentation2
 
ML Application Life Cycle
ML Application Life CycleML Application Life Cycle
ML Application Life Cycle
 
Feasibility Study - Management PPT Slides
Feasibility Study  - Management PPT SlidesFeasibility Study  - Management PPT Slides
Feasibility Study - Management PPT Slides
 
Software Project Management
Software Project ManagementSoftware Project Management
Software Project Management
 
Requirements Gathering And Management
Requirements Gathering And ManagementRequirements Gathering And Management
Requirements Gathering And Management
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
 
Chap01
Chap01Chap01
Chap01
 
City universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitschCity universitylondon devprocess_g_a_reitsch
City universitylondon devprocess_g_a_reitsch
 
ASUG Utilities Presentation
ASUG Utilities PresentationASUG Utilities Presentation
ASUG Utilities Presentation
 
Dpbok context i
Dpbok   context iDpbok   context i
Dpbok context i
 
Soft requirement
Soft requirementSoft requirement
Soft requirement
 

More from Domonkos Tikk

General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsDomonkos Tikk
 
Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Domonkos Tikk
 
Idomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwIdomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwDomonkos Tikk
 
Big Data in Online Classifieds
Big Data in Online ClassifiedsBig Data in Online Classifieds
Big Data in Online ClassifiedsDomonkos Tikk
 
Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Domonkos Tikk
 
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Domonkos Tikk
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Domonkos Tikk
 
From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...Domonkos Tikk
 

More from Domonkos Tikk (8)

General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendations
 
Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment)
 
Idomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwIdomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fw
 
Big Data in Online Classifieds
Big Data in Online ClassifiedsBig Data in Online Classifieds
Big Data in Online Classifieds
 
Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...
 
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...
 
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
Fast ALS-Based Tensor Factorization for Context-Aware Recommendation from Imp...
 
From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...
 

Recently uploaded

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Recently uploaded (20)

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 workshop at ACM Recsys 2012

  • 1. Recommender systems evaluation: a 3D benchmark Alan Said1, Domonkos Tikk2, Yue Shi3, Martha Larson3, Klára Stumpf2, Paolo Cremonesi4 1: TU Berlin 2: Gravity R&D 3: TU Delft 4: Politecnico di Milano/Moviri
  • 2. Motivation • Current recsys evaluation benchmarks are insufficient – mostly focused on IR measures (RMSE, MAP@X, precision/recall) – does not consider the need of all stakeholders (users, content provider, recsys vendor) – technological and business requirements are mostly overlooked • 3D Recommender System Benchmarking Model
  • 3. Stakeholders users content of service provider recommender
  • 5. Recent benchmarks (1) • pros: – Large scale – very well organized • cons: – qualitative assessment of recommendation: simplified to RMSE – rating prediction (not ranking) – no focus on direct business and technical parameters (scalability, robustness, reactivity)
  • 6. Recent benchmarks (2) • pros: – constraints on training and response time – real traffic (only planned) – major driver: revenue increase • cons: – only business goals, but otherwise unclear optimization criteria – user needs are neglected – organization
  • 7. Recent Benchmarks (3) • pros: – availability of additional metadata (compared to KDD Cup 2011) – not rating based (implicit feedback) – ranking based evaluation metric (MAP@500) • cons: – offline evaluation – size does not matter anymore (lower interest) – no business requirements or technical constraint
  • 9. User requirements • functional (quality-related) – relevant, interesting, novel, diverse, serendipitious, context-aware, ethical, etc. • non-functional (technology related) – real-time – usability-related
  • 10. Business requirements • Business model – for-profit: revenue stream – NP-style: award driven (reputation, community building) • KPI depends on the application area – Revenue increase – CTR – Raise awarness to content or service
  • 11. Technical constraints • data driven – availability of user feedback (e.g. satellite TV) • system driven – hardware/software limitations (device- dependent) • scalability – typical response time • robustness
  • 12. Example • VoD recommendation scenario (TV) – user: easy contect exploration, context- awareness (time, viewer identification) – business: increase VoD sales & awareness (user base) – technical: middleware, HW/SW of the provider, response time
  • 13. Facit • Recommendation tasks have many aspects typically overlooked • Tasks define the important user, business, and technical quality measures – the fulfilment of all is required at a certain level – trade-off is usually required • Proposal: with our 3D evaluation concept more comprehensive evaluation can be achieved