SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Leveraging Hadoop in Wikimart
             Roman Zykov
          Head of analytics
          http://wikimart.ru




London, Big Data World Europe, 20th September 2012
Key problem


To be or not to be….

Hadoop

Introduction
Key tasks for Wikimart

What
• BI tasks
• Web analytics (in-house solution)
• Recommendations on site
• Data services for marketing

Who
• Core analytics team
• Analytics members in other departments
• IT site operations
Problem

Too time consuming or too
expensive?
• Data volume
• # of data services
Map Reduce



       Standalone


DATA


       Map Reduce
Our idea

New platform for “Big Data” tasks only

• Start research on Map Reduce software
• First patient - recommendation engine

Difficulties
- no planned budget ->   Hadoop is free
- no experts        ->   learn it
- no hardware       ->   virtual cluster
Requirements for Hadoop


•   Easy scalable
•   Easy deployment
•   Easy integration
•   Less low level Java coding
•   SQL-like querries
Data flow




DWH
         Data feeds
Accomplishments

Recommendations
• Collaborative filtering (item-to-item on browsing history, PIG)
• Similar products (items attributes, PIG)
• Most popular items (browsing history + orders, HiveQL)
• Internal and external search recommendations (HiveQL)

Some statistics after 1 year
• >10% of revenue
• 3 months to launch
• Tens of gigabytes are processed 2 hours daily
• 1 crash only (cluster lost power)

Decision: Invest to Hardware cluster
End user

Internal high-level languages
• HiveQL
• Pig

Reporting
• Pre-aggregated data for OLAP
• RDBMS - front end
• OLAP and Reporting software should
  support HiveQL
Data Integration

• SQOOP
  • Parallel data exchange with RDBMS
    (MS SQL, MySQL, Oracle, Teradata… )
  • Incremental updates
  • HDFS, Hive, HBASE

• Talend Open Studio
Hadoop vs RDBMS

• Never replace RDBMS:
   • Latency
   • Weak capabilities of HiveQL vs SQL
• Only some tasks with offline processing:
   • Machine learning
   • Queries to Big tables
   • ….
• Real time: NOSQL
Hadoop myth


      Terabytes?
      Petabytes?

      Big tasks!
Conclusion

• Hadoop is not Rocket Science
• Intermediate data can be Big Data

Starter kit
• Hadoop management system
• Virtual hardware (cloud, virtual servers, etc)
• Offline data tasks
• Pig or HiveQL
• Sqoop: import data from existing data sources
Thank you!!!

     rzykov@gmail.com
linkedin.com/in/romanzykov

Contenu connexe

Plus de Roman Zykov

Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)Roman Zykov
 
Wikimart recommendations
Wikimart recommendationsWikimart recommendations
Wikimart recommendationsRoman Zykov
 
Hadoop implementation in Wikimart
Hadoop implementation in WikimartHadoop implementation in Wikimart
Hadoop implementation in WikimartRoman Zykov
 
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetricsGoogle Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetricsRoman Zykov
 
MIPhT presentation about BI
MIPhT presentation about BIMIPhT presentation about BI
MIPhT presentation about BIRoman Zykov
 
Owox rzykov kp_iexamples
Owox rzykov kp_iexamplesOwox rzykov kp_iexamples
Owox rzykov kp_iexamplesRoman Zykov
 
Roman zykovcertificates
Roman zykovcertificatesRoman zykovcertificates
Roman zykovcertificatesRoman Zykov
 
Wpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approachWpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approachRoman Zykov
 
Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02Roman Zykov
 
Metrics drivendesign
Metrics drivendesignMetrics drivendesign
Metrics drivendesignRoman Zykov
 
Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4Roman Zykov
 
Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3Roman Zykov
 
Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2Roman Zykov
 
Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1Roman Zykov
 
Roman Zykov Certificates
Roman Zykov CertificatesRoman Zykov Certificates
Roman Zykov CertificatesRoman Zykov
 
Связной клуб
Связной клубСвязной клуб
Связной клубRoman Zykov
 
Complete Ga Power User Web
Complete Ga Power User WebComplete Ga Power User Web
Complete Ga Power User WebRoman Zykov
 
RIW2009 Анализ продвижения
RIW2009 Анализ продвиженияRIW2009 Анализ продвижения
RIW2009 Анализ продвиженияRoman Zykov
 

Plus de Roman Zykov (20)

Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)
 
Wikimart recommendations
Wikimart recommendationsWikimart recommendations
Wikimart recommendations
 
Hadoop implementation in Wikimart
Hadoop implementation in WikimartHadoop implementation in Wikimart
Hadoop implementation in Wikimart
 
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetricsGoogle Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
 
MIPhT presentation about BI
MIPhT presentation about BIMIPhT presentation about BI
MIPhT presentation about BI
 
Owox rzykov kp_iexamples
Owox rzykov kp_iexamplesOwox rzykov kp_iexamples
Owox rzykov kp_iexamples
 
Owox rzykov
Owox rzykovOwox rzykov
Owox rzykov
 
Roman zykovcertificates
Roman zykovcertificatesRoman zykovcertificates
Roman zykovcertificates
 
Wpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approachWpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approach
 
Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02
 
Metrics drivendesign
Metrics drivendesignMetrics drivendesign
Metrics drivendesign
 
E-commerce KPIs
E-commerce KPIsE-commerce KPIs
E-commerce KPIs
 
Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4
 
Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3
 
Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2
 
Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1
 
Roman Zykov Certificates
Roman Zykov CertificatesRoman Zykov Certificates
Roman Zykov Certificates
 
Связной клуб
Связной клубСвязной клуб
Связной клуб
 
Complete Ga Power User Web
Complete Ga Power User WebComplete Ga Power User Web
Complete Ga Power User Web
 
RIW2009 Анализ продвижения
RIW2009 Анализ продвиженияRIW2009 Анализ продвижения
RIW2009 Анализ продвижения
 

Dernier

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Leveraging Hadoop to mine customer insights in a developing market

  • 1. Leveraging Hadoop in Wikimart Roman Zykov Head of analytics http://wikimart.ru London, Big Data World Europe, 20th September 2012
  • 2. Key problem To be or not to be…. Hadoop Introduction
  • 3. Key tasks for Wikimart What • BI tasks • Web analytics (in-house solution) • Recommendations on site • Data services for marketing Who • Core analytics team • Analytics members in other departments • IT site operations
  • 4. Problem Too time consuming or too expensive? • Data volume • # of data services
  • 5. Map Reduce Standalone DATA Map Reduce
  • 6. Our idea New platform for “Big Data” tasks only • Start research on Map Reduce software • First patient - recommendation engine Difficulties - no planned budget -> Hadoop is free - no experts -> learn it - no hardware -> virtual cluster
  • 7. Requirements for Hadoop • Easy scalable • Easy deployment • Easy integration • Less low level Java coding • SQL-like querries
  • 8. Data flow DWH Data feeds
  • 9. Accomplishments Recommendations • Collaborative filtering (item-to-item on browsing history, PIG) • Similar products (items attributes, PIG) • Most popular items (browsing history + orders, HiveQL) • Internal and external search recommendations (HiveQL) Some statistics after 1 year • >10% of revenue • 3 months to launch • Tens of gigabytes are processed 2 hours daily • 1 crash only (cluster lost power) Decision: Invest to Hardware cluster
  • 10. End user Internal high-level languages • HiveQL • Pig Reporting • Pre-aggregated data for OLAP • RDBMS - front end • OLAP and Reporting software should support HiveQL
  • 11. Data Integration • SQOOP • Parallel data exchange with RDBMS (MS SQL, MySQL, Oracle, Teradata… ) • Incremental updates • HDFS, Hive, HBASE • Talend Open Studio
  • 12. Hadoop vs RDBMS • Never replace RDBMS: • Latency • Weak capabilities of HiveQL vs SQL • Only some tasks with offline processing: • Machine learning • Queries to Big tables • …. • Real time: NOSQL
  • 13. Hadoop myth Terabytes? Petabytes? Big tasks!
  • 14. Conclusion • Hadoop is not Rocket Science • Intermediate data can be Big Data Starter kit • Hadoop management system • Virtual hardware (cloud, virtual servers, etc) • Offline data tasks • Pig or HiveQL • Sqoop: import data from existing data sources
  • 15. Thank you!!! rzykov@gmail.com linkedin.com/in/romanzykov