SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Current Trends and Challenges in
Big Data Benchmarking
Kai Sachs - SPEC Research Group
May 2014
© 2014 Kai Sachs. All rights reserved. 2
 Hard- & Software Vendors:
Publish results & marketing
 Example: 27.500 results submitted only for SPEC CPU2006 benchmarks
 Developer:
Analysis & product quality
 Example: Regression performance testing
 Consumer:
Compare different products
 Example: Find the best video card for gaming
 IT Architect:
Cloud & hardware sizing
 Example: Choosing configuration
 Researcher:
 Example: Evaluate own implementation using standardized workload
Benchmark Use Cases & Stakeholders
© 2014 Kai Sachs. All rights reserved. 3
Standard Performance Evaluation Corporation
OSG
Open
Systems
Group
HPG
High
Performance
Group
GWPG
Graphics and
Workstation
Performance
Group
RG
Research
Group
> 80 member organizations & associates
Founded 1988
© 2014 Kai Sachs. All rights reserved. 4
Standard Performance Evaluation Corporation
Development of Industry Standard Benchmarks
OSG
Open
Systems
Group
HPG
High
Performance
Group
GWPG
Graphics and
Workstation
Performance
Group
RG
Research
Group
> 80 member organizations & associates
Founded 1988
CPU, Java,
Virtualization,
Power, …
OpenMP, MPI
…
© 2014 Kai Sachs. All rights reserved. 5
RG
Research
Group
Cloud,
Intrusion
Detection
Systems,
Big Data
Standard Performance Evaluation Corporation
Research Platform
OSG
Open
Systems
Group
HPG
High
Performance
Group
GWPG
Graphics and
Workstation
Performance
Group
> 80 member organizations & associates
Founded 1988
© 2014 Kai Sachs. All rights reserved. 6
 Provide a platform for collaborative research efforts in the areas of
 Computer benchmarking and
 Quantitative system analysis
 Portal for all kinds of benchmarking-related resources
 Provide research benchmarks, tools, metrics and scenarios.
Mission Statement
SPEC Research Group
© 2014 Kai Sachs. All rights reserved. 7
Performance
Performance in a broad sense:
 Classical performance metrics
Example: response time, throughput, scalability,
efficiency, and elasticity
 Non-functional system properties under the term
dependability
Example: availability, reliability, and security
© 2014 Kai Sachs. All rights reserved. 8
Big Data Benchmarking Community (BDBC)
 ‘Incubator’ for Big Data standard benchmark(s) for industry
 >200 members on the mailing list
Workshop on Big Data Benchmarking Series
 2012 in San Jose, CA & in Pune, India, 2013 in San Jose, CA & Xian, China,
2014 in Potsdam, Germany
 Post-proceedings published in LNCS
BDBC is joining the SPEC Research Group
 RG Working group focusing on Big Data in preparation
 Working group chairs: Chaitan Baru, Tillmann Rabl
Towards a Big Data Standard Benchmark
WBDB 2012 Report: Setting the Direction for Big Data Benchmark Standards
C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, T. Rabl, TPCTC: 2012, collocated with VLDB2012
© 2014 Kai Sachs. All rights reserved. 9
Other Benchmark Organizations
Transaction Processing Performance Council (TPC)
 Focus: Transaction Processing and Database Benchmarks
 Most famous benchmarks:
TPC-C (OLTP benchmark), TPC-E (OLTP
benchmark), TPC-H (Decision support benchmark)
Embedded Microprocessor Benchmark Consortium (EEMBC)
 Focus: hardware and software used in embedded systems
Business Applications Performance Corporation (BAPCo)
 Focus: performance benchmarks for personal computers based on
popular computer applications and industry standard operating systems
© 2014 Kai Sachs. All rights reserved. 10
General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP)
Local Arrangements: Matthias Uflacker (Hasso Plattner Institute)
Publicity Chair: Henning Schmitz (SAP Innovations Center)
Publication Chair: Meikel Poess (Oracle)
Program Committee
Milind Bhandarkar (Pivotal)
Anja Bog (SAP Labs)
Dhruba Borthakur (Facebook)
Joos-Hendrik Böse (Amazon)
Tobias Bürger (Payback)
Tyson Condi (UCLA)
Kshitij Doshi (Intel)
Pedro Furtado (U Coimbra)
Bhaskar Gowda (Intel)
Goetz Graefe (HP)
Martin Grund (Exascale)
Alfons Kemper (TU München)
Donald Kossmann (ETH Zürich)
Tim Kraska (Brown University)
Wolfgang Lehner (TU Dresden)
Christof Leng (UC Berkeley)
Stefan Manegold (CWI)
Raghu Nambiar (Cisco)
Manoj K. Nambiar (TCS)
Glenn Paulley (Conestoga Col.)
Keynote Speakers: Umesh Dayal, Alexandru Iosup
Scott Pearson (CLDS Industry Fellow)
Andreas Polze (HPI)
Alexander Reinefeld (HU Berlin)
Berni Schiefer (IBM Labs Toronto)
Saptak Sen (Hortonworks)
Florian Stegmaier (University of Passau)
Till Westmann (Oracle Labs)
Jianfeng Zhan (Chinese Academy of Science)
Platinum Sponsor: Gold Sponsors:
Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)
Benchmark Engineering
© 2014 Kai Sachs. All rights reserved. 12
Past & Present
Past:
 It was common to write a for-loop and call it benchmark.
Present:
 Benchmarks are complex pieces of software and
specifications.
 Benchmark development has turned into a complex team
effort.
© 2014 Kai Sachs. All rights reserved. 13
The Whetstone Benchmark (1974 – 284 lines)
Curnow, H.J., Wichman, B.A. "A Synthetic Benchmark" Computer Journal, Volume 19, Issue 1, Feb. 1976, p. 43-49
© 2014 Kai Sachs. All rights reserved. 14
SPEC CPU Benchmark Suite – Lines of Code
Henning, J. ”SPEC CPU suite growth: an historical perspective” SIGARCH Comput. Archit. News 35, Issue 1, March 2007
© 2014 Kai Sachs. All rights reserved. 15
Example Components of a Standard Benchmark
Workload
Reporter
Run Rules
Implementation &
Framework (opt.)
Documentation
Metrics
BENCHMARK
Workload specification is the most important part
Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark
Kai Sachs, Samuel Kounev, Jean Bacon, Alejandro Buchmann: Performance Evaluation, 2009
Performance Modeling and Benchmarking of Event-Based Systems
Kai Sachs, PhD Thesis, TU Darmstadt, 2010
© 2014 Kai Sachs. All rights reserved. 16
Workload Requirements
Resilience Benchmarking
Marco Vieira, Henrique Madeira, Kai Sachs, Samuel Kounev in Resilience Assessment and Evaluation, Springer, 2012
 Representativeness
 Comprehensiveness
 Focus
 Scalability
 Configurability
© 2014 Kai Sachs. All rights reserved. 17
Workload Description ‘Level’
From TPC-C to Big Data Benchmarks: A Functional Workload Model
Yanpei Chen, Francois Raab, and Randy Katz in Workshop on Big Data Benchmarks, 2012.
Current Trends &
Challenges in Big Data
Benchmarking
© 2014 Kai Sachs. All rights reserved. 19
Current Trends & Challenges in Benchmarking
Technology:
 Virtualization
 Cloud
 (Big) Data
Map Reduce, Mixed Workload (OLAP / OLTP),
Data / Event Streaming, …
Benchmarking methodology:
 Large Scale Systems
Tools:
 Data / workload generator
 Power consumption
 Simulation frameworks
 Generic benchmarking frameworks
Technologies
Tools
Benchmark
Methodologies
© 2014 Kai Sachs. All rights reserved. 20
Current Trends & Challenges in Benchmarking
Technology:
 Virtualization
 Cloud
 (Big) Data
Map Reduce, Mixed Workload (OLAP / OLTP),
Data / Event Streaming, …
Benchmarking methodology:
 Large Scale Systems
Tools:
 Data / workload generator
 Power consumption
 Simulation frameworks
 Generic benchmarking frameworks
Technologies
Tools
Benchmark
Methodologies
© 2014 Kai Sachs. All rights reserved. 21
Benchmark Methodology
System Under Test
Past & Present
 Single node
 Multiple nodes
Isolated systems
© 2014 Kai Sachs. All rights reserved. 22
Benchmark Methodology
System Under Test
http://instagram.com/p/W2FCksR9-e/
St. Peter's Square
2005 vs. 2013
© 2014 Kai Sachs. All rights reserved. 23
Benchmark Methodology
System Under Test
Challenge: Large Scale Systems
 Isolation is not guaranteed (or impossible)
 High number of nodes
 Data amount is very high
 Repeatability is an issue
How can we benchmark such systems?
Technologies
Tools
Benchmark
Methodology
© 2014 Kai Sachs. All rights reserved. 24
“Big Data should be Interesting Data!
There are various definitions of Big Data; most center around a number of
V’s like volume, velocity, variety, veracity – in short: interesting data
(interesting in at least one aspect). However, when you look into research
papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see
here in experimental studies is utterly boring. Performance and scalability
experiments are often based on the TPC-H benchmark: completely
synthetic data with a synthetic workload that has been beaten to death for
the last twenty years. Data quality, data cleaning, and data integration
studies are often based on bibliographic data from DBLP, usually old
versions with less than a million publications, prolific authors, and curated
records. I doubt that this is a real challenge for tasks like entity linkage or
data cleaning. So where’s the – interesting – data in Big Data research?”
Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013
Gerhard Weikum
© 2014 Kai Sachs. All rights reserved. 25
“Big Data should be Interesting Data!
There are various definitions of Big Data; most center around a number of
V’s like volume, velocity, variety, veracity – in short: interesting data
(interesting in at least one aspect). However, when you look into research
papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see
here in experimental studies is utterly boring. Performance and scalability
experiments are often based on the TPC-H benchmark: completely
synthetic data with a synthetic workload that has been beaten to
death for the last twenty years. Data quality, data cleaning, and data
integration studies are often based on bibliographic data from DBLP,
usually old versions with less than a million publications, prolific authors,
and curated records. I doubt that this is a real challenge for tasks like entity
linkage or data cleaning. So where’s the – interesting – data in Big Data
research?”
Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013
Gerhard Weikum
© 2014 Kai Sachs. All rights reserved. 26
Big Data Benchmark:
Issues and Challenges
‘Big Data World’
Communities
Benchmark Design
 Single benchmark vs. Benchmark collection
 Component vs. End-to-end scenario
 Specification vs. Implementation
 Metric
System under Test
Workload
© 2014 Kai Sachs. All rights reserved. 27
Enterprise Warehouse + Agglomeration of other data
 Structured enterprise data warehouse
 Extended to incorporate data from other non-fully structured
data sources (e.g. weblogs, text, streams)
Pool of data with sequence of processing
 Enterprise data processing as a pipeline from data ingestion
to transformation, extraction, subsetting, machine learning,
predictive analytics
 Data from multiple structured and non-structured sources
Abstractions of the Big Data World from WBDB
Introduction to the 4th Workshop on Big Data Benchmarking
Chaitan Baru
© 2014 Kai Sachs. All rights reserved. 28
Scenario:
 Retail domain
Data:
 Structured: based on TPC–DS
 Semi-Structured: click streams
 Unstructured: product reviews
 PDGF used to generate data
BigBench: A Big Data Analytics Benchmark
Data Model
BigBench: Towards an Industry Standard Benchmark for Big Data Analytics
A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013
© 2014 Kai Sachs. All rights reserved. 29
Extended version of parallel data generation framework (PDGF)
Separate review generator
BigBench: A Big Data Analytics Benchmark
Data Generation – Unstructured Data
BigBench: Towards an Industry Standard Benchmark for Big Data Analytics
A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013, to appear
© 2014 Kai Sachs. All rights reserved. 30
An end-to-end data processing pipeline:
 Data from multiple sources
 Loose, flexible schema
 Data requires structuring
Application characteristics
 Processing pipelines
 Running models with data
Deep Analytics Pipeline
Introduction to the 4th Workshop on Big Data Benchmarking
Chaitan Baru
© 2014 Kai Sachs. All rights reserved. 31
Example of an Application:
Determine User Interest Profile by Mining Activities
Scalable distributed inference of dynamic user interests for behavioral targeting
A. Ahmed, Y. Low, M. Aly, V. Josifovski, A.J. Smola, SIGKDD 2011
© 2014 Kai Sachs. All rights reserved. 32
Composite Benchmark for Transactions and Reporting (CBTR)
OLTP & OLAP Benchmark based on Current and Real Enterprise
Order-to-cash Scenario:
18 tables with 5 - 327 columns
2316 columns in sum
Variable Workload Mix
OLTP sub-workload
ST:= {x ∈ ℜ | 0 ≤ x ≤ 1}
OLAP sub-workload
SA = 1 - ST
read-only OLTP queries
SrT:= {x ∈ ℜ | 0 ≤ x ≤ 1}
mixed OLTP queries
SmT = 1 - SrT
S: share
T: transactional | A: analytical
r: read-only | m: mixed
Benchmarking Composite Transaction and Analytical Processing Systems
Anja Bog, PhD Thesis, University of Potsdam, 2012
Interactive Performance Monitoring of a Composite OLTP & OLAP Workload
Anja Bog, Kai Sachs, Hasso Plattner. SIGMOD 2012 (Demo)
Normalization in a Mixed OLTP and OLAP Workload Scenario
Anja Bog, Kai Sachs, Alexander Zeier, Hasso Plattner. TPCTC 2011, collocated with VLDB2011
© 2014 Kai Sachs. All rights reserved. 33
Big Data & Cloud Benchmark
Related Work – Virtualization Benchmarking
© 2014 Kai Sachs. All rights reserved. 34
Big Data & Cloud Benchmark
Related Work – Virtualization Benchmarking
© 2014 Kai Sachs. All rights reserved. 35
Other activities
TPC–BD
 TPC announced a Big Data working group (11.2013)
Graph 500
 Driven by HPC community
 Cooperating with SPEC CPU group
 Green Graph 500 list
SPEC OSG
 Big Data as part of a cloud benchmark
Cloudsuite 2.0, CH-benCHmark, BigDataBench, HiBench,
LinkedBench …
© 2014 Kai Sachs. All rights reserved. 36
Target group
 Researchers & developers
Data categories
 Structured, unstructured and semi-structured; events & streams; graphs;
geospatial, retail, astronomy & genomic; …
Benchmark scenario & metrics
 Realistic use-cases & workload mixes
 Big Data Classification schema
(Research) Standard Benchmarks
 BigBench, Deep Analytics Pipeline, …
Data generation
 Real world traces & synthetic data, tooling
SPEC RG – Big Data Working Group
Potential Topics
Conclusions
© 2014 Kai Sachs. All rights reserved. 38
Conclusions
Benchmarking is more than throughput
Meaningful workloads are most important
© 2014 Kai Sachs. All rights reserved. 39
Conclusions
Benchmarking is more than throughput
Meaningful workloads are most important
More research is needed
 Benchmarking of large scale systems
 “Big Data World”: Workloads & scenarios
 Benchmarks for Big Data
We Don’t Know Enough to make a Big Data
Benchmark Suite
Yanpei Chen, WBDB 2012
Thank you
Contact information:
Kai Sachs
Email: Kai.Sachs@sap.com
Disclaimer:
SPEC, the SPEC logo, the SPEC Research Group logo and the tool and names SERT, SPECjms2007, SPECpower_ssj2008, SPECweb2009 and
SPECvirt_sc2010 are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). Reprint with permission.
© 2014 Kai Sachs. All rights reserved. 41
General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP)
Local Arrangements: Matthias Uflacker (Hasso Plattner Institute)
Publicity Chair: Henning Schmitz (SAP Innovations Center)
Publication Chair: Meikel Poess (Oracle)
Program Committee
Milind Bhandarkar (Pivotal)
Anja Bog (SAP Labs)
Dhruba Borthakur (Facebook)
Joos-Hendrik Böse (Amazon)
Tobias Bürger (Payback)
Tyson Condi (UCLA)
Kshitij Doshi (Intel)
Pedro Furtado (U Coimbra)
Bhaskar Gowda (Intel)
Goetz Graefe (HP)
Martin Grund (Exascale)
Alfons Kemper (TU München)
Donald Kossmann (ETH Zürich)
Tim Kraska (Brown University)
Wolfgang Lehner (TU Dresden)
Christof Leng (UC Berkeley)
Stefan Manegold (CWI)
Raghu Nambiar (Cisco)
Manoj K. Nambiar (TCS)
Glenn Paulley (Conestoga Col.)
Keynote Speakers: Umesh Dayal, Alexandru Iosup
Scott Pearson (CLDS Industry Fellow)
Andreas Polze (HPI)
Alexander Reinefeld (HU Berlin)
Berni Schiefer (IBM Labs Toronto)
Saptak Sen (Hortonworks)
Florian Stegmaier (University of Passau)
Till Westmann (Oracle Labs)
Jianfeng Zhan (Chinese Academy of Science)
Platinum Sponsor: Gold Sponsors:
Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)

Contenu connexe

Tendances

H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling WaterSri Ambati
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceTao Feng
 
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera, Inc.
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxLex Avstreikh
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkDatabricks
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark Summit
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战hdhappy001
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkSpark Summit
 

Tendances (20)

Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
Airflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conferenceAirflow at lyft for Airflow summit 2020 conference
Airflow at lyft for Airflow summit 2020 conference
 
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
Cloudera Federal Forum 2014: The Evolution of Machine Learning from Science t...
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptxDowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
 

En vedette

Influencer Olmanın 11 Altın Kuralı
Influencer Olmanın 11 Altın KuralıInfluencer Olmanın 11 Altın Kuralı
Influencer Olmanın 11 Altın KuralıRenerald
 
"Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol...
"Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol..."Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol...
"Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol...Dubai Quality Group
 
Historic background of ISO standards
Historic background of ISO standardsHistoric background of ISO standards
Historic background of ISO standardsAneel Arshad Ali
 
International organization for standardization
International organization for standardization International organization for standardization
International organization for standardization Chirag Tewari
 
ppt of basic concept of iso 9000 & 14000
ppt of basic concept of iso 9000 & 14000ppt of basic concept of iso 9000 & 14000
ppt of basic concept of iso 9000 & 14000Ayush Upadhyay
 
Başarılı Influencer Marketing Örnekleri
Başarılı Influencer Marketing ÖrnekleriBaşarılı Influencer Marketing Örnekleri
Başarılı Influencer Marketing ÖrnekleriRenerald
 
SuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen Signatur
SuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen SignaturSuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen Signatur
SuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen SignaturTrägerverein SuisseID
 
Calidad líquida. La calidad de los servicios
Calidad líquida. La calidad de los serviciosCalidad líquida. La calidad de los servicios
Calidad líquida. La calidad de los serviciosMindProject
 
Hoja informativa cgt 4 7-2014
Hoja informativa cgt 4 7-2014Hoja informativa cgt 4 7-2014
Hoja informativa cgt 4 7-2014Cgtseat Barcelona
 
Traxess Traveller Eye
Traxess Traveller EyeTraxess Traveller Eye
Traxess Traveller Eyeabustar
 
"e-Patients are Changing Healthcare": Public keynote at NI2016, Geneva
"e-Patients are Changing Healthcare": Public keynote at NI2016, Geneva"e-Patients are Changing Healthcare": Public keynote at NI2016, Geneva
"e-Patients are Changing Healthcare": Public keynote at NI2016, Genevae-Patient Dave deBronkart
 
Keynote capitals india morning note 20 november-12
Keynote capitals india morning note 20 november-12Keynote capitals india morning note 20 november-12
Keynote capitals india morning note 20 november-12Keynote Capitals Ltd.
 
Unifying Search with Performance Media By Jon Myers #SEJSummit
Unifying Search with Performance Media By Jon Myers #SEJSummitUnifying Search with Performance Media By Jon Myers #SEJSummit
Unifying Search with Performance Media By Jon Myers #SEJSummitSearch Engine Journal
 
RenForm Profile Email Low Res[1]
RenForm Profile Email Low Res[1]RenForm Profile Email Low Res[1]
RenForm Profile Email Low Res[1]Erika Hearnshaw
 
Programa final cvei
Programa final cveiPrograma final cvei
Programa final cveiJean Sanchez
 

En vedette (20)

Q2
Q2Q2
Q2
 
Influencer Olmanın 11 Altın Kuralı
Influencer Olmanın 11 Altın KuralıInfluencer Olmanın 11 Altın Kuralı
Influencer Olmanın 11 Altın Kuralı
 
"Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol...
"Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol..."Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol...
"Trends and Strategies for Benchmarking and Best Practice Sharing" by Dr. Hol...
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
 
What is ISO?
What is ISO?What is ISO?
What is ISO?
 
Historic background of ISO standards
Historic background of ISO standardsHistoric background of ISO standards
Historic background of ISO standards
 
International organization for standardization
International organization for standardization International organization for standardization
International organization for standardization
 
ppt of basic concept of iso 9000 & 14000
ppt of basic concept of iso 9000 & 14000ppt of basic concept of iso 9000 & 14000
ppt of basic concept of iso 9000 & 14000
 
Başarılı Influencer Marketing Örnekleri
Başarılı Influencer Marketing ÖrnekleriBaşarılı Influencer Marketing Örnekleri
Başarılı Influencer Marketing Örnekleri
 
SuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen Signatur
SuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen SignaturSuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen Signatur
SuisseID Forum 2014 | Rechtliche Grundlagen der elektronischen Signatur
 
Calidad líquida. La calidad de los servicios
Calidad líquida. La calidad de los serviciosCalidad líquida. La calidad de los servicios
Calidad líquida. La calidad de los servicios
 
Hoja informativa cgt 4 7-2014
Hoja informativa cgt 4 7-2014Hoja informativa cgt 4 7-2014
Hoja informativa cgt 4 7-2014
 
Traxess Traveller Eye
Traxess Traveller EyeTraxess Traveller Eye
Traxess Traveller Eye
 
"e-Patients are Changing Healthcare": Public keynote at NI2016, Geneva
"e-Patients are Changing Healthcare": Public keynote at NI2016, Geneva"e-Patients are Changing Healthcare": Public keynote at NI2016, Geneva
"e-Patients are Changing Healthcare": Public keynote at NI2016, Geneva
 
Keynote capitals india morning note 20 november-12
Keynote capitals india morning note 20 november-12Keynote capitals india morning note 20 november-12
Keynote capitals india morning note 20 november-12
 
Daily Health Update for October 10/16/15 Poway Chiropractor
Daily Health Update for October 10/16/15 Poway Chiropractor Daily Health Update for October 10/16/15 Poway Chiropractor
Daily Health Update for October 10/16/15 Poway Chiropractor
 
Unifying Search with Performance Media By Jon Myers #SEJSummit
Unifying Search with Performance Media By Jon Myers #SEJSummitUnifying Search with Performance Media By Jon Myers #SEJSummit
Unifying Search with Performance Media By Jon Myers #SEJSummit
 
RenForm Profile Email Low Res[1]
RenForm Profile Email Low Res[1]RenForm Profile Email Low Res[1]
RenForm Profile Email Low Res[1]
 
Programa final cvei
Programa final cveiPrograma final cvei
Programa final cvei
 
Comunicación política para gremialistas
Comunicación política para gremialistasComunicación política para gremialistas
Comunicación política para gremialistas
 

Similaire à Current Trends and Challenges in Big Data Benchmarking

Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarksTilmann Rabl
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTurkish Testing Board
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Yellowfin
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
 
MAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdf
MAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdfMAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdf
MAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdfGary Mazzaferro
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDataWorks Summit
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 

Similaire à Current Trends and Challenges in Big Data Benchmarking (20)

Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarks
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
 
Strategic Prototyping.SATURN
Strategic Prototyping.SATURNStrategic Prototyping.SATURN
Strategic Prototyping.SATURN
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil Jadhav
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
MAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdf
MAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdfMAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdf
MAZZ -Bob Towards BIG DATA-RA-AlloyCloud-NIST_BD.pdf
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 

Plus de eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 

Plus de eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 

Dernier

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Dernier (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Current Trends and Challenges in Big Data Benchmarking

  • 1. Current Trends and Challenges in Big Data Benchmarking Kai Sachs - SPEC Research Group May 2014
  • 2. © 2014 Kai Sachs. All rights reserved. 2  Hard- & Software Vendors: Publish results & marketing  Example: 27.500 results submitted only for SPEC CPU2006 benchmarks  Developer: Analysis & product quality  Example: Regression performance testing  Consumer: Compare different products  Example: Find the best video card for gaming  IT Architect: Cloud & hardware sizing  Example: Choosing configuration  Researcher:  Example: Evaluate own implementation using standardized workload Benchmark Use Cases & Stakeholders
  • 3. © 2014 Kai Sachs. All rights reserved. 3 Standard Performance Evaluation Corporation OSG Open Systems Group HPG High Performance Group GWPG Graphics and Workstation Performance Group RG Research Group > 80 member organizations & associates Founded 1988
  • 4. © 2014 Kai Sachs. All rights reserved. 4 Standard Performance Evaluation Corporation Development of Industry Standard Benchmarks OSG Open Systems Group HPG High Performance Group GWPG Graphics and Workstation Performance Group RG Research Group > 80 member organizations & associates Founded 1988 CPU, Java, Virtualization, Power, … OpenMP, MPI …
  • 5. © 2014 Kai Sachs. All rights reserved. 5 RG Research Group Cloud, Intrusion Detection Systems, Big Data Standard Performance Evaluation Corporation Research Platform OSG Open Systems Group HPG High Performance Group GWPG Graphics and Workstation Performance Group > 80 member organizations & associates Founded 1988
  • 6. © 2014 Kai Sachs. All rights reserved. 6  Provide a platform for collaborative research efforts in the areas of  Computer benchmarking and  Quantitative system analysis  Portal for all kinds of benchmarking-related resources  Provide research benchmarks, tools, metrics and scenarios. Mission Statement SPEC Research Group
  • 7. © 2014 Kai Sachs. All rights reserved. 7 Performance Performance in a broad sense:  Classical performance metrics Example: response time, throughput, scalability, efficiency, and elasticity  Non-functional system properties under the term dependability Example: availability, reliability, and security
  • 8. © 2014 Kai Sachs. All rights reserved. 8 Big Data Benchmarking Community (BDBC)  ‘Incubator’ for Big Data standard benchmark(s) for industry  >200 members on the mailing list Workshop on Big Data Benchmarking Series  2012 in San Jose, CA & in Pune, India, 2013 in San Jose, CA & Xian, China, 2014 in Potsdam, Germany  Post-proceedings published in LNCS BDBC is joining the SPEC Research Group  RG Working group focusing on Big Data in preparation  Working group chairs: Chaitan Baru, Tillmann Rabl Towards a Big Data Standard Benchmark WBDB 2012 Report: Setting the Direction for Big Data Benchmark Standards C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, T. Rabl, TPCTC: 2012, collocated with VLDB2012
  • 9. © 2014 Kai Sachs. All rights reserved. 9 Other Benchmark Organizations Transaction Processing Performance Council (TPC)  Focus: Transaction Processing and Database Benchmarks  Most famous benchmarks: TPC-C (OLTP benchmark), TPC-E (OLTP benchmark), TPC-H (Decision support benchmark) Embedded Microprocessor Benchmark Consortium (EEMBC)  Focus: hardware and software used in embedded systems Business Applications Performance Corporation (BAPCo)  Focus: performance benchmarks for personal computers based on popular computer applications and industry standard operating systems
  • 10. © 2014 Kai Sachs. All rights reserved. 10 General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP) Local Arrangements: Matthias Uflacker (Hasso Plattner Institute) Publicity Chair: Henning Schmitz (SAP Innovations Center) Publication Chair: Meikel Poess (Oracle) Program Committee Milind Bhandarkar (Pivotal) Anja Bog (SAP Labs) Dhruba Borthakur (Facebook) Joos-Hendrik Böse (Amazon) Tobias Bürger (Payback) Tyson Condi (UCLA) Kshitij Doshi (Intel) Pedro Furtado (U Coimbra) Bhaskar Gowda (Intel) Goetz Graefe (HP) Martin Grund (Exascale) Alfons Kemper (TU München) Donald Kossmann (ETH Zürich) Tim Kraska (Brown University) Wolfgang Lehner (TU Dresden) Christof Leng (UC Berkeley) Stefan Manegold (CWI) Raghu Nambiar (Cisco) Manoj K. Nambiar (TCS) Glenn Paulley (Conestoga Col.) Keynote Speakers: Umesh Dayal, Alexandru Iosup Scott Pearson (CLDS Industry Fellow) Andreas Polze (HPI) Alexander Reinefeld (HU Berlin) Berni Schiefer (IBM Labs Toronto) Saptak Sen (Hortonworks) Florian Stegmaier (University of Passau) Till Westmann (Oracle Labs) Jianfeng Zhan (Chinese Academy of Science) Platinum Sponsor: Gold Sponsors: Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)
  • 12. © 2014 Kai Sachs. All rights reserved. 12 Past & Present Past:  It was common to write a for-loop and call it benchmark. Present:  Benchmarks are complex pieces of software and specifications.  Benchmark development has turned into a complex team effort.
  • 13. © 2014 Kai Sachs. All rights reserved. 13 The Whetstone Benchmark (1974 – 284 lines) Curnow, H.J., Wichman, B.A. "A Synthetic Benchmark" Computer Journal, Volume 19, Issue 1, Feb. 1976, p. 43-49
  • 14. © 2014 Kai Sachs. All rights reserved. 14 SPEC CPU Benchmark Suite – Lines of Code Henning, J. ”SPEC CPU suite growth: an historical perspective” SIGARCH Comput. Archit. News 35, Issue 1, March 2007
  • 15. © 2014 Kai Sachs. All rights reserved. 15 Example Components of a Standard Benchmark Workload Reporter Run Rules Implementation & Framework (opt.) Documentation Metrics BENCHMARK Workload specification is the most important part Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark Kai Sachs, Samuel Kounev, Jean Bacon, Alejandro Buchmann: Performance Evaluation, 2009 Performance Modeling and Benchmarking of Event-Based Systems Kai Sachs, PhD Thesis, TU Darmstadt, 2010
  • 16. © 2014 Kai Sachs. All rights reserved. 16 Workload Requirements Resilience Benchmarking Marco Vieira, Henrique Madeira, Kai Sachs, Samuel Kounev in Resilience Assessment and Evaluation, Springer, 2012  Representativeness  Comprehensiveness  Focus  Scalability  Configurability
  • 17. © 2014 Kai Sachs. All rights reserved. 17 Workload Description ‘Level’ From TPC-C to Big Data Benchmarks: A Functional Workload Model Yanpei Chen, Francois Raab, and Randy Katz in Workshop on Big Data Benchmarks, 2012.
  • 18. Current Trends & Challenges in Big Data Benchmarking
  • 19. © 2014 Kai Sachs. All rights reserved. 19 Current Trends & Challenges in Benchmarking Technology:  Virtualization  Cloud  (Big) Data Map Reduce, Mixed Workload (OLAP / OLTP), Data / Event Streaming, … Benchmarking methodology:  Large Scale Systems Tools:  Data / workload generator  Power consumption  Simulation frameworks  Generic benchmarking frameworks Technologies Tools Benchmark Methodologies
  • 20. © 2014 Kai Sachs. All rights reserved. 20 Current Trends & Challenges in Benchmarking Technology:  Virtualization  Cloud  (Big) Data Map Reduce, Mixed Workload (OLAP / OLTP), Data / Event Streaming, … Benchmarking methodology:  Large Scale Systems Tools:  Data / workload generator  Power consumption  Simulation frameworks  Generic benchmarking frameworks Technologies Tools Benchmark Methodologies
  • 21. © 2014 Kai Sachs. All rights reserved. 21 Benchmark Methodology System Under Test Past & Present  Single node  Multiple nodes Isolated systems
  • 22. © 2014 Kai Sachs. All rights reserved. 22 Benchmark Methodology System Under Test http://instagram.com/p/W2FCksR9-e/ St. Peter's Square 2005 vs. 2013
  • 23. © 2014 Kai Sachs. All rights reserved. 23 Benchmark Methodology System Under Test Challenge: Large Scale Systems  Isolation is not guaranteed (or impossible)  High number of nodes  Data amount is very high  Repeatability is an issue How can we benchmark such systems? Technologies Tools Benchmark Methodology
  • 24. © 2014 Kai Sachs. All rights reserved. 24 “Big Data should be Interesting Data! There are various definitions of Big Data; most center around a number of V’s like volume, velocity, variety, veracity – in short: interesting data (interesting in at least one aspect). However, when you look into research papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see here in experimental studies is utterly boring. Performance and scalability experiments are often based on the TPC-H benchmark: completely synthetic data with a synthetic workload that has been beaten to death for the last twenty years. Data quality, data cleaning, and data integration studies are often based on bibliographic data from DBLP, usually old versions with less than a million publications, prolific authors, and curated records. I doubt that this is a real challenge for tasks like entity linkage or data cleaning. So where’s the – interesting – data in Big Data research?” Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013 Gerhard Weikum
  • 25. © 2014 Kai Sachs. All rights reserved. 25 “Big Data should be Interesting Data! There are various definitions of Big Data; most center around a number of V’s like volume, velocity, variety, veracity – in short: interesting data (interesting in at least one aspect). However, when you look into research papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see here in experimental studies is utterly boring. Performance and scalability experiments are often based on the TPC-H benchmark: completely synthetic data with a synthetic workload that has been beaten to death for the last twenty years. Data quality, data cleaning, and data integration studies are often based on bibliographic data from DBLP, usually old versions with less than a million publications, prolific authors, and curated records. I doubt that this is a real challenge for tasks like entity linkage or data cleaning. So where’s the – interesting – data in Big Data research?” Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013 Gerhard Weikum
  • 26. © 2014 Kai Sachs. All rights reserved. 26 Big Data Benchmark: Issues and Challenges ‘Big Data World’ Communities Benchmark Design  Single benchmark vs. Benchmark collection  Component vs. End-to-end scenario  Specification vs. Implementation  Metric System under Test Workload
  • 27. © 2014 Kai Sachs. All rights reserved. 27 Enterprise Warehouse + Agglomeration of other data  Structured enterprise data warehouse  Extended to incorporate data from other non-fully structured data sources (e.g. weblogs, text, streams) Pool of data with sequence of processing  Enterprise data processing as a pipeline from data ingestion to transformation, extraction, subsetting, machine learning, predictive analytics  Data from multiple structured and non-structured sources Abstractions of the Big Data World from WBDB Introduction to the 4th Workshop on Big Data Benchmarking Chaitan Baru
  • 28. © 2014 Kai Sachs. All rights reserved. 28 Scenario:  Retail domain Data:  Structured: based on TPC–DS  Semi-Structured: click streams  Unstructured: product reviews  PDGF used to generate data BigBench: A Big Data Analytics Benchmark Data Model BigBench: Towards an Industry Standard Benchmark for Big Data Analytics A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013
  • 29. © 2014 Kai Sachs. All rights reserved. 29 Extended version of parallel data generation framework (PDGF) Separate review generator BigBench: A Big Data Analytics Benchmark Data Generation – Unstructured Data BigBench: Towards an Industry Standard Benchmark for Big Data Analytics A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013, to appear
  • 30. © 2014 Kai Sachs. All rights reserved. 30 An end-to-end data processing pipeline:  Data from multiple sources  Loose, flexible schema  Data requires structuring Application characteristics  Processing pipelines  Running models with data Deep Analytics Pipeline Introduction to the 4th Workshop on Big Data Benchmarking Chaitan Baru
  • 31. © 2014 Kai Sachs. All rights reserved. 31 Example of an Application: Determine User Interest Profile by Mining Activities Scalable distributed inference of dynamic user interests for behavioral targeting A. Ahmed, Y. Low, M. Aly, V. Josifovski, A.J. Smola, SIGKDD 2011
  • 32. © 2014 Kai Sachs. All rights reserved. 32 Composite Benchmark for Transactions and Reporting (CBTR) OLTP & OLAP Benchmark based on Current and Real Enterprise Order-to-cash Scenario: 18 tables with 5 - 327 columns 2316 columns in sum Variable Workload Mix OLTP sub-workload ST:= {x ∈ ℜ | 0 ≤ x ≤ 1} OLAP sub-workload SA = 1 - ST read-only OLTP queries SrT:= {x ∈ ℜ | 0 ≤ x ≤ 1} mixed OLTP queries SmT = 1 - SrT S: share T: transactional | A: analytical r: read-only | m: mixed Benchmarking Composite Transaction and Analytical Processing Systems Anja Bog, PhD Thesis, University of Potsdam, 2012 Interactive Performance Monitoring of a Composite OLTP & OLAP Workload Anja Bog, Kai Sachs, Hasso Plattner. SIGMOD 2012 (Demo) Normalization in a Mixed OLTP and OLAP Workload Scenario Anja Bog, Kai Sachs, Alexander Zeier, Hasso Plattner. TPCTC 2011, collocated with VLDB2011
  • 33. © 2014 Kai Sachs. All rights reserved. 33 Big Data & Cloud Benchmark Related Work – Virtualization Benchmarking
  • 34. © 2014 Kai Sachs. All rights reserved. 34 Big Data & Cloud Benchmark Related Work – Virtualization Benchmarking
  • 35. © 2014 Kai Sachs. All rights reserved. 35 Other activities TPC–BD  TPC announced a Big Data working group (11.2013) Graph 500  Driven by HPC community  Cooperating with SPEC CPU group  Green Graph 500 list SPEC OSG  Big Data as part of a cloud benchmark Cloudsuite 2.0, CH-benCHmark, BigDataBench, HiBench, LinkedBench …
  • 36. © 2014 Kai Sachs. All rights reserved. 36 Target group  Researchers & developers Data categories  Structured, unstructured and semi-structured; events & streams; graphs; geospatial, retail, astronomy & genomic; … Benchmark scenario & metrics  Realistic use-cases & workload mixes  Big Data Classification schema (Research) Standard Benchmarks  BigBench, Deep Analytics Pipeline, … Data generation  Real world traces & synthetic data, tooling SPEC RG – Big Data Working Group Potential Topics
  • 38. © 2014 Kai Sachs. All rights reserved. 38 Conclusions Benchmarking is more than throughput Meaningful workloads are most important
  • 39. © 2014 Kai Sachs. All rights reserved. 39 Conclusions Benchmarking is more than throughput Meaningful workloads are most important More research is needed  Benchmarking of large scale systems  “Big Data World”: Workloads & scenarios  Benchmarks for Big Data We Don’t Know Enough to make a Big Data Benchmark Suite Yanpei Chen, WBDB 2012
  • 40. Thank you Contact information: Kai Sachs Email: Kai.Sachs@sap.com Disclaimer: SPEC, the SPEC logo, the SPEC Research Group logo and the tool and names SERT, SPECjms2007, SPECpower_ssj2008, SPECweb2009 and SPECvirt_sc2010 are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). Reprint with permission.
  • 41. © 2014 Kai Sachs. All rights reserved. 41 General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP) Local Arrangements: Matthias Uflacker (Hasso Plattner Institute) Publicity Chair: Henning Schmitz (SAP Innovations Center) Publication Chair: Meikel Poess (Oracle) Program Committee Milind Bhandarkar (Pivotal) Anja Bog (SAP Labs) Dhruba Borthakur (Facebook) Joos-Hendrik Böse (Amazon) Tobias Bürger (Payback) Tyson Condi (UCLA) Kshitij Doshi (Intel) Pedro Furtado (U Coimbra) Bhaskar Gowda (Intel) Goetz Graefe (HP) Martin Grund (Exascale) Alfons Kemper (TU München) Donald Kossmann (ETH Zürich) Tim Kraska (Brown University) Wolfgang Lehner (TU Dresden) Christof Leng (UC Berkeley) Stefan Manegold (CWI) Raghu Nambiar (Cisco) Manoj K. Nambiar (TCS) Glenn Paulley (Conestoga Col.) Keynote Speakers: Umesh Dayal, Alexandru Iosup Scott Pearson (CLDS Industry Fellow) Andreas Polze (HPI) Alexander Reinefeld (HU Berlin) Berni Schiefer (IBM Labs Toronto) Saptak Sen (Hortonworks) Florian Stegmaier (University of Passau) Till Westmann (Oracle Labs) Jianfeng Zhan (Chinese Academy of Science) Platinum Sponsor: Gold Sponsors: Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)