SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Demo - Smart City Use-case
Using ODPi Hadoop, Spark, H2O and Sparkling water
Ganesh Raju
ENGINEERS AND DEVICES
WORKING TOGETHER
● Simplify & standardize big data ecosystem with a common reference
specification and test suites.
● Reduces cost and complexity and accelerates the development of Big Data
solutions.
● Cross-compatibility between different distributions of Hadoop and big data
technologies
● Has two stacks: Runtime and Operations
● V2.0 alpha release coming soon
● Linaro is a member of ODPi
www.odpi.org
ODPi
ENGINEERS AND DEVICES
WORKING TOGETHER
● Distributed and fast in-memory data processing engine
● Provides development APIs to efficiently execute iterative streaming, machine
learning or SQL workloads
● Spark was developed as an alternative approach to Map Reduce with easy of
use in mind.
● Code in Java, Scala, or Python.
Spark
ENGINEERS AND DEVICES
WORKING TOGETHER
● H2O is a in-memory user friendly machine learning API
● Compatible with Hadoop and Spark
● Spark + H2O is Sparkling Water
● Sparkling Water allows to combine fast & scalable machine learning algorithms
of H2O with high performance distributed processing capabilities of Spark
engine.
● Spark’s RDD and DataFrame and H2O’s H2OFrame are interoperable
● Users can utilize H2O Flow UI to drive Scala / R / Python computation from
Spark
H2O Sparkling Water
ENGINEERS AND DEVICES
WORKING TOGETHER
● Utilizing ODPi v1 based Native Hadoop, Spark, H2O Sparkling Water, H2O flow.
● All Compiled on ARM - ODPi Hadoop 2.7, Spark 1.6 with Scala 2.10 (Scala 2.11 is
not supported with SparklingWater)
● 3 node cluster running on Linaro Developer Cloud - HP MoonShot machines
● Dataset files stored in HDFS.
● Spark utilizing Yarn for Resource manager.
● H2O Sparkling water utilizing Spark as execution Engine.
● H2O Flow utilizing Spark SQL API and scala code
● .csv data -> HDFS -> Spark RDD -> H2O H2OFrame
https://wiki.linaro.org/LEG/Engineering/BigData
Demo
Benchmarking Big Data
Ganesh Raju and Naresh Bhat
ENGINEERS AND DEVICES
WORKING TOGETHER
● Various Benchmarking Tools
● Types of Benchmarks and standards
● Challenges of BigData benchmarking on ARM
● Some of the tools that we will be covering are TPC (Transaction Processing
Performance Council) based TPCx-HS, TPC-DS, TPC-H benchmark, HiBench
(TestDFSIO), Spark-Bench for Apache Spark, MRBench for Mapreduce,
NNBench for HDFS...etc
Abstract
ENGINEERS AND DEVICES
WORKING TOGETHER
● Measure performance and scale
● Simulate higher load
○ Find bottlenecks/limits
● Evaluate different hardware/software
○ OS, Java, VM.
○ Hadoop, Spark, Pig, Hive..
● Validate reliability
● Validate assumptions / Configurations
● Compare two different deployments
● Performance tuning
Why Benchmarking ..?
ENGINEERS AND DEVICES
WORKING TOGETHER
Challenges of BigData benchmarking
● System Diversity
○ Variety of Solutions - Data Read, I/O, Streaming, Data warehousing,
Machine Learning
● Rapid Data Evolution - Velocity.
● System and Data Scale
● System Complexity
○ Multiple pipelines (layers of Transformations)
ENGINEERS AND DEVICES
WORKING TOGETHER
Types of benchmarks and standards
● Micro benchmarks: To evaluate specific lower-level, system operations
○ E.g. Hadoop Workload Examples (sort, grep, wordcount and Terasort,
Gridmix, Pigmix), HiBench, HDFS DFSIO, AMP Lab Big Data Benchmark
● Functional/Component benchmarks: Specific to low level function
○ E.g. Basic SQL queries (select, join, etc.,)
○ Synthetic benchmarks
● Application level
○ Bigbench
○ Spark bench
ENGINEERS AND DEVICES
WORKING TOGETHER
Benchmark Efforts -
Microbenchmarks
Workloads Software
Stacks
Metrics
HiBench Sort, WordCount, TeraSort, PageRank, K-means, Bayes
classification, Index
Hadoop
and Hive
Execution
Time,
Throughput,
resource
utilization
DFSIO Generate, read, write, append, and remove data for
MapReduce jobs
Hadoop Execution
Time,
Throughput
AMPLab benchmark Part of CALDA workloads (scan, aggregate and join) and
PageRank
Hive, Tez Execution
Time
ENGINEERS AND DEVICES
WORKING TOGETHER
Benchmark
Efforts - TPC
Workloads Software
Stacks
Metrics
TPCx-HS HSGen, HSData, Check, HSSort and HSValidate Hadoop Performance,
price and energy
TPC-H Datawarehousing operations Hive, Pig Execution Time,
Throughput
TPC-DS Decision support benchmark
Data loading, queries and maintenance
Hive, Pig Execution Time,
Throughput
ENGINEERS AND DEVICES
WORKING TOGETHER
Benchmark
Efforts -
Synthetic
Workloads Software Stacks Metrics
SWIM Synthetic user generated MapReduce jobs of reading,
writing, shuffling and sorting
Hadoop Multiple metrics
GridMix Synthetic and basic operations to stress test job
scheduler and compression and decompression
Hadoop Memory,
Execution Time,
Throughput
PigMix 17 Pig specific queries Hadoop, Pig Execution Time
MRBench MapReduce benchmark as a complementary to TeraSort
- Datawarehouse operations with 22 TPC-H queries
Hadoop Execution Time
NNBench and
NNBenchWithO
utMR
Load testing namenode and HDFS I/O with small
payloads
Hadoop I/O
SparkBench CPU, memory and shuffle and IO intensive workloads.
Machine Learning, Streaming, Graph Computation and
SQL Workloads
Spark Execution Time,
Data process
rate
BigBench Interactive-based queries based on synthetic data Hadoop, Spark Execution Time
ENGINEERS AND DEVICES
WORKING TOGETHER
Benchmark
Efforts
Workloads Software Stacks Metrics
BigDataBench 1. Micro Benchmarks (sort, grep, WordCount);
2. Search engine workloads (index, PageRank);
3. Social network workloads (connected components (CC),
K-means and BFS);
4. E-commerce site workloads (Relational database queries
(select, aggregate and join), collaborative filtering (CF) and
Naive Bayes;
5. Multimedia analytics workloads (Speech Recognition, Ray
Tracing, Image Segmentation, Face Detection);
6. Bioinformatics workloads
Hadoop,
DBMSs, NoSQL
systems, Hive,
Impala, Hbase,
MPI, Libc, and
other real-time
analytics
systems
Throughput,
Memory, CPU
(MIPS, MPKI -
Misses per
instruction)
ENGINEERS AND DEVICES
WORKING TOGETHER
Hadoop benchmark and Test tool
● Hadoop distribution comes with a number of benchmarks
● TestDFSIO, nnbench, mrbench are in hadoop-*test*.jar
● TeraGen, TeraSort, TeraValidate are in hadoop-*examples*.jar
● You can check it using the command
$ cd /usr/local/hadoop
$ bin/hadoop jar hadoop-*test*.jar
$ bin/hadoop jar hadoop-*examples*.jar
● While running the benchmarks you might want to use time command which
measure the elapsed time. This saves you the hassle of navigating to the
hadoop JobTracker interface. The relevant metric is real value in the first row.
$ time hadoop jar hadoop-*examples*.jar ...
[...]
real 9m15.510s
user 0m7.075s
sys 0m0.584s
ENGINEERS AND DEVICES
WORKING TOGETHER
TeraGen, TeraSort and TeraValidate
● This is a most well known Hadoop benchmark
● The TeraSort is to sort the data as fast as possible
● This test suite combines HDFS and mapreduce layers of a hadoop cluster
● TeraSort benchmark consists of 3 steps
○ Generate input via TeraGen
○ Run TeraSort on input data
○ Validate sorted output data via TeraValidate
https://wiki.linaro.org/LEG/Engineering/BigData/HadoopBuildInstallAndRunGuide
ENGINEERS AND DEVICES
WORKING TOGETHER
HiBench
● Contains 9 typical Hadoop and Spark workloads (including micro benchmarks, HDFS benchmarks,
web search benchmarks, machine learning benchmarks using Mahout, and data analytics
benchmarks)
● Sort, WordCount, TeraSort, TestDFSIO, Nutch indexing (search indexing using Nutch engine),
PageRank (An implementation of Google’s Web page ranking algorithm), hivebench
● Uses zlib compression for input and output
● Metrics: Time (sec) & Throughput (Bytes/Sec), Memory partitions, parallelism,
● Cons: Lack of AARCH bits, Lack of documentations
https://wiki.linaro.org/LEG/Engineering/BigData/HiBench
ENGINEERS AND DEVICES
WORKING TOGETHER
TestDFSIO
● It is part of hadoop-mapreduce-client-jobclient.jar
● Stress test I/O performance (throughput and latency) on a clustered setup.
● This test will shake out the hardware, OS and Hadoop setup on your cluster
machines (NameNode/DataNode)
● The tests are run as a MapReduce job using 1:1 mapping (1 map / file)
● Helpful to discover performance bottlenecks in your network
● Benchmark write test followed up with read test
● Use -write for write tests and -read for read tests.
● The results stored in TestDFSIO_results.log. Use -resFile to choose different file
name
ENGINEERS AND DEVICES
WORKING TOGETHER
Hive Testbench
● Based on TPC-H and TPC-DS benchmarks
● Experiment Apache Hive at any data scale
● Contains data generator and set of queries
● Test the basic Hive performance on large data sets
https://wiki.linaro.org/LEG/Engineering/BigData/HiveTestBench
ENGINEERS AND DEVICES
WORKING TOGETHER
MR(Map Reduce) Benchmark for MR
● Loops a small job number of times
● Checks whether small job runs are responsive and running efficiently on your
cluster
● Puts focus on MapReduce layer as its impact on the HDFS layer is very limited
● The multiple parallel MRBench issue is resolved. Hence you can run it from
different boxes
● Test command to run 50 small test jobs
$ hadoop jar hadoop-*test*.jar mrbench -numRuns 50
● Exemplary output, which means in 31 sec the job finished
DataLines Maps Reduces AvgTime (milliseconds)
1 2 1 31414
ENGINEERS AND DEVICES
WORKING TOGETHER
NNBench and NNBenchWithoutMR
● Load testing NameNode through continuous read, write, rename and delete
operations on small files
● Stress tests HDFS (I/O)
● To increase stress, multiple instances of NNBenchWithoutMR can be run
simultaneously from several machines or increase map tasks for NNBench
● All write tests are run then followed by read tests
● The test command: The below command will run a NameNode benchmark that
creates 1000 files using 12 maps and 6 reducers.
$ hadoop jar hadoop-*test*.jar nnbench -operation create_write 
-maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 
-replicationFactorPerFile 3 -readFileAfterOpen true 
-baseDir /benchmarks/NNBench-`hostname -s`
ENGINEERS AND DEVICES
WORKING TOGETHER
TPC Benchmark
● TPCx-HS - https://wiki.linaro.org/LEG/Engineering/BigData/TPCxHS
○ Currently facing problems with cluster shell configuration
● TPC-H
○ TPC-H benchmark focuses on ad-hoc queries
● TPC-DS
○ “the” standard benchmark for decision support
● TPC-C
○ Is an on-line transaction processing (OLTP) benchmark
ENGINEERS AND DEVICES
WORKING TOGETHER
TPCx-HS Benchmark
X: Express, H: Hadoop, S: Sort
The TPCx-HS kit contains
● TPCx-HS specification documentation
● TPCx-HS User's guide documentation
● Scripts to run benchmarks
● Java code to execute the benchmark load
TPCx-HS Execution
● A valid run consists of 5 separate phases run sequentially with overlap in their execution
● The benchmark test consists of 2 runs (Run with lower and higher TPCx-HS Performance Metric)
● No configuration or tuning changes or reboot are allowed between the two runs
ENGINEERS AND DEVICES
WORKING TOGETHER
TPC vs SPEC models
TPC model
● Specification based
● Performance, Price, energy in one
benchmark
● End-to-End
● Multiple tests (ACID, Load)
● Independent Review
● Full disclosure
● TPC Technology conference
SPEC model
● Kit based
● Performance and energy in
separate benchmarks
● Server centric
● Single test
● Summary disclosure
● SPEC research group ICPE
ENGINEERS AND DEVICES
WORKING TOGETHER
BigBench
● BigBench is a joint effort with partners in industry and academia on creating a comprehensive
and standardized BigData benchmark.
● BigBench builds upon and borrows elements from existing benchmarking efforts (such as
TPC-xHS, GridMix, PigMix, HiBench, Big Data Benchmark, YCSB and TPC-DS).
● BigBench is a specification-based benchmark with an open-source reference implementation
kit.
● As a specification-based benchmark, it would be technology-agnostic and provide the
necessary formalism and flexibility to support multiple implementations.
● Focused around execution time calculation
● Consists of 30 queries/workloads (10 of them are from TPC)
● Drawback - it is structured-data-intensive
ENGINEERS AND DEVICES
WORKING TOGETHER
Spark Bench for Apache Spark
● Build on ARM works
● FAIL: When spark bench examples are run, a KILL signal is observed which
terminates all workers.
● This is still under investigation as there are no useful logs to debug. No proper
error description and lack of documentation is a challenge.
● A ticket is already filed on spark bench git which is unresolved.
● Con: Lack of documentation.
ENGINEERS AND DEVICES
WORKING TOGETHER
GridMix
● Mix of Synthetic Mapreduce jobs (sorting text data and SequenceFiles)
● Evaluate MapReduce and HDFS performance
● The input file needs to be in JSON format
● Jobs can be either LOADJOB (trace of history logs using Rumen) or SLEEPJOB (A synthetic job where
each task does *nothing* but sleep for a certain duration)
● Jobs can be run in STRESS, REPLAY or SERIAL mode
● You can emulate number of users, number of job queries and resource usage (CPU, memory, JVM
heap)
● Basic command line usage: (Provided as part of hadoop command)
$ hadoop gridmix [-generate <size>] [-users <users-list>] <iopath> <trace>
● Con: Challenging to explore the performance impact of combining or separating workloads, e.g.,
through consolidating from many clusters.
ENGINEERS AND DEVICES
WORKING TOGETHER
PigMix
● PigMix is a set of queries used test Apache Pig performance
● There are queries that test latency (How long it takes to run this query ?)
● Queries that test scalability (How many fields or records can ping handle before
it fails ?)
● Usage: Run the below commands from pig home
ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy (generate test dataset)
ant -Dharness.hadoop.home=$HADOOP_HOME pigmix (run the PigMix benchmark)
ENGINEERS AND DEVICES
WORKING TOGETHER
SWIM(Statistical Workload Injector for MapReduce)
● Enables rigorous performance measurement of MapReduce systems
● Contains suites of workloads of thousands of jobs, with complex data, arrival,
and computation patterns
● Informs both highly targeted, workload specific optimizations
● Highly recommended for MapReduce operators
● Performance measurement
https://github.com/SWIMProjectUCB/SWIM/wiki/Performance-measurement-by-ex
ecuting-synthetic-or-historical-workloads
ENGINEERS AND DEVICES
WORKING TOGETHER
AmpLab
● The Big Data Benchmark from AMPLab, UC Berkeley provides quantitative and qualitative
comparisons of five systems
○ Redshift – a hosted MPP database offered by Amazon.com based on the ParAc
warehouse
○ Hive – a Hadoop-based data warehousing system
○ Shark – a Hive-compatible SQL engine which runs on top of the Spark computing framework
○ Impala – a Hive-compatible* SQL engine with its own MPP-like execution engine
○ Stinger/Tez – Tez is a next generation Hadoop execution engine used in Spark
● This benchmark measures response time on a handful of relational queries: scans, aggregations, joins,
and UDF’s, across different data sizes.
ENGINEERS AND DEVICES
WORKING TOGETHER
BigDataBench
BigDataBench is a benchmark suite for scale-out workloads, different from SPEC
CPU (sequential workloads), and PARSEC (multithreaded workloads). Currently, it
simulates five typical and important big data applications: search engine, social
network, e-commerce, multimedia data analytics, and bioinformatics.
Includes 15 real-world data sets, and 34 big data workloads.
ENGINEERS
AND DEVICES
WORKING
TOGETHER
References
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-21.pdf
Terasort, TestDFSIO, NNBench, MRBench
https://wiki.linaro.org/LEG/Engineering/BigData
https://wiki.linaro.org/LEG/Engineering/BigData/HadoopTuningGuide
https://wiki.linaro.org/LEG/Engineering/BigData/HadoopBuildInstallAndRunGuide
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasor
t-testdfsio-nnbench-mrbench/
GridMix3, PigMix, HiBench, TPCx-HS, SWIM, AMPLab, BigBench
https://hadoop.apache.org/docs/current/hadoop-gridmix/GridMix.html
https://cwiki.apache.org/confluence/display/PIG/PigMix
https://wiki.linaro.org/LEG/Engineering/BigData/HiBench
https://wiki.linaro.org/LEG/Engineering/BigData/TPCxHS
https://github.com/SWIMProjectUCB/SWIM/wiki
https://github.com/amplab
https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench
Thank you
ganesh.raju@linaro.org
naresh.bhat@linaro.org
#LAS16
For further information: www.linaro.org
LAS16 keynotes and videos on: connect.linaro.org

Contenu connexe

Tendances

LAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoTLAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoTLinaro
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLinaro
 
LAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLinaro
 
LAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLinaro
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLinaro
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationLinaro
 
BKK16-106 ODP Project Update
BKK16-106 ODP Project UpdateBKK16-106 ODP Project Update
BKK16-106 ODP Project UpdateLinaro
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...Linaro
 
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and ApproachesBUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and ApproachesLinaro
 
LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101Linaro
 
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by HisiliconLAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by HisiliconLinaro
 
BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 Linaro
 
HKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewHKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewLinaro
 
BKK16-213 Where's the Hardware?
BKK16-213 Where's the Hardware?BKK16-213 Where's the Hardware?
BKK16-213 Where's the Hardware?Linaro
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...Rogue Wave Software
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304Linaro
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopLinaro
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLinaro
 

Tendances (20)

LAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoTLAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoT
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android N
 
LAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development Lifecycle
 
LAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96Boards
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
 
BKK16-106 ODP Project Update
BKK16-106 ODP Project UpdateBKK16-106 ODP Project Update
BKK16-106 ODP Project Update
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and ApproachesBUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
 
LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101LAS16-TR02: Upstreaming 101
LAS16-TR02: Upstreaming 101
 
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by HisiliconLAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
LAS16-310: Introducing the first 96Boards TV Platform: Poplar by Hisilicon
 
BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64
 
HKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewHKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overview
 
BKK16-213 Where's the Hardware?
BKK16-213 Where's the Hardware?BKK16-213 Where's the Hardware?
BKK16-213 Where's the Hardware?
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
 
Las16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need itLas16 200 - firmware summit - ras what is it- why do we need it
Las16 200 - firmware summit - ras what is it- why do we need it
 

En vedette

Big data, iot &amp; smart city
Big data, iot &amp; smart cityBig data, iot &amp; smart city
Big data, iot &amp; smart cityimran2017
 
BKK16-400A LuvOS and ACPI Compliance Testing
BKK16-400A LuvOS and ACPI Compliance TestingBKK16-400A LuvOS and ACPI Compliance Testing
BKK16-400A LuvOS and ACPI Compliance TestingLinaro
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model databaseMahdi Atawneh
 
Towards smart city making government data work with big data analysis
Towards smart city making government data work with big data analysisTowards smart city making government data work with big data analysis
Towards smart city making government data work with big data analysisCharles Mok
 
LAS16-300K2: Geoff Thorpe - IoT Zephyr
LAS16-300K2: Geoff Thorpe - IoT ZephyrLAS16-300K2: Geoff Thorpe - IoT Zephyr
LAS16-300K2: Geoff Thorpe - IoT ZephyrShovan Sargunam
 
LAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLinaro
 
Chapter 3 introduction to the smart city concept, AUST 2015
Chapter 3 introduction to the smart city concept, AUST 2015Chapter 3 introduction to the smart city concept, AUST 2015
Chapter 3 introduction to the smart city concept, AUST 2015Isam Shahrour
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLELinaro
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals哲豪 康哲豪
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart CityKoltiva
 
Connected IO: Smart Cities
Connected IO: Smart CitiesConnected IO: Smart Cities
Connected IO: Smart Citiesdanielpwardmbd
 
Next Generation Intelligent Transportation: Solutions for Smart Cities
Next Generation Intelligent Transportation: Solutions for Smart CitiesNext Generation Intelligent Transportation: Solutions for Smart Cities
Next Generation Intelligent Transportation: Solutions for Smart CitiesUGPTI
 
Transforming Governments in the Cloud
Transforming Governments in the CloudTransforming Governments in the Cloud
Transforming Governments in the CloudAmazon Web Services
 
Innovative Approaches for Smart City Development
Innovative Approaches for Smart City DevelopmentInnovative Approaches for Smart City Development
Innovative Approaches for Smart City DevelopmentMartin Venzky-Stalling
 
Overcoming the cybersecurity challenges of smart cities
Overcoming the cybersecurity challenges of smart citiesOvercoming the cybersecurity challenges of smart cities
Overcoming the cybersecurity challenges of smart citiesSaeed Al Dhaheri
 
151116 smart city furniture trends
151116    smart city furniture trends151116    smart city furniture trends
151116 smart city furniture trendsYANG DESIGN
 
Global City Teams Challenge Overview
Global City Teams Challenge OverviewGlobal City Teams Challenge Overview
Global City Teams Challenge OverviewInternet of Things DC
 

En vedette (20)

Big data, iot &amp; smart city
Big data, iot &amp; smart cityBig data, iot &amp; smart city
Big data, iot &amp; smart city
 
BKK16-400A LuvOS and ACPI Compliance Testing
BKK16-400A LuvOS and ACPI Compliance TestingBKK16-400A LuvOS and ACPI Compliance Testing
BKK16-400A LuvOS and ACPI Compliance Testing
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
Towards smart city making government data work with big data analysis
Towards smart city making government data work with big data analysisTowards smart city making government data work with big data analysis
Towards smart city making government data work with big data analysis
 
Smart city
Smart citySmart city
Smart city
 
LAS16-300K2: Geoff Thorpe - IoT Zephyr
LAS16-300K2: Geoff Thorpe - IoT ZephyrLAS16-300K2: Geoff Thorpe - IoT Zephyr
LAS16-300K2: Geoff Thorpe - IoT Zephyr
 
LAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in AndroidLAS16-307: Benchmarking Schedutil in Android
LAS16-307: Benchmarking Schedutil in Android
 
Chapter 3 introduction to the smart city concept, AUST 2015
Chapter 3 introduction to the smart city concept, AUST 2015Chapter 3 introduction to the smart city concept, AUST 2015
Chapter 3 introduction to the smart city concept, AUST 2015
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
 
Connected IO: Smart Cities
Connected IO: Smart CitiesConnected IO: Smart Cities
Connected IO: Smart Cities
 
Next Generation Intelligent Transportation: Solutions for Smart Cities
Next Generation Intelligent Transportation: Solutions for Smart CitiesNext Generation Intelligent Transportation: Solutions for Smart Cities
Next Generation Intelligent Transportation: Solutions for Smart Cities
 
Lightscene 2016: Defining the Smart City
Lightscene 2016: Defining the Smart CityLightscene 2016: Defining the Smart City
Lightscene 2016: Defining the Smart City
 
Transforming Governments in the Cloud
Transforming Governments in the CloudTransforming Governments in the Cloud
Transforming Governments in the Cloud
 
Innovative Approaches for Smart City Development
Innovative Approaches for Smart City DevelopmentInnovative Approaches for Smart City Development
Innovative Approaches for Smart City Development
 
Overcoming the cybersecurity challenges of smart cities
Overcoming the cybersecurity challenges of smart citiesOvercoming the cybersecurity challenges of smart cities
Overcoming the cybersecurity challenges of smart cities
 
151116 smart city furniture trends
151116    smart city furniture trends151116    smart city furniture trends
151116 smart city furniture trends
 
Global City Teams Challenge Overview
Global City Teams Challenge OverviewGlobal City Teams Challenge Overview
Global City Teams Challenge Overview
 

Similaire à LAS16-305: Smart City Big Data Visualization on 96Boards

Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkAlex Zeltov
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkManish Gupta
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityJAYAPRAKASH JPINFOTECH
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 

Similaire à LAS16-305: Smart City Big Data Visualization on 96Boards (20)

Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
Ml2
Ml2Ml2
Ml2
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 

Plus de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Plus de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Dernier

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

LAS16-305: Smart City Big Data Visualization on 96Boards

  • 1. Demo - Smart City Use-case Using ODPi Hadoop, Spark, H2O and Sparkling water Ganesh Raju
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER ● Simplify & standardize big data ecosystem with a common reference specification and test suites. ● Reduces cost and complexity and accelerates the development of Big Data solutions. ● Cross-compatibility between different distributions of Hadoop and big data technologies ● Has two stacks: Runtime and Operations ● V2.0 alpha release coming soon ● Linaro is a member of ODPi www.odpi.org ODPi
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER ● Distributed and fast in-memory data processing engine ● Provides development APIs to efficiently execute iterative streaming, machine learning or SQL workloads ● Spark was developed as an alternative approach to Map Reduce with easy of use in mind. ● Code in Java, Scala, or Python. Spark
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER ● H2O is a in-memory user friendly machine learning API ● Compatible with Hadoop and Spark ● Spark + H2O is Sparkling Water ● Sparkling Water allows to combine fast & scalable machine learning algorithms of H2O with high performance distributed processing capabilities of Spark engine. ● Spark’s RDD and DataFrame and H2O’s H2OFrame are interoperable ● Users can utilize H2O Flow UI to drive Scala / R / Python computation from Spark H2O Sparkling Water
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER ● Utilizing ODPi v1 based Native Hadoop, Spark, H2O Sparkling Water, H2O flow. ● All Compiled on ARM - ODPi Hadoop 2.7, Spark 1.6 with Scala 2.10 (Scala 2.11 is not supported with SparklingWater) ● 3 node cluster running on Linaro Developer Cloud - HP MoonShot machines ● Dataset files stored in HDFS. ● Spark utilizing Yarn for Resource manager. ● H2O Sparkling water utilizing Spark as execution Engine. ● H2O Flow utilizing Spark SQL API and scala code ● .csv data -> HDFS -> Spark RDD -> H2O H2OFrame https://wiki.linaro.org/LEG/Engineering/BigData Demo
  • 6. Benchmarking Big Data Ganesh Raju and Naresh Bhat
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER ● Various Benchmarking Tools ● Types of Benchmarks and standards ● Challenges of BigData benchmarking on ARM ● Some of the tools that we will be covering are TPC (Transaction Processing Performance Council) based TPCx-HS, TPC-DS, TPC-H benchmark, HiBench (TestDFSIO), Spark-Bench for Apache Spark, MRBench for Mapreduce, NNBench for HDFS...etc Abstract
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER ● Measure performance and scale ● Simulate higher load ○ Find bottlenecks/limits ● Evaluate different hardware/software ○ OS, Java, VM. ○ Hadoop, Spark, Pig, Hive.. ● Validate reliability ● Validate assumptions / Configurations ● Compare two different deployments ● Performance tuning Why Benchmarking ..?
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Challenges of BigData benchmarking ● System Diversity ○ Variety of Solutions - Data Read, I/O, Streaming, Data warehousing, Machine Learning ● Rapid Data Evolution - Velocity. ● System and Data Scale ● System Complexity ○ Multiple pipelines (layers of Transformations)
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Types of benchmarks and standards ● Micro benchmarks: To evaluate specific lower-level, system operations ○ E.g. Hadoop Workload Examples (sort, grep, wordcount and Terasort, Gridmix, Pigmix), HiBench, HDFS DFSIO, AMP Lab Big Data Benchmark ● Functional/Component benchmarks: Specific to low level function ○ E.g. Basic SQL queries (select, join, etc.,) ○ Synthetic benchmarks ● Application level ○ Bigbench ○ Spark bench
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER Benchmark Efforts - Microbenchmarks Workloads Software Stacks Metrics HiBench Sort, WordCount, TeraSort, PageRank, K-means, Bayes classification, Index Hadoop and Hive Execution Time, Throughput, resource utilization DFSIO Generate, read, write, append, and remove data for MapReduce jobs Hadoop Execution Time, Throughput AMPLab benchmark Part of CALDA workloads (scan, aggregate and join) and PageRank Hive, Tez Execution Time
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Benchmark Efforts - TPC Workloads Software Stacks Metrics TPCx-HS HSGen, HSData, Check, HSSort and HSValidate Hadoop Performance, price and energy TPC-H Datawarehousing operations Hive, Pig Execution Time, Throughput TPC-DS Decision support benchmark Data loading, queries and maintenance Hive, Pig Execution Time, Throughput
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Benchmark Efforts - Synthetic Workloads Software Stacks Metrics SWIM Synthetic user generated MapReduce jobs of reading, writing, shuffling and sorting Hadoop Multiple metrics GridMix Synthetic and basic operations to stress test job scheduler and compression and decompression Hadoop Memory, Execution Time, Throughput PigMix 17 Pig specific queries Hadoop, Pig Execution Time MRBench MapReduce benchmark as a complementary to TeraSort - Datawarehouse operations with 22 TPC-H queries Hadoop Execution Time NNBench and NNBenchWithO utMR Load testing namenode and HDFS I/O with small payloads Hadoop I/O SparkBench CPU, memory and shuffle and IO intensive workloads. Machine Learning, Streaming, Graph Computation and SQL Workloads Spark Execution Time, Data process rate BigBench Interactive-based queries based on synthetic data Hadoop, Spark Execution Time
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER Benchmark Efforts Workloads Software Stacks Metrics BigDataBench 1. Micro Benchmarks (sort, grep, WordCount); 2. Search engine workloads (index, PageRank); 3. Social network workloads (connected components (CC), K-means and BFS); 4. E-commerce site workloads (Relational database queries (select, aggregate and join), collaborative filtering (CF) and Naive Bayes; 5. Multimedia analytics workloads (Speech Recognition, Ray Tracing, Image Segmentation, Face Detection); 6. Bioinformatics workloads Hadoop, DBMSs, NoSQL systems, Hive, Impala, Hbase, MPI, Libc, and other real-time analytics systems Throughput, Memory, CPU (MIPS, MPKI - Misses per instruction)
  • 15. ENGINEERS AND DEVICES WORKING TOGETHER Hadoop benchmark and Test tool ● Hadoop distribution comes with a number of benchmarks ● TestDFSIO, nnbench, mrbench are in hadoop-*test*.jar ● TeraGen, TeraSort, TeraValidate are in hadoop-*examples*.jar ● You can check it using the command $ cd /usr/local/hadoop $ bin/hadoop jar hadoop-*test*.jar $ bin/hadoop jar hadoop-*examples*.jar ● While running the benchmarks you might want to use time command which measure the elapsed time. This saves you the hassle of navigating to the hadoop JobTracker interface. The relevant metric is real value in the first row. $ time hadoop jar hadoop-*examples*.jar ... [...] real 9m15.510s user 0m7.075s sys 0m0.584s
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER TeraGen, TeraSort and TeraValidate ● This is a most well known Hadoop benchmark ● The TeraSort is to sort the data as fast as possible ● This test suite combines HDFS and mapreduce layers of a hadoop cluster ● TeraSort benchmark consists of 3 steps ○ Generate input via TeraGen ○ Run TeraSort on input data ○ Validate sorted output data via TeraValidate https://wiki.linaro.org/LEG/Engineering/BigData/HadoopBuildInstallAndRunGuide
  • 17. ENGINEERS AND DEVICES WORKING TOGETHER HiBench ● Contains 9 typical Hadoop and Spark workloads (including micro benchmarks, HDFS benchmarks, web search benchmarks, machine learning benchmarks using Mahout, and data analytics benchmarks) ● Sort, WordCount, TeraSort, TestDFSIO, Nutch indexing (search indexing using Nutch engine), PageRank (An implementation of Google’s Web page ranking algorithm), hivebench ● Uses zlib compression for input and output ● Metrics: Time (sec) & Throughput (Bytes/Sec), Memory partitions, parallelism, ● Cons: Lack of AARCH bits, Lack of documentations https://wiki.linaro.org/LEG/Engineering/BigData/HiBench
  • 18. ENGINEERS AND DEVICES WORKING TOGETHER TestDFSIO ● It is part of hadoop-mapreduce-client-jobclient.jar ● Stress test I/O performance (throughput and latency) on a clustered setup. ● This test will shake out the hardware, OS and Hadoop setup on your cluster machines (NameNode/DataNode) ● The tests are run as a MapReduce job using 1:1 mapping (1 map / file) ● Helpful to discover performance bottlenecks in your network ● Benchmark write test followed up with read test ● Use -write for write tests and -read for read tests. ● The results stored in TestDFSIO_results.log. Use -resFile to choose different file name
  • 19. ENGINEERS AND DEVICES WORKING TOGETHER Hive Testbench ● Based on TPC-H and TPC-DS benchmarks ● Experiment Apache Hive at any data scale ● Contains data generator and set of queries ● Test the basic Hive performance on large data sets https://wiki.linaro.org/LEG/Engineering/BigData/HiveTestBench
  • 20. ENGINEERS AND DEVICES WORKING TOGETHER MR(Map Reduce) Benchmark for MR ● Loops a small job number of times ● Checks whether small job runs are responsive and running efficiently on your cluster ● Puts focus on MapReduce layer as its impact on the HDFS layer is very limited ● The multiple parallel MRBench issue is resolved. Hence you can run it from different boxes ● Test command to run 50 small test jobs $ hadoop jar hadoop-*test*.jar mrbench -numRuns 50 ● Exemplary output, which means in 31 sec the job finished DataLines Maps Reduces AvgTime (milliseconds) 1 2 1 31414
  • 21. ENGINEERS AND DEVICES WORKING TOGETHER NNBench and NNBenchWithoutMR ● Load testing NameNode through continuous read, write, rename and delete operations on small files ● Stress tests HDFS (I/O) ● To increase stress, multiple instances of NNBenchWithoutMR can be run simultaneously from several machines or increase map tasks for NNBench ● All write tests are run then followed by read tests ● The test command: The below command will run a NameNode benchmark that creates 1000 files using 12 maps and 6 reducers. $ hadoop jar hadoop-*test*.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-`hostname -s`
  • 22. ENGINEERS AND DEVICES WORKING TOGETHER TPC Benchmark ● TPCx-HS - https://wiki.linaro.org/LEG/Engineering/BigData/TPCxHS ○ Currently facing problems with cluster shell configuration ● TPC-H ○ TPC-H benchmark focuses on ad-hoc queries ● TPC-DS ○ “the” standard benchmark for decision support ● TPC-C ○ Is an on-line transaction processing (OLTP) benchmark
  • 23. ENGINEERS AND DEVICES WORKING TOGETHER TPCx-HS Benchmark X: Express, H: Hadoop, S: Sort The TPCx-HS kit contains ● TPCx-HS specification documentation ● TPCx-HS User's guide documentation ● Scripts to run benchmarks ● Java code to execute the benchmark load TPCx-HS Execution ● A valid run consists of 5 separate phases run sequentially with overlap in their execution ● The benchmark test consists of 2 runs (Run with lower and higher TPCx-HS Performance Metric) ● No configuration or tuning changes or reboot are allowed between the two runs
  • 24. ENGINEERS AND DEVICES WORKING TOGETHER TPC vs SPEC models TPC model ● Specification based ● Performance, Price, energy in one benchmark ● End-to-End ● Multiple tests (ACID, Load) ● Independent Review ● Full disclosure ● TPC Technology conference SPEC model ● Kit based ● Performance and energy in separate benchmarks ● Server centric ● Single test ● Summary disclosure ● SPEC research group ICPE
  • 25. ENGINEERS AND DEVICES WORKING TOGETHER BigBench ● BigBench is a joint effort with partners in industry and academia on creating a comprehensive and standardized BigData benchmark. ● BigBench builds upon and borrows elements from existing benchmarking efforts (such as TPC-xHS, GridMix, PigMix, HiBench, Big Data Benchmark, YCSB and TPC-DS). ● BigBench is a specification-based benchmark with an open-source reference implementation kit. ● As a specification-based benchmark, it would be technology-agnostic and provide the necessary formalism and flexibility to support multiple implementations. ● Focused around execution time calculation ● Consists of 30 queries/workloads (10 of them are from TPC) ● Drawback - it is structured-data-intensive
  • 26. ENGINEERS AND DEVICES WORKING TOGETHER Spark Bench for Apache Spark ● Build on ARM works ● FAIL: When spark bench examples are run, a KILL signal is observed which terminates all workers. ● This is still under investigation as there are no useful logs to debug. No proper error description and lack of documentation is a challenge. ● A ticket is already filed on spark bench git which is unresolved. ● Con: Lack of documentation.
  • 27. ENGINEERS AND DEVICES WORKING TOGETHER GridMix ● Mix of Synthetic Mapreduce jobs (sorting text data and SequenceFiles) ● Evaluate MapReduce and HDFS performance ● The input file needs to be in JSON format ● Jobs can be either LOADJOB (trace of history logs using Rumen) or SLEEPJOB (A synthetic job where each task does *nothing* but sleep for a certain duration) ● Jobs can be run in STRESS, REPLAY or SERIAL mode ● You can emulate number of users, number of job queries and resource usage (CPU, memory, JVM heap) ● Basic command line usage: (Provided as part of hadoop command) $ hadoop gridmix [-generate <size>] [-users <users-list>] <iopath> <trace> ● Con: Challenging to explore the performance impact of combining or separating workloads, e.g., through consolidating from many clusters.
  • 28. ENGINEERS AND DEVICES WORKING TOGETHER PigMix ● PigMix is a set of queries used test Apache Pig performance ● There are queries that test latency (How long it takes to run this query ?) ● Queries that test scalability (How many fields or records can ping handle before it fails ?) ● Usage: Run the below commands from pig home ant -Dharness.hadoop.home=$HADOOP_HOME pigmix-deploy (generate test dataset) ant -Dharness.hadoop.home=$HADOOP_HOME pigmix (run the PigMix benchmark)
  • 29. ENGINEERS AND DEVICES WORKING TOGETHER SWIM(Statistical Workload Injector for MapReduce) ● Enables rigorous performance measurement of MapReduce systems ● Contains suites of workloads of thousands of jobs, with complex data, arrival, and computation patterns ● Informs both highly targeted, workload specific optimizations ● Highly recommended for MapReduce operators ● Performance measurement https://github.com/SWIMProjectUCB/SWIM/wiki/Performance-measurement-by-ex ecuting-synthetic-or-historical-workloads
  • 30. ENGINEERS AND DEVICES WORKING TOGETHER AmpLab ● The Big Data Benchmark from AMPLab, UC Berkeley provides quantitative and qualitative comparisons of five systems ○ Redshift – a hosted MPP database offered by Amazon.com based on the ParAc warehouse ○ Hive – a Hadoop-based data warehousing system ○ Shark – a Hive-compatible SQL engine which runs on top of the Spark computing framework ○ Impala – a Hive-compatible* SQL engine with its own MPP-like execution engine ○ Stinger/Tez – Tez is a next generation Hadoop execution engine used in Spark ● This benchmark measures response time on a handful of relational queries: scans, aggregations, joins, and UDF’s, across different data sizes.
  • 31. ENGINEERS AND DEVICES WORKING TOGETHER BigDataBench BigDataBench is a benchmark suite for scale-out workloads, different from SPEC CPU (sequential workloads), and PARSEC (multithreaded workloads). Currently, it simulates five typical and important big data applications: search engine, social network, e-commerce, multimedia data analytics, and bioinformatics. Includes 15 real-world data sets, and 34 big data workloads.
  • 32. ENGINEERS AND DEVICES WORKING TOGETHER References https://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-21.pdf Terasort, TestDFSIO, NNBench, MRBench https://wiki.linaro.org/LEG/Engineering/BigData https://wiki.linaro.org/LEG/Engineering/BigData/HadoopTuningGuide https://wiki.linaro.org/LEG/Engineering/BigData/HadoopBuildInstallAndRunGuide http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasor t-testdfsio-nnbench-mrbench/ GridMix3, PigMix, HiBench, TPCx-HS, SWIM, AMPLab, BigBench https://hadoop.apache.org/docs/current/hadoop-gridmix/GridMix.html https://cwiki.apache.org/confluence/display/PIG/PigMix https://wiki.linaro.org/LEG/Engineering/BigData/HiBench https://wiki.linaro.org/LEG/Engineering/BigData/TPCxHS https://github.com/SWIMProjectUCB/SWIM/wiki https://github.com/amplab https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench
  • 33. Thank you ganesh.raju@linaro.org naresh.bhat@linaro.org #LAS16 For further information: www.linaro.org LAS16 keynotes and videos on: connect.linaro.org