SlideShare une entreprise Scribd logo
1  sur  58
From Device to Data Center to Insights
Architectural Considerations for the Internet of Anything
P. Taylor Goetz, Hortonworks
@ptgoetz
About Me
• Tech Staff @ Hortonworks
• PMC Chair, Apache Storm
• ASF Member
• PMC, Apache Incubator, Apache Arrow, Apache
Kylin, Apache Apex
• Mentor/PPMC, Apache Eagle (Incubating), Apache
Mynewt (Incubating), Apache Metron (Incubating),
Apache Gossip (Incubating)
26 billion IoT devices by 2020
-Gartner
http://www.gartner.com/newsroom/id/2636073
IPv4 Address Space: 4.6 billion
IoT Growth
• Everyone here should know IoT is huge
• Sensors, Phones, Connected Cars, Wearables, Software-as-a-
Sensor, ...
• Cuts across virtually all industries
IoT Architecture
Key Architectural Tiers
• Origin: Devices and Data Sources
• Transport: Orchestrating Bi-Directional Data Flow Between Sources
• Analytics: Analysis of Unbounded (Streaming) and Bounded (Batch)
Data, and Acting in Response
Origin Tier
Birthplace of IoT Data
Origin Tier
• Where data is born, but also a destination
• Sensors and Devices
• Constrained Hubs/Gateways
Origin Tier
Devices are getting smaller, cheaper, and increasingly network
enabled.
Examples:
• RaspberryPi ($35, Full OS)
• ESP8266 (<$5 WiFi-enabled microcontroller)
Origin Tier
Devices in the Origin Tier both transmit and receive data.
• Command and Control
• Actuators (interaction with the physical environment)
• End user alerts and notifications
IoT Protocol Considerations
IoT Protocol Considerations
• Device-Device / Device-Gateway Communication
• Radio Frequency Protocols
• IP-based Protocols
IoT Protocol Considerations
Radio Frequency Protocols
• Typically for very resource-constrained devices (Ex: Wireless
sensors in a home security system)
• Usually involve an intermediary hub/gateway as a protocol bridge
(Ex: Main panel in a home security system)
• Short range
• Low Power
Radio Frequency Protocols
ZigBee
• Intended for low power applications (~2 yr. battery life)
• Low data rates
• Simpler and less expensive that WPANs like Bluetooth
Radio Frequency Protocols
ZigBee
• Range: 10–100 meters LOS (between nodes, but messages can
hop in a mesh network)
• Data Rate: 250 kbit/s
• Supports Star, Tree, and Mesh network topologies
• Requires a coordinator device for every network (usually the
hub/gateway)
Radio Frequency Protocols
Z-Wave
• Targets home automation
• Low power/Low data rate
• Proprietary
• Sole chip vendor
Radio Frequency Protocols
Z-Wave
• Range: ~30 meters LOS (between nodes, but messages can hop)
• Data Rate: 100kbit/s
• Form source-routed mesh-networks (can route around failures/obstacles)
• Devices must be paired
• Requires a primary controller (e.g. the hub/gateway)
• Max 232 devices per network (but networks can be bridged)
Radio Frequency Protocols
Bluetooth/Blootooth LE
• Targets wireless computer and device accessories
• High data rates
• Do not form routed networks like Zigbee and Z-Wave
• Usually one host to many device pairing
• Range: 0.5m (Class 4) - 100m (Class 1)
• Data Rate: 1 Mbit/s - 24 Mbit/s
Radio Frequency Protocols
Thread
• New wireless protocol introduced by Nest (Google/Alphabet), Samsung, ARM, Qualcomm
• Built on top of the same (IEEE 802.15.4) specification as ZigBee
• IPv6-based
• Mesh network with hops supported
• ~250 devices per network
• Very low power (purported years of operation on a single AA with deep sleep modes)
• Very new/unsure future — WiFi, Bluetooth, etc. already ubiquitous
IoT Protocol Considerations
IP-Based Protocols
• Require a full IP stack
• Higher power consumption
• Longer range (e.g. WiFi)
IP-Based Protocols
CoAP - Constrained Application Protocol
• Designed to be used on micro controllers with as little as 10k of
memory.
• Simple request/response protocol
• Much like HTTP but based on UDP
• Based on the REST model (GET, PUT, POST, DELETE)
• Strong security via DTLS (Datagram Transport Layer Security)
IP-Based Protocols
CoAP - Constrained Application Protocol
• Simple 4-byte header
• Subset of MIME types and HTTP response codes
• Data model agnostic
• one-to-one
• Tranport (UDP) <— Base Messaging (Simple Confirmable/Non-
Confirmable message transfer) <— REST Semantics
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Pub/Sub messaging protocol
• Requires a broker (though brokers can be lightweight)
• many-to-many broadcast
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Message == Topic + Payload
• Topics: users/ptgoetz/office/thermostat
• Topic wildcards:
• Single level (+): users/ptgoetz/+/thermostat
• Multi-level (#): users/ptgoetz/office/#
• Payload: Just a bunch of bytes (you define the schema)
IP-Based Protocols
MQTT - Message Queue Telemetry Transport
• Delivery guarantees (QoS):
• 0: At-most-once
• 1: At-least-once
• 2: Exactly-once
• Last will and testament (when a device goes offline)
• Security via SSL/TLS
Apache Mynewt (incubating)
• Real-time, modular OS for IoT devices
• Designed for use in devices with power, memory and
storage constraints
• Support for many ARM Cortex-M based boards
(including Arduino)
• HAL for unified access to MCU features
• Connectivity with Bluetooth LE
• WiFi, CoAP, and Thread support (roadmap)
• Remote Firmware Upgrades
• Command-line tools for package management
Transport Tier
Data Flow From Device to Data Center
Transport Tier
• Connecting Edge Devices:
• To and from the Analytics Tier (data center)
• To and from one another (inter-device communication)
• Bridging Protocols:
• e.g. WPAN to IP
• Collecting/Transforming/Enriching Data in Motion
Apache NiFi
Apache NiFi
• Data flow orchestration tool
• Guaranteed Delivery
• Data provenance (important in the Analytics
Tier)
• Backpressure with release
• Flow-specific QoS
• Web-based UI for editing data flows
• Data flows modifiable at runtime
• Supports bi-directional data flows
• Integrates with just about any system
Apache NiFi
Basic Concepts
• Flow File: Unit of user data with associated
key-value metadata
• Processor: Components for creating, sending,
receiving, transforming, routing, etc. Flow Files
• Connection: Acts as the link between
processors.
• Flow Controller: Brokers the exchange of data
between processors
• Process Group: Set of Processors and
Connections with Input/Output ports. New
components can be created by composition.
Apache NiFi minifi
• Supplement to NiFi for constrained
devices/environments
• More suitable for edge devices
• Small footprint
• Designed to collect data near where it
originates an integrate with NiFi
Apache NiFi
For more information:
• https://nifi.apache.org
Some of the best technical
documentation I’ve ever seen:
• https://nifi.apache.org/docs.html
Analytics Tier
Acting on Insights
Analytics Tier
• Where IoT data often (but not always) intersects with Big Data
platforms and Cloud Computing
• Vertical scaling may suffice
Analytics Tier
• Many, many options…
• [insert your definition of Hadoop here]
Analytics Tier
Key Platform Considerations:
• Unbounded (Stream) data processing frequently necessary
• Apache Storm, Apache Flink, etc.
• Bounded (Batch) data processing frequently necessary
• e.g. Training machine learning models, etc.
• Apache Hadoop M/R, Apache Flink, Apache Spark
• Time Series DB a common requirement
• Apache HBase, Apache Cassandra, etc.
Analytics Tier
Key Platform Considerations:
• Latency matters for many use cases
• Latency can add up quickly, depending on the number of “hops”
• Windowing semantics and flexibility
When?
The importance of event time(s).
What is Event Time and why is it so
important?
• Event Times: Origin Time vs. Processing Time
• Ex: Airplane Mode
• Other types of Event Time:
• Enrichment Time
• Ingest Time
• Processing Time 1, 2, n…
• Exit Time (e.g. “return” events, C2, bi-directional communication)
Choose a platform/API that gives you
the most flexibility with respect to
dealing with various event times.
Future-Proofing and Scaling
Small to Medium Scale:
• Not Big Data
• Investment in large-scale distributed system infrastructure wouldn’t
make sense.
• YAGNI (Yet…)
• Vertical scaling may suffice
Future-Proofing and Scaling
Medium to Large Scale:
• A single server is no longer cutting it
• “V”s are starting to pile up
• Need to move to a distributed architecture to scale with increasing
demand
• Your data is now Big
Apache Beam (incubating)
• Unified API for dealing with
bounded/unbounded data sources
(i.e. batch/streaming)
• One API. Multiple implementations
(execution engines). Called
“Runners” in Beamspeak.
Apache Beam (incubating)
• Major focus on Windowing and
properly dealing with Event Time(s)
• Sliding Windows, Tumbling Windows,
Session Windows, etc.
• Watermark capabilities for dealing
with late data
Apache Beam (incubating)
• Runner/Execution Engine Availability
• Local runner (single machine)
• Runners for Google Cloud
Dataflow, Flink and Spark
• Others underway: Apache Storm,
Apache Apex and others
Apache Beam (incubating)
• Choose the right runner for your
current scaling and organizational
needs (you can switch later as as
necessary)
• Understand the limits of different
runner implementations
• Outside of Google Data Flow, the
Flink runner is currently the most
feature-complete (this will change)
Apache Beam (incubating)
For a technical deep dive into Apache
Beam:
Apache Beam: A Unified Model for
Batch and Streaming Data
Processing
- Davor Bonaci, Google Inc.
Thursday 4:10PM, Ballroom A
Firmware, Parsers, and
Schemas
(Oh my!)
Problem: Data Formats
• Many IoT devices transmit data as a raw array of bytes
• The format of that data may be proprietary
• To be of any use it must be parsed into a machine-readable format
(i.e. Schema)
• Once parsed, you need to know the schema
Problem: Firmware Versions
• Deployed IoT devices may be running any number of versions
• Data formats may differ between firmware versions
• Multiple parsers may be necessary to accommodate different device
types and firmware versions
Solution: Parser Registry
• Allow manufacturers to supply proprietary parsers, load at runtime
• Parser API to include way to discover schema
• Tag data with device type + firmware version at the hub/gateway
• Look up associated parser when data arrives
• (This can be done either in either the Transport or Analytics tier)
Solution: Schema Registry
• When parsers are registered, also register the associated schema
• Downstream components (Transport/Analytics Tier) discover schema
based on metadata
Who owns your IoT data?
Hint: It may not be you.
Who owns your data?
• Beware of 3rd-party device manufacturers
• Data is valuable, and everyone wants it
• Frequently exclusive access
Who owns your data?
• Device manufacturers may hoard data.
• Retention policies limit how long you can store the data.
• Aggregate/Derivative data okay, but what’s the definition?
Thank you!
Questions?
P. Taylor Goetz, Hortonworks
@ptgoetz

Contenu connexe

Tendances

The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...DataWorks Summit
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streamsJoey Echeverria
 
4. Communication and Network Security
4. Communication and Network Security4. Communication and Network Security
4. Communication and Network SecuritySam Bowne
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationMatthew Ring
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at ScaleDataWorks Summit
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer GuideDeon Huang
 
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)Sam Bowne
 
Open / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware SystemsOpen / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware SystemsCharalampos Doukas
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiDataWorks Summit
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
How LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site fasterHow LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site fasterShawn Zandi
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 

Tendances (20)

The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
4. Communication and Network Security
4. Communication and Network Security4. Communication and Network Security
4. Communication and Network Security
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
CNIT 125 Ch 5 Communication & Network Security (part 2 of 2)
 
Open / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware SystemsOpen / Free Cloud platforms and Open Hardware Systems
Open / Free Cloud platforms and Open Hardware Systems
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Nifi
NifiNifi
Nifi
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
IPv6 on the Interop Network
IPv6 on the Interop NetworkIPv6 on the Interop Network
IPv6 on the Interop Network
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
How LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site fasterHow LinkedIn used TCP Anycast to make the site faster
How LinkedIn used TCP Anycast to make the site faster
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 

Similaire à From Device to Data Center to Insights: Architectural Considerations for the Internet of Anything

ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 
IP Signal Distribution
IP Signal DistributionIP Signal Distribution
IP Signal DistributionrAVe [PUBS]
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTDominik Obermaier
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Sooraj Sanker
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Null mumbai-iot-workshop
Null mumbai-iot-workshopNull mumbai-iot-workshop
Null mumbai-iot-workshopNitesh Malviya
 
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureGlobal Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureVinoth Rajagopalan
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterShawn Zandi
 
5 introduction to internet
5 introduction to internet5 introduction to internet
5 introduction to internetVedpal Yadav
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumData Driven Innovation
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Web technologies: recap on TCP-IP
Web technologies: recap on TCP-IPWeb technologies: recap on TCP-IP
Web technologies: recap on TCP-IPPiero Fraternali
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Serverless, IoT and OpenWhisk
Serverless, IoT and OpenWhiskServerless, IoT and OpenWhisk
Serverless, IoT and OpenWhiskAlex Glikson
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute ClusterRamsay Key
 

Similaire à From Device to Data Center to Insights: Architectural Considerations for the Internet of Anything (20)

ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
IP Signal Distribution
IP Signal DistributionIP Signal Distribution
IP Signal Distribution
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTT
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?Can a browser become an IoT Gateway?
Can a browser become an IoT Gateway?
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Null mumbai-iot-workshop
Null mumbai-iot-workshopNull mumbai-iot-workshop
Null mumbai-iot-workshop
 
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with AzureGlobal Azure boot camp 2015 - Microsoft IoT Solutions with Azure
Global Azure boot camp 2015 - Microsoft IoT Solutions with Azure
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
LinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data CenterLinkedIn's Approach to Programmable Data Center
LinkedIn's Approach to Programmable Data Center
 
5 introduction to internet
5 introduction to internet5 introduction to internet
5 introduction to internet
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Web technologies: recap on TCP-IP
Web technologies: recap on TCP-IPWeb technologies: recap on TCP-IP
Web technologies: recap on TCP-IP
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Serverless, IoT and OpenWhisk
Serverless, IoT and OpenWhiskServerless, IoT and OpenWhisk
Serverless, IoT and OpenWhisk
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 

Plus de P. Taylor Goetz

Flux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentFlux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentP. Taylor Goetz
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormP. Taylor Goetz
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache StormP. Taylor Goetz
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 

Plus de P. Taylor Goetz (8)

Flux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & DeploymentFlux: Apache Storm Frictionless Topology Configuration & Deployment
Flux: Apache Storm Frictionless Topology Configuration & Deployment
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 

Dernier

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 

Dernier (20)

Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 

From Device to Data Center to Insights: Architectural Considerations for the Internet of Anything

  • 1. From Device to Data Center to Insights Architectural Considerations for the Internet of Anything P. Taylor Goetz, Hortonworks @ptgoetz
  • 2. About Me • Tech Staff @ Hortonworks • PMC Chair, Apache Storm • ASF Member • PMC, Apache Incubator, Apache Arrow, Apache Kylin, Apache Apex • Mentor/PPMC, Apache Eagle (Incubating), Apache Mynewt (Incubating), Apache Metron (Incubating), Apache Gossip (Incubating)
  • 3. 26 billion IoT devices by 2020 -Gartner http://www.gartner.com/newsroom/id/2636073
  • 4. IPv4 Address Space: 4.6 billion
  • 5. IoT Growth • Everyone here should know IoT is huge • Sensors, Phones, Connected Cars, Wearables, Software-as-a- Sensor, ... • Cuts across virtually all industries
  • 7. Key Architectural Tiers • Origin: Devices and Data Sources • Transport: Orchestrating Bi-Directional Data Flow Between Sources • Analytics: Analysis of Unbounded (Streaming) and Bounded (Batch) Data, and Acting in Response
  • 9. Origin Tier • Where data is born, but also a destination • Sensors and Devices • Constrained Hubs/Gateways
  • 10. Origin Tier Devices are getting smaller, cheaper, and increasingly network enabled. Examples: • RaspberryPi ($35, Full OS) • ESP8266 (<$5 WiFi-enabled microcontroller)
  • 11. Origin Tier Devices in the Origin Tier both transmit and receive data. • Command and Control • Actuators (interaction with the physical environment) • End user alerts and notifications
  • 13. IoT Protocol Considerations • Device-Device / Device-Gateway Communication • Radio Frequency Protocols • IP-based Protocols
  • 14. IoT Protocol Considerations Radio Frequency Protocols • Typically for very resource-constrained devices (Ex: Wireless sensors in a home security system) • Usually involve an intermediary hub/gateway as a protocol bridge (Ex: Main panel in a home security system) • Short range • Low Power
  • 15. Radio Frequency Protocols ZigBee • Intended for low power applications (~2 yr. battery life) • Low data rates • Simpler and less expensive that WPANs like Bluetooth
  • 16. Radio Frequency Protocols ZigBee • Range: 10–100 meters LOS (between nodes, but messages can hop in a mesh network) • Data Rate: 250 kbit/s • Supports Star, Tree, and Mesh network topologies • Requires a coordinator device for every network (usually the hub/gateway)
  • 17. Radio Frequency Protocols Z-Wave • Targets home automation • Low power/Low data rate • Proprietary • Sole chip vendor
  • 18. Radio Frequency Protocols Z-Wave • Range: ~30 meters LOS (between nodes, but messages can hop) • Data Rate: 100kbit/s • Form source-routed mesh-networks (can route around failures/obstacles) • Devices must be paired • Requires a primary controller (e.g. the hub/gateway) • Max 232 devices per network (but networks can be bridged)
  • 19. Radio Frequency Protocols Bluetooth/Blootooth LE • Targets wireless computer and device accessories • High data rates • Do not form routed networks like Zigbee and Z-Wave • Usually one host to many device pairing • Range: 0.5m (Class 4) - 100m (Class 1) • Data Rate: 1 Mbit/s - 24 Mbit/s
  • 20. Radio Frequency Protocols Thread • New wireless protocol introduced by Nest (Google/Alphabet), Samsung, ARM, Qualcomm • Built on top of the same (IEEE 802.15.4) specification as ZigBee • IPv6-based • Mesh network with hops supported • ~250 devices per network • Very low power (purported years of operation on a single AA with deep sleep modes) • Very new/unsure future — WiFi, Bluetooth, etc. already ubiquitous
  • 21. IoT Protocol Considerations IP-Based Protocols • Require a full IP stack • Higher power consumption • Longer range (e.g. WiFi)
  • 22. IP-Based Protocols CoAP - Constrained Application Protocol • Designed to be used on micro controllers with as little as 10k of memory. • Simple request/response protocol • Much like HTTP but based on UDP • Based on the REST model (GET, PUT, POST, DELETE) • Strong security via DTLS (Datagram Transport Layer Security)
  • 23. IP-Based Protocols CoAP - Constrained Application Protocol • Simple 4-byte header • Subset of MIME types and HTTP response codes • Data model agnostic • one-to-one • Tranport (UDP) <— Base Messaging (Simple Confirmable/Non- Confirmable message transfer) <— REST Semantics
  • 24. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Pub/Sub messaging protocol • Requires a broker (though brokers can be lightweight) • many-to-many broadcast
  • 25. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Message == Topic + Payload • Topics: users/ptgoetz/office/thermostat • Topic wildcards: • Single level (+): users/ptgoetz/+/thermostat • Multi-level (#): users/ptgoetz/office/# • Payload: Just a bunch of bytes (you define the schema)
  • 26. IP-Based Protocols MQTT - Message Queue Telemetry Transport • Delivery guarantees (QoS): • 0: At-most-once • 1: At-least-once • 2: Exactly-once • Last will and testament (when a device goes offline) • Security via SSL/TLS
  • 27. Apache Mynewt (incubating) • Real-time, modular OS for IoT devices • Designed for use in devices with power, memory and storage constraints • Support for many ARM Cortex-M based boards (including Arduino) • HAL for unified access to MCU features • Connectivity with Bluetooth LE • WiFi, CoAP, and Thread support (roadmap) • Remote Firmware Upgrades • Command-line tools for package management
  • 28. Transport Tier Data Flow From Device to Data Center
  • 29. Transport Tier • Connecting Edge Devices: • To and from the Analytics Tier (data center) • To and from one another (inter-device communication) • Bridging Protocols: • e.g. WPAN to IP • Collecting/Transforming/Enriching Data in Motion
  • 31. Apache NiFi • Data flow orchestration tool • Guaranteed Delivery • Data provenance (important in the Analytics Tier) • Backpressure with release • Flow-specific QoS • Web-based UI for editing data flows • Data flows modifiable at runtime • Supports bi-directional data flows • Integrates with just about any system
  • 32. Apache NiFi Basic Concepts • Flow File: Unit of user data with associated key-value metadata • Processor: Components for creating, sending, receiving, transforming, routing, etc. Flow Files • Connection: Acts as the link between processors. • Flow Controller: Brokers the exchange of data between processors • Process Group: Set of Processors and Connections with Input/Output ports. New components can be created by composition.
  • 33. Apache NiFi minifi • Supplement to NiFi for constrained devices/environments • More suitable for edge devices • Small footprint • Designed to collect data near where it originates an integrate with NiFi
  • 34. Apache NiFi For more information: • https://nifi.apache.org Some of the best technical documentation I’ve ever seen: • https://nifi.apache.org/docs.html
  • 36. Analytics Tier • Where IoT data often (but not always) intersects with Big Data platforms and Cloud Computing • Vertical scaling may suffice
  • 37. Analytics Tier • Many, many options… • [insert your definition of Hadoop here]
  • 38. Analytics Tier Key Platform Considerations: • Unbounded (Stream) data processing frequently necessary • Apache Storm, Apache Flink, etc. • Bounded (Batch) data processing frequently necessary • e.g. Training machine learning models, etc. • Apache Hadoop M/R, Apache Flink, Apache Spark • Time Series DB a common requirement • Apache HBase, Apache Cassandra, etc.
  • 39. Analytics Tier Key Platform Considerations: • Latency matters for many use cases • Latency can add up quickly, depending on the number of “hops” • Windowing semantics and flexibility
  • 40. When? The importance of event time(s).
  • 41. What is Event Time and why is it so important? • Event Times: Origin Time vs. Processing Time • Ex: Airplane Mode • Other types of Event Time: • Enrichment Time • Ingest Time • Processing Time 1, 2, n… • Exit Time (e.g. “return” events, C2, bi-directional communication)
  • 42. Choose a platform/API that gives you the most flexibility with respect to dealing with various event times.
  • 43. Future-Proofing and Scaling Small to Medium Scale: • Not Big Data • Investment in large-scale distributed system infrastructure wouldn’t make sense. • YAGNI (Yet…) • Vertical scaling may suffice
  • 44. Future-Proofing and Scaling Medium to Large Scale: • A single server is no longer cutting it • “V”s are starting to pile up • Need to move to a distributed architecture to scale with increasing demand • Your data is now Big
  • 45. Apache Beam (incubating) • Unified API for dealing with bounded/unbounded data sources (i.e. batch/streaming) • One API. Multiple implementations (execution engines). Called “Runners” in Beamspeak.
  • 46. Apache Beam (incubating) • Major focus on Windowing and properly dealing with Event Time(s) • Sliding Windows, Tumbling Windows, Session Windows, etc. • Watermark capabilities for dealing with late data
  • 47. Apache Beam (incubating) • Runner/Execution Engine Availability • Local runner (single machine) • Runners for Google Cloud Dataflow, Flink and Spark • Others underway: Apache Storm, Apache Apex and others
  • 48. Apache Beam (incubating) • Choose the right runner for your current scaling and organizational needs (you can switch later as as necessary) • Understand the limits of different runner implementations • Outside of Google Data Flow, the Flink runner is currently the most feature-complete (this will change)
  • 49. Apache Beam (incubating) For a technical deep dive into Apache Beam: Apache Beam: A Unified Model for Batch and Streaming Data Processing - Davor Bonaci, Google Inc. Thursday 4:10PM, Ballroom A
  • 51. Problem: Data Formats • Many IoT devices transmit data as a raw array of bytes • The format of that data may be proprietary • To be of any use it must be parsed into a machine-readable format (i.e. Schema) • Once parsed, you need to know the schema
  • 52. Problem: Firmware Versions • Deployed IoT devices may be running any number of versions • Data formats may differ between firmware versions • Multiple parsers may be necessary to accommodate different device types and firmware versions
  • 53. Solution: Parser Registry • Allow manufacturers to supply proprietary parsers, load at runtime • Parser API to include way to discover schema • Tag data with device type + firmware version at the hub/gateway • Look up associated parser when data arrives • (This can be done either in either the Transport or Analytics tier)
  • 54. Solution: Schema Registry • When parsers are registered, also register the associated schema • Downstream components (Transport/Analytics Tier) discover schema based on metadata
  • 55. Who owns your IoT data? Hint: It may not be you.
  • 56. Who owns your data? • Beware of 3rd-party device manufacturers • Data is valuable, and everyone wants it • Frequently exclusive access
  • 57. Who owns your data? • Device manufacturers may hoard data. • Retention policies limit how long you can store the data. • Aggregate/Derivative data okay, but what’s the definition?
  • 58. Thank you! Questions? P. Taylor Goetz, Hortonworks @ptgoetz

Notes de l'éditeur

  1. That’s a lot of devices, generating a lot of data.
  2. To put that in perspective, that’s over 5 1/2 times the size of the entire IPv4 address space.
  3. Devices, Phones, Gateways and Hubs typically act as a bridge between devices and the cloud.
  4. communication is frequently bi-directional.
  5. Most IoT devices are wireless and there are a number of protocols need to be considered.
  6. fall loosely into two categories
  7. Compare to arduino — with arduino you write C++ code but don’t necessarily know it.
  8. And there’s actually one Apache project that can handle all this very well…
  9. It’s impossible to do NiFi justice in three slides.
  10. time-based aggregations
  11. Google DataFlow API recently open-sourced to Apache.