Internet of Things (IoT) data frequently has a location and time component. Getting value out of this "geotemporal" data can be tricky. We'll explore when and how to leverage Cassandra, DSE Search and DSE Analytics to surface meaningful information from your geotemporal data.
DataStax and Esri: Geotemporal IoT Search and Analytics
1. When and Where are all the Things:
Geotemporal IoT Search and Analytics
2. Esri
Geographic Information System (GIS)
• Environmental Systems Research Institute (ESRI) was founded in 1969
• Esri develops GIS software
• Global Company with over 350,000 user organizations worldwide
Headquarters in Redlands, CA 80 Esri distributors worldwide
7. Ingestion
of high velocity & volume geotemporal IoT data
Ingestion When and Where
are all the Things?
• Sustain a single node throughput of at least tens of thousands of events per second
• Achieve near linear scalability of throughput when adding additional nodes
• Gracefully handle bursty data
8. Apache Kafka
Publish-subscribe messaging rethought as a distributed commit log
• Fast
- single broker can handle hundreds of MBs of reads and writes per second
• Scalable
- data streams are partitioned and spread over a cluster of machines
• Durable
- messages are persisted to disk and replicated within the cluster
• Distributed
- cluster-centric design that offers strong durability and fault-tolerance guarantees
9. Apache Spark
A fast and general engine for large-scale data processing
• Unified big data processing
- write streaming jobs the same way you write batch jobs
- can combine streaming with batch and interactive queries
• Spark apps can be written in Java, Scala, Python, and R
13. Streaming Analytics
of high velocity & volume geotemporal IoT data
When and Where are all the Things?
Streaming
Analytics
• Configure the flow of events,
- the filtering and analytic steps to perform,
- what ingestion stream(s) to apply them to,
- and where to send the results.
Ingestion
14. KafkaUtils.createStream(ssc, …)
.map( event => FieldEnricher.enrich(event, …) )
.filter( event => IncidentDetector.evaluate(event, …) )
.map( event => FieldEnricher.enrich(event, …) )
.map( event => FieldMapper(event, …))
.saveTo…
=> DAG(Directed Acyclic Graph)
• Configure the flow of events,
- the filtering and analytic steps to perform,
- what ingestion stream(s) to apply them to,
- and where to send the results.
of high velocity & volume geotemporal IoT data
Streaming Analytics
15. GIS Tools for Hadoop
http://esri.github.io/gis-tools-for-hadoop/
• Esri Geometry API for Java:
- Geometry objects: points, lines, polygons
- Spatial relations: intersects, touches, overlaps, …
- Spatial operations: buffer, cut, union, …
• Spatial Framework for Hadoop
- Includes Spatial UDFs (User Defined Functions) that extend Hive
• GeoProcessing Tools for Hadoop
Ch. 8 Geospatial & Temporal Data Analysis
16. Demo
New York Taxi Cab Location Density Monitoring
High Velocity Geotemporal Analytics
18. Storage
of high velocity & volume geotemporal IoT data
Ingestion Streaming
Analytics
Storage + Query
• Sustain a single-node write throughput of at least tens of thousands of events per second
• Achieve growth in volume capacity & write throughput when adding additional nodes
19. Cassandra
A Distributed Database with real-world Scalability
• Distributed, Scalable, and Highly Available
• Continuous Availability
- no single point of failure
• Easy data distribution across multiple data centers
• Spark Cassandra Connector
- https://github.com/datastax/spark-cassandra-connector
21. Ingestion Streaming
Analytics
Search
Storage + Query
• Efficiently access and search a large volume of data
- Query by any combination of id, time, space, and attributes
Search
high velocity & volume geotemporal IoT data
22. Search
high velocity & volume geotemporal IoT data
• Efficiently access and search a large volume of data
- Query by any combination of id, time, space, and attributes
- Made possible via DSE Search = C*/Solr + Lucene spatial types
24. Visualization
of high velocity & volume geotemporal IoT data
DesktopWeb Device
Ingestion Streaming
Analytics
Search
Storage + Query
• ArcGIS API for JavaScript
- A lightweight way to embed maps in web apps
- Renders any Map or Feature Service compliant source
- https://www.esri.com/library/whitepapers/pdfs/geoservices-rest-spec.pdf
Visualization
25. High Velocity & Volume Visualization
Requirements
• Render with ability to do aggregation
- Aggregations calculated at various levels of detail and are specific to each user session
- when zoomed in raw features are returned and rendered
26. High Velocity & Volume Visualization
Requirements
• Render with ability to do aggregation
- Aggregations calculated at various levels of detail and are specific to each user session
- when zoomed in raw features are returned and rendered
27. High Velocity & Volume Visualization
Requirements
• Render with ability to do aggregation
- Aggregations calculated at various levels of detail and are specific to each user session
- when zoomed in raw features are returned and rendered
44. When and Where are all the Things
Geotemporal IoT Search and Analytics Summary
• When working with high velocity & volume geotemporal IoT data we have found the best
technology selections are as follows:
- Ingestion = Spark Streaming + Kafka
- Streaming Analytics = Spark Streaming + GIS Tools for Hadoop
- Storage & Search = DataStax Enterprise + Spark Cassandra Connector
- Batch Analytics = DataStax Enterprise + Spark Core + GIS Tools for Hadoop
- Visualization = ArcGIS API for JavaScript
- GIS Tools for Hadoop
- Can be used as a basis to add spatial geometries, relations, and operators to Spark
- http://esri.github.io/gis-tools-for-hadoop/