Watch this webinar to understand how Hortonworks DataFlow (HDF) has evolved into the new Cloudera DataFlow (CDF). Learn about key capabilities that CDF delivers such as -
-Powerful data ingestion powered by Apache NiFi
-Edge data collection by Apache MiNiFi
-IoT-scale streaming data processing with Apache Kafka
-Enterprise services to offer unified security and governance from edge-to-enterprise
Data ingestion, transformation and routing done visually with no code using Apache NiFi & 260+ processors
Build streaming apps and analytics from edge to datalake / EDW using builder
Enable edge data collection and intelligence through MiNiFi agents
Support massive IoT infrastructures
Deliver perishable insights with pattern matching and Complex Event Processing (CEP) from real-time streams
Manage, monitor, secure and govern streaming data
What it actually is and What is the main use/goal of [product]?
Provide context to why we added this to our stack at time. For CDF, it was to a) create more value from HDP by making it easier to get data into HDP and also to take advantage of growing IOT market opportunities and to address more encompassing view of data. It then was foundational for next step (DataPlane). History can help strengthen mental models of where this fits.
TALK TRACK
We usually help our customers get started with one of these CDF use cases:
They augment their Splunk systems with a wider variety of data (via CDF),
They ingest logs for cyber security and threat detection.
They feed data to streaming analytics engines like Apache Spark or Apache Storm
They move their own data internally between data centers on premises or to the cloud.
And of course, they capture data from the Internet of Things. CDF was originally designed to be robust, so that it could continue to move data despite varying device footprints or fluctuating power or connectivity levels. The data keeps flowing, without being lost in transit.
[NEXT SLIDE]
Clearsense public case study, https://hortonworks.com/customers/clearsense/
Challenge
Needed viable, economic, and secure platform that could combine multi-format data streaming
Data scarcity/latency problems for healthcare organizations
Clinicians wanted to use machine learning/data science to store/analyze data, but technology didn’t exist.
Solution
First to deliver SMART real-time streaming data to healthcare customers.
Inception product makes data available for clinical, financial and operational decisions.
Customers have access to all data sources, ingested with CDF, stored in HDP, delivered to the point of decision.
Result
Doctors and nurses now have a new level of mission-critical data and relevant insight that can be incorporated into clinical decisions.
Cost efficiencies from running in the cloud have allowed Clearsense to offer healthcare predictive analytics to 2,000 rural providers that otherwise wouldn’t have access.
Real-time data is displayed on “Mission Control” dashboard, which helps prevent Code Blue with patients.
TMW/Trimble case study, https://hortonworks.com/customers/tmw-systems/
Challenge:
Accurate data for small carriers needed to improve business results
95% small carriers have a deficit in the data available to them
They are estimating data, price points, revenue-based opportunities and controlling fuel cost
Solution:
New approach enables advanced analytics leveraging Big Data. Analytics like market rate index, national rate, fuel surcharge, and maintenance cost are important because small businesses were growing at a fast rate.
Leveraging big data powering Blockchain, with machine learning, to revolutionize Transportation and Logistics industries
Analyzed fuel data; can consolidate data set for small carriers to generate community data lake to drive revenue, fuel and freight cost, lane analysis, and pricing ranges.
Results:
Double digit revenue Y/Y
Managing 4M trucks on the nation/state roads, daily
$31 billion dollars in freight movement guides customers to profitability
Blockchain driven architecture
Data ingestion, transformation and routing done visually with no code using Apache NiFi & 260+ processors
Build streaming apps and analytics from edge to datalake / EDW using builder
Enable edge data collection and intelligence through MiNiFi agents
Support massive IoT infrastructures
Deliver perishable insights with pattern matching and Complex Event Processing (CEP) from real-time streams
Manage, monitor, secure and govern streaming data
Web-based user interface
Design, control, feedback & monitoring
Highly configurable
Loss tolerant vs guaranteed delivery
Low latency vs high throughput
Dynamic prioritization
Flow can be modified at runtime
Back pressure
Data provenance
Track dataflow from beginning to end
Designed for extension
Build your own processors
Secure
SSL, SSH, HTTPS, etc.
Web-based user interface
Design, control, feedback & monitoring
Highly configurable
Loss tolerant vs guaranteed delivery
Low latency vs high throughput
Dynamic prioritization
Flow can be modified at runtime
Back pressure
Data provenance
Track dataflow from beginning to end
Designed for extension
Build your own processors
Secure
SSL, SSH, HTTPS, etc.