SlideShare a Scribd company logo
1 of 21
Download to read offline
A Gentle
Introduction to
Big Data
Presenter: Mehmet Ali Akyol
April 03, 2018
How much data is big data?
Big data happens when the data you
have to process is bigger than what
you can process in the given time
What is Big Data?
Field dedicated to the analysis, processing, and storage of large collections of
data that frequently originate from disparate sources
Required when traditional data analysis, processing and storage technologies
and techniques are insufficient
Addresses distinct requirements, such as the combining of multiple unrelated
datasets, processing of large amounts of unstructured data and harvesting of
hidden information in a time-sensitive manner
Characteristics
of Big Data
● Volume
● Variety
● Velocity
● Veracity
● Value
Volume
About the scale of data
Terabytes, petabytes, exabytes, zettabytes…
Airbus generates 640TB of data in a flight
Self driving cars will generate 2PB of data every year
Variety
Various types of data formats and types
Structured, unstructured, and semi-structured
Text files, media files such as sound and video
Velocity
The speed of data is produced
The delay before the data must be consumed
Streaming data; Social media posts (Tweets, Facebook posts), IOT (Internet of
Things)
Veracity
Quality of data
Meaningful results
Understandability
Importance of data source
Value
Usefulness of data
Amount of knowledge that can be extracted from data
Making informed decisions
Types of Data
Structured Data Unstructured
Data
Semi-structured
data
Metadata
Structured Data
Unstructured Data
Semi-structured Data
Metadata
Data about data
Details of the dataset
such as source, date,
and type
Data Processing in Big Data
Batch Processing
- Stored data is processed at
certain time intervals
- Hadoop
Stream Processing
- Processing continuous streams of
data in real time
- Apache Storm
Big Data
Processing
Architectures
Processing architectures that takes
advantage of both batch and stream
processing
- Lambda Architecture
- Apache Spark
- Kappa Architecture
- Apache Flink
Big Data Tools
Hadoop: A framework
that allows for the
distributed storage &
processing of large
datasets across clusters
of computers
Hive: A data warehouse
infrastructure that
provides data
summarization and
querying
Flink: Stream
processing framework
for distributed,
high-performing,
always-available, and
accurate data streaming
applications
Spark: A fast and
general engine for
large-scale data
processing for both
batch and streaming
data
Big Data Tools
Storm: A distributed
realtime computation
system for unbounded
data processing
Beam: A framework for
batch and streaming
data processing jobs
that run on any
execution engine such
as Flink and Spark
Zeppelin: Web-based
notebook that enables
data-driven, interactive
data analytics, and data
visualization
Kafka: A distributed
message queue for
building real-time data
pipelines and streaming
apps
Takeaways
Big Data is a research field that deals with processing of large amount of data
that traditional data processing techniques cannot handle in a timely and
efficient manner
5 Vs of Big Data: Volume, Velocity, Variety, Veracity, and Value
Data Types: Structured, Unstructured, Semi-structured, and Metadata
Big Data Processing Methods: Batch and Stream Processing
There are tens of open-source big data tools out there
Thanks for listening!

More Related Content

What's hot

Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsNguyen Cao
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Big data converted
Big data convertedBig data converted
Big data convertedABINAYAM20
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case Muh Saleh
 
Useful data presentation from DataShaka
Useful data presentation from DataShakaUseful data presentation from DataShaka
Useful data presentation from DataShakaagenda21
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overviewDorai Thodla
 
Nyc web perf-final-july-23
Nyc web perf-final-july-23Nyc web perf-final-july-23
Nyc web perf-final-july-23Dan Boutin
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Hadoop Big data Solution Provider
Hadoop Big data Solution ProviderHadoop Big data Solution Provider
Hadoop Big data Solution ProviderAgileiss
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Fwdays
 

What's hot (20)

Abi
AbiAbi
Abi
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big data converted
Big data convertedBig data converted
Big data converted
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Useful data presentation from DataShaka
Useful data presentation from DataShakaUseful data presentation from DataShaka
Useful data presentation from DataShaka
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Big Data
Big DataBig Data
Big Data
 
Nyc web perf-final-july-23
Nyc web perf-final-july-23Nyc web perf-final-july-23
Nyc web perf-final-july-23
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
INTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATAINTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATA
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Hadoop Big data Solution Provider
Hadoop Big data Solution ProviderHadoop Big data Solution Provider
Hadoop Big data Solution Provider
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 

Similar to A Gentle Introduction to Big Data

Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Cloud Native Day Tel Aviv
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdferamfatima43
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john malloryAmazon Web Services
 
Internet of Things & Big Data
Internet of Things & Big DataInternet of Things & Big Data
Internet of Things & Big DataArun Rajput
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 

Similar to A Gentle Introduction to Big Data (20)

Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
BDA ( haoop ).pptx
BDA ( haoop ).pptxBDA ( haoop ).pptx
BDA ( haoop ).pptx
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data
Big dataBig data
Big data
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Modernizing upstream workflows with aws storage - john mallory
Modernizing upstream workflows with aws storage -  john malloryModernizing upstream workflows with aws storage -  john mallory
Modernizing upstream workflows with aws storage - john mallory
 
Internet of Things & Big Data
Internet of Things & Big DataInternet of Things & Big Data
Internet of Things & Big Data
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

A Gentle Introduction to Big Data

  • 1. A Gentle Introduction to Big Data Presenter: Mehmet Ali Akyol April 03, 2018
  • 2. How much data is big data?
  • 3. Big data happens when the data you have to process is bigger than what you can process in the given time
  • 4. What is Big Data? Field dedicated to the analysis, processing, and storage of large collections of data that frequently originate from disparate sources Required when traditional data analysis, processing and storage technologies and techniques are insufficient Addresses distinct requirements, such as the combining of multiple unrelated datasets, processing of large amounts of unstructured data and harvesting of hidden information in a time-sensitive manner
  • 5. Characteristics of Big Data ● Volume ● Variety ● Velocity ● Veracity ● Value
  • 6. Volume About the scale of data Terabytes, petabytes, exabytes, zettabytes… Airbus generates 640TB of data in a flight Self driving cars will generate 2PB of data every year
  • 7. Variety Various types of data formats and types Structured, unstructured, and semi-structured Text files, media files such as sound and video
  • 8. Velocity The speed of data is produced The delay before the data must be consumed Streaming data; Social media posts (Tweets, Facebook posts), IOT (Internet of Things)
  • 9. Veracity Quality of data Meaningful results Understandability Importance of data source
  • 10. Value Usefulness of data Amount of knowledge that can be extracted from data Making informed decisions
  • 11. Types of Data Structured Data Unstructured Data Semi-structured data Metadata
  • 15. Metadata Data about data Details of the dataset such as source, date, and type
  • 16. Data Processing in Big Data Batch Processing - Stored data is processed at certain time intervals - Hadoop Stream Processing - Processing continuous streams of data in real time - Apache Storm
  • 17. Big Data Processing Architectures Processing architectures that takes advantage of both batch and stream processing - Lambda Architecture - Apache Spark - Kappa Architecture - Apache Flink
  • 18. Big Data Tools Hadoop: A framework that allows for the distributed storage & processing of large datasets across clusters of computers Hive: A data warehouse infrastructure that provides data summarization and querying Flink: Stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications Spark: A fast and general engine for large-scale data processing for both batch and streaming data
  • 19. Big Data Tools Storm: A distributed realtime computation system for unbounded data processing Beam: A framework for batch and streaming data processing jobs that run on any execution engine such as Flink and Spark Zeppelin: Web-based notebook that enables data-driven, interactive data analytics, and data visualization Kafka: A distributed message queue for building real-time data pipelines and streaming apps
  • 20. Takeaways Big Data is a research field that deals with processing of large amount of data that traditional data processing techniques cannot handle in a timely and efficient manner 5 Vs of Big Data: Volume, Velocity, Variety, Veracity, and Value Data Types: Structured, Unstructured, Semi-structured, and Metadata Big Data Processing Methods: Batch and Stream Processing There are tens of open-source big data tools out there