SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
2
Vision
To become leading consulting and
training provider in the field of Data
Analytics, Machine Learning, Big Data in
India & Overseas.
Mission
To create value for our customers by
providing consulting services and to
impart high quality training & skill
enhancement programs for employability.
About Us
Emerging India is promoted by professionals from IIT’s, IIM’s, MBAs and experts from Education and IT Industry. We are one of the India’s
fastest growing Analytics/ IT consulting and training companies. We offer services in both consulting and training domain including
NASSCOM certified professional programs (designed to bridge the gap between academics and Industry) and Data Analytics/ Cyber
Security/ IoT/ Robotics/ AI/ Blockchain consulting solutions. We are also proud NASSCOM member and NASSCOM
SSC Licensed Training Partner for the northern region in India.. As NASSCOM licensed training partner, Emerging
India is proudly taking NASSCOM SSC initiatives to the next level in the field of Data Analytics to enhance the technical skills of students
& working professionals.
Mrs. Rakhi Singh, Delivery Head (NASSCOM
Certified Trainer)
Mr. Mayank Jain, Big Data Developer and
Analyst
Mr. Kapil Sharma Center-Head cum Trainer
(Certified by North-western University)
Speakers:
What is HDFS
 HDFS stands for Hadoop Distributed File System
 Built on top of Ext3/Ext4 file System
 Designed to store large amount of data reliably and efficiently
 Ensure 100% data availability (High Availability Cluster)
 Do not permit Update Operation
 Built for OLAP, not for OLTP
Architecture of HDFS
Ext3/Ext4
SecondaryName
Node
Name Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
Data Node
HDFS
Master
Slave
MR Operation
A.txt 100mb
Client1 a.txt code1
A
.
t
x
t
A
.t
x
t
MR Framwork
Mapper Reducer
< Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out>
Input
Patitioner
Output
Hi
Hello
Hi
Hadoop
World
Hive
Hadoop
Hive
Hi
Hello
World
a.txt
Hi
Hello
Hi
Hadoop
World
Hive
Hadoop
Hive
Hi
Hello
World
block1
block2
Mapper1
Mapper2
1,Hi
2,Hello
3,Hi
4,Hadoop
5,World
1,Hive
2,Hadoop
3,Hive
4,Hi
5,Hello
6,World
Reducer1
Hi,1
Hello,1
Hi,1
Hadoop,1
World,1
Hive,1
Hadoop,1
Hive,1
Hi,1
Hello,1
World,1
MR Framwork
Mapper Reducer
< Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out>
Input
Patitioner
Output
Reducer1
Hi,1
Hello,1
Hi,1
Hadoop,1
World,1
Hive,1
Hadoop,1
Hive,1
Hi,1
Hello,1
World,1
Hi,<1,1,1>
Hello,<1,1>
Hadoop,<1,1>
World,<1,1>
Hive,<1,1>
Hi,3
Hello,2
Hadoop,2
World,2
Hive,2
HIVE
Driver
Execution Engine
=MapReduce
Compiler Translator
Client Submits SQL
Convert SQL to Map Reduce
Metastore
Derby
Hive
 Database
 Types – Internal (Managed table) & External
 Internal(default): In case of drop operation, it will delete
data + metadata
 External Table: It drop only metadata
 Optimization Technique
 Partitioning
 Bucketing
Introduction to HBase
HBase is a Nosql, non-relational, distributed column-oriented database on top of
Hadoop.
NoSQL - NoSQL database are databases that doesn't use SQL engine as query engine.
Hbase Daemons
Daemons are services that run on individual machines and communicate with each other
HMaster — Master server of HBase, contains all meta data.
HRegionserver — Slave server of Hbase, contains the actual data.
HQuorumpeer — Zookeeper daemons for co-ordination service.
Advantages of using HBase
Provides a highly scalable database with nativity with hadoop.
Nodes can be added on the fly.
HBase ( LSM Tree)
Normalization vs Denormalization
HBase Data Model
Introduction to Spark
Introduction to Spark
Key Features
• RDD
• DAG
• Dataframe
• Lazzy
Questions &
Feedback !!!!
Our Location A
H-196,304,Iind Floor
Sector 63, ,Noida –
201301
Our Phone
+91 120-4169097
+91 8860599698
Email / Website
info@emergingindiagroup.com
https://www.emergingindiagro
up.com
Get in Touch with Us
We would be glad to hear from you !

Contenu connexe

Similaire à Big data technologies by Emerging India Analytics

Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceMakoto Yui
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010Cloudera, Inc.
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Syed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Haider
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020Intellipaat
 
Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020Intellipaat
 
Bhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogueBhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogueVijayananda Mohire
 
Jayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram Parida
 
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_KumarSAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumararavindkvs
 
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Indus Khaitan
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Basecamp Startups Company Profile
Basecamp Startups Company ProfileBasecamp Startups Company Profile
Basecamp Startups Company ProfileRupesh Patil
 
Kalyan Hadoop
Kalyan HadoopKalyan Hadoop
Kalyan HadoopCanarys
 

Similaire à Big data technologies by Emerging India Analytics (20)

Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experience
 
Resume
ResumeResume
Resume
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010
 
Madhu
MadhuMadhu
Madhu
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Syed Iqbal haider_updated
Syed Iqbal haider_updatedSyed Iqbal haider_updated
Syed Iqbal haider_updated
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020Top 5 In-demand Technologies to Learn in 2020
Top 5 In-demand Technologies to Learn in 2020
 
Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020Top 5 In-demand technologies to Learn in 2020
Top 5 In-demand technologies to Learn in 2020
 
New-RajeshNaspoori_profile
New-RajeshNaspoori_profileNew-RajeshNaspoori_profile
New-RajeshNaspoori_profile
 
Bhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogueBhadale group of companies Red Hat partner services catalogue
Bhadale group of companies Red Hat partner services catalogue
 
Ramesh kutumbaka resume
Ramesh kutumbaka resumeRamesh kutumbaka resume
Ramesh kutumbaka resume
 
Jayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum MasterJayaram_Parida- Big Data Architect and Technical Scrum Master
Jayaram_Parida- Big Data Architect and Technical Scrum Master
 
Technovalley RedHat
Technovalley RedHatTechnovalley RedHat
Technovalley RedHat
 
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_KumarSAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
SAP_BASIS & HANA_with_Yrs_Exp-10.7_Aravind_Kumar
 
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
 
Resume
ResumeResume
Resume
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Basecamp Startups Company Profile
Basecamp Startups Company ProfileBasecamp Startups Company Profile
Basecamp Startups Company Profile
 
Kalyan Hadoop
Kalyan HadoopKalyan Hadoop
Kalyan Hadoop
 

Dernier

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Big data technologies by Emerging India Analytics

  • 1.
  • 2. 2 Vision To become leading consulting and training provider in the field of Data Analytics, Machine Learning, Big Data in India & Overseas. Mission To create value for our customers by providing consulting services and to impart high quality training & skill enhancement programs for employability. About Us Emerging India is promoted by professionals from IIT’s, IIM’s, MBAs and experts from Education and IT Industry. We are one of the India’s fastest growing Analytics/ IT consulting and training companies. We offer services in both consulting and training domain including NASSCOM certified professional programs (designed to bridge the gap between academics and Industry) and Data Analytics/ Cyber Security/ IoT/ Robotics/ AI/ Blockchain consulting solutions. We are also proud NASSCOM member and NASSCOM SSC Licensed Training Partner for the northern region in India.. As NASSCOM licensed training partner, Emerging India is proudly taking NASSCOM SSC initiatives to the next level in the field of Data Analytics to enhance the technical skills of students & working professionals.
  • 3. Mrs. Rakhi Singh, Delivery Head (NASSCOM Certified Trainer) Mr. Mayank Jain, Big Data Developer and Analyst Mr. Kapil Sharma Center-Head cum Trainer (Certified by North-western University) Speakers:
  • 4. What is HDFS  HDFS stands for Hadoop Distributed File System  Built on top of Ext3/Ext4 file System  Designed to store large amount of data reliably and efficiently  Ensure 100% data availability (High Availability Cluster)  Do not permit Update Operation  Built for OLAP, not for OLTP
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Architecture of HDFS Ext3/Ext4 SecondaryName Node Name Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node HDFS Master Slave
  • 23. MR Operation A.txt 100mb Client1 a.txt code1 A . t x t A .t x t
  • 24. MR Framwork Mapper Reducer < Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out> Input Patitioner Output Hi Hello Hi Hadoop World Hive Hadoop Hive Hi Hello World a.txt Hi Hello Hi Hadoop World Hive Hadoop Hive Hi Hello World block1 block2 Mapper1 Mapper2 1,Hi 2,Hello 3,Hi 4,Hadoop 5,World 1,Hive 2,Hadoop 3,Hive 4,Hi 5,Hello 6,World Reducer1 Hi,1 Hello,1 Hi,1 Hadoop,1 World,1 Hive,1 Hadoop,1 Hive,1 Hi,1 Hello,1 World,1
  • 25. MR Framwork Mapper Reducer < Key_In,Val_In,Key_Out,Val_out> < Key_In,Val_In,Key_Out,Val_out> Input Patitioner Output Reducer1 Hi,1 Hello,1 Hi,1 Hadoop,1 World,1 Hive,1 Hadoop,1 Hive,1 Hi,1 Hello,1 World,1 Hi,<1,1,1> Hello,<1,1> Hadoop,<1,1> World,<1,1> Hive,<1,1> Hi,3 Hello,2 Hadoop,2 World,2 Hive,2
  • 26. HIVE Driver Execution Engine =MapReduce Compiler Translator Client Submits SQL Convert SQL to Map Reduce Metastore Derby
  • 27. Hive  Database  Types – Internal (Managed table) & External  Internal(default): In case of drop operation, it will delete data + metadata  External Table: It drop only metadata  Optimization Technique  Partitioning  Bucketing
  • 28. Introduction to HBase HBase is a Nosql, non-relational, distributed column-oriented database on top of Hadoop. NoSQL - NoSQL database are databases that doesn't use SQL engine as query engine. Hbase Daemons Daemons are services that run on individual machines and communicate with each other HMaster — Master server of HBase, contains all meta data. HRegionserver — Slave server of Hbase, contains the actual data. HQuorumpeer — Zookeeper daemons for co-ordination service. Advantages of using HBase Provides a highly scalable database with nativity with hadoop. Nodes can be added on the fly.
  • 29. HBase ( LSM Tree)
  • 33. Introduction to Spark Key Features • RDD • DAG • Dataframe • Lazzy
  • 35. Our Location A H-196,304,Iind Floor Sector 63, ,Noida – 201301 Our Phone +91 120-4169097 +91 8860599698 Email / Website info@emergingindiagroup.com https://www.emergingindiagro up.com Get in Touch with Us We would be glad to hear from you !