SlideShare une entreprise Scribd logo
1  sur  27
Diomin Aliaksey
R&D
2014, Minsk
OpenSource

Monitoring

Target Group

Apache Hadoop

Yes

X

Developers

Cloudera

Yes

Good

All

Hortonworks

Yes

Good

All

MapR

No

Bad

Enterprise

PivotalHD

No

Bad

Enterprise

3
How to find the bottleneck?

4
5
6
8
9
10
11
12
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

13
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

14
15
16
17
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

18
19
1. Increase size of cluster
2. Increase input block size
3. Increase buffer size

20
1. Compression

21
1. Compression
2. Combiner

22
Wordcount

Reduce function as Combine
combine 1:

<a, 1> <b, 1> <a, 1>

=> <a, 2> <b, 1>

combine 2:

<a, 1> <b, 1>

=> <a, 1> <b, 1>

Reduce:

<a, {1, 2}> <b, {1, 1}> => <a, 3> <b, 2>

23
Mean

combine 1: <k,40> <k,30> <k,20> =>

<k, 30>

combine 2: <k,2> <k,8>

=>

<k, 5>

Reduce:

=>

<k, 17.5>

<k, {30, 5}>

24
Mean

combine 1: <k,40> <k,30> <k,20> =>

<k, 30>

combine 2: <k,2> <k,8>

=>

<k, 5>

Reduce:

=>

<k, 17.5>

<k, {30, 5}>

(40 + 30 + 20 + 2 + 8)/5 = 17.5

25
Mean

combine 1:

<k,<40,1>> <k,<30,1>>, <k,<20,1>>

=>

<k, <90,3> >

<k,<2,1>> <k, <8,1>>

=>

<k, <10, 2> >

Reduce:

=>

<k, 20>

combine 2:

<k, {<90,3>, <10,2>} >

26
27

Contenu connexe

En vedette

Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 

En vedette (11)

Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Powered by the Sun
Powered by the SunPowered by the Sun
Powered by the Sun
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 

Similaire à Hadoop Distributions: Bottlenecks and Tuning

Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» HadoopОбзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» HadoopOlga Lavrentieva
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesHans-Jürgen Schönig
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQMariaDB plc
 
Vcs slides on or 2014
Vcs slides on or 2014Vcs slides on or 2014
Vcs slides on or 2014Shakti Ranjan
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
Big data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - strippedBig data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - strippedMalcolm Box
 
SEMLA_logging_infra
SEMLA_logging_infraSEMLA_logging_infra
SEMLA_logging_infraswy351
 
Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)Soshi Nemoto
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascentjeykottalam
 

Similaire à Hadoop Distributions: Bottlenecks and Tuning (11)

Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» HadoopОбзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
Обзор Hadoop-дистрибутивов. Тюнинг «узких мест» Hadoop
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
 
Vcs slides on or 2014
Vcs slides on or 2014Vcs slides on or 2014
Vcs slides on or 2014
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Big data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - strippedBig data meetup 2012 01-18 - stripped
Big data meetup 2012 01-18 - stripped
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
SEMLA_logging_infra
SEMLA_logging_infraSEMLA_logging_infra
SEMLA_logging_infra
 
Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)Big query - Command line tools and Tips - (MOSG)
Big query - Command line tools and Tips - (MOSG)
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 

Plus de Altoros

Maturing with Kubernetes
Maturing with KubernetesMaturing with Kubernetes
Maturing with KubernetesAltoros
 
Kubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentKubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentAltoros
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityAltoros
 
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksSGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksAltoros
 
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Altoros
 
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
A Zero-Knowledge Proof:  Improving Privacy on a BlockchainA Zero-Knowledge Proof:  Improving Privacy on a Blockchain
A Zero-Knowledge Proof: Improving Privacy on a BlockchainAltoros
 
Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Altoros
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and KubernetesAltoros
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingAltoros
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple NodesAltoros
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayAltoros
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for KubernetesAltoros
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryAltoros
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFAltoros
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedAltoros
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsAltoros
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionAltoros
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesAltoros
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoTAltoros
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentAltoros
 

Plus de Altoros (20)

Maturing with Kubernetes
Maturing with KubernetesMaturing with Kubernetes
Maturing with Kubernetes
 
Kubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentKubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity Assessment
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment Maturity
 
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksSGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
 
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
 
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
A Zero-Knowledge Proof:  Improving Privacy on a BlockchainA Zero-Knowledge Proof:  Improving Privacy on a Blockchain
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
 
Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter Trading
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with Kubespray
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for Kubernetes
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud Foundry
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCF
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment Unattended
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based Solution
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Hadoop Distributions: Bottlenecks and Tuning

Notes de l'éditeur

  1. вывод map, если в буфер не влазит то сброс на диск, потом merge-sort.в определенный момент 2х кратное превышение использования диска относительно вывода map
  2. данные гоняются по сети, нагрузка на io – disk read &amp; network
  3. вывод map, если в буфер не влазит то сброс на диск, потом merge-sort.в определенный момент 2х кратное превышение использования диска относительно вывода map
  4. Задачка: сколько записей и чтений на диск можно получить имея вывод X.идеально: X записали из map, X считали на этапе fetchсуровая реальность: write: X(spill) + X (merge-sort) + X (fetch/spill) = 3 Xread: X (merge-sort) + X (fetch) + X (toreducer) = 3 X
  5. Задачка: сколько записей и чтений на диск можно получить имея вывод X.идеально: X записали из map, X считали на этапе fetchсуровая реальность: write: X(spill) + X (merge-sort) + X (fetch/spill) = 3 Xread: X (merge-sort) + X (fetch) + X (toreducer) = 3 X
  6. увеличим количество машин в 2 раза, а заодно и в параметрах проставим в 2 раза больше map и reducemap и reduce =&gt; eachother =&gt; в 4 раза больше коннектов на получение данных =&gt; лимиты на обработку handlers, на самой датанодеВЫВОД: количество одновременно запущенных map/reduceинстансов должно определяться в первую очередь задачей, линейное масштабирование это сказка
  7. увеличим количество машин в 2 раза, а заодно и в параметрах проставим в 2 раза больше map и reducemap и reduce =&gt; eachother =&gt; в 4 раза больше коннектов на получение данных =&gt; лимиты на обработку handlers, на самой датанодеВЫВОД: количество одновременно запущенных map/reduceинстансов должно определяться в первую очередь задачей, линейное масштабирование это сказка
  8. 2) увеличим блок данных для map =&gt; выскочили за размеры буфера =&gt; лишний spill на диск =&gt; больше дискового io =&gt; все медленней. ВЫВОД: размер блока для обработки на вход map должен быть достаточно большим чтобы заполнить буфер, но не больше, иначе лишняя активность на диске
  9. 2) увеличим блок данных для map =&gt; выскочили за размеры буфера =&gt; лишний spill на диск =&gt; больше дискового io =&gt; все медленней. ВЫВОД: размер блока для обработки на вход map должен быть достаточно большим чтобы заполнить буфер, но не больше, иначе лишняя активность на диске
  10. 3) увеличим размер кеша на map/reduce =&gt; ограничения размера для буфера в jvm (больше 2х гб на массив не выделить)Тут уже ничего не поделать, нужно учитывать что у map/reduce функций есть свои лимиты и они легко достижимы
  11. компрессия =&gt; размен cpu на diskio =&gt; snappy, достаточно шустрое решение для потокового сжатия
  12. Combiner - не всегда возможно использовать в лоб (например мы считаем с помощью hive/pig) или у нас веселая функция
  13. incorrect
  14. incorrect
  15. правильное решение, но требует дополнительных манипуляций на всех уровнях: 1) меняем MapOutputFormat (в значении не просто число, а сумма свернутых чисел и количество чисел для получения текущей суммы)2) отдельная функция для Combine