SlideShare une entreprise Scribd logo
1  sur  14
BIG DATA 
By Kay Burn
Big Data is the ocean of information we swim in 
every day – vast zeta bytes of data flowing from 
our computers, mobile devices, and machine 
sensors. 
With the right solutions, organizations can dive 
into all data and gain valuable insights that were 
previously unimaginable. 
More data may lead to more accurate analyses 
and more accurate analysis may lead to more 
confident decision making. And better decisions 
can mean greater operational efficiencies, cost 
reductions and reduced risk.
Lower Data Storage Costs by keeping 
non important data stored on Hadoop 
clusters and amalgamate Data 
Warehouse with Hadoop clusters. 
Unearth patterns 
for leakages and 
issues by 
identifying true 
causes of issues, 
catching fraud 
and abuse cases. 
Make informed decisions by 
pinpointing product buzz 
information from social media 
and the web to lower cost of 
product to market and product 
development lifecycle 
Differentiate 
from 
competitors by 
using insight to 
align to 
customers 
needs. 
Increase customer base with targeted campaigns 
via social media, market analysis, identify 
competitors customers complaints via social 
media and identify and target unsatisfied 
customers. Click Here: Big Data Blog - Kay Burn 
Sales lead 
generation by 
identifying 
customer needs 
from the web 
pages, blogs and 
social media. 
Deeper 
understanding 
of customers 
personalities, 
personas and 
profiles from 
Facebook, 
LinkedIn and 
Twitter which 
helps to create 
new service 
streams. 
Accurate action 
plans for new 
products, 
business 
strategy and 
complaints.
Big Data Velocity deals with the pace at which data flows in from sources 
like business processes, machines, networks and human interaction with 
things like social media sites, mobile devices, etc. 
Big data is enormous volumes of 
data. Data is generated by 
machines, networks and human 
interaction on systems like social 
media the volume of data to be 
analysed is massive. 
Variety refers to the many 
sources and types of data both 
structured and unstructured. We 
used to store data from sources 
like spreadsheets and databases. 
Now data comes in the form of 
emails, photos, videos, 
monitoring devices, PDFs, audio, 
etc 
Veracity refers to the 
biases, noise and 
abnormality in data. 
Is the data that is 
being stored, and 
mined meaningful to 
the problem being 
analysed.
23% 
18% 
17% 
7% 
7% 
9% 
11% 
6% 20%% 
Big Data Contract Jobs 
Big Data 
Scala 
Hadoop 
Spark 
NoSQL 
MongoDB 
Cassandra 
MapReduce 
Cloudera 
CouchDB 
Big Data Contract Job Landscape 
in London 
7 days 
www.jobsite.co.uk 
0 50 100 
MapReduce 
Spark 
Cassandra 
Scala 
Hadoop 
MongoDB 
NoSQL 
Big Data 
Big Data Contract Jobs 
Big Data Contract Job Landscape 
in London 
7 days 
www.indeed.co.uk
25 
20 
15 
10 
5 
0 
Big Data 
Candidates 
Big Data Contract Candidate 
Landscape 
In London 
7 days 
www.jobsite.co.uk 
Project People’s 
rapidly growing, 
qualified and 
clean Big Data 
Contract 
Candidate 
Database 
Mongo 
DB 
Big 
Data 
Scala NoSQL Hadoop 
Cassand 
ra 
Spark 
6000 
5000 
4000 
3000 
2000 
1000 
Big Data Candidate 
Landscape 
4918 3748 2630 2232 1953 815 147 
0 
Number Of Candidates 
Big Data Candidate Landscape
Hadoop is a Natural career progression route for 
Java professionals. 
Hadoop is Java-based framework and 
written entirely in Java. 
The combination of Hadoop and Java 
skills is the number one combination in 
demand among all Hadoop jobs. 
Java skills come hand in hand while 
writing code for the following in 
Hadoop: 
 MapReduce programming using Java 
 User Defined Functions in Pig and 
Hive scripts of Hadoop Applications. 
 Client Applications in Hbase.
Hadoop is a free, Java-based programming 
framework that supports the processing of 
large data sets in a distributed computing 
environment. 
Instead of relying on expensive, proprietary 
hardware and different systems to store and 
process data, Hadoop enables distributed 
parallel processing of huge amounts of data 
across inexpensive, industry-standard servers 
that both store and process the data, and can 
scale without limits. With Hadoop, no data is too 
big. 
Hadoop was initially inspired by papers 
published by Google outlining its approach to 
handling an avalanche of data, and has since 
become the de facto standard for storing, 
processing and analysing hundreds of 
terabytes, and even petabytes of data. 
Hadoop can provide fast and reliable 
analysis of both structured and 
unstructured data. 
Imagine you had a file that was larger than 
your PC's capacity. You could not store that 
file, right? Hadoop lets you store files bigger 
than what can be stored on one particular 
node or server. So you can store very large 
files.
There are four categories of 
NoSQL 
• Key Value 
• Document 
• Column Family 
• Graph 
Document 
databases pair 
each key with a 
complex data 
structure known 
as a document. 
With NoSQL databases you 
can mix and match to create 
a database solution that is 
tailored to the businesses 
needs. 
Wide-column 
stores such as 
Cassandra and 
HBase are 
optimized for 
queries over 
large datasets, 
and store 
columns of data 
together, instead 
of rows. 
Graph stores are 
used to store 
information 
about networks, 
such as social 
connections. 
Key-value stores are the simplest NoSQL databases. 
Every single item in the database is stored as an 
attribute name (or "key"), together with its value.
Python is an 
excellent choice for 
Data Scientists to 
do their day to day 
activities as it 
provides extensive 
libraries. 
Python is a powerful, 
flexible, open-source 
language that is easy 
to learn, easy to use, 
and has powerful 
libraries for data 
manipulation and 
analysis 
General-purpose 
programming language as 
well as being easy to use 
for analytical and 
quantitative computing. 
Python has 
been used in 
scientific 
computing 
for many 
years. 
Python is one 
of the most 
popular 
languages in 
the world, 
ranking higher 
than Perl, 
Ruby, and 
JavaScript by a 
wide margin.
Scala, a scalable language specializing in functional and object-oriented 
programming, has been running on the Java Virtual Machine for several years now, 
enjoying adoption from enterprises and start-ups alike. 
• It Runs on the Java Virtual Machine 
• It is More Concise and Readable 
than Java 
• Easy to Learn and "Exciting" 
• Solve functional problems 
Functional programming 
The advantage of functional programming 
is that there are no side effects - a function 
takes input and produces output , that is 
all. This make it easy to write error free 
programs that can scale or can be executed 
in parallel. Scala does not need to know 
whether the data is structured or 
unstructured. 
Objected oriented programming language (OOP) 
This helps produce programs that are easier to read 
and maintain. 
Brevity Less code mean fewer bugs and 
less time spent on maintenance. 
Static Types 
Unlike Java, Scala supports type 
inference 
Which means it is able to detect 
unstructured data types such as a 
picture, web page or video.
Run programs up to 100x faster than 
Hadoop MapReduce in memory, or 10x 
faster on disk 
Write applications quickly in Java, Scala or 
Python. 
Combine SQL, streaming, and complex 
analytics. Spark powers a stack of high-level 
tools. 
Spark provides simple and easy-to-understand 
programming APIs that can be used to build 
applications at a rapid pace in Java, Python or Scala.
Hadoop Ecosystem Components Example
Kay Burn 
My name is Kayleigh or Kay for 
short, I am a Senior Big Data 
Consultant at Project People 
providing global recruitment 
solutions within Big Data, Data 
Science, Business Intelligence & 
Insight. 
Call me on 01179087000 or 
07803415865 to discuss your next 
Big Data project. 
Email kay.burn@projectpeople.com 
Check out my blog here: 
http://kayburn.wix.com/southwest

Contenu connexe

Dernier

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Dernier (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

En vedette (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

What is Big Data?

  • 1. BIG DATA By Kay Burn
  • 2. Big Data is the ocean of information we swim in every day – vast zeta bytes of data flowing from our computers, mobile devices, and machine sensors. With the right solutions, organizations can dive into all data and gain valuable insights that were previously unimaginable. More data may lead to more accurate analyses and more accurate analysis may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk.
  • 3. Lower Data Storage Costs by keeping non important data stored on Hadoop clusters and amalgamate Data Warehouse with Hadoop clusters. Unearth patterns for leakages and issues by identifying true causes of issues, catching fraud and abuse cases. Make informed decisions by pinpointing product buzz information from social media and the web to lower cost of product to market and product development lifecycle Differentiate from competitors by using insight to align to customers needs. Increase customer base with targeted campaigns via social media, market analysis, identify competitors customers complaints via social media and identify and target unsatisfied customers. Click Here: Big Data Blog - Kay Burn Sales lead generation by identifying customer needs from the web pages, blogs and social media. Deeper understanding of customers personalities, personas and profiles from Facebook, LinkedIn and Twitter which helps to create new service streams. Accurate action plans for new products, business strategy and complaints.
  • 4. Big Data Velocity deals with the pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc. Big data is enormous volumes of data. Data is generated by machines, networks and human interaction on systems like social media the volume of data to be analysed is massive. Variety refers to the many sources and types of data both structured and unstructured. We used to store data from sources like spreadsheets and databases. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analysed.
  • 5. 23% 18% 17% 7% 7% 9% 11% 6% 20%% Big Data Contract Jobs Big Data Scala Hadoop Spark NoSQL MongoDB Cassandra MapReduce Cloudera CouchDB Big Data Contract Job Landscape in London 7 days www.jobsite.co.uk 0 50 100 MapReduce Spark Cassandra Scala Hadoop MongoDB NoSQL Big Data Big Data Contract Jobs Big Data Contract Job Landscape in London 7 days www.indeed.co.uk
  • 6. 25 20 15 10 5 0 Big Data Candidates Big Data Contract Candidate Landscape In London 7 days www.jobsite.co.uk Project People’s rapidly growing, qualified and clean Big Data Contract Candidate Database Mongo DB Big Data Scala NoSQL Hadoop Cassand ra Spark 6000 5000 4000 3000 2000 1000 Big Data Candidate Landscape 4918 3748 2630 2232 1953 815 147 0 Number Of Candidates Big Data Candidate Landscape
  • 7. Hadoop is a Natural career progression route for Java professionals. Hadoop is Java-based framework and written entirely in Java. The combination of Hadoop and Java skills is the number one combination in demand among all Hadoop jobs. Java skills come hand in hand while writing code for the following in Hadoop:  MapReduce programming using Java  User Defined Functions in Pig and Hive scripts of Hadoop Applications.  Client Applications in Hbase.
  • 8. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. Hadoop was initially inspired by papers published by Google outlining its approach to handling an avalanche of data, and has since become the de facto standard for storing, processing and analysing hundreds of terabytes, and even petabytes of data. Hadoop can provide fast and reliable analysis of both structured and unstructured data. Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very large files.
  • 9. There are four categories of NoSQL • Key Value • Document • Column Family • Graph Document databases pair each key with a complex data structure known as a document. With NoSQL databases you can mix and match to create a database solution that is tailored to the businesses needs. Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. Graph stores are used to store information about networks, such as social connections. Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value.
  • 10. Python is an excellent choice for Data Scientists to do their day to day activities as it provides extensive libraries. Python is a powerful, flexible, open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis General-purpose programming language as well as being easy to use for analytical and quantitative computing. Python has been used in scientific computing for many years. Python is one of the most popular languages in the world, ranking higher than Perl, Ruby, and JavaScript by a wide margin.
  • 11. Scala, a scalable language specializing in functional and object-oriented programming, has been running on the Java Virtual Machine for several years now, enjoying adoption from enterprises and start-ups alike. • It Runs on the Java Virtual Machine • It is More Concise and Readable than Java • Easy to Learn and "Exciting" • Solve functional problems Functional programming The advantage of functional programming is that there are no side effects - a function takes input and produces output , that is all. This make it easy to write error free programs that can scale or can be executed in parallel. Scala does not need to know whether the data is structured or unstructured. Objected oriented programming language (OOP) This helps produce programs that are easier to read and maintain. Brevity Less code mean fewer bugs and less time spent on maintenance. Static Types Unlike Java, Scala supports type inference Which means it is able to detect unstructured data types such as a picture, web page or video.
  • 12. Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk Write applications quickly in Java, Scala or Python. Combine SQL, streaming, and complex analytics. Spark powers a stack of high-level tools. Spark provides simple and easy-to-understand programming APIs that can be used to build applications at a rapid pace in Java, Python or Scala.
  • 14. Kay Burn My name is Kayleigh or Kay for short, I am a Senior Big Data Consultant at Project People providing global recruitment solutions within Big Data, Data Science, Business Intelligence & Insight. Call me on 01179087000 or 07803415865 to discuss your next Big Data project. Email kay.burn@projectpeople.com Check out my blog here: http://kayburn.wix.com/southwest