What is Big Data?

Big Data is the ocean of information we swim in
every day – vast zeta bytes of data flowing from
our computers, mobile devices, and machine
sensors.
With the right solutions, organizations can dive
into all data and gain valuable insights that were
previously unimaginable.
More data may lead to more accurate analyses
and more accurate analysis may lead to more
confident decision making. And better decisions
can mean greater operational efficiencies, cost
reductions and reduced risk.

Lower Data Storage Costs by keeping
non important data stored on Hadoop
clusters and amalgamate Data
Warehouse with Hadoop clusters.
Unearth patterns
for leakages and
issues by
identifying true
causes of issues,
catching fraud
and abuse cases.
Make informed decisions by
pinpointing product buzz
information from social media
and the web to lower cost of
product to market and product
development lifecycle
Differentiate
from
competitors by
using insight to
align to
customers
needs.
Increase customer base with targeted campaigns
via social media, market analysis, identify
competitors customers complaints via social
media and identify and target unsatisfied
customers. Click Here: Big Data Blog - Kay Burn
Sales lead
generation by
identifying
customer needs
from the web
pages, blogs and
social media.
Deeper
understanding
of customers
personalities,
personas and
profiles from
Facebook,
LinkedIn and
Twitter which
helps to create
new service
streams.
Accurate action
plans for new
products,
business
strategy and
complaints.

Big Data Velocity deals with the pace at which data flows in from sources
like business processes, machines, networks and human interaction with
things like social media sites, mobile devices, etc.
Big data is enormous volumes of
data. Data is generated by
machines, networks and human
interaction on systems like social
media the volume of data to be
analysed is massive.
Variety refers to the many
sources and types of data both
structured and unstructured. We
used to store data from sources
like spreadsheets and databases.
Now data comes in the form of
emails, photos, videos,
monitoring devices, PDFs, audio,
etc
Veracity refers to the
biases, noise and
abnormality in data.
Is the data that is
being stored, and
mined meaningful to
the problem being
analysed.

23%
18%
17%
7%
7%
9%
11%
6% 20%%
Big Data Contract Jobs
Big Data
Scala
Hadoop
Spark
NoSQL
MongoDB
Cassandra
MapReduce
Cloudera
CouchDB
Big Data Contract Job Landscape
in London
7 days
www.jobsite.co.uk
0 50 100
MapReduce
Spark
Cassandra
Scala
Hadoop
MongoDB
NoSQL
Big Data
Big Data Contract Jobs
Big Data Contract Job Landscape
in London
7 days
www.indeed.co.uk

25
20
15
10
5
0
Big Data
Candidates
Big Data Contract Candidate
Landscape
In London
7 days
www.jobsite.co.uk
Project People’s
rapidly growing,
qualified and
clean Big Data
Contract
Candidate
Database
Mongo
DB
Big
Data
Scala NoSQL Hadoop
Cassand
ra
Spark
6000
5000
4000
3000
2000
1000
Big Data Candidate
Landscape
4918 3748 2630 2232 1953 815 147
0
Number Of Candidates
Big Data Candidate Landscape

Hadoop is a Natural career progression route for
Java professionals.
Hadoop is Java-based framework and
written entirely in Java.
The combination of Hadoop and Java
skills is the number one combination in
demand among all Hadoop jobs.
Java skills come hand in hand while
writing code for the following in
Hadoop:
 MapReduce programming using Java
 User Defined Functions in Pig and
Hive scripts of Hadoop Applications.
 Client Applications in Hbase.

Hadoop is a free, Java-based programming
framework that supports the processing of
large data sets in a distributed computing
environment.
Instead of relying on expensive, proprietary
hardware and different systems to store and
process data, Hadoop enables distributed
parallel processing of huge amounts of data
across inexpensive, industry-standard servers
that both store and process the data, and can
scale without limits. With Hadoop, no data is too
big.
Hadoop was initially inspired by papers
published by Google outlining its approach to
handling an avalanche of data, and has since
become the de facto standard for storing,
processing and analysing hundreds of
terabytes, and even petabytes of data.
Hadoop can provide fast and reliable
analysis of both structured and
unstructured data.
Imagine you had a file that was larger than
your PC's capacity. You could not store that
file, right? Hadoop lets you store files bigger
than what can be stored on one particular
node or server. So you can store very large
files.

There are four categories of
NoSQL
• Key Value
• Document
• Column Family
• Graph
Document
databases pair
each key with a
complex data
structure known
as a document.
With NoSQL databases you
can mix and match to create
a database solution that is
tailored to the businesses
needs.
Wide-column
stores such as
Cassandra and
HBase are
optimized for
queries over
large datasets,
and store
columns of data
together, instead
of rows.
Graph stores are
used to store
information
about networks,
such as social
connections.
Key-value stores are the simplest NoSQL databases.
Every single item in the database is stored as an
attribute name (or "key"), together with its value.

Python is an
excellent choice for
Data Scientists to
do their day to day
activities as it
provides extensive
libraries.
Python is a powerful,
flexible, open-source
language that is easy
to learn, easy to use,
and has powerful
libraries for data
manipulation and
analysis
General-purpose
programming language as
well as being easy to use
for analytical and
quantitative computing.
Python has
been used in
scientific
computing
for many
years.
Python is one
of the most
popular
languages in
the world,
ranking higher
than Perl,
Ruby, and
JavaScript by a
wide margin.

Scala, a scalable language specializing in functional and object-oriented
programming, has been running on the Java Virtual Machine for several years now,
enjoying adoption from enterprises and start-ups alike.
• It Runs on the Java Virtual Machine
• It is More Concise and Readable
than Java
• Easy to Learn and "Exciting"
• Solve functional problems
Functional programming
The advantage of functional programming
is that there are no side effects - a function
takes input and produces output , that is
all. This make it easy to write error free
programs that can scale or can be executed
in parallel. Scala does not need to know
whether the data is structured or
unstructured.
Objected oriented programming language (OOP)
This helps produce programs that are easier to read
and maintain.
Brevity Less code mean fewer bugs and
less time spent on maintenance.
Static Types
Unlike Java, Scala supports type
inference
Which means it is able to detect
unstructured data types such as a
picture, web page or video.

Run programs up to 100x faster than
Hadoop MapReduce in memory, or 10x
faster on disk
Write applications quickly in Java, Scala or
Python.
Combine SQL, streaming, and complex
analytics. Spark powers a stack of high-level
tools.
Spark provides simple and easy-to-understand
programming APIs that can be used to build
applications at a rapid pace in Java, Python or Scala.

Hadoop Ecosystem Components Example

Kay Burn
My name is Kayleigh or Kay for
short, I am a Senior Big Data
Consultant at Project People
providing global recruitment
solutions within Big Data, Data
Science, Business Intelligence &
Insight.
Call me on 01179087000 or
07803415865 to discuss your next
Big Data project.
Email kay.burn@projectpeople.com
Check out my blog here:
http://kayburn.wix.com/southwest

What is Big Data?

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

What is Big Data?