SlideShare une entreprise Scribd logo
1  sur  61
Finding business value in Big Data
“What exactly is Big Data and why should I care?”
James Serra
Big Data Evangelist
Microsoft
JamesSerra3@gmail.com
Other Presentations
 Building an Effective Data Warehouse Architecture
Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)
 Building a Big Data Solution (Building an Effective Data Warehouse
Architecture with Hadoop, the cloud and MPP)
Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in
 Finding business value in Big Data (What exactly is Big Data and why
should I care?)
Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects
 How does Microsoft solve Big Data?
Covers the Microsoft products that can be used to create a Big Data solution
 Modern Data Warehousing with the Microsoft Analytics Platform System
The next step in data warehouse performance is APS, a MPP appliance
 Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc
Deep dives into the various Microsoft Big Data related products
About Me
 Business Intelligence Consultant, in IT for 28 years
 Microsoft, Big Data Evangelist
 Owner of Serra Consulting Services, specializing in end-to-end Business Intelligence and Data
Warehouse solutions using the Microsoft BI stack
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW developer
 Been perm, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference and PASS Summit
 MCSE for SQL Server 2012: Data Platform and BI
 SME for SQL Server 2012 certs
 Contributing writer for SQL Server Pro magazine
 Blog at JamesSerra.com
 SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
I tried understanding Big Data…
And ended up passed-out drunk in a Denny’s
parking lot
Let’s prevent that from happening…
Agenda
 Overview of Big Data and Analytics
 Use cases
 Data Lake
 Hadoop and its role
 IoT and real-time data
 Modern data warehouse
 Federated querying
 Data warehouse and the cloud
 Symmetric Multiprocessing (SMP) vs. Massively Parallel Processing (MPP)
Overview of Big Data and Analytics
What differentiates today’s
thriving organizations?
Data.
What is Big Data, really?
Data in all forms & sizes
is being generated
faster than ever before
Capture & combine it
for new insights & better,
faster decisions
11
Harness the growing and changing nature of data
Collect any data
StreamingStructured
Challenge is combining transactional data stored in relational databases with less structured data
Big Data = All Data
Get the right information to the right people at the right time in the right format
Unstructured
“ ”
An illustration of the velocity of data created
Kalakota, R. (2012, October 22). Sizing “Mobile + Social” Big Data Stats. Retrieved from http://practicalanalytics.wordpress.com/
The three V’s
Complex implementations
Enterprise data warehouse
Spreadmarts
Siloed data
Hadoop
DashboardsAd hoc analysis
Machine learning
OLAP
Any dataIn-memory
Internet of Things
Innovation
Transactional systems
ETL
Operational reporting
Value
Technology innovation accelerates value
Discover and connect
Answering new questions
Value
21
Put data to work for everyone
in your organization
Inspire innovation
Accelerate decision-making
Learn from & share insights
Units Sold, Discounts, and Profit
before Tax
22
Embrace Big Data across your business
Revenue and Target by Region Departments HeadcountXT2000 Status List
Show Only Problems
Indicator
Preliminary Budget
Materials and Packaging Review
Book Advertising Slots
Fall Showcase Event Analysis
End User Survey
Technical Review Milestone
Status 2M
1.5M
1M
0.5M
0M
Discounts(Millions)
50K 60K 70K 80K 90K 100K 110
Product A
Product D Product C
Product F
Product G
0 5 10 15
Accounting
Administration
Customer Support
Finance
Human Resources
IT
Marketing
R&D
Sales
Sales
Improve revenue
performance
HR
Maximize employee
engagement
Marketing
Build deeper customer
relationships
Finance
Impact your company’s
bottom line
0
5
10
15
0
5
10
15
(Thousands)
North South
Region: South
Target: 13450
Highlighted:
4900
Revenue Target
23
The Data Divide
80%
of data
stored
70%
of data
generated by
customers
<0.5%
being
operationalized
0.5%
being
analyzed
3%
prepared for
analysis
IDC says that right now, about 22% of data is useful. By 2020 that number will climb to 37%.
Major Fail
Gartner: “Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation”
Paradigm4: 76% of those who have used Hadoop or Apache Spark complained of significant limitations
Analytics Solution
Capture and
integrate data
from multiple internal
and external sources
Derive insight
from data
with rich, interactive dashboards
and reports using the tools you know
Put insight
into action
to increase efficiency
and constituent satisfaction
Advanced Analytics Defined
The end result of Big Data - Icing on the cake
Use Cases
Let’s set off light bulbs in your head
Recommenda-
tion engines
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting for
business
planning
Oil & Gas
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
IT infrastructure
& Web App
optimization
Legal
discovery and
document
archiving
Data Analytics is needed everywhere
Intelligence
Gathering
Location-based
tracking &
services
Pricing Analysis
Personalized
Insurance
Personalized
policies can
reduce costs &
better meet
customer needs
Insurance companies can help
(and some have already started
helping) their customers with truly
personalized insurance plans
tailored to their needs and risks
Personalized Insurance
Insurance Companies can collect real-time data from in-
car sensors and combine it with geolocation and in-house
systems. With information such as distance and speed,
provide personalized insurance offers based on driving
amount, risk, and other factors, for a truly personalized
plan that may often save drivers money
$1,600/yr.
US national avg. car
insurance premium
The vast amount of current and ever-growing customer
purchase, rating and click data can all be collected and
managed with an Hadoop-based solution, to pinpoint
preferences based on purchase history and demographics, and
be able to serve useful and compelling cross-sell and up-sell
recommendations.
Recommendation Engines
Significantly
improve up-sell
and cross-sell
opportunities
Retailers can use customer
purchase & rating information to
serve recommendations to current
customers, based on similarities
across many dimensions
158
Items sold/second
by Amazon.com on
11/29/2010 (Cyber
Monday)
Retailers – whether large, small, online or in-store – can improve
margins with more detailed pricing analysis. When a customer
is in range of a transaction (either in the store, online or perhaps
passing by), offer personalized offers, real-time price quotes, or
other frequent-buyer perks to help bring more customers to the
store and improve repeat business.
Pricing Analysis
Significantly
improve sales
and customer
satisfaction
Retailers can use customer past
purchase, preference, and demo-
graphic information to serve real-
time custom pricing, instant
discounts when near the store.
up to 30%
Additional price Mac
users accepted for
travel from Orbitz
Using data from the Weather Channel, Walmart can create targeted ads based on local weather, products in
their nearby stores, and seasonal consumer desires. Walmart increased the berry and steak sales as much as
threefold when weather-targeted ads were run
Using Big data to determine the best train schedules
Data Lake
What is a data lake?
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
• A place to store unlimited amounts of data in any format inexpensively
• Allows collection of data that you may or may not use later: “just in case”
• A way to describe any large data pool in which the schema and data requirements are not
defined until the data is queried: “just in time” or “schema on read”
• Complements EDW and can be seen as a data source for the EDW – capturing all data but
only passing relevant data to the EDW
• Frees up expensive EDW resources (storage and processing), especially for data refinement
• Allows for data exploration to be performed without waiting for the EDW team to model
and load the data
• Some processing in better done on Hadoop than ETL tools like SSIS
• Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera)
Current state of a data warehouse
Traditional Approaches
CRMERPOLTP LOB
DATA SOURCES ETL DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
BI AND ANALYTCIS
Emailed,
centrally
stored Excel
reports and
dashboards
Well manicured, often relational
sources
Known and expected data volume
and formats
Little to no change
Complex, rigid transformations
Required extensive monitoring
Transformed historical into read
structures
Flat, canned or multi-dimensional
access to historical data
Many reports, multiple versions of
the truth
24 to 48h delay
MONITORING AND TELEMETRY
Current state of a data warehouse
Traditional Approaches
CRMERPOLTP LOB
DATA SOURCES ETL DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
BI AND ANALYTCIS
Emailed,
centrally
stored Excel
reports and
dashboards
Increase in variety of data sources
Increase in data volume
Increase in types of data
Pressure on the ingestion engine
Complex, rigid transformations can’t
longer keep pace
Monitoring is abandoned
Delay in data, inability to transform
volumes, or react to new sources
Repair, adjust and redesign ETL
Reports become invalid or unusable
Delay in preserved reports increases
Users begin to “innovate” to relieve
starvation
MONITORING AND TELEMETRY
INCREASING DATA VOLUME NON-RELATIONAL DATA
INCREASE IN TIME
STALE REPORTING
Data Lake Transformation (ELT not ETL)
New Approaches
All data sources are considered
Leverages the power of on-prem
technologies and the cloud for
storage and capture
Native formats, streaming data, big
data
Extract and load, no/minimal transform
Storage of data in near-native format
Orchestration becomes possible
Streaming data accommodation becomes
possible
Refineries transform data on read
Produce curated data sets to
integrate with traditional warehouses
Users discover published data
sets/services using familiar tools
CRMERPOLTP LOB
DATA SOURCES
FUTURE DATA
SOURCESNON-RELATIONAL DATA
EXTRACT AND LOAD
DATA LAKE DATA REFINERY PROCESS
(TRANSFORM ON READ)
Transform
relevant data
into data sets
BI AND ANALYTCIS
Discover and
consume
predictive
analytics, data
sets and other
reports
OTHER REFINERY
PROCESSES
DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
Hadoop and its role
What is Hadoop?
Microsoft Confidential
 Distributed, scalable system on commodity HW
 Composed of a few parts:
 HDFS – Distributed file system
 MapReduce – Programming model
 Other tools: Hive, Pig, SQOOP, HCatalog, HBase,
Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie,
ZooKeeper, Flume, Storm
 Main players are Hortonworks, Cloudera, MapR
 WARNING: Hadoop, while ideal for processing huge
volumes of data, is inadequate for analyzing that
data in real time (companies do batch analytics
instead)
Core Services
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
OOZIE
AMBARI
YARN
MAP
REDUCE
HIVE &
HCATALOG
PIG
HBASEFALCON
Hadoop Cluster
compute
&
storage . . .
. . .
. .
compute
&
storage
.
.
Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware
Hortonworks Data Platform 2.2
Simply put, Hortonworks ties all the open source products together (20)
The real cost of Hadoop
http://www.wintercorp.com/tcod-report/
Use cases using Hadoop and a DW in combination
Bringing islands of Hadoop data together
Archiving data warehouse data to Hadoop (move)
(Hadoop as cold storage)
Exporting relational data to Hadoop (copy)
(Hadoop as backup/DR, analysis, cloud use)
Importing Hadoop data into data warehouse (copy)
(Hadoop as staging area, sandbox, Data Lake)
IoT and real-time data
What is the Internet of Things?
Connectivity Data AnalyticsThings
IoT = sensor-acquired data
What is the Internet of Things (IoT)?
Internet-connected devices that can perceive the environment in some way, share their data, and communicate with
you. IoT is just a catch-all term for ways of using machine-generated data to create something useful.
- Has it one processor and sensor to collect information
- Examples: heart monitoring implants, biochip transponders on farm animals, automobiles with build-in sensors,
field operation devices that assist firefighters in search and rescue
- Excludes computers, tablets, and smart phones
- But really, it’s in the sphere of business intelligence that IoT will really make a difference.
Cool possibilities
- When a milk carton is almost empty it will ping you when you are near a store
- An alarm clock that signals your coffee maker to start brewing when you wake up
- An embedded chip that monitors your vital signs and notifies a medical provider if exceeds limit
Gartner: 10 billion devices connected to the internet today, 26B by 2020
At some point in the future, nearly every manmade object will contain a device that transmits data!
Modern Data Warehouse
Modern Data Warehouse
Think about future needs:
• Increasing data volumes
• Real-time performance
• New data sources and types
• Cloud-born data
• Multi-platform solution
• Hybrid architecture
Modern Data Warehouse Defined
Modern Data WarehouseThe
Dream
The
Reality
Federated Querying
Federated Querying
Other names: Data virtualization, logical data warehouse, data
federation, virtual database, and decentralized data warehouse.
A model that allows a single query to retrieve and combine data as it sits
from multiple data sources, so as to not need to use ETL or learn more
than one retrieval technology
Select… Result set
Federated Querying
Relational
Data
DB2
Oracle
MongoDB
SQL Server
Query Model
Non-
Relational
Data
Cloudera CHD Linux
Hortonworks HDP
Windows Azure
HDInsight
EDW
DW and the Cloud
Can I use the cloud with my DW?
• Public and private cloud
• Cloud-born data vs on-prem born data
• Transfer cost from/to cloud and on-prem
• Sensitive data on-prem, non-sensitive in cloud
• Look at hybrid solutions
TDWI Best Practices Report (2015)
SMP vs MPP
SMP vs MPP
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
50 TB
100 TB
500 TB
10 TB
5 PB
1.000
100
10.000
3-5 Way
Joins
 Joins +
 OLAP operations +
 Aggregation +
 Complex “Where”
constraints +
 Views
 Parallelism
5-10 Way
Joins
Normalized
Multiple, Integrated
Stars and Normalized
Simple
Star
Multiple,
Integrated
Stars
TB’s
MB’s
GB’s
Batch Reporting,
Repetitive Queries
Ad Hoc Queries
Data Analysis/Mining
Near Real Time
Data Feeds
Daily
Load
Weekly
Load
Strategic, Tactical
Strategic
Strategic, Tactical
Loads
Strategic, Tactical
Loads, SLA
“Query Freedom“
“Query complexity“
“Data
Freshness”
“Query Data Volume“
“Query Concurrency“
“Mixed
Workload”
“Schema Sophistication“
“Data Volume”
DW SCALABILITY SPIDER CHART
MPP – Multidimensional
Scalability
SMP – Tunable in one dimension
on cost of other dimensions
The spiderweb depicts
important attributes to
consider when evaluating
Data Warehousing options.
Big Data support is newest
dimension.
When do you need a MPP solution?
• We need at least 3x query performance improvement
• We are near disk capacity and see a lot of growth in the upcoming years
• We need to support queries during our maintenance window
• We need to load data outside of our maintenance window
• We will spend a lot of money for FusionIO cards, SSDs, more SAN space, more
memory, faster cpu, clustering
Big Data is coming
Summary
• We live in an increasingly data-intensive world
• Much of the data stored online and analyzed today is more varied than the data
stored in recent years
• More of our data arrives in near-real time
This presents a large business opportunity. Are you ready for it?
Resources
 The Modern Data Warehouse: http://bit.ly/1xuX4Py
 Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6
 Should you move your data to the cloud? http://bit.ly/1xuXbKU
 Presentation slides for Modern Data Warehousing: http://bit.ly/1xuXcP5
 Presentation slides for Building an Effective Data Warehouse Architecture: http://bit.ly/1xuXeX4
 Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
 What is the Microsoft Analytics Platform System (APS)? http://bit.ly/1xuXipO
 Parallel Data Warehouse (PDW) benefits made simple: http://bit.ly/1xuXlSy
 What is Advanced Analytics? http://bit.ly/1LDklkB
 Azure Data Lake http://bit.ly/1LDkqEN
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted under “Presentations”)

Contenu connexe

Tendances

DAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data ArchitectureDAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake ArchitectureDATAVERSITY
 
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Edureka!
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Certus Solutions
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Cathrine Wilhelmsen
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingAmazon Web Services
 

Tendances (20)

DAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data ArchitectureDAS Slides: Enterprise Architecture vs. Data Architecture
DAS Slides: Enterprise Architecture vs. Data Architecture
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big data
Big dataBig data
Big data
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 

En vedette

How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream AnalyticsJames Serra
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics SuiteJames Serra
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB James Serra
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine LearningJames Serra
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridJames Serra
 

En vedette (20)

How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 

Similaire à Finding business value in Big Data

Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case studySudhi Seshachala
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acmehooduku
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceSalesforce Developers
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateCCG
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & OptimizationAmbareesh Kulkarni
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Denodo
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Big Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxBig Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxZitin Technologies PVT LTD
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Itay Braun
 
TDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DWTDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DWJeannette Browning
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 

Similaire à Finding business value in Big Data (20)

IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case study
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acme
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & Optimization
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Big Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxBig Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docx
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
TDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DWTDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DW
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 

Plus de James Serra

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernanceJames Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
How to build your career
How to build your careerHow to build your career
How to build your careerJames Serra
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed InstanceJames Serra
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at itJames Serra
 

Plus de James Serra (18)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 

Dernier

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Dernier (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

Finding business value in Big Data

  • 1. Finding business value in Big Data “What exactly is Big Data and why should I care?” James Serra Big Data Evangelist Microsoft JamesSerra3@gmail.com
  • 2. Other Presentations  Building an Effective Data Warehouse Architecture Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)  Building a Big Data Solution (Building an Effective Data Warehouse Architecture with Hadoop, the cloud and MPP) Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in  Finding business value in Big Data (What exactly is Big Data and why should I care?) Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects  How does Microsoft solve Big Data? Covers the Microsoft products that can be used to create a Big Data solution  Modern Data Warehousing with the Microsoft Analytics Platform System The next step in data warehouse performance is APS, a MPP appliance  Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc Deep dives into the various Microsoft Big Data related products
  • 3. About Me  Business Intelligence Consultant, in IT for 28 years  Microsoft, Big Data Evangelist  Owner of Serra Consulting Services, specializing in end-to-end Business Intelligence and Data Warehouse solutions using the Microsoft BI stack  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW developer  Been perm, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference and PASS Summit  MCSE for SQL Server 2012: Data Platform and BI  SME for SQL Server 2012 certs  Contributing writer for SQL Server Pro magazine  Blog at JamesSerra.com  SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 4. I tried understanding Big Data… And ended up passed-out drunk in a Denny’s parking lot Let’s prevent that from happening…
  • 5. Agenda  Overview of Big Data and Analytics  Use cases  Data Lake  Hadoop and its role  IoT and real-time data  Modern data warehouse  Federated querying  Data warehouse and the cloud  Symmetric Multiprocessing (SMP) vs. Massively Parallel Processing (MPP)
  • 6. Overview of Big Data and Analytics
  • 8. What is Big Data, really? Data in all forms & sizes is being generated faster than ever before Capture & combine it for new insights & better, faster decisions 11
  • 9. Harness the growing and changing nature of data Collect any data StreamingStructured Challenge is combining transactional data stored in relational databases with less structured data Big Data = All Data Get the right information to the right people at the right time in the right format Unstructured “ ”
  • 10. An illustration of the velocity of data created Kalakota, R. (2012, October 22). Sizing “Mobile + Social” Big Data Stats. Retrieved from http://practicalanalytics.wordpress.com/
  • 12. Complex implementations Enterprise data warehouse Spreadmarts Siloed data Hadoop DashboardsAd hoc analysis Machine learning OLAP Any dataIn-memory Internet of Things Innovation Transactional systems ETL Operational reporting Value Technology innovation accelerates value
  • 13. Discover and connect Answering new questions Value
  • 14. 21 Put data to work for everyone in your organization Inspire innovation Accelerate decision-making Learn from & share insights
  • 15. Units Sold, Discounts, and Profit before Tax 22 Embrace Big Data across your business Revenue and Target by Region Departments HeadcountXT2000 Status List Show Only Problems Indicator Preliminary Budget Materials and Packaging Review Book Advertising Slots Fall Showcase Event Analysis End User Survey Technical Review Milestone Status 2M 1.5M 1M 0.5M 0M Discounts(Millions) 50K 60K 70K 80K 90K 100K 110 Product A Product D Product C Product F Product G 0 5 10 15 Accounting Administration Customer Support Finance Human Resources IT Marketing R&D Sales Sales Improve revenue performance HR Maximize employee engagement Marketing Build deeper customer relationships Finance Impact your company’s bottom line 0 5 10 15 0 5 10 15 (Thousands) North South Region: South Target: 13450 Highlighted: 4900 Revenue Target
  • 16. 23 The Data Divide 80% of data stored 70% of data generated by customers <0.5% being operationalized 0.5% being analyzed 3% prepared for analysis IDC says that right now, about 22% of data is useful. By 2020 that number will climb to 37%.
  • 17. Major Fail Gartner: “Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation” Paradigm4: 76% of those who have used Hadoop or Apache Spark complained of significant limitations
  • 18. Analytics Solution Capture and integrate data from multiple internal and external sources Derive insight from data with rich, interactive dashboards and reports using the tools you know Put insight into action to increase efficiency and constituent satisfaction
  • 20.
  • 21. The end result of Big Data - Icing on the cake
  • 23. Let’s set off light bulbs in your head
  • 24. Recommenda- tion engines Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting for business planning Oil & Gas exploration Social network analysis Churn analysis Traffic flow optimization IT infrastructure & Web App optimization Legal discovery and document archiving Data Analytics is needed everywhere Intelligence Gathering Location-based tracking & services Pricing Analysis Personalized Insurance
  • 25. Personalized policies can reduce costs & better meet customer needs Insurance companies can help (and some have already started helping) their customers with truly personalized insurance plans tailored to their needs and risks Personalized Insurance Insurance Companies can collect real-time data from in- car sensors and combine it with geolocation and in-house systems. With information such as distance and speed, provide personalized insurance offers based on driving amount, risk, and other factors, for a truly personalized plan that may often save drivers money $1,600/yr. US national avg. car insurance premium
  • 26. The vast amount of current and ever-growing customer purchase, rating and click data can all be collected and managed with an Hadoop-based solution, to pinpoint preferences based on purchase history and demographics, and be able to serve useful and compelling cross-sell and up-sell recommendations. Recommendation Engines Significantly improve up-sell and cross-sell opportunities Retailers can use customer purchase & rating information to serve recommendations to current customers, based on similarities across many dimensions 158 Items sold/second by Amazon.com on 11/29/2010 (Cyber Monday)
  • 27. Retailers – whether large, small, online or in-store – can improve margins with more detailed pricing analysis. When a customer is in range of a transaction (either in the store, online or perhaps passing by), offer personalized offers, real-time price quotes, or other frequent-buyer perks to help bring more customers to the store and improve repeat business. Pricing Analysis Significantly improve sales and customer satisfaction Retailers can use customer past purchase, preference, and demo- graphic information to serve real- time custom pricing, instant discounts when near the store. up to 30% Additional price Mac users accepted for travel from Orbitz
  • 28. Using data from the Weather Channel, Walmart can create targeted ads based on local weather, products in their nearby stores, and seasonal consumer desires. Walmart increased the berry and steak sales as much as threefold when weather-targeted ads were run
  • 29. Using Big data to determine the best train schedules
  • 31. What is a data lake? A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. • A place to store unlimited amounts of data in any format inexpensively • Allows collection of data that you may or may not use later: “just in case” • A way to describe any large data pool in which the schema and data requirements are not defined until the data is queried: “just in time” or “schema on read” • Complements EDW and can be seen as a data source for the EDW – capturing all data but only passing relevant data to the EDW • Frees up expensive EDW resources (storage and processing), especially for data refinement • Allows for data exploration to be performed without waiting for the EDW team to model and load the data • Some processing in better done on Hadoop than ETL tools like SSIS • Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera)
  • 32. Current state of a data warehouse Traditional Approaches CRMERPOLTP LOB DATA SOURCES ETL DATA WAREHOUSE Star schemas, views other read- optimized structures BI AND ANALYTCIS Emailed, centrally stored Excel reports and dashboards Well manicured, often relational sources Known and expected data volume and formats Little to no change Complex, rigid transformations Required extensive monitoring Transformed historical into read structures Flat, canned or multi-dimensional access to historical data Many reports, multiple versions of the truth 24 to 48h delay MONITORING AND TELEMETRY
  • 33. Current state of a data warehouse Traditional Approaches CRMERPOLTP LOB DATA SOURCES ETL DATA WAREHOUSE Star schemas, views other read- optimized structures BI AND ANALYTCIS Emailed, centrally stored Excel reports and dashboards Increase in variety of data sources Increase in data volume Increase in types of data Pressure on the ingestion engine Complex, rigid transformations can’t longer keep pace Monitoring is abandoned Delay in data, inability to transform volumes, or react to new sources Repair, adjust and redesign ETL Reports become invalid or unusable Delay in preserved reports increases Users begin to “innovate” to relieve starvation MONITORING AND TELEMETRY INCREASING DATA VOLUME NON-RELATIONAL DATA INCREASE IN TIME STALE REPORTING
  • 34. Data Lake Transformation (ELT not ETL) New Approaches All data sources are considered Leverages the power of on-prem technologies and the cloud for storage and capture Native formats, streaming data, big data Extract and load, no/minimal transform Storage of data in near-native format Orchestration becomes possible Streaming data accommodation becomes possible Refineries transform data on read Produce curated data sets to integrate with traditional warehouses Users discover published data sets/services using familiar tools CRMERPOLTP LOB DATA SOURCES FUTURE DATA SOURCESNON-RELATIONAL DATA EXTRACT AND LOAD DATA LAKE DATA REFINERY PROCESS (TRANSFORM ON READ) Transform relevant data into data sets BI AND ANALYTCIS Discover and consume predictive analytics, data sets and other reports OTHER REFINERY PROCESSES DATA WAREHOUSE Star schemas, views other read- optimized structures
  • 36. What is Hadoop? Microsoft Confidential  Distributed, scalable system on commodity HW  Composed of a few parts:  HDFS – Distributed file system  MapReduce – Programming model  Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm  Main players are Hortonworks, Cloudera, MapR  WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead) Core Services OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUME NFS LOAD & EXTRACT WebHDFS OOZIE AMBARI YARN MAP REDUCE HIVE & HCATALOG PIG HBASEFALCON Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  • 37. Hortonworks Data Platform 2.2 Simply put, Hortonworks ties all the open source products together (20)
  • 38. The real cost of Hadoop http://www.wintercorp.com/tcod-report/
  • 39. Use cases using Hadoop and a DW in combination Bringing islands of Hadoop data together Archiving data warehouse data to Hadoop (move) (Hadoop as cold storage) Exporting relational data to Hadoop (copy) (Hadoop as backup/DR, analysis, cloud use) Importing Hadoop data into data warehouse (copy) (Hadoop as staging area, sandbox, Data Lake)
  • 41. What is the Internet of Things? Connectivity Data AnalyticsThings IoT = sensor-acquired data
  • 42. What is the Internet of Things (IoT)? Internet-connected devices that can perceive the environment in some way, share their data, and communicate with you. IoT is just a catch-all term for ways of using machine-generated data to create something useful. - Has it one processor and sensor to collect information - Examples: heart monitoring implants, biochip transponders on farm animals, automobiles with build-in sensors, field operation devices that assist firefighters in search and rescue - Excludes computers, tablets, and smart phones - But really, it’s in the sphere of business intelligence that IoT will really make a difference. Cool possibilities - When a milk carton is almost empty it will ping you when you are near a store - An alarm clock that signals your coffee maker to start brewing when you wake up - An embedded chip that monitors your vital signs and notifies a medical provider if exceeds limit Gartner: 10 billion devices connected to the internet today, 26B by 2020 At some point in the future, nearly every manmade object will contain a device that transmits data!
  • 44. Modern Data Warehouse Think about future needs: • Increasing data volumes • Real-time performance • New data sources and types • Cloud-born data • Multi-platform solution • Hybrid architecture
  • 49. Federated Querying Other names: Data virtualization, logical data warehouse, data federation, virtual database, and decentralized data warehouse. A model that allows a single query to retrieve and combine data as it sits from multiple data sources, so as to not need to use ETL or learn more than one retrieval technology
  • 50. Select… Result set Federated Querying Relational Data DB2 Oracle MongoDB SQL Server Query Model Non- Relational Data Cloudera CHD Linux Hortonworks HDP Windows Azure HDInsight EDW
  • 51. DW and the Cloud
  • 52. Can I use the cloud with my DW? • Public and private cloud • Cloud-born data vs on-prem born data • Transfer cost from/to cloud and on-prem • Sensitive data on-prem, non-sensitive in cloud • Look at hybrid solutions
  • 53. TDWI Best Practices Report (2015)
  • 55. SMP vs MPP • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  • 56. 50 TB 100 TB 500 TB 10 TB 5 PB 1.000 100 10.000 3-5 Way Joins  Joins +  OLAP operations +  Aggregation +  Complex “Where” constraints +  Views  Parallelism 5-10 Way Joins Normalized Multiple, Integrated Stars and Normalized Simple Star Multiple, Integrated Stars TB’s MB’s GB’s Batch Reporting, Repetitive Queries Ad Hoc Queries Data Analysis/Mining Near Real Time Data Feeds Daily Load Weekly Load Strategic, Tactical Strategic Strategic, Tactical Loads Strategic, Tactical Loads, SLA “Query Freedom“ “Query complexity“ “Data Freshness” “Query Data Volume“ “Query Concurrency“ “Mixed Workload” “Schema Sophistication“ “Data Volume” DW SCALABILITY SPIDER CHART MPP – Multidimensional Scalability SMP – Tunable in one dimension on cost of other dimensions The spiderweb depicts important attributes to consider when evaluating Data Warehousing options. Big Data support is newest dimension.
  • 57. When do you need a MPP solution? • We need at least 3x query performance improvement • We are near disk capacity and see a lot of growth in the upcoming years • We need to support queries during our maintenance window • We need to load data outside of our maintenance window • We will spend a lot of money for FusionIO cards, SSDs, more SAN space, more memory, faster cpu, clustering
  • 58. Big Data is coming
  • 59. Summary • We live in an increasingly data-intensive world • Much of the data stored online and analyzed today is more varied than the data stored in recent years • More of our data arrives in near-real time This presents a large business opportunity. Are you ready for it?
  • 60. Resources  The Modern Data Warehouse: http://bit.ly/1xuX4Py  Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6  Should you move your data to the cloud? http://bit.ly/1xuXbKU  Presentation slides for Modern Data Warehousing: http://bit.ly/1xuXcP5  Presentation slides for Building an Effective Data Warehouse Architecture: http://bit.ly/1xuXeX4  Hadoop and Data Warehouses: http://bit.ly/1xuXfu9  What is the Microsoft Analytics Platform System (APS)? http://bit.ly/1xuXipO  Parallel Data Warehouse (PDW) benefits made simple: http://bit.ly/1xuXlSy  What is Advanced Analytics? http://bit.ly/1LDklkB  Azure Data Lake http://bit.ly/1LDkqEN
  • 61. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted under “Presentations”)

Notes de l'éditeur

  1. many sources and many data marts (spaghetti code), different update of frequency, different variation of dimensions
  2. One version of truth story: different departments using different financial formulas to help bonus This leads to reasons to use BI. This is used to convince your boss of need for DW Note that you still want to do some reporting off of source system (i.e. current inventory counts). It’s important to know upfront if data warehouse needs to be updated in real-time or very frequently as that is a major architectural decision JD Edwards has tables names like T117
  3. Key goal of slide: To convey what every IT person knows: The data warehouse and what’s it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?   Slide talk track: What is the “traditional” data warehouse? IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company. However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
  4. Key goal of slide: Communicate what Hadoop is Slide talk track: Everyone has heard of Hadoop. But what is it? And do I need it? Apache Hadoop is an open-source solution framework that supports data-intensive distributed applications on large clusters of commodity hardware. Hadoop is composed of a few parts: HDFS – Hadoop Distributed File System is Hadoop’s file-system which stores large files (from gigabytes to terabytes) across multiple machines MapReduce – is a programming model that performs filtering, sorting and other data retrieval commands across a parallel, distributed algorithm. Other parts of Hadoop include Hbase, R, Pig, Hive, Flume, Mahout, Avro, Zookeeper which are all parts of the Hadoop ecosystem that all perform other functions to supplement.
  5. http://www.jamesserra.com/archive/2014/02/introduction-to-hadoop/
  6. http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/
  7. PolyBase of APS v2 AU1 can already support HDP 2.x with the hotfix KB2973037! (HDP 2.x includes HDP 2.0 and HDP 2.1) Azure HDInsight supports both HDFS and Azure Blog storage for storing data. With this hotfix, you have following sp_configure values for the option "hadoop connectivity" availlable: 0 - no HDP support 1 - Hortonworks for Windows Server (HDP 1.3) HDInsight on Analytics Platform System HDInsight’s Windows Azure blob storage (WASB[S]) 2 - Hortonworks for Linux (HDP 1.3) 3 - Cloudera CDH 4.3 for Linux (also works with 4.5 and 4.6) 4 - Hortonworks Data Platform for Windows Server (HDP 2.x) 5 - Hortonworks Data Platform (HDP 2.x) for Linux Key goal of slide: PolyBase is available only within the Microsoft Analytics Platform System. Slide talk track: PolyBase simplifies this by allowing Hadoop data to be queried with standard Transact-SQL (T-SQL) query language without the need to learn MapReduce and without the need to move the data into the data warehouse. PolyBase unifies relational and non-relational data at the query level. Integrated query: PolyBase accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster referencing a non-relational source. It then seamlessly returns the results to the user. PolyBase can query Hadoop data in other Hadoop distributions such as Hortonworks or Cloudera. No difficult learning curve: Standard T-SQL can be used to query Hadoop data. Users are not required to learn MapReduce to execute the query. Cloud-Hybrid Scenario Options PolyBase can also query across Windows Azure HDInsight, providing a Hybrid Cloud solution to the data warehouse The ability of querying all of your company’s data, independent of where it resides, what format it is stored in, in a performing way is crucial in today’s data-centric world with massive, increasing data volume. Today, with AU1, one can query various Hadoop distributions + data stored in Azure. For example, with one single T-SQL statement a user can query over data stored in multiple HDP 2.0 clusters, combine it with data in PDW and combine it with data stored in Azure.  No one in the industry (as far as I’m aware of) can do this in this simple fashion. Bringing all Microsoft assets together, on-prem and specifically through our Azure play including various services that will be brought online in future, we can clearly distinguish through our unique & complete end-to-end data management story.   No doubt that there are several pieces missing in our ‘Poly’ vision – including supporting other data stores, enabling push-down computation for our cloud story, more user-definable options language-wise, better automation/polices, and many more ideas we’d like to go after in the next weeks & months ahead.
  8. HDInsights benefits: Cheap, quickly procure Key goal of slide: Highlight the four main use cases for PolyBase. Slide talk track: There are four key scenarios for using PolyBase with the data lake of data normally locked up in Hadoop. PolyBase leverages the APS MPP architecture along with optimizations like push-down computing to query data using Transact-SQL faster than using other Hadoop technologies like Hive. More importantly, you can use the Transact-SQL join syntax between Hadoop data and PDW data without having to import the data into PDW first. PolyBase is a great tool for archiving older or unused data in APS to less expensive storage on a Hadoop cluster. When you do need to access the data for historical purposes, you can easily join it back up with your PDW data using Transact-SQL. There are times when you need to share your PDW with Hadoop users and PolyBase makes it easy to copy data to a Hadoop cluster. Using a simple SELECT INTO statement, PolyBase makes it easy to import valuable Hadoop data into PDW without having to use external ETL processes.
  9. http://blogs.microsoft.com/firehose/2014/07/16/the-internet-of-things-gives-the-worlds-cities-a-major-lift/ http://www.microsoft.com/windowsembedded/en-us/internet-of-things.aspx?WT.mc_id=Search_Bing&WT.srch=1
  10. Key goal of slide: To convey that the traditional data warehouse is going to break in one of four different ways. These ways should also not be a surprise to the IT professionals. At the end of the slide, IT should be asking, what can I do to prevent my warehouse from breaking? Slide talk track: There are many reasons why data warehouses are at it’s tipping point where something needs to change. The first trend that will break my traditional data warehouse is data growth. Data volumes are expected to grow 10X over the next five years and traditional data warehouses cannot keep up with this explosion of data. In addition to growing data, end users have the expectation that they’ll need be able to get back query results faster in near real-time. End users are no longer apt to wait minutes to hours for their results which is something traditional data warehouses cannot keep up with. Also, want real-time data, not dated data pulled in during a maintenance window each night The third trend is new types of data captured that are “non-relational.” 85% of data growth is coming from “non-relational” data in the form of things like web logs, sensor data, social sentiment and devices. You’ve probably heard the term “Big Data” and “Hadoop” quite a bit. This is where these technologies come into play. More on that later…. The final trend that is appearing is cloud born data. This is data that might be coming from some of IT’s infrastructure that they are starting to host in the cloud (ie. CRM, ERP, etc) or not stored by any type of corporate owned system. How do you incorporate both on-premise and cloud data as part of your data warehouse? This is the last trend that is breaking the traditional data warehouse. However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since it’s inception. The question is why? What is going on?
  11. Key goal of slide: To convey that the modern data warehouse is something that the traditional data warehouse must evolve to. To have IT agree that their warehouses need to take advantage of these new technologies (specifically focusing on the middle and bottom layer). Slide talk track: To encompass these four trends, we need to evolve our traditional data warehouse to ensure that it does not break. It needs to become the “modern data warehouse.” What is the “modern data warehouse?” This is the new warehouse that is able to excel with these new trends and can be your warehouse now and into the future. The modern data warehouse has the ability to: Handle all types of data. Whether it be your structured, relational data sources or your non-relational data sources, the Modern data warehouse will incorporate Hadoop. It can handle real-time data by using complex event processor technologies. Provide a way to enrich your data with Extract, Transform Load (ETL) capabilities as well as Master Data Management (MDM) and data quality Provide a way for any BI tool or query mechanism to interface with all these different types of data with a single query model that leverages a single query language that users already know (example: SQL). Questions drive BI, Analytics drive questions
  12. Key goal of slide: To convey that the major pillars of the Analytics Platform System with key points. To help organizations with a simple and smooth seamlessly transition to this new world of data, Microsoft introduces the Microsoft Analytics Platform System (APS) – the only, no-compromise modern data warehouse solution that brings both Hadoop and RDBMS in a single, pre-built appliance with tier-one performance, the lowest TCO in the industry, and accessibility to all their users through some of the most widely used BI tools in the industry.   Enterprise-ready Big Data: Microsoft APS combines Microsoft’s industry leading RDBMS platform, the Parallel Data Warehouse Appliance (PDW), with Microsoft’s Hadoop Distribution, HDInsight, for non-relational data to offer an all-in Big Data Analytics appliance. Tying together and integrating the worlds of relational and Hadoop data is PolyBase, Microsoft’s integrated query tool available only in APS. Your Modern Data Warehouse in One Turnkey Appliance APS integrates PDW and HDInsight to operate seamlessly together in a single appliance Integrated Querying across All Data Types Using T-SQL PolyBase allows Hadoop data to be queried using rich featured T-SQL , while taking advantage of Hadoop processing, without additional Hadoop-based skills or training. Enterprise-Ready Hadoop HDInsight is Microsoft’s Hadoop-based distribution with end-user authentication via Active Directory and managed by IT using System Center Big Data Insights to Any User Native Microsoft BI integration within PolyBase allows everyone access to insights through familiar tools such as SSAS and Excel Next-generation performance at scale: APS was built to scale into multi-petabytes, handling both RDBMS and the data stored in Hadoop, to deliver the performance that meets today’s near real-time sand rapid insights requirements. Scale-Out to accommodate your Growing Data APS contains PDW and HDInsight that both have linear scale-out architecture. Start small with a few terabytes and dynamically add capacity for seamless, linear scale-out Remove DW bottlenecks with MPP SQL Server Get the dynamic performance and scale that your modern data warehouse requires while retaining your skills and investment in SQL Server. Real-Time Performance with In-Memory Provides up to 100x improvement in query performance and 15x compression via updateable in-memory columnstore Concurrency that Supports High Adoption Scales in simultaneous user accessibility. APS has high concurrency, allowing for multiple workloads. Optimal architecture: More than just a converged system, APS has reshaped the very hardware specifications required through software innovations to deliver optimal value. Through features delivered in Windows Server 2012, customers get exceptional value: APS Provides the Industry’s Lowest DW Price/TB Lower cost while maintaining performance using WS2012 Storage Spaces that replace SAN with economical Windows Storage Spaces Save up to 70% of APS storage with up to 15x compression via updateable in-memory columnstore Value through Single Appliance Solution Reduce hardware footprint by having PDW and HDInsight within a single appliance Remove the need for costly integration efforts Value through Flexible Hardware Options Avoid hardware lock-in through flexible hardware options from HP, Dell, and Quanta