SlideShare une entreprise Scribd logo
1  sur  51
Database Applications (15-415)
DBMS Internals- Part IX
Lecture 21, April 1, 2018
Mohammad Hammoud
Today…
 Last Session:
 DBMS Internals- Part VIII
 Algorithms for Relational Operations (Cont’d)
 Today’s Session:
 DBMS Internals- Part IX
 Query Optimization
 Announcements:
 PS4 is out. It is due on April 11 by midnight
 Project 3 is due on Sunday, April 15 by midnight
 Quiz II will be held on Sunday, April 8 (all concepts
covered after the midterm are included)
DBMS Layers
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Queries
Transaction
Manager
Lock
Manager
Recovery
Manager
Outline
A Brief Primer on Query Optimization
Evaluating Query Plans
Relational Algebra Equivalences
Estimating Plan Costs
Enumerating Plans

Cost-Based Query Sub-System
Query Parser
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Query Plan Evaluator
Catalog Manager
Usually there is a
heuristics-based
rewriting step before
the cost-based steps.
Schema Statistics
Select *
From Blah B
Where B.blah = blah
Queries
Query Optimization Steps
 Step 1: Queries are parsed into internal forms
(e.g., parse trees)
 Step 2: Internal forms are transformed into ‘canonical forms’
(syntactic query optimization)
 Step 3: A subset of alternative plans are enumerated
 Step 4: Costs for alternative plans are estimated
 Step 5: The query evaluation plan with the least estimated
cost is picked
Required Information to Evaluate Queries
 To estimate the costs of query plans, the query
optimizer examines the system catalog and retrieves:
 Information about the types and lengths of fields
 Statistics about the referenced relations
 Access paths (indexes) available for relations
 In particular, the Schema and Statistics components
in the Catalog Manager are inspected to find a good
enough query evaluation plan
Cost-Based Query Sub-System
Query Parser
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Query Plan Evaluator
Catalog Manager
Usually there is a
heuristics-based
rewriting step before
the cost-based steps.
Schema Statistics
Select *
From Blah B
Where B.blah = blah
Queries
Catalog Manager: The Schema
 What kind of information do we store at the Schema?
 Information about tables (e.g., table names and
integrity constraints) and attributes (e.g., attribute
names and types)
 Information about indices (e.g., index structures)
 Information about users
 Where do we store such information?
 In tables, hence, can be queried like any other tables
 For example: Attribute_Cat (attr_name: string,
rel_name: string; type: string; position: integer)
Catalog Manager: Statistics
 What would you store at the Statistics component?
 NTuples(R): # records for table R
 NPages(R): # pages for R
 NKeys(I): # distinct key values for index I
 INPages(I): # pages for index I
 IHeight(I): # levels for I
 ILow(I), IHigh(I): range of values for I
 ...
 Such statistics are important for estimating plan
costs and result sizes (to be discussed shortly!)
SQL Blocks
 SQL queries are optimized by decomposing them into a
collection of smaller units, called blocks
 A block is an SQL query with:
 No nesting
 Exactly 1 SELECT and 1 FROM clauses
 At most 1 WHERE, 1 GROUP BY and 1 HAVING clauses
 A typical relational query optimizer concentrates on
optimizing a single block at a time
Translating SQL Queries Into Relational
Algebra Trees
select name
from STUDENT, TAKES
where c-id=‘415’ and
STUDENT.ssn=TAKES.ssn
STUDENT TAKES


s
p
 An SQL block can be thought of as an algebra expression containing:
 A cross-product of all relations in the FROM clause
 Selections in the WHERE clause
 Projections in the SELECT clause
 Remaining operators can be carried out on the result of such
SQL block
Translating SQL Queries Into Relational
Algebra Trees (Cont’d)
STUDENT TAKES


s
p
STUDENT TAKES


s
p Canonical form
Still the same result!
How can this be guaranteed?
Translating SQL Queries Into Relational
Algebra Trees (Cont’d)
STUDENT TAKES


s
p
STUDENT TAKES


s
p Canonical form
OBSERVATION: try to perform selections and projections early!
Translating SQL Queries Into Relational
Algebra Trees (Cont’d)
STUDENT TAKES


s
p
Index; seq scan
Hash join;
merge join;
nested loops;
How to evaluate a query plan (as opposed to
evaluating an operator)?
Outline
A Brief Primer on Query Optimization
Evaluating Query Plans
Relational Algebra Equivalences
Estimating Plan Costs
Enumerating Plans

Query Evaluation Plans
 A query evaluation plan (or simply a plan) consists of an
extended relational algebra tree (or simply a tree)
 A plan tree consists of annotations at each node indicating:
 The access methods to use for each relation
 The implementation method to use for each operator
 Consider the following SQL query Q:
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5
What is the
corresponding
RA of Q?
Query Evaluation Plans (Cont’d)
 Q can be expressed in relational algebra as follows:
)
(Re
5
100
( Sailors
sid
sid
serves
rating
bid
sname 





s
p
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
A RA Tree:
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Simple Nested Loops)
(On-the-fly)
(On-the-fly)
An Extended RA Tree:
(File Scan)
(File Scan)
Pipelining vs. Materializing
 When a query is composed of several operators, the
result of one operator can sometimes be pipelined to
another operator
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Simple Nested Loops)
(On-the-fly)
(On-the-fly)
(File Scan)
(File Scan)
Pipeline the output of the join into the
selection and projection that follow
Applied on-the-fly
Pipelining vs. Materializing
 When a query is composed of several operators, the
result of one operator can sometimes be pipelined to
another operator
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Simple Nested Loops)
(On-the-fly)
(On-the-fly)
(File Scan)
(File Scan)
Pipeline the output of the join into the
selection and projection that follow
Applied on-the-fly
In contrast, a temporary table can be materialized
to hold the intermediate result of the join and read
back by the selection operation!
Pipelining can significantly save I/O cost!
The I/O Cost of the Q Plan
 What is the I/O cost of the following evaluation plan?
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Simple Nested Loops)
(On-the-fly)
(On-the-fly)
(File Scan)
(File Scan)
 The cost of the join is 1000 + 1000 * 500 = 501,000 I/Os (assuming page-oriented
Simple NL join)
 The selection and projection are done on-the-fly; hence, do not incur additional I/Os
Pushing Selections
 How can we reduce the cost of a join?
 By reducing the sizes of the input relations!
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Involves bid in Reserves;
hence, “push” ahead of the join!
Involves rating in Sailors;
hence, “push” ahead of the join!
Pushing Selections
 How can we reduce the cost of a join?
 By reducing the sizes of the input relations!
Reserves Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
(Sort-Merge Join)
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Simple Nested Loops)
(On-the-fly)
(On-the-fly)
(File Scan)
(File Scan)
The I/O Cost of the New Q Plan
 What is the I/O cost of the following evaluation plan?
Reserves Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
(Sort-Merge Join)
Cost of Scanning Reserves = 1000 I/Os
Cost of Writing T1 = 10* I/Os (later)
Cost of Scanning Sailors = 500 I/Os
Cost of Writing T2 = 250* I/Os (later)
*Assuming 100 boats and uniform distribution of reservations across boats.
*Assuming 10 ratings and uniform distribution over ratings.
The I/O Cost of the New Q Plan
 What is the I/O cost of the following evaluation plan?
Reserves Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
(Sort-Merge Join)
Cost = 2×4×250 = 2000 I/Os
(assuming B = 5)
Cost = 2×2×10 = 40 I/Os
(assuming B = 5)
Merge Cost = 10 + 250 = 260 I/Os
The I/O Cost of the New Q Plan
 What is the I/O cost of the following evaluation plan?
Reserves Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
(Sort-Merge Join)
Done on-the-fly, thus, do
not incur additional I/Os
The I/O Cost of the New Q Plan
 What is the I/O cost of the following evaluation plan?
Reserves Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
(Sort-Merge Join)
Cost of Scanning Reserves = 1000 I/Os
Cost of Writing T1 = 10 I/Os (later)
Cost of Scanning Sailors = 500 I/Os
Cost of Writing T2 = 250 I/Os (later)
Cost = 2×4×250 = 2000 I/Os
(assuming B = 5)
Cost = 2×2×10 = 40 I/Os
(assuming B = 5)
Merge Cost = 10 + 250 = 260 I/Os
Total Cost = 1000 + 10 + 500 + 250 + 40 + 2000 + 260 = 4060 I/Os
Done on-the-fly, thus, do
not incur additional I/Os
The I/O Costs of the Two Q Plans
Total Cost = 501, 000 I/Os
Reserves Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
(Sort-Merge Join)
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Simple Nested Loops)
(On-the-fly)
(On-the-fly)
(File Scan)
(File Scan)
Total Cost = 4060 I/Os
Pushing Projections
 How can we reduce the cost of a join?
 By reducing the sizes of the input relations!
 Consider (again) the following plan:
Reserves Sailors
sid=sid
bid=100
sname
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
 What are the attributes required
from T1 and T2?
 Sid from T1
 Sid and sname from T2
Hence, as we scan Reserves and
Sailors we can also remove
unwanted columns (i.e., “Push” the
projections ahead of the join)!
Pushing Projections
 How can we reduce the cost of a join?
 By reducing the sizes of the input relations!
 Consider (again) the following plan:
Reserves Sailors
sid=sid
bid=100
sname
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
The cost after applying
this heuristic can become
2000 I/Os (as opposed to
4060 I/Os with only
pushing the selection)!
“Push” ahead
the join
 What if indexes are available on Reserves and Sailors?
Using Indexes
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Use hash
index; do
not write
result to
temp)
(Index Nested Loops,
with pipelining )
(On-the-fly)
(Hash index on sid)
(Clustered hash index on bid)
 With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples (assuming 100
boats and uniform distribution of reservations across boats)
 Since the index is clustered, the 1000 tuples appear consecutively within the same
bucket; thus # of pages = 1000/100 = 10 pages
Using Indexes
 What if indexes are available on Reserves and Sailors?
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Use hash
index; do
not write
result to
temp)
(Index Nested Loops,
with pipelining )
(On-the-fly)
(Hash index on sid)
(Clustered hash index on bid)
 For each selected Reserves tuple, we can retrieve matching Sailors tuples using the hash
index on the sid field
 Selected Reserves tuples need not be materialized and the join result can be pipelined!
 For each tuple in the join result, we apply rating > 5 and the projection of sname on-the-fly
Using Indexes
 What if indexes are available on Reserves and Sailors?
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Use hash
index; do
not write
result to
temp)
(Index Nested Loops,
with pipelining )
(On-the-fly)
(Hash index on sid)
(Clustered hash index on bid)
Is it necessary to project out
unwanted columns?
NO, since selection results
are NOT materialized
Using Indexes
 What if indexes are available on Reserves and Sailors?
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Use hash
index; do
not write
result to
temp)
(Index Nested Loops,
with pipelining )
(On-the-fly)
(Hash index on sid)
(Clustered hash index on bid)
Does the hash index on sid
need to be clustered?
NO, since there is at most
1 matching Sailors tuple
per a Reserves tuple! Why?
Using Indexes
 What if indexes are available on Reserves and Sailors?
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Use hash
index; do
not write
result to
temp)
(Index Nested Loops,
with pipelining )
(On-the-fly)
(Hash index on sid)
(Clustered hash index on bid)
Cost = 1.2 I/Os (if
A(1)) or 2.2 (if A(2))
Using Indexes
 What if indexes are available on Reserves and Sailors?
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Use hash
index; do
not write
result to
temp)
(Index Nested Loops,
with pipelining )
(On-the-fly)
(Hash index on sid)
(Clustered hash index on bid)
Why not pushing this selection
ahead of the join?
It would require a scan on Sailors!
 What is the I/O cost of the following evaluation plan?
The I/O Cost of the New Q Plan
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Use hash
index; do
not write
result to
temp)
(Index Nested Loops,
with pipelining )
(On-the-fly)
(Hash index on sid)
(Clustered hash index on bid)
10 I/Os
Cost = 1.2 I/Os for
1000 Reserves
tuples; hence,
1200 I/Os
Total Cost = 10 + 1200 = 1210 I/Os
Comparing I/O Costs: Recap
Total Cost = 501, 000 I/Os
Reserves Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Scan;
write to
temp T1)
(Scan;
write to
temp T2)
(Sort-Merge Join)
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
(Simple Nested
Loops)
(On-the-fly)
(On-the-fly)
(File Scan)
(File Scan)
Total Cost = 4060 I/Os
Reserves
Sailors
sid=sid
bid=100
sname
(On-the-fly)
rating > 5
(Hash
index)
(Index Nested
Loops,
with pipelining )
(On-the-fly)
(Hash
index
on sid)
Total Cost = 1210 I/Os
But, How Can we Ensure Correctness?
Canonical form
Still the same result!
How can this be guaranteed?
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Reserves
Sailors
sid=sid
bid=100
sname
rating > 5
Outline
A Brief Primer on Query Optimization
Evaluating Query Plans
Relational Algebra Equivalences
Estimating Plan Costs
Enumerating Plans

Relational Algebra Equivalences
 A relational query optimizer uses relational algebra
equivalences to identify many equivalent expressions for a
given query
 Two relational algebra expressions over the same set of
input relations are said to be equivalent if they produce the
same result on all relations’ instances
 Relational algebra equivalences allow us to:
 Push selections and projections ahead of joins
 Combine selections and cross-products into joins
 Choose different join orders
RA Equivalences: Selections
 Two important equivalences involve selections:
1. Cascading of Selections:
2. Commutation of Selections:
   
 
s s s
c cn c cn
R R
1 1
  
... ...
 
   
 
s s s s
c c c c
R R
1 2 2 1

Allows us to combine several selections into one selection
OR: Allows us to replace a selection with several smaller selections
Allows us to test selection conditions in either order
RA Equivalences: Projections
 One important equivalence involves projections:
 Cascading of Projections:
This says that successively eliminating columns from a relation
is equivalent to simply eliminating all but the columns retained
by the final projection!
   
 
 
R
R an
a
a p
p
p ...
1
1 
RA Equivalences: Cross-Products and Joins
 Two important equivalences involve cross-products
and joins:
1. Commutative Operations:
This allows us to choose which relation to be the inner and
which to be the outer!
(R × S) (S × R)

(R S) (S R)
 

RA Equivalences: Cross-Products and Joins
 Two important equivalences involve cross-products
and joins:
2. Associative Operations:
This says that regardless of the order in which the relations are
considered, the final result is the same!
R × (S × T) (R × S) × T


R (S T) (R S) T
 
 


R (S T) (T R) S

   
It follows:
This order-independence is fundamental to how a query optimizer
generates alternative query evaluation plans
RA Equivalences: Selections, Projections,
Cross Products and Joins
 Selections with Projections:
 Selections with Cross-Products:
This says we can commute a selection with a projection if the
selection involves only attributes retained by the projection!
))
(
(
))
(
( R
R a
c
c
a p
s
s
p 
R T 
c

 )
( S
R
c 
s
This says we can combine a selection with a cross-product to
form a join (as per the definition of a join)!
RA Equivalences: Selections, Projections,
Cross Products and Joins
 Selections with Cross-Products and with Joins:
S
R
c
S
R
c 

 )
(
)
( s
s
This says we can commute a selection with a cross-product or a join
if the selection condition involves only attributes of one of the
arguments to the cross-product or join!
S
R
c
S
R
c 


 )
(
)
( s
s 
Caveat: The attributes mentioned in c must appear only in R and
NOT in S
RA Equivalences: Selections, Projections,
Cross Products and Joins
 Selections with Cross-Products and with Joins (Cont’d):
)
(
3
2
1
)
( S
R
c
c
c
S
R
c 



 s
s
This says we can push part of the selection condition c ahead of
the cross-product!
)))
(
3
(
2
(
1
S
R
c
c
c

 s
s
s
))
(
3
)
(
2
(
1
S
c
R
c
c
s
s
s 

This applies to joins as well!
RA Equivalences: Selections, Projections,
Cross Products and Joins
 Projections with Cross-Products and with Joins:
)
(
2
)
(
1
)
( S
a
R
a
S
R
a p
p
p 


Intuitively, we need to retain only those attributes of R and S that
are either mentioned in the join condition c or included in the set
of attributes a retained by the projection
)
(
2
)
(
1
)
( S
a
c
R
a
S
c
R
a p
p
p 


 
))
(
2
)
(
1
(
)
( S
a
c
R
a
a
S
c
R
a p
p
p
p 


 
How to Estimate the Cost of Plans?
 Now that correctness is ensured, how can the DBMS
estimate the costs of various plans?
Canonical form
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Reserves
Sailors
sid=sid
bid=100
sname
rating > 5
Next Class
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Queries
Transaction
Manager
Lock
Manager
Recovery
Manager
Continue…

Contenu connexe

Similaire à Lecture21-Query-Optimization-1April-2018.pptx

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016Maryann Xue
 
SQL Server 2000 Research Series - Transact SQL
SQL Server 2000 Research Series - Transact SQLSQL Server 2000 Research Series - Transact SQL
SQL Server 2000 Research Series - Transact SQLJerry Yang
 
Spm software effort estimation
Spm software effort estimationSpm software effort estimation
Spm software effort estimationKanchana Devi
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Query optimization
Query optimizationQuery optimization
Query optimizationdixitdavey
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsAlithya
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
powerpoint
powerpointpowerpoint
powerpointbutest
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questionsDr P Deepak
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageNeo4j
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questionsSunil0108
 

Similaire à Lecture21-Query-Optimization-1April-2018.pptx (20)

Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
 
Chapter15
Chapter15Chapter15
Chapter15
 
SQL Server 2000 Research Series - Transact SQL
SQL Server 2000 Research Series - Transact SQLSQL Server 2000 Research Series - Transact SQL
SQL Server 2000 Research Series - Transact SQL
 
Spm software effort estimation
Spm software effort estimationSpm software effort estimation
Spm software effort estimation
 
9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
It's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM MappingsIt's Time to Reassess Your FDM Mappings
It's Time to Reassess Your FDM Mappings
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
powerpoint
powerpointpowerpoint
powerpoint
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questions
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questions
 

Dernier

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Dernier (20)

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 

Lecture21-Query-Optimization-1April-2018.pptx

  • 1. Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud
  • 2. Today…  Last Session:  DBMS Internals- Part VIII  Algorithms for Relational Operations (Cont’d)  Today’s Session:  DBMS Internals- Part IX  Query Optimization  Announcements:  PS4 is out. It is due on April 11 by midnight  Project 3 is due on Sunday, April 15 by midnight  Quiz II will be held on Sunday, April 8 (all concepts covered after the midterm are included)
  • 3. DBMS Layers Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Queries Transaction Manager Lock Manager Recovery Manager
  • 4. Outline A Brief Primer on Query Optimization Evaluating Query Plans Relational Algebra Equivalences Estimating Plan Costs Enumerating Plans 
  • 5. Cost-Based Query Sub-System Query Parser Query Optimizer Plan Generator Plan Cost Estimator Query Plan Evaluator Catalog Manager Usually there is a heuristics-based rewriting step before the cost-based steps. Schema Statistics Select * From Blah B Where B.blah = blah Queries
  • 6. Query Optimization Steps  Step 1: Queries are parsed into internal forms (e.g., parse trees)  Step 2: Internal forms are transformed into ‘canonical forms’ (syntactic query optimization)  Step 3: A subset of alternative plans are enumerated  Step 4: Costs for alternative plans are estimated  Step 5: The query evaluation plan with the least estimated cost is picked
  • 7. Required Information to Evaluate Queries  To estimate the costs of query plans, the query optimizer examines the system catalog and retrieves:  Information about the types and lengths of fields  Statistics about the referenced relations  Access paths (indexes) available for relations  In particular, the Schema and Statistics components in the Catalog Manager are inspected to find a good enough query evaluation plan
  • 8. Cost-Based Query Sub-System Query Parser Query Optimizer Plan Generator Plan Cost Estimator Query Plan Evaluator Catalog Manager Usually there is a heuristics-based rewriting step before the cost-based steps. Schema Statistics Select * From Blah B Where B.blah = blah Queries
  • 9. Catalog Manager: The Schema  What kind of information do we store at the Schema?  Information about tables (e.g., table names and integrity constraints) and attributes (e.g., attribute names and types)  Information about indices (e.g., index structures)  Information about users  Where do we store such information?  In tables, hence, can be queried like any other tables  For example: Attribute_Cat (attr_name: string, rel_name: string; type: string; position: integer)
  • 10. Catalog Manager: Statistics  What would you store at the Statistics component?  NTuples(R): # records for table R  NPages(R): # pages for R  NKeys(I): # distinct key values for index I  INPages(I): # pages for index I  IHeight(I): # levels for I  ILow(I), IHigh(I): range of values for I  ...  Such statistics are important for estimating plan costs and result sizes (to be discussed shortly!)
  • 11. SQL Blocks  SQL queries are optimized by decomposing them into a collection of smaller units, called blocks  A block is an SQL query with:  No nesting  Exactly 1 SELECT and 1 FROM clauses  At most 1 WHERE, 1 GROUP BY and 1 HAVING clauses  A typical relational query optimizer concentrates on optimizing a single block at a time
  • 12. Translating SQL Queries Into Relational Algebra Trees select name from STUDENT, TAKES where c-id=‘415’ and STUDENT.ssn=TAKES.ssn STUDENT TAKES   s p  An SQL block can be thought of as an algebra expression containing:  A cross-product of all relations in the FROM clause  Selections in the WHERE clause  Projections in the SELECT clause  Remaining operators can be carried out on the result of such SQL block
  • 13. Translating SQL Queries Into Relational Algebra Trees (Cont’d) STUDENT TAKES   s p STUDENT TAKES   s p Canonical form Still the same result! How can this be guaranteed?
  • 14. Translating SQL Queries Into Relational Algebra Trees (Cont’d) STUDENT TAKES   s p STUDENT TAKES   s p Canonical form OBSERVATION: try to perform selections and projections early!
  • 15. Translating SQL Queries Into Relational Algebra Trees (Cont’d) STUDENT TAKES   s p Index; seq scan Hash join; merge join; nested loops; How to evaluate a query plan (as opposed to evaluating an operator)?
  • 16. Outline A Brief Primer on Query Optimization Evaluating Query Plans Relational Algebra Equivalences Estimating Plan Costs Enumerating Plans 
  • 17. Query Evaluation Plans  A query evaluation plan (or simply a plan) consists of an extended relational algebra tree (or simply a tree)  A plan tree consists of annotations at each node indicating:  The access methods to use for each relation  The implementation method to use for each operator  Consider the following SQL query Q: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 What is the corresponding RA of Q?
  • 18. Query Evaluation Plans (Cont’d)  Q can be expressed in relational algebra as follows: ) (Re 5 100 ( Sailors sid sid serves rating bid sname       s p Reserves Sailors sid=sid bid=100 rating > 5 sname A RA Tree: Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) (On-the-fly) An Extended RA Tree: (File Scan) (File Scan)
  • 19. Pipelining vs. Materializing  When a query is composed of several operators, the result of one operator can sometimes be pipelined to another operator Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) (On-the-fly) (File Scan) (File Scan) Pipeline the output of the join into the selection and projection that follow Applied on-the-fly
  • 20. Pipelining vs. Materializing  When a query is composed of several operators, the result of one operator can sometimes be pipelined to another operator Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) (On-the-fly) (File Scan) (File Scan) Pipeline the output of the join into the selection and projection that follow Applied on-the-fly In contrast, a temporary table can be materialized to hold the intermediate result of the join and read back by the selection operation! Pipelining can significantly save I/O cost!
  • 21. The I/O Cost of the Q Plan  What is the I/O cost of the following evaluation plan? Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) (On-the-fly) (File Scan) (File Scan)  The cost of the join is 1000 + 1000 * 500 = 501,000 I/Os (assuming page-oriented Simple NL join)  The selection and projection are done on-the-fly; hence, do not incur additional I/Os
  • 22. Pushing Selections  How can we reduce the cost of a join?  By reducing the sizes of the input relations! Reserves Sailors sid=sid bid=100 rating > 5 sname Involves bid in Reserves; hence, “push” ahead of the join! Involves rating in Sailors; hence, “push” ahead of the join!
  • 23. Pushing Selections  How can we reduce the cost of a join?  By reducing the sizes of the input relations! Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) (On-the-fly) (File Scan) (File Scan)
  • 24. The I/O Cost of the New Q Plan  What is the I/O cost of the following evaluation plan? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Cost of Scanning Reserves = 1000 I/Os Cost of Writing T1 = 10* I/Os (later) Cost of Scanning Sailors = 500 I/Os Cost of Writing T2 = 250* I/Os (later) *Assuming 100 boats and uniform distribution of reservations across boats. *Assuming 10 ratings and uniform distribution over ratings.
  • 25. The I/O Cost of the New Q Plan  What is the I/O cost of the following evaluation plan? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Cost = 2×4×250 = 2000 I/Os (assuming B = 5) Cost = 2×2×10 = 40 I/Os (assuming B = 5) Merge Cost = 10 + 250 = 260 I/Os
  • 26. The I/O Cost of the New Q Plan  What is the I/O cost of the following evaluation plan? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Done on-the-fly, thus, do not incur additional I/Os
  • 27. The I/O Cost of the New Q Plan  What is the I/O cost of the following evaluation plan? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Cost of Scanning Reserves = 1000 I/Os Cost of Writing T1 = 10 I/Os (later) Cost of Scanning Sailors = 500 I/Os Cost of Writing T2 = 250 I/Os (later) Cost = 2×4×250 = 2000 I/Os (assuming B = 5) Cost = 2×2×10 = 40 I/Os (assuming B = 5) Merge Cost = 10 + 250 = 260 I/Os Total Cost = 1000 + 10 + 500 + 250 + 40 + 2000 + 260 = 4060 I/Os Done on-the-fly, thus, do not incur additional I/Os
  • 28. The I/O Costs of the Two Q Plans Total Cost = 501, 000 I/Os Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) (On-the-fly) (File Scan) (File Scan) Total Cost = 4060 I/Os
  • 29. Pushing Projections  How can we reduce the cost of a join?  By reducing the sizes of the input relations!  Consider (again) the following plan: Reserves Sailors sid=sid bid=100 sname rating > 5 (Scan; write to temp T1) (Scan; write to temp T2)  What are the attributes required from T1 and T2?  Sid from T1  Sid and sname from T2 Hence, as we scan Reserves and Sailors we can also remove unwanted columns (i.e., “Push” the projections ahead of the join)!
  • 30. Pushing Projections  How can we reduce the cost of a join?  By reducing the sizes of the input relations!  Consider (again) the following plan: Reserves Sailors sid=sid bid=100 sname rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) The cost after applying this heuristic can become 2000 I/Os (as opposed to 4060 I/Os with only pushing the selection)! “Push” ahead the join
  • 31.  What if indexes are available on Reserves and Sailors? Using Indexes Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) (Clustered hash index on bid)  With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples (assuming 100 boats and uniform distribution of reservations across boats)  Since the index is clustered, the 1000 tuples appear consecutively within the same bucket; thus # of pages = 1000/100 = 10 pages
  • 32. Using Indexes  What if indexes are available on Reserves and Sailors? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) (Clustered hash index on bid)  For each selected Reserves tuple, we can retrieve matching Sailors tuples using the hash index on the sid field  Selected Reserves tuples need not be materialized and the join result can be pipelined!  For each tuple in the join result, we apply rating > 5 and the projection of sname on-the-fly
  • 33. Using Indexes  What if indexes are available on Reserves and Sailors? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) (Clustered hash index on bid) Is it necessary to project out unwanted columns? NO, since selection results are NOT materialized
  • 34. Using Indexes  What if indexes are available on Reserves and Sailors? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) (Clustered hash index on bid) Does the hash index on sid need to be clustered? NO, since there is at most 1 matching Sailors tuple per a Reserves tuple! Why?
  • 35. Using Indexes  What if indexes are available on Reserves and Sailors? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) (Clustered hash index on bid) Cost = 1.2 I/Os (if A(1)) or 2.2 (if A(2))
  • 36. Using Indexes  What if indexes are available on Reserves and Sailors? Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) (Clustered hash index on bid) Why not pushing this selection ahead of the join? It would require a scan on Sailors!
  • 37.  What is the I/O cost of the following evaluation plan? The I/O Cost of the New Q Plan Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) (Clustered hash index on bid) 10 I/Os Cost = 1.2 I/Os for 1000 Reserves tuples; hence, 1200 I/Os Total Cost = 10 + 1200 = 1210 I/Os
  • 38. Comparing I/O Costs: Recap Total Cost = 501, 000 I/Os Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) (On-the-fly) (File Scan) (File Scan) Total Cost = 4060 I/Os Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Hash index) (Index Nested Loops, with pipelining ) (On-the-fly) (Hash index on sid) Total Cost = 1210 I/Os
  • 39. But, How Can we Ensure Correctness? Canonical form Still the same result! How can this be guaranteed? Reserves Sailors sid=sid bid=100 rating > 5 sname Reserves Sailors sid=sid bid=100 sname rating > 5
  • 40. Outline A Brief Primer on Query Optimization Evaluating Query Plans Relational Algebra Equivalences Estimating Plan Costs Enumerating Plans 
  • 41. Relational Algebra Equivalences  A relational query optimizer uses relational algebra equivalences to identify many equivalent expressions for a given query  Two relational algebra expressions over the same set of input relations are said to be equivalent if they produce the same result on all relations’ instances  Relational algebra equivalences allow us to:  Push selections and projections ahead of joins  Combine selections and cross-products into joins  Choose different join orders
  • 42. RA Equivalences: Selections  Two important equivalences involve selections: 1. Cascading of Selections: 2. Commutation of Selections:       s s s c cn c cn R R 1 1    ... ...         s s s s c c c c R R 1 2 2 1  Allows us to combine several selections into one selection OR: Allows us to replace a selection with several smaller selections Allows us to test selection conditions in either order
  • 43. RA Equivalences: Projections  One important equivalence involves projections:  Cascading of Projections: This says that successively eliminating columns from a relation is equivalent to simply eliminating all but the columns retained by the final projection!         R R an a a p p p ... 1 1 
  • 44. RA Equivalences: Cross-Products and Joins  Two important equivalences involve cross-products and joins: 1. Commutative Operations: This allows us to choose which relation to be the inner and which to be the outer! (R × S) (S × R)  (R S) (S R)   
  • 45. RA Equivalences: Cross-Products and Joins  Two important equivalences involve cross-products and joins: 2. Associative Operations: This says that regardless of the order in which the relations are considered, the final result is the same! R × (S × T) (R × S) × T   R (S T) (R S) T       R (S T) (T R) S      It follows: This order-independence is fundamental to how a query optimizer generates alternative query evaluation plans
  • 46. RA Equivalences: Selections, Projections, Cross Products and Joins  Selections with Projections:  Selections with Cross-Products: This says we can commute a selection with a projection if the selection involves only attributes retained by the projection! )) ( ( )) ( ( R R a c c a p s s p  R T  c   ) ( S R c  s This says we can combine a selection with a cross-product to form a join (as per the definition of a join)!
  • 47. RA Equivalences: Selections, Projections, Cross Products and Joins  Selections with Cross-Products and with Joins: S R c S R c    ) ( ) ( s s This says we can commute a selection with a cross-product or a join if the selection condition involves only attributes of one of the arguments to the cross-product or join! S R c S R c     ) ( ) ( s s  Caveat: The attributes mentioned in c must appear only in R and NOT in S
  • 48. RA Equivalences: Selections, Projections, Cross Products and Joins  Selections with Cross-Products and with Joins (Cont’d): ) ( 3 2 1 ) ( S R c c c S R c      s s This says we can push part of the selection condition c ahead of the cross-product! ))) ( 3 ( 2 ( 1 S R c c c   s s s )) ( 3 ) ( 2 ( 1 S c R c c s s s   This applies to joins as well!
  • 49. RA Equivalences: Selections, Projections, Cross Products and Joins  Projections with Cross-Products and with Joins: ) ( 2 ) ( 1 ) ( S a R a S R a p p p    Intuitively, we need to retain only those attributes of R and S that are either mentioned in the join condition c or included in the set of attributes a retained by the projection ) ( 2 ) ( 1 ) ( S a c R a S c R a p p p      )) ( 2 ) ( 1 ( ) ( S a c R a a S c R a p p p p     
  • 50. How to Estimate the Cost of Plans?  Now that correctness is ensured, how can the DBMS estimate the costs of various plans? Canonical form Reserves Sailors sid=sid bid=100 rating > 5 sname Reserves Sailors sid=sid bid=100 sname rating > 5
  • 51. Next Class Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Queries Transaction Manager Lock Manager Recovery Manager Continue…

Notes de l'éditeur

  1. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.
  2. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.
  3. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.
  4. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.
  5. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.
  6. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.
  7. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.
  8. In step 2, the combination of all Query optimization is one of the most important tasks of a relational DBMS is used as a key for sorting.