trees

TREES
Table of Contents:
•Heapsort
• B Tree
• Huffman’s Algorithm

Heap
• Suppose H is a complete binary tree with n
elements. Then H is called a heap or a maxheap
if each node N of H has the property that value
of N is greater than or equal to value at each of
the children of N.
• Analogously, a minheap is a heap such that
value at N is less than or equal to the value of
each of its children

97
Example of Max Heap
88 95
66 55 87 48
35 24 62 77 25 38
26
48
40 39
30
18 17

Inserting an Element in a Heap
Suppose H is a heap with N elements, and suppose an ITEM of
information is given. We insert ITEM into the heap H as follows:
• First adjoin the ITEM at the end of H so that H is still a complete
tree but not necessarily a heap.
• Then let the ITEM rise to its appropriate place in H so that H is
finally a heap.
[Heap is more efficiently implemented through array rather than linked list. In a heap, the
location of parent of a node PTR is given by PTR/2 ]

Build a Maxheap
Following are the elements:
44,30,50, 22,60,55,77,55

Algorithm: INSHEAP( TREE, N, ITEM)
A heap H with N elements is stored in the array TREE and an ITEM of information is given. This
procedure inserts the ITEM as the new element of H. PTR gives the location of ITEM as it rises in
the tree and PAR denotes the parent of ITEM
1. [Add new node to H and Initialize PTR]
Set N:= N +1 and PTR:=N
2. [Find Location to Insert ITEM
Repeat steps 3 to 6 while PTR > 1
3. Set PAR:= └PTR/2 ┘ [Location of Parent node]
4. If ITEM ≤ TREE[PAR], then:
Set TREE[PTR]:=ITEM and Return
[End of If Structure]
5. Set TREE[PTR]:=TREE[PAR] [Moves node down]
6. Set PTR:=PAR [updates PTR]
[End of step 2 Loop]
7. Set TREE[1]:=ITEM
8. Return

Deleting the Root node in a heap
Suppose H is a heap with N elements and suppose we want
to delete the root R of H. This is accomplished as follows:
• Assign the root R to some variable ITEM
• Replace the deleted node R by last node L of H so that H is
still a complete tree but not necessarily a heap.
• Let L sink to its appropriate place in H so that H is finally a
heap.

95
85 70
55 33 30 65
15 20 15 22
DELETE 95

Algorithm: DELHEAP( TREE, N , ITEM )
A heap H with N elements is stored in the array TREE. This
algorithm assigns the root TREE[1] of H to the variable ITEM
and then reheaps the remaining elements. The variable LAST
stores the value of the original last node of H. The pointers PTR,
LEFT and RIGHT give the Location of LAST and its left and
right children as LAST sinks into the tree.

1: Set ITEM:=TREE[1] [removes root of H]
2: Set LAST:=TREE[N] and N:=N-1 [removes last node of H]
3: Set PTR:=1, LEFT:=2 and RIGHT:=3
4: Repeat step 5 to 7 while RIGHT ≤ N:
5: If LAST ≥ TREE[LEFT] and LAST ≥ TREE [RIGHT] , then:
Set TREE[PTR]:=LAST and Return
6: If TREE[RIGHT]≤ TREE[LEFT], then:
Set TREE[PTR]:=TREE[LEFT]
Set PTR:=LEFT
Else:
Set TREE[PTR]:=TREE[RIGHT] and PTR:=RIGHT
[End of If structure]
Set LEFT:= 2* PTR and RIGHT:=LEFT + 1
[End of Loop]
7: If LEFT=N and If LAST < TREE[LEFT], then:
Set TREE[PTR]:=TREE[LEFT] and Set PTR:=LEFT
8: Set TREE[PTR]:=LAST

Application of Heap
HeapSort- One of the important applications of heap
is sorting of an array using heapsort method. Suppose
an array A with N elements is to be sorted. The
heapsort algorithm sorts the array in two phases:
• Phase A: Build a heap H out of the elements of A
• Phase B: Repeatedly delete the root element of H
Since the root element of heap contains the largest
element of the heap, phase B deletes the elements in
decreasing order. Similarly, using heapsort in minheap
sorts the elements in increasing order as then the root
represents the smallest element of the heap.

Algorithm: HEAPSORT(A,N)
An array A with N elements is given. This algorithm sorts
the elements of the array
• Step 1: [Build a heap H, call the procedure ]
Repeat for J=1 to N-1:
Call INSHEAP(A, J, A[J+1])
[End of Loop]
• Step 2: [Sort A repeatedly deleting the root of H]
Repeat while N > 1:
(a) Call DELHEAP( A, N, ITEM)
(b) Set A[N + 1] := ITEM [Store the elements deleted from
the heap]
[End of loop]
• Step 3: Exit

Complexity of HeapSort
• Phase1 (Build a heap H out of the ‘n’ elements of A):
g(n) ≤ nlog2n
• Phase 2 (Repeatedly delete the root element of H):
h(n) ≤ nlog2n
Therefore, f(n) = O(nlog2n) (In worst case)
• Better than Bubblesort (O(n2 )) and Quicksort (Avg-
O(nlog2n), Worst O(n2 ))

• Problem: Create a Heap out of the following data:
jan feb mar apr may jun jul aug sept oct nov dec

Motivation for B-Trees
• Index structures for large datasets cannot be stored
in main memory
• Storing it on disk requires different approach to
efficiency
• Assuming that a disk spins at 3600 RPM, one
revolution occurs in 1/60 of a second, or 16.7ms
• Crudely speaking, one disk access takes about the
same time as 200,000 instructions

Motivation (cont.)
• Assume that we use an AVL tree to store about 20
million records
• We end up with a very deep binary tree with lots of
different disk accesses; log2 20,000,000 is about 24,
so this takes about 0.2 seconds
• We know we can’t improve on the log n lower
bound on search for a binary tree
• But, the solution is to use more branches and thus
reduce the height of the tree!
– As branching increases, depth decreases

Definition of a B-tree
• A B-tree of order m is an m-way tree (i.e., a tree
where each node may have up to m children) in
which:
1.the number of keys in each non-leaf node is one less than
the number of its children and these keys partition the
keys in the children in the fashion of a search tree
2.all leaves are on the same level
3.all non-leaf nodes except the root have at least ém / 2ù
children
4.the root is either a leaf node, or it has from two to m
children
5.a leaf node contains no more than m – 1 keys
• The number m should always be odd

An example B-Tree
A B-tree of order 5
containing 26 items
42 51 62
6 12
26
1 2 4 7 8 13 15 18 25
27 29 46 48 53
45 55 60 64 70 90
NNoottee tthhaatt aallll tthhee lleeaavveess aarree aatt tthhee ssaammee lleevveell

Constructing a B-tree
• Suppose we start with an empty B-tree and keys
arrive in the following order:1 12 8 2 25 5 14 28
17 7 52 16 48 68 3 26 29 53 55 45
• We want to construct a B-tree of order 5
• The first four items go into the root:
1 2 8 12
• To put the fifth item in the root would violate
condition 5
• Therefore, when 25 arrives, pick the middle key to
make a new root

Constructing a B-tree (contd.)
1 2
8
12 25
6, 14, 28 get added to the leaf nodes:
1 2
8
6 12 14 25 28

Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17
1 2 6 12 14 25 28
7, 52, 16, 48 get added to the leaf nodes
8 17
1 2 6 7 12 14 16 25 28 48 52

Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves
3 8 17 48
1 2 6 7 12 14 16 25 26 28 29 52 53 55 68
Adding 45 causes a split of 25 26 28 29
and promoting 28 to the root then causes the root to split

17
3 8 28 48
1 2 6 7 12 14 16 25 26 29 45 52 53 55 68

Inserting into a B-Tree
• Attempt to insert the new key into a leaf
• If this would result in that leaf becoming too big,
split the leaf into two, promoting the middle key to
the leaf’s parent
• If this would result in the parent becoming too big,
split the parent into two, promoting the middle key
• This strategy might have to be repeated all the way
to the top
• If necessary, the root is split in two and the middle
key is promoted to a new root, making the tree one
level higher

Exercise in Inserting a B-Tree
• Insert the following keys to a 5-way B-tree:
3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35,
56

Removal from a B-tree
• During insertion, the key always goes into a leaf.
For deletion we wish to remove from a leaf. There
are three possible ways we can do this:
CASE: 1 - If the key is already in a leaf node, and
removing it doesn’t cause that leaf node to have too
few keys, then simply remove the key to be deleted.
CASE: 2 - If the key is not in a leaf then it is
guaranteed (by the nature of a B-tree) that its
predecessor or successor will be in a leaf -- in this
case we can delete the key and promote the
predecessor or successor key to the non-leaf deleted
key’s position.

Removal from a B-tree (2)
• If (1) or (2) lead to a leaf node containing less than
the minimum number of keys then we have to look
at the siblings immediately adjacent to the leaf in
question:
CASE: 3- If one of them has more than the min. number of
keys then we can promote one of its keys to the parent and
take the parent key into our lacking leaf
CASE:4 - If neither of them has more than the min. number
of keys then the lacking leaf and one of its neighbours can
be combined with their shared parent (the opposite of
promoting a key) and the new leaf will have the correct
number of keys; if this step leave the parent with too
few keys then we repeat the process up to the root
itself, if required

Type #1: Simple leaf deletion
1122 2299 5522
Assuming a 5-way
B-Tree, as before...
22 77 99 1155 2222 3311 4433 5566 6699 7722
Delete 2: Since there are enough
keys in the node, just delete it

Type #2: Simple non-leaf deletion
1122 2299 5522
Delete 52
77 99 1155 2222 3311 4433 5566 6699 7722
Borrow the predecessor
or (in this case) successor
5566
Delete 52
Delete 72

Type #4: Too few keys in node and its
siblings
1122 2299 5566
Join back together
77 99 1155 2222 3311 4433 6699 7722
Too few keys!
Delete 72

Type #4: Too few keys in node and its
siblings
1122 2299
77 99 1155 2222 3311 4433 5566 6699
Delete 22

Type #3: Enough siblings
1122 2299
77 99 1155 2222 3311 4433 5566 6699
Delete 22
Demote root key and
promote leaf key

Type #3: Enough siblings
1122
3311
77 99 1155 2299
4433 5566 6699

Deletion Exercise
Given a B tree of Order 5
Delete 95,226

Result after deletion of 95,226
Delete 221

Result after deletion of 221
Delete 70

Exercise to do
• Given 5-way B-tree created by these data (last
exercise):
3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35,
56
• Further Add the following keys:
– 2, 6,12
• Delete the following keys:
– 4, 5, 7, 3, 14

Comparing Trees
• Binary trees
– Can become unbalanced and lose their good time complexity
(big O)
– AVL trees are strict binary trees that overcome the balance
problem
– Heaps remain balanced but only prioritise (not order) the keys
• Multi-way trees
– B-Trees can be m-way, they can have any (odd) number of
children
– One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a
permanently balanced binary tree, exchanging the AVL tree’s
balancing operations for insertion and (more complex) deletion
operations

Huffman Coding:
An Application of Binary Trees and
Priority Queues

Encoding and Compression of Data
• ASCII
• Variations on ASCII
– min number of bits needed
– cost of savings
– patterns
– modifications

Purpose of Huffman Coding
• Proposed by Dr. David A. Huffman in 1952
– “A Method for the Construction of Minimum
Redundancy Codes”
• Applicable to many forms of data transmission
– Our example: text files

The Basic Algorithm
• Huffman coding is a form of statistical coding
• Not all characters occur with the same frequency!
• Yet all characters are allocated the same amount of
space
– 1 char = 1 byte, be it e or x

The Basic Algorithm
• Any savings in tailoring codes to frequency of
character?
• Code word lengths are no longer fixed like ASCII.
• Code word lengths vary and will be shorter for the
more frequently used characters.

The (Real) Basic Algorithm
1. Scan text to be compressed and tally occurrence of all
characters.
2. Sort or prioritize characters based on number of
occurrences in text.
3. Build Huffman code tree based on prioritized list.
4. Perform a traversal of tree to determine all code words.
5. Scan text again and create new file using the Huffman
codes.

Building a Tree
Scan the original text
• Consider the following short text:
Eerie eyes seen near lake.
• Count up the occurrences of all characters in the text

Building a Tree
• What characters are present?
E e r i space
y s n a r l k .

Building a Tree
• What is the frequency of each character in the text?

Building a Tree
Prioritize characters
• Create binary tree nodes with character and
frequency of each character
• Place nodes in a priority queue
– The lower the occurrence, the higher the priority in the
queue

Building a Tree
• The queue after inserting all nodes
E
1
i
1
y
1
l
1
k
1
.
1
• Null Pointers are not shown
r
2
s
2
n
2
a
2
sp
4
e
8

Building a Tree
• While priority queue contains two or more nodes
– Create new node
– Dequeue node and make it left subtree
– Dequeue next node and make it right subtree
– Frequency of new node equals sum of frequency of left and right
children
– Enqueue new node back into queue

Building a Tree
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8

Building a Tree
E
i
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2

Building a Tree
E
i
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8

Building a Tree
E
i
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
y
l

Building a Tree
E
i
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
2
y
l

Building a Tree
E
i
r
2
s
2
n
2
a
2
sp
4
e
8
2
2
y
l
2
k
.

Building a Tree
E
i
n
2
a
2
sp
4
e
8
2
2
y
l
2
k
.
4
r
s

Building a Tree
E
i
sp
4
e
8
2
2
y
l
2
k
.
4
r
s
4
n
a

Building a Tree
4
E
i
sp
4
e
8
2
y
l
2
2
k
.
r
s
4
n
a
4

Building a Tree
E
i
sp
4
e
2 8
y
l
2
2
k
.
4
r
s
4 n
a
4

Building a Tree
E
i
sp
e
2 8
y
l
2
4 2
k
.
4
r
s
n
a
4
6

Building a Tree
E
i
sp
e
8
2
y
l
2
2
k
.
4
r
s
4 n
a
4 6
What is happening to the characters with a low number of occurrences?

Building a Tree
E
i
sp
e
2 8
y
l
2
2
k
.
4
r
s
4
n
a
4 6
8

Building a Tree
E
i
sp
e
2 8
y
l
2
2
k
.
4
r
s
4
n
a
4 6 8

Building a Tree
E
i
sp
e
8
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6
8
10

Building a Tree
E
i
sp
e
8
2
y
l
2
k
.
r 2
s
4
4 n
a
4
6
8 10

Building a Tree
E
i
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6
8
10
16

Building a Tree
E
i
10 16
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6 8

Building a Tree
E
i
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6 8
10
16
26

Building a Tree
E
i
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6 8
10
16
26
After enqueueing this node
there is only one node left in
priority queue.

Building a Tree
Dequeue the single node left in the
queue.
This tree contains the new code
words for each character.
Frequency of root node should equal
number of characters in text.
E
i
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6 8
10
16
26
Eerie eyes seen near lake. 26 characters

Encoding the File
Traverse Tree for Codes
• Perform a traversal of the tree
to obtain new code words
• Going left is a 0 going right is
a 1
• code word is only completed
when a leaf node is reached
E
i
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6 8
10
16
26

Encoding the File
Traverse Tree for Codes
Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space011
e 10
r 1100
s 1101
n 1110
a 1111
E
i
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6 8
10
16
26

Encoding the File
• Rescan text and encode file
using new code words
Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space011
e 10
r 1100
s 1101
n 1110
a 1111
000010110000011001110001010110110
100111110101111110001100111111010
0100101
· Why is there no need for a
separator character?
.

Encoding the File
Results
• Have we made things any
better?
• 73 bits to encode the text
• ASCII would take 8 * 26 = 208
bits
000010110000011001110001010110110
100111110101111110001100111111010
0100101
If modified code used 4 bits per character are needed. Total bits
4 * 26 = 104. Savings not as great.

Decoding the File
• How does receiver know what the codes are?
• Tree constructed for each text file.
– Considers frequency for each file
– Big hit on compression, especially for smaller files
• Tree predetermined
– based on statistical analysis of text files or file types
• Data transmission is bit based versus byte based

Decoding the File
• Once receiver has tree it scans
incoming bit stream
• 0 Þ go left
• 1 Þ go right
E
i
sp
e
2
y
l
2
2
k
.
4
r
s
4
n
a
4
6 8
10
16
26
10100011011110111101
111110000110101

Summary
• Huffman coding is a technique used to compress files
for transmission
• Uses statistical coding
– more frequently used symbols have shorter code words
• Works well for text and fax transmissions
• An application that uses several data structures

HUFFMAN’S Algorithm
• Data ITEM : A B C D E F G H
• Weight 22 5 11 19 2 11 25 5

trees

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à trees

Similaire à trees (20)

Plus de Shankar Bishnoi

Plus de Shankar Bishnoi (6)

Dernier

Dernier (20)

trees