3. Merge Sort
SORTING:-arranging array elements in ascending order
MERGE SORT:- A sorting technique to arrange array element in
ascending order
sorting elements by merging elements
combo of sorting and merging
4. Merge Sort (cont,,,)
CONCEPT:
1. Divide the list in half
2. Merge sort the first half
3. Merge sort the second half
4. Merge both halves back together
5. Merge Sort (cont,,,)
Merge sort is based on the divide-and-conquer paradigm.
To sort A[p .. r]:
1. Divide Step
If a given array A has zero or one element, simply return; it is already
sorted. Otherwise, split A[p .. r] into two subarrays A[p .. q] and A[q + 1 .. r],
each containing about half of the elements of A[p .. r]. That is, q is the
halfway point of A[p .. r].
2. Conquer Step
Conquer by sorting the two subarrays A[p .. q] and A[q + 1 .. r].
6. Merge Sort (cont,,,)
3. Combine Step
Combine the elements back in A[p .. r] by merging the two sorted
subarrays A[p .. q] and A[q + 1 .. r] into a sorted sequence. To accomplish
this step, we will define a procedure MERGE (A, p, q, r).
Note that the recursion bottoms out when the subarray has just one
element, so that it is trivially sorted.
7. Merge sort ( cont,,,)
Merge sort (divide-and-conquer)
Divide array into two halves.
A L G O R I T H M S
divideA L G O R I T H M S
8. Merge sort(cont,,,)
Mergesort (divide-and-conquer)
Divide array into two halves.
Recursively sort each half.
sort
A L G O R I T H M S
divideA L G O R I T H M S
A G L O R H I M S T
9. Merge sort(cont,,,)
Mergesort (divide-and-conquer)
Divide array into two halves.
Recursively sort each half.
Merge two halves to make sorted whole.
merge
sort
A L G O R I T H M S
divideA L G O R I T H M S
A G L O R H I M S T
A G H I L M O R S T
10. auxiliary array
smallest smallest
A G L O R H I M S T
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
A
11. auxiliary array
smallest smallest
A G L O R H I M S T
A
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
G
12. auxiliary array
smallest smallest
A G L O R H I M S T
A G
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
H
13. auxiliary array
smallest smallest
A G L O R H I M S T
A G H
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
I
14. auxiliary array
smallest smallest
A G L O R H I M S T
A G H I
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
L
15. auxiliary array
smallest smallest
A G L O R H I M S T
A G H I L
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
M
16. auxiliary array
smallest smallest
A G L O R H I M S T
A G H I L M
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
O
17. auxiliary array
smallest smallest
A G L O R H I M S T
A G H I L M O
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
R
18. auxiliary array
first half
exhausted smallest
A G L O R H I M S T
A G H I L M O R
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
S
19. auxiliary array
first half
exhausted smallest
A G L O R H I M S T
A G H I L M O R S
Merging
Merge.
Keep track of smallest element in each sorted half.
Insert smallest of two elements into auxiliary array.
Repeat until done.
T
20. Algorithm Merge sort
Input :- An array A[l…r] where l and r are the lower and upper index
of A.
Output :- Array a[l…r] with all element arranged in ascending order .
Steps:
1. if(r<=l) then
2. Return
3. else
4. Mid=[(l+r)/2]
5. MergeSort(A[l…mid])
6. MergeSort(A[mid+1…r])
7. Merge(A,L,mid,r)
8. Endif
9. Stop
21. Pros:
It is a stable sort, and there is no worst-case scenario.
It is faster, the temp array holds the resulting array until both left and right
sides are merged into the temp array, then the temp array is appended
over the input array.
It is used in tape drives to sort data - its good with parallel processing.
Cons:
The memory requirement is doubled.
Takes longer to merge because if the next element is in the right side then
all of the elements must be moved down.
23. Definition of a B-tree
A B-tree of order m is an m-way tree (i.e., a tree where
each node may have up to m children) in which:
1. the number of keys in each non-leaf node is one less
than the number of its children and these keys partition the
keys in the children in the fashion of a search tree
2.all leaves are on the same level
3.all non-leaf nodes except the root have at least m / 2
children
5. a leaf node contains no more than m – 1 keys
The number m should always be odd
B-Trees 23
24. An example B-Tree
B-Trees 24
51 6242
6 12
26
55 60
7064 90
45
1 2 4 7 8 13 15 18 25
27 29 46 48 53
A B-tree of order 5
containing 26 items
Note that all the leaves are at the same levelNote that all the leaves are at the same level
25. Constructing a B-tree
Suppose we start with an empty B-tree and keys
arrive in the following order:1 12 8 2 25 5 14
28 17 7 52 16 48 68 3 26 29 53 55 45
We want to construct a B-tree of order 5
The first four items go into the root:
To put the fifth item in the root would violate
condition 5
Therefore, when 25 arrives, pick the middle key to
make a new root
B-Trees 25
1 2 8 12
27. Constructing a B-tree (cont,,,)
B-Trees
27
1 2
8
12 25
6, 14, 28 get added to the leaf nodes:
1 2
8
12 146 25 28
28. Constructing a B-tree (cont,,,)
B-Trees
Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17
12 14 25 281 2 6
7, 52, 16, 48 get added to the leaf nodes
8 17
12 14 25 281 2 6 16 48 527
29. Constructing a B-tree (cont,,,)
B-Trees
Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves
3 8 17 48
52 53 55 6825 26 28 291 2 6 7 12 14 16
Adding 45 causes a split of 25 26 28 29
and promoting 28 to the root then causes the root to split
31. Inserting into a B-Tree
Attempt to insert the new key into a leaf
If this would result in that leaf becoming too big, split the
leaf into two, promoting the middle key to the leaf’s
parent
If this would result in the parent becoming too big, split
the parent into two, promoting the middle key
This strategy might have to be repeated all the way to
the top
If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
B-Trees
36. Removal from a B-tree
• During insertion, the key always goes into a leaf. There
are three possible ways we can do this:
• 1 - If the key is already in a leaf node, and removing it
doesn’t cause that leaf node to have too few keys, then
simply remove the key to be deleted.
• 2 - If the key is not in a leaf then it is guaranteed (by the
nature of a B-tree) that its predecessor or successor will
be in a leaf -- in this case we can delete the key and
promote the predecessor or successor key to the non-
leaf deleted key’s position.
B-Trees
37. Removal from a B-tree (2)
3: if one of them has more than the min.
number of keys then we can promote one of
its keys to the parent and take the parent key
into our lacking leaf
if neither of them has more than the min.
number of keys then the lacking leaf and one
of its neighbours can be combined with their
shared parent (the opposite of promoting a
key) and the new leaf will have the correct
number of keys.
B-Trees
43. Hashing
• Is a means used to order and access elements
in a list quickly by using a function of the key
value to identify its location in the list.
• The function of the key value is called a hash
function.
44. Hashing
Idea:
– Use a function h to compute the slot for each key
– Store the element in slot h(k)
• A hash function h transforms a key into an index
in a hash table T[0…m-1]:
h : U {0, 1, . . . , m - 1}→
• We say that k hashes to slot h(k)
46. Advantages of Hashing
• Reduce the range of array indices handled:
m instead of |U|
where m is the hash table size: T[0, …, m-1]
• Storage is reduced.
• simplicity
47. Properties of Good Hash Functions
• Good hash function properties
(1) Easy to compute
(2) Approximates a random function
i.e., for every input, every output is equally likely.
(3) Minimizes the chance that similar keys hash to the
same slot
i.e., strings such as pt and pts should hash to different slot.
• We will discuss two methods:
– Division method
– Multiplication method
48. The Division Method
• Idea:
– Map a key k into one of the m slots by taking
the remainder of k divided by m
h(k) = k mod m
• Advantage:
– fast, requires only one operation
• Disadvantage:
– Certain values of m are bad (i.e., collisions), e.g.,
• power of 2
• non-prime numbers
49. The Multiplication Method
Idea:
(1) Multiply key k by a constant A, where 0 < A < 1
(2) Extract the fractional part of kA
(3) Multiply the fractional part by m
(4) Truncate the result
h(k) = = m (k A mod 1)
• Disadvantage: Slower than division method
• Advantage: Value of m is not critical
fractional part of kA = kA - kA. ., 12.3 12e g =
50. Example – Multiplication Method
Suppose k=6, A=0.3, m=32
(1) k x A = 1.8
(2) fractional part:
(3) m x 0.8 = 32 x 0.8 = 25.6
(4)
1.8 1.8 0.8− =
25.6 25= h(6)=25
52. • For a given set K of keys:
– If |K| ≤ m, collisions may or may not happen,
depending on the hash function!
– If |K| > m, collisions will definitely happen
• i.e., there must be at least two keys that have the same hash
value
• Avoiding collisions completely might not be easy.
Collisions
54. Chaining
• Idea:
– Put all elements that hash to the same slot into a
linked list
– Slot j contains a pointer to the head of the list of all
elements that hash to j
55. Chaining
• How to choose the size of the hash table m?
– Small enough to avoid wasting space.
– Large enough to avoid many collisions and keep
linked-lists short.
– Typically 1/5 or 1/10 of the total number of elements.
• Should we use sorted or unsorted linked lists?
– Unsorted
• Insert is fast
• Can easily remove the most recently inserted elements
56. Double Hashing
(1) Use one hash function to determine the first slot.
(2) Use a second hash function to determine the
increment for the probe sequence:
h(k,i) = (h1(k) + i h2(k) ) mod m, i=0,1,...
• Initial probe: h1(k)
• Second probe is offset by h2(k) mod m, so on ...
• Advantage: handles clustering better
• Disadvantage: more time consuming
• How many probe sequences can double hashing
generate?
m2
57. Double Hashing: Example
h1(k) = k mod 13
h2(k) = 1+ (k mod 11)
h(k,i) = (h1(k) + i h2(k) ) mod 13
• Insert key 14:
i=0: h(14,0) = h1(14) = 14 mod 13 = 1
i=1: h(14,1) = (h1(14) + h2(14)) mod 13
= (1 + 4) mod 13 = 5
i=2: h(14,2) = (h1(14) + 2 h2(14)) mod 13
= (1 + 8) mod 13 = 9
0
9
4
2
3
1
5
6
7
8
10
11
12
14