FP-Tree is also a huge hierarchical data structure and cannot fit into the main memory also it is not suitable for “Incremental-mining” nor used in “Interactive-mining” system
2. What is a frequent pattern?
Pattern (set of items, sequence, etc.) that occurs together frequently in a database
Example: Market basket analysis
2
3. Frequent patterns play an essential role in association Rule
An association rule is an implication of the form[2] :
X → Y, where X, Y ⊂ I, and X ∩Y = ∅
A transaction t contains X, a set of items in I, if X ⊆ t.
Each rule has two quality measurements:
“A → Β [support s, confidence c]”.
Support: usefulness of discovered rules
Confidence: certainty of the detected association
Rules that satisfy both min_sup and min_conf are called strong.
3
n
countYX
support
).( ∪
=
countX
countYX
confidence
.
).( ∪
=
4. min_support = 3min_support = 3
4
TID Items (Ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f , c, e, l, p, m, n} {f, c, a, m, p}
6. Most of the algorithms (like Apriori) attains good performance, gained by decreasing the magnitude of candidate sets. But, in
situations with a huge number of frequent patterns, it might undergo into the multiple passes over the entire database which
makes it costly to tolerate a vast number of candidate sets.
FP-Tree is a compressed form of original database because only frequent sets are used to construct a tree as well as mining is
performed only over this frequent pattern tree & all the irrelevant elements are pruned. So, it requires two scans which
decreases the computational cost and also reduces the size of subsequent items.
But, the problem is that FP-Tree is also a huge hierarchical data structure and cannot fit into the main memory also it is not
suitable for “Incremental-mining” nor used in “Interactive-mining” system.
The time complexity of FP-Growth Tree is very high because it takes large execution time to process the large number of
transactions.
6
7. .
There are following objectives for parallel scheme and partition scheme, FP tree over other procedures:-
It constructs a highly condensed parallel and partition strategy, which is usually significantly smaller than the unique
database, and thus saves the overpriced database scans in the successive mining processes.
By using projection practice into the activity of tree-construction, we save the costly repeating items scans, which hugely
shorten the time of tree-creation. And this presentation is much more accessible than the FP-tree method.
It put on a partitioning-based divide-and-conquer technique, which dramatically decomposes the mining task & also
decreases the search space of the Projected Frequent Pattern-trees.
7
8. Projection Methods
There are two methods for database projection:
oParallel projection
oPartition projection
8
9. Scan the database to be projected once, where the database could be either an operation database or an α-projected database. Since
more than one program will execute at a time and all the projected datasets are stored in the same memory location from where they can
be retrieved easily, it is called parallel projection.
Parallel projection facilitates parallel processing because all the projected databases are available for mining at the end of
the scan, and these projected databases can be mined in parallel also it takes more memory.
9
10. Architectural View of FP-Growth Tree with ParallelArchitectural View of FP-Growth Tree with Parallel
Projected DatabaseProjected Database
10
12. Scan the database (original or α-projected) to be projected. Since an operation is projected to only one projected database
scan, after scanning process the entire database is partitioned logically by the projection scheme into a set of projected
segments & each segment is processed separately with its own local memory, it is called partition projection.
The advantage of partition projection is that
The total size of the projected databases at each level is smaller than the original database.
It usually takes less memory and I/O’s to complete the partition projection.
12
13. Architectural View of FP-Growth Tree with PartitionArchitectural View of FP-Growth Tree with Partition
Projection DatabaseProjection Database
13
15. It applies a partitioning-based divide-and-conquer method, which dramatically reduces the size of the subsequent
conditional pattern bases and conditional PFP-trees.
It constructs a highly compact PFP-tree, which is usually substantially smaller than the original database, and thus saves the
costly database scans in the subsequent mining processes.
By using projection technique into the process of tree-construction, we save the expensive frequent items scans in. And the
performance is much more scalable than the FP-tree method.
15
16. This application not having its own storage management. It depends on SQL SERVER- data base package.
The application has no window based GUI.
The application will work only for VB net (7.0) higher version.
The application is based on Boolean association rules.
This application is only work for 30 items not more than that.
16
17. [1] JIAWEI HAN “Technologies for Mining Frequent Patterns in Large Databases”, Simon Fraser University, canada.
[2] R. Agrawal and R. Srikant. “Fast algorithms for mining association rules”. In Proc. VLDB’94, Chile, September 1994
[3] Akshita Bhandari, Ashutosh Gupta, Debasis Das “Improvised apriori algorithm using frequent pattern tree for real time
applications in data mining” in Elsevier2014.
[4] O.Jamsheela, Raju.G: “An Adaptive Method for Mining Frequent Itemsets Efficiently: An Improved Header Tree Method” In
IEEE2015.
[5] Wei-Tee Lin and Chih-Ping Chu “Using Appropriate Number of Computing Nodes for Parallel Mining of Frequent Patterns”
in IEEE2014.
[6] Dang Nguyen , Bay Vo , Bac Le “Efficient strategies for parallel mining class association rules” in Elsevier 2014.
[7] Sheetal Rathi , Dr.Chandrashekhar.A.Dhote “Using Parallel Approach in Pre-processing to Improve Frequent Pattern Growth
Algorithm” in IEEE2014.
17