SlideShare une entreprise Scribd logo
1  sur  16
ODAM An Optimized Distributed Association
Rule Mining Algorithm

(Synopsis)
INTRODUCTION
Data mining, the extraction of hidden predictive information
from large databases, is a powerful new technology with great
potential to help companies focus on the most important information in
their data warehouses. Data mining tools predict future trends and
behaviors, allowing businesses to make proactive, knowledge-driven
decisions. The automated, prospective analyses offered by data mining
move beyond the analyses of past events provided by retrospective
tools typical of decision support systems. Data mining tools can
answer business questions that traditionally were too time consuming
to resolve. They scour databases for hidden patterns, finding
predictive information that experts may miss because it lies outside
their expectations.
Most companies already collect and refine massive quantities of
data. Data mining techniques can be implemented rapidly on existing
software and hardware platforms to enhance the value of existing
information resources, and can be integrated with new products and
systems as they are brought on-line. When implemented on high
performance client/server or parallel processing computers, data
mining tools can analyze massive databases to deliver answers to
questions such as, "Which clients are most likely to respond to my
next promotional mailing, and why?"
Data mining (DM), also called Knowledge-Discovery in
Databases (KDD) or Knowledge-Discovery and Data Mining, is the
process of automatically searching large volumes of data for patterns
using tools such as classification, association rule mining, clustering,
etc.. Data mining is a complex topic and has links with multiple core
fields such as computer science and adds value to rich seminal
computational techniques from statistics, information retrieval,
machine learning and pattern recognition.
Data mining techniques are the result of a long process of research
and product development. This evolution began when business data
was first stored on computers, continued with improvements in data
access, and more recently, generated technologies that allow users to
navigate through their data in real time. Data mining takes this
evolutionary process beyond retrospective data access and navigation
to prospective and proactive information delivery. Data mining is ready
for application in the business community because it is supported by
three technologies that are now sufficiently mature:
o Massive data collection
o Powerful multiprocessor computers
o Data mining algorithms
Commercial databases are growing at unprecedented rates. A recent
META Group survey of data warehouse projects found that 19% of
respondents are beyond the 50 gigabyte level, while 59% expect to be
there by second quarter of 1996.1 In some industries, such as retail,
these numbers can be much larger. The accompanying need for
improved computational engines can now be met in a cost-effective
manner with parallel multiprocessor computer technology. Data mining
algorithms embody techniques that have existed for at least 10 years,
but have only recently been implemented as mature, reliable,
understandable tools that consistently outperform older statistical
methods.

With the explosive growth of information sources available on
the World Wide Web, it has become increasingly necessary for users to
utilize automated tools in find the desired information resources, and
to track and analyze their usage patterns. These factors give rise to
the necessity of creating serverside and clientside intelligent systems
that can effectively mine for knowledge. Web mining can be broadly
defined as the discovery and analysis of useful information from the
World Wide Web. This describes the automatic search of information
resources available online, i.e. Web content mining, and the
discovery of user access patterns from Web servers, i.e., Web usage
mining.
Web Mining is the extraction of interesting and potentially
useful patterns and implicit information from artifacts or activity
related to the WorldWide Web. There are roughly three knowledge
discovery domains that pertain to web mining: Web Content Mining,
Web Structure Mining, and Web Usage Mining. Web content mining is
the process of extracting knowledge from the content of documents or
their descriptions. Web document text mining, resource discovery
based on concepts indexing or agent based technology may also fall in
this category. Web structure mining is the process of inferring
knowledge from the World Wide Web organization and links between
references and referents in the Web. Finally, web usage mining, also
known as Web Log Mining, is the process of extracting interesting
patterns in web access logs.
Web Content Mining
Web content mining is an automatic process that goes beyond
keyword extraction. Since the content of a text document
presents no machinereadable semantic, some approaches have
suggested to restructure the document content in a
representation that could be exploited by machines. The usual
approach to exploit known structure in documents is to use
wrappers to map documents to some data model. Techniques
using lexicons for content interpretation are yet to come.
There are two groups of web content mining strategies: Those
that directly mine the content of documents and those that
improve on the content search of other tools like search engines.
Web Structure Mining
WorldWide Web can reveal more information than just the
information contained in documents. For example, links pointing
to a document indicate the popularity of the document, while
links coming out of a document indicate the richness or perhaps
the variety of topics covered in the document. This can be
compared to bibliographical citations. When a paper is cited
often, it ought to be important. The PageRank and CLEVER
methods take advantage of this information conveyed by the
links to find pertinent web pages. By means of counters, higher
levels cumulate the number of artifacts subsumed by the
concepts they hold. Counters of hyperlinks, in and out
documents, retrace the structure of the web artifacts
summarized.
Web Usage Mining
Web servers record and accumulate data about user interactions
whenever requests for resources are received. Analyzing the web
access logs of different web sites
can help understand the user behaviour and the web structure,
thereby improving the design of this colossal collection of resources.
There are two main tendencies in Web Usage Mining driven by the
applications of the discoveries: General Access Pattern Tracking and
Customized Usage Tracking.
The general access pattern tracking analyzes the web logs to
understand access patterns and trends. These analyses can shed light
on better structure and grouping of resource providers. Many web
analysis tools existd but they are limited and usually unsatisfactory.
We have designed a web log data mining tool, WebLogMiner, and
proposed techniques for using data mining and OnLine Analytical
Processing (OLAP) on treated and transformed web access files.
Applying data mining techniques on access logs unveils interesting
access patterns that can be used to restructure sites in a more
efficient grouping, pinpoint effective advertising locations, and target
specific users for specific selling ads.
Customized usage tracking analyzes individual trends. Its purpose is to
customize web sites to users. The information displayed, the depth of
the site structure and the format of the resources can all be
dynamically customized for each user over time based on their access
patterns.
While it is encouraging and exciting to see the various potential
applications of web log file analysis, it is important to know that the
success of such applications depends on what and how much valid and
reliable knowledge one can discover from the large raw log data.
Current web servers store limited information about the accesses.
Some scripts customtailored for some sites may store additional
information. However, for an effective web usage mining, an important
cleaning and data transformation step before analysis may be needed.
Abstract
With the explosive growth of information sources available on
the World Wide Web, it has become increasingly necessary for users to
utilize automated tools in find the desired information resources, and
to track and analyze their usage patterns.
Association rule mining is an active data mining research area.
However, most ARM algorithms cater to a centralized environment. In
contrast to previous ARM algorithms, ODAM is a distributed algorithm
for geographically distributed data sets that reduces communication
costs. Recently, as the need to mine patterns across distributed
databases has grown, Distributed Association Rule Mining (D-ARM)
algorithms have been developed. These algorithms, however, assume
that the databases are either horizontally or vertically distributed. In
the special case of databases populated from information extracted
from textual data, existing D-ARM algorithms cannot discover rules
based on higher-order associations between items in distributed
textual

documents

that

are

neither

vertically

nor

horizontally

distributed, but rather a hybrid of the two.
Modern organizations are geographically distributed. Typically,
each site locally stores its ever increasing amount of day-to-day data.
Using centralized data mining to discover useful patterns in such
organizations' data isn't always feasible because merging data sets
from different sites into a centralized site incurs huge network
communication costs. Data from these organizations are not only
distributed over various locations but also vertically fragmented,
making it difficult if not impossible to combine them in a central
location. Distributed data mining has thus emerged as an active
subarea of data mining research.
A significant area of data mining research is association rule
mining. Unfortunately, most ARM algorithms focus on a sequential or
centralized environment where no external communication is required.
Distributed ARM algorithms, on the other hand, aim to generate rules
from different data sets spread over various geographical sites; hence,
they require external communications throughout the entire process.
DARM algorithms must reduce communication costs so that generating
global association rules costs less than combining the participating
sites' data sets into a centralized site. However, most DARM
algorithms don't have an efficient message optimization technique, so
they exchange numerous messages during the mining process. We
have developed a distributed algorithm, called Optimized Distributed
Association Mining, for geographically distributed data sets. ODAM
generates support counts of candidate itemsets quicker than other
DARM algorithms and reduces the size of average transactions, data
sets, and message exchanges.
Description of Problem
After the advent of computer the data are enormously available
and by making use of such raw collection data to invent the knowledge
is the process of Data Mining. Like wise in Web also plenty of Web
Documents resides in online. Web is repository of variety of
information like Technology, Science, History, Geography, Sports
Politics and others. If any one know about particular topic, then they
are using search engine to search for their requirements and it gives
full satisfaction for that user by giving entire related information about
the topic. We can categorize parallel ARM algorithms as dataparallelism or task-parallelism algorithms. In the former, the
algorithms partition the data sets among different nodes; in the latter,
each site performs the task independently but must access the entire
data set. The Count Distribution (CD) algorithm is a simple dataparallelism algorithm.2 It uses the sequential Apriori algorithm in a
parallel environment and assumes data sets are horizontally
partitioned among different sites.
DARM discovers rules from various geographically distributed data
sets. However, the network connection between those data sets isn't
as fast as in a parallel environment, so distributed mining usually aims
to minimize communication costs.
Existing Method
The Data mining Algorithms can be categorized into the following
:


Association Algorithm



Classification



Clustering Algorithm

Classification:
The process of dividing a dataset into mutually exclusive groups
such that the members of each group are as "close" as possible to one
another, and different groups are as "far" as possible from one
another, where distance is measured with respect to specific
variable(s) you are trying to predict. For example, a typical
classification problem is to divide a database of companies into groups
that are as homogeneous as possible with respect to a
creditworthiness variable with values "Good" and "Bad."
Clustering:
The process of dividing a dataset into mutually exclusive groups
such that the members of each group are as "close" as possible to one
another, and different groups are as "far" as possible from one
another, where distance is measured with respect to all available
variables.
Given databases of sufficient size and quality, data mining technology
can generate new business opportunities by providing these
capabilities:

•

Automated prediction of trends and behaviors. Data mining
automates the process of finding predictive information in large
databases. Questions that traditionally required extensive handson analysis can now be answered directly from the data —
quickly. A typical example of a predictive problem is targeted
marketing. Data mining uses data on past promotional mailings
to identify the targets most likely to maximize return on
investment in future mailings. Other predictive problems include
forecasting bankruptcy and other forms of default, and
identifying segments of a population likely to respond similarly to
given events.

•

Automated discovery of previously unknown patterns.
Data mining tools sweep through databases and identify
previously hidden patterns in one step. An example of pattern
discovery is the analysis of retail sales data to identify seemingly
unrelated products that are often purchased together. Other
pattern discovery problems include detecting fraudulent credit
card transactions and identifying anomalous data that could
represent data entry keying errors.
DARM discovers rules from various geographically distributed data
sets. However, the network connection between those data sets isn't
as fast as in a parallel environment, so distributed mining usually aims
to minimize communication costs.

Proposed System
Unlike other algorithms, ODAM offers better performance by
minimizing candidate itemset generation costs. It achieves this by
focusing on two major DARM issues communication and
synchronization. Communication is one of the most important DARM
objectives. DARM algorithms will perform better if we can reduce
communication (for example, message exchange size) costs.
Synchronization forces
each participating site to wait a certain period until globally frequent
itemset generation completes. Each site will wait longer if computing
support counts takes more time. Hence, we reduce the computation
time of candidate itemsets' support counts.
To reduce communication costs, we highlight several message
optimization techniques. ARM algorithms and on the message
exchange method, we can divide the message optimization techniques
into two methods direct and indirect support counts exchange. Each
method has different aims, expectations, advantages, and
disadvantages. For example, the first method exchanges each
candidate itemset's support count to generate globally frequent
itemsets of that pass (CD and FDM are examples of this approach). All
sites share a common globally frequent itemset with identical support
counts, so rules that are generated at different participating sites have
identical confidence. This approach focuses on a rule's exactness and
correctness.
System Requirement
Hardware specifications:
Processor
RAM

:
:

Intel Processor IV
128 MB

Hard disk

:

20 GB

CD drive

:

40 x Samsung

Floppy drive

:

1.44 MB

Monitor

:

15’ Samtron color

Keyboard
Mouse

:
:

108 mercury keyboard
Logitech mouse

Software Specification
Operating System – Windows XP/2000
Language used – J2sdk1.4.0, JCreator

Contenu connexe

Tendances

Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure miningAtul Khanna
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social mediaJeremiah Fadugba
 
Comparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining CategoriesComparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining Categoriestheijes
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesijctet
 
IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...IRJET Journal
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisInfini Graph
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slidesmahavir_a
 
Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web MiningIOSR Journals
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation FinalEr. Jagrat Gupta
 

Tendances (20)

Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
 
Web mining
Web miningWeb mining
Web mining
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
 
Web Usage Pattern
Web Usage PatternWeb Usage Pattern
Web Usage Pattern
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
 
H0314450
H0314450H0314450
H0314450
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
Web mining
Web miningWeb mining
Web mining
 
Comparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining CategoriesComparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining Categories
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
 
IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...IRJET-Computational model for the processing of documents and support to the ...
IRJET-Computational model for the processing of documents and support to the ...
 
Social Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & AnalysisSocial Targeting: Understanding Social Media Data Mining & Analysis
Social Targeting: Understanding Social Media Data Mining & Analysis
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Business Intelligence: A Rapidly Growing Option through Web Mining
Business Intelligence: A Rapidly Growing Option through Web  MiningBusiness Intelligence: A Rapidly Growing Option through Web  Mining
Business Intelligence: A Rapidly Growing Option through Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web mining Web mining
Web mining
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
 

Similaire à Odam an optimized distributed association rule mining algorithm (synopsis)

What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approachesAparna Sharma
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...IOSR Journals
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data MiningShobhita Dayal
 
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...IJAEMSJORNAL
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingPim Piepers
 
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...ijdkp
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET Journal
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media miningRoxana Tadayon
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage dataijfcstjournal
 
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Denis Shestakov
 

Similaire à Odam an optimized distributed association rule mining algorithm (synopsis) (20)

What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approaches
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
Web Mining
Web MiningWeb Mining
Web Mining
 
An introduction to Data Mining
An introduction to Data MiningAn introduction to Data Mining
An introduction to Data Mining
 
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
Certain Issues in Web Page Prediction, Classification and Clustering in Data ...
 
An introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt ThearlingAn introduction to Data Mining by Kurt Thearling
An introduction to Data Mining by Kurt Thearling
 
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
 
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
IRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search ResultsIRJET - Re-Ranking of Google Search Results
IRJET - Re-Ranking of Google Search Results
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media mining
 
Web
WebWeb
Web
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
 
Minning www
Minning wwwMinning www
Minning www
 
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
 

Plus de Mumbai Academisc

Plus de Mumbai Academisc (20)

Non ieee java projects list
Non  ieee java projects list Non  ieee java projects list
Non ieee java projects list
 
Non ieee dot net projects list
Non  ieee dot net projects list Non  ieee dot net projects list
Non ieee dot net projects list
 
Ieee java projects list
Ieee java projects list Ieee java projects list
Ieee java projects list
 
Ieee 2014 java projects list
Ieee 2014 java projects list Ieee 2014 java projects list
Ieee 2014 java projects list
 
Ieee 2014 dot net projects list
Ieee 2014 dot net projects list Ieee 2014 dot net projects list
Ieee 2014 dot net projects list
 
Ieee 2013 java projects list
Ieee 2013 java projects list Ieee 2013 java projects list
Ieee 2013 java projects list
 
Ieee 2013 dot net projects list
Ieee 2013 dot net projects listIeee 2013 dot net projects list
Ieee 2013 dot net projects list
 
Ieee 2012 dot net projects list
Ieee 2012 dot net projects listIeee 2012 dot net projects list
Ieee 2012 dot net projects list
 
Spring ppt
Spring pptSpring ppt
Spring ppt
 
Ejb notes
Ejb notesEjb notes
Ejb notes
 
Java web programming
Java web programmingJava web programming
Java web programming
 
Java programming-examples
Java programming-examplesJava programming-examples
Java programming-examples
 
Hibernate tutorial
Hibernate tutorialHibernate tutorial
Hibernate tutorial
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai Academics
 
Web based development
Web based developmentWeb based development
Web based development
 
Jdbc
JdbcJdbc
Jdbc
 
Java tutorial part 4
Java tutorial part 4Java tutorial part 4
Java tutorial part 4
 
Java tutorial part 3
Java tutorial part 3Java tutorial part 3
Java tutorial part 3
 
Java tutorial part 2
Java tutorial part 2Java tutorial part 2
Java tutorial part 2
 
Engineering
EngineeringEngineering
Engineering
 

Dernier

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Dernier (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Odam an optimized distributed association rule mining algorithm (synopsis)

  • 1. ODAM An Optimized Distributed Association Rule Mining Algorithm (Synopsis)
  • 2. INTRODUCTION Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?"
  • 3. Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clustering, etc.. Data mining is a complex topic and has links with multiple core fields such as computer science and adds value to rich seminal computational techniques from statistics, information retrieval, machine learning and pattern recognition. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: o Massive data collection o Powerful multiprocessor computers o Data mining algorithms Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19% of
  • 4. respondents are beyond the 50 gigabyte level, while 59% expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers can be much larger. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods. With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools in find the desired information resources, and to track and analyze their usage patterns. These factors give rise to the necessity of creating serverside and clientside intelligent systems that can effectively mine for knowledge. Web mining can be broadly defined as the discovery and analysis of useful information from the World Wide Web. This describes the automatic search of information resources available online, i.e. Web content mining, and the discovery of user access patterns from Web servers, i.e., Web usage mining.
  • 5. Web Mining is the extraction of interesting and potentially useful patterns and implicit information from artifacts or activity related to the WorldWide Web. There are roughly three knowledge discovery domains that pertain to web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining. Web content mining is the process of extracting knowledge from the content of documents or their descriptions. Web document text mining, resource discovery based on concepts indexing or agent based technology may also fall in this category. Web structure mining is the process of inferring knowledge from the World Wide Web organization and links between references and referents in the Web. Finally, web usage mining, also known as Web Log Mining, is the process of extracting interesting patterns in web access logs. Web Content Mining Web content mining is an automatic process that goes beyond keyword extraction. Since the content of a text document presents no machinereadable semantic, some approaches have suggested to restructure the document content in a representation that could be exploited by machines. The usual approach to exploit known structure in documents is to use wrappers to map documents to some data model. Techniques using lexicons for content interpretation are yet to come.
  • 6. There are two groups of web content mining strategies: Those that directly mine the content of documents and those that improve on the content search of other tools like search engines. Web Structure Mining WorldWide Web can reveal more information than just the information contained in documents. For example, links pointing to a document indicate the popularity of the document, while links coming out of a document indicate the richness or perhaps the variety of topics covered in the document. This can be compared to bibliographical citations. When a paper is cited often, it ought to be important. The PageRank and CLEVER methods take advantage of this information conveyed by the links to find pertinent web pages. By means of counters, higher levels cumulate the number of artifacts subsumed by the concepts they hold. Counters of hyperlinks, in and out documents, retrace the structure of the web artifacts summarized. Web Usage Mining Web servers record and accumulate data about user interactions whenever requests for resources are received. Analyzing the web access logs of different web sites can help understand the user behaviour and the web structure,
  • 7. thereby improving the design of this colossal collection of resources. There are two main tendencies in Web Usage Mining driven by the applications of the discoveries: General Access Pattern Tracking and Customized Usage Tracking. The general access pattern tracking analyzes the web logs to understand access patterns and trends. These analyses can shed light on better structure and grouping of resource providers. Many web analysis tools existd but they are limited and usually unsatisfactory. We have designed a web log data mining tool, WebLogMiner, and proposed techniques for using data mining and OnLine Analytical Processing (OLAP) on treated and transformed web access files. Applying data mining techniques on access logs unveils interesting access patterns that can be used to restructure sites in a more efficient grouping, pinpoint effective advertising locations, and target specific users for specific selling ads. Customized usage tracking analyzes individual trends. Its purpose is to customize web sites to users. The information displayed, the depth of the site structure and the format of the resources can all be dynamically customized for each user over time based on their access patterns. While it is encouraging and exciting to see the various potential applications of web log file analysis, it is important to know that the
  • 8. success of such applications depends on what and how much valid and reliable knowledge one can discover from the large raw log data. Current web servers store limited information about the accesses. Some scripts customtailored for some sites may store additional information. However, for an effective web usage mining, an important cleaning and data transformation step before analysis may be needed.
  • 9. Abstract With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools in find the desired information resources, and to track and analyze their usage patterns. Association rule mining is an active data mining research area. However, most ARM algorithms cater to a centralized environment. In contrast to previous ARM algorithms, ODAM is a distributed algorithm for geographically distributed data sets that reduces communication costs. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, assume that the databases are either horizontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two. Modern organizations are geographically distributed. Typically, each site locally stores its ever increasing amount of day-to-day data. Using centralized data mining to discover useful patterns in such organizations' data isn't always feasible because merging data sets
  • 10. from different sites into a centralized site incurs huge network communication costs. Data from these organizations are not only distributed over various locations but also vertically fragmented, making it difficult if not impossible to combine them in a central location. Distributed data mining has thus emerged as an active subarea of data mining research. A significant area of data mining research is association rule mining. Unfortunately, most ARM algorithms focus on a sequential or centralized environment where no external communication is required. Distributed ARM algorithms, on the other hand, aim to generate rules from different data sets spread over various geographical sites; hence, they require external communications throughout the entire process. DARM algorithms must reduce communication costs so that generating global association rules costs less than combining the participating sites' data sets into a centralized site. However, most DARM algorithms don't have an efficient message optimization technique, so they exchange numerous messages during the mining process. We have developed a distributed algorithm, called Optimized Distributed Association Mining, for geographically distributed data sets. ODAM generates support counts of candidate itemsets quicker than other DARM algorithms and reduces the size of average transactions, data sets, and message exchanges.
  • 11. Description of Problem After the advent of computer the data are enormously available and by making use of such raw collection data to invent the knowledge is the process of Data Mining. Like wise in Web also plenty of Web Documents resides in online. Web is repository of variety of information like Technology, Science, History, Geography, Sports Politics and others. If any one know about particular topic, then they are using search engine to search for their requirements and it gives full satisfaction for that user by giving entire related information about the topic. We can categorize parallel ARM algorithms as dataparallelism or task-parallelism algorithms. In the former, the algorithms partition the data sets among different nodes; in the latter, each site performs the task independently but must access the entire data set. The Count Distribution (CD) algorithm is a simple dataparallelism algorithm.2 It uses the sequential Apriori algorithm in a parallel environment and assumes data sets are horizontally partitioned among different sites. DARM discovers rules from various geographically distributed data sets. However, the network connection between those data sets isn't as fast as in a parallel environment, so distributed mining usually aims to minimize communication costs.
  • 12. Existing Method The Data mining Algorithms can be categorized into the following :  Association Algorithm  Classification  Clustering Algorithm Classification: The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to specific variable(s) you are trying to predict. For example, a typical classification problem is to divide a database of companies into groups that are as homogeneous as possible with respect to a creditworthiness variable with values "Good" and "Bad." Clustering: The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to all available variables.
  • 13. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities: • Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive handson analysis can now be answered directly from the data — quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. • Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit
  • 14. card transactions and identifying anomalous data that could represent data entry keying errors. DARM discovers rules from various geographically distributed data sets. However, the network connection between those data sets isn't as fast as in a parallel environment, so distributed mining usually aims to minimize communication costs. Proposed System Unlike other algorithms, ODAM offers better performance by minimizing candidate itemset generation costs. It achieves this by focusing on two major DARM issues communication and synchronization. Communication is one of the most important DARM objectives. DARM algorithms will perform better if we can reduce communication (for example, message exchange size) costs. Synchronization forces each participating site to wait a certain period until globally frequent itemset generation completes. Each site will wait longer if computing support counts takes more time. Hence, we reduce the computation time of candidate itemsets' support counts. To reduce communication costs, we highlight several message optimization techniques. ARM algorithms and on the message
  • 15. exchange method, we can divide the message optimization techniques into two methods direct and indirect support counts exchange. Each method has different aims, expectations, advantages, and disadvantages. For example, the first method exchanges each candidate itemset's support count to generate globally frequent itemsets of that pass (CD and FDM are examples of this approach). All sites share a common globally frequent itemset with identical support counts, so rules that are generated at different participating sites have identical confidence. This approach focuses on a rule's exactness and correctness.
  • 16. System Requirement Hardware specifications: Processor RAM : : Intel Processor IV 128 MB Hard disk : 20 GB CD drive : 40 x Samsung Floppy drive : 1.44 MB Monitor : 15’ Samtron color Keyboard Mouse : : 108 mercury keyboard Logitech mouse Software Specification Operating System – Windows XP/2000 Language used – J2sdk1.4.0, JCreator