SlideShare une entreprise Scribd logo
1  sur  26
27/Sep/2008




Data Mining   July 16, 2009        1
Evolution of Database
              technology
YEAR       PURPOSE
1960’s     Network Model, Batch Reports

1970’s     Relational data model, Executive information Systems

1980’s     Application specific DBMS(spatial data, scientific data,
           image data, …)
1990’s     Terabyte Data warehouses, Object Oriented, middleware
           and web technology
2000’s     Business Process

2010’s     Sensor DB systems, DBs on embedded systems, large
           scale pub/ sub systems
                                             Data Mining   July 16, 2009   2
Motivation : Necessity is the
       mother of invention
   Data explosion problem

    ◦ Automated data collection tools and mature database technology
      lead to tremendous amounts of data stored in databases, data
      warehouses and other information repositories
   We are drowning in data, but starving for knowledge!
   Solution: Data warehousing and data mining

    ◦ Extraction of interesting knowledge (rules, regularities, patterns,
      constraints) from data in large databases



                                                  Data Mining   July 16, 2009   3
Why Data Mining?


      Data, Data, Data Every where …

         I can’t find data I need – data is
          scattered over network

         I can’t get the data I need

         I can’t understand the data I
          need

         I can’t use the data I found


                      Data Mining   July 16, 2009   4
   An abundance of data                 This data occupies
     Super Market Scanners, POS
     data
                                           Terabytes - 10^12 bytes
     Credit cards transactions
     Call Center records
                                           Petabytes - 10^15 bytes
     ATM Machines
     Demographic data
                                           Exabytes - 10^18bytes
     Sensor Networks
     Cameras
                                           Zettabytes - 10^21bytes
     Web server logs
     Customer web site trails
                                           Zottabytes-10^24bytes
     Geographic Information System
     National Medical Records             Walmart - 24 Terabytes
     Weather Images



                                                Data Mining   July 16, 2009   5
   Process of sorting through large amounts of data and picking
    out relevant information

   Process of analyzing data from different perspectives and
    summarizing it into useful information

   Discovering hidden value in database

   It is non-trivial process of identifying valid, novel, useful and
    understandable patterns in data

   Extracting or mining knowledge from large amounts of data


                                              Data Mining   July 16, 2009   6
History Notes – Many Names of Data
              Mining

 YEAR            Names                           USES


  1960    Data Fishing, Data     Statisticians
          Dredging
  1990    Data Mining            DB Community, business


  1989    Knowledge Discovery    AI, Machine Learning community
          in databases
Other Names

Data Archaeology, Information Harvesting, Information Discovery,
Knowledge Extraction,


                                                 Data Mining   July 16, 2009   7
Data Warehousing provides the
                            Enterprise with a memory




         Data Mining provides the
        Enterprise with intelligence

July 16, 2009                      Data Mining      8
Why Data Mining?(Cont..)

   Data Warehouse is single, complete and consistent store of data from
    variety of different sources available to end users

   For example, AT and T handles billions of calls per day. Europe's Very
    Long Baseline Interferometer (VLBI) has 16 telescopes, each of which
    produces 1 Gigabit/second of astronomical data over a 25-day
    observation session

   We need data mining for
      Transforming data into useful information to users
      Present data in useful format
      Provide data access to business analyst, Information technology
       professionals



                                                 Data Mining   July 16, 2009   9
Data Mining Process
   Data Mining is the technique used to carry out KDD.

   Data Mining turns data into information and then to knowledge


                             Information




                   Data

                                           Knowledge



                                              Data Mining   July 16, 2009   10
Steps in Data Mining
1. Data cleaning
        To remove noise and inconsistent data
2. Data integration
   To integrate (compile) multiple data
sources
3. Data selection
   Data relevant to analysis is selected
4. Data transformation
   Summary normalization aggregation operations are performed
   (convert data into two dimension form) and consolidate the data



                                           Data Mining   July 16, 2009   11
Steps in Data Mining(Cont..)
5. Data mining
 Intelligent methods are applied to the data to discover
 knowledge or patterns

6. Pattern evaluation
 Evaluation of the interesting patterns by thresholding

7. Knowledge Discovery
 Visualization and presentation methods are used to present
 the mined knowledge to the user.


                                           Data Mining   July 16, 2009   12
Pattern Evaluation
◦ Data mining: the core of
  knowledge discovery
  process.                         Data Mining

                    Task-relevant Data


      Data                   Selection
      Warehouse
Data Cleaning

          Data Integration


        Databases
                                                 Data Mining   July 16, 2009   13
Data Mining Tasks
1. Classification
•   Classification maps data into predefined groups or classes.
•   It may be represented by methods such as decision trees, etc.

Decision tree
 Flow chart like tree structure
 Each node denotes test of
  an attribute value
 Each branch represents
  outcome of test
 Leaves represent classes
  or class distribution.


                                            Data Mining   July 16, 2009   14
2. Regression
Used to map a data item to a real valued prediction variable.
Example. A manager wants to reach a certain level of savings before his
  retirement. Periodically he predicts his retirement savings by current value
  and several past values. He uses a simple linear regressive formula to
  predict the values of savings in future.


3. Prediction
Many real world applications can be seen
predicting future data states based on
past and current data.
Example -   Predicting flooding is difficult problem


                                                         Data Mining   July 16, 2009   15
4. Clustering
Clustering is similar to classification
except that the groups are not predefined.
5. Association Rule
Association refers to uncovering relationship                              1998
among data.
Used in retail sales community to identify the items                       Bread and
(products) that are frequently                                              Jam sell
                                             Zzzz...
purchased together.                                                         together!




                                             Data Mining   July 16, 2009            16
6. Summarization
Summarization of general characteristics or features of target class of
  data.
Data characterization presented in various forms - pie charts, bar
  charts, curves.
Data discrimination comparison of general features of target class of
  data objects with general features of objects from one or a set of
  contrasting classes.
7. Outlier Analysis
Database may contain data objects that do not comply with general
  behavior model of data. These data objects are called as outliers.
Data mining methods discard outliers as noise or exceptions.
In applications such as fraud detection, rare events may be more
  interesting than regularly occurring events.
                                               Data Mining   July 16, 2009   17
Data Mining: Types of Data

   Relational data and transactional data

   Text

   Images, video

   Mixtures of data




                                         Data Mining   July 16, 2009   18
Data Mining Products

   DataMind -- neurOagent
   Information Discovery -- IDIS
   SAS Institute -- SAS/Neuronets




                                      19
                             Data Mining   July 16, 2009
Data Mining Software
   RapidMiner and Weka – Defining data mining process

   Top 8 data mining software in 2008

           Angoss software
           Infor CRM Epiphany
           Portrait Software
           SAS
           SPSS
           ThinkAnalytics
           Unica
           Viscovery


                                            Data Mining   July 16, 2009   20
Application Areas


       Industry            Application
       Finance             Credit Card Analysis
       Insurance           Fraud Analysis
       Telecommunication   Call record analysis




July 16, 2009                Data Mining          21
Applications
   Financial Industry, Banks, Businesses, E-commerce
    ◦ Stock and investment analysis
    ◦ Identify loyal customers and risky customer
    ◦ Predict customer spending

   Database analysis and decision support
    ◦ Market analysis and management
      target marketing, customer relation management, market basket
       analysis.
    ◦ Risk analysis and management
      Forecasting, quality control, competitive analysis
    ◦ Fraud detection and management

                                                   Data Mining   July 16, 2009   22
Data Mining in Usage

1.   Intelligent Miner
    It is IBM data mining product
    Distinct feature is include scalability of its mining algorithm and tight
     integration with IBM DB2 related data base system.


5.   DB Miner
      Developed by DBMiner Technologies Inc.
     Distinct features of DBMiner are Data cube based Online Analytical
     Mining



                                                   Data Mining   July 16, 2009   23
The Telecomm Slice
Product




Household

Telecomm          o ns
              e gi
             R
   Video                 Europe
                  Far East
   Audio        India

            Retail Direct    Special            Sales Channel




                                             Data Mining   July 16, 2009   24
Conclusion
   Data mining: discovering interesting patterns from large amounts of
    data
   A KDD process includes data cleaning, data integration, data
    selection, transformation, data mining, pattern evaluation, and
    knowledge presentation
   Mining can be performed in a variety of information repositories
   Data mining functionalities: characterization,               discrimination,
    association, classification, clustering, outlier etc




                                                 Data Mining   July 16, 2009       25
Thank you !!!
         Data Mining   July 16, 2009   26

Contenu connexe

Tendances

Tendances (20)

data mining
data miningdata mining
data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Data mining
Data mining Data mining
Data mining
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Data mining
Data mining Data mining
Data mining
 
Text MIning
Text MIningText MIning
Text MIning
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Text mining
Text miningText mining
Text mining
 
Random forest
Random forestRandom forest
Random forest
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 

En vedette

Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...Ryan Rosario
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataPier Luca Lanzi
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data miningSnehali Chake
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsMotaz Saad
 
Ch 1 Intro to Data Mining
Ch 1 Intro to Data MiningCh 1 Intro to Data Mining
Ch 1 Intro to Data MiningSushil Kulkarni
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 

En vedette (17)

Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
 
Data mining and_big_data_web
Data mining and_big_data_webData mining and_big_data_web
Data mining and_big_data_web
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Ch 1 Intro to Data Mining
Ch 1 Intro to Data MiningCh 1 Intro to Data Mining
Ch 1 Intro to Data Mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data mining
Data miningData mining
Data mining
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 

Similaire à Data Mining Overview

Data mining concepts
Data mining conceptsData mining concepts
Data mining conceptsBasit Rafiq
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Edwin S. Garcia
 
Data Mining - Presentation.pptx
Data Mining - Presentation.pptxData Mining - Presentation.pptx
Data Mining - Presentation.pptxfahadusman23
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesasnaparveen414
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryYoung Alista
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHarry Potter
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryJames Wong
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryFraboni Ec
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryLuis Goldster
 

Similaire à Data Mining Overview (20)

Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Data mining
Data miningData mining
Data mining
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
D
DD
D
 
Data Mining - Presentation.pptx
Data Mining - Presentation.pptxData Mining - Presentation.pptx
Data Mining - Presentation.pptx
 
isd314-01
isd314-01isd314-01
isd314-01
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
18231979 Data Mining
18231979 Data Mining18231979 Data Mining
18231979 Data Mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 

Dernier

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 

Dernier (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 

Data Mining Overview

  • 1. 27/Sep/2008 Data Mining July 16, 2009 1
  • 2. Evolution of Database technology YEAR PURPOSE 1960’s Network Model, Batch Reports 1970’s Relational data model, Executive information Systems 1980’s Application specific DBMS(spatial data, scientific data, image data, …) 1990’s Terabyte Data warehouses, Object Oriented, middleware and web technology 2000’s Business Process 2010’s Sensor DB systems, DBs on embedded systems, large scale pub/ sub systems Data Mining July 16, 2009 2
  • 3. Motivation : Necessity is the mother of invention  Data explosion problem ◦ Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories  We are drowning in data, but starving for knowledge!  Solution: Data warehousing and data mining ◦ Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases Data Mining July 16, 2009 3
  • 4. Why Data Mining?  Data, Data, Data Every where …  I can’t find data I need – data is scattered over network  I can’t get the data I need  I can’t understand the data I need  I can’t use the data I found Data Mining July 16, 2009 4
  • 5. An abundance of data  This data occupies  Super Market Scanners, POS data  Terabytes - 10^12 bytes  Credit cards transactions  Call Center records  Petabytes - 10^15 bytes  ATM Machines  Demographic data  Exabytes - 10^18bytes  Sensor Networks  Cameras  Zettabytes - 10^21bytes  Web server logs  Customer web site trails  Zottabytes-10^24bytes  Geographic Information System  National Medical Records  Walmart - 24 Terabytes  Weather Images Data Mining July 16, 2009 5
  • 6. Process of sorting through large amounts of data and picking out relevant information  Process of analyzing data from different perspectives and summarizing it into useful information  Discovering hidden value in database  It is non-trivial process of identifying valid, novel, useful and understandable patterns in data  Extracting or mining knowledge from large amounts of data Data Mining July 16, 2009 6
  • 7. History Notes – Many Names of Data Mining YEAR Names USES 1960 Data Fishing, Data Statisticians Dredging 1990 Data Mining DB Community, business 1989 Knowledge Discovery AI, Machine Learning community in databases Other Names Data Archaeology, Information Harvesting, Information Discovery, Knowledge Extraction, Data Mining July 16, 2009 7
  • 8. Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence July 16, 2009 Data Mining 8
  • 9. Why Data Mining?(Cont..)  Data Warehouse is single, complete and consistent store of data from variety of different sources available to end users  For example, AT and T handles billions of calls per day. Europe's Very Long Baseline Interferometer (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session  We need data mining for  Transforming data into useful information to users  Present data in useful format  Provide data access to business analyst, Information technology professionals Data Mining July 16, 2009 9
  • 10. Data Mining Process  Data Mining is the technique used to carry out KDD.  Data Mining turns data into information and then to knowledge Information Data Knowledge Data Mining July 16, 2009 10
  • 11. Steps in Data Mining 1. Data cleaning To remove noise and inconsistent data 2. Data integration To integrate (compile) multiple data sources 3. Data selection Data relevant to analysis is selected 4. Data transformation Summary normalization aggregation operations are performed (convert data into two dimension form) and consolidate the data Data Mining July 16, 2009 11
  • 12. Steps in Data Mining(Cont..) 5. Data mining Intelligent methods are applied to the data to discover knowledge or patterns 6. Pattern evaluation Evaluation of the interesting patterns by thresholding 7. Knowledge Discovery Visualization and presentation methods are used to present the mined knowledge to the user. Data Mining July 16, 2009 12
  • 13. Pattern Evaluation ◦ Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Data Selection Warehouse Data Cleaning Data Integration Databases Data Mining July 16, 2009 13
  • 14. Data Mining Tasks 1. Classification • Classification maps data into predefined groups or classes. • It may be represented by methods such as decision trees, etc. Decision tree  Flow chart like tree structure  Each node denotes test of an attribute value  Each branch represents outcome of test  Leaves represent classes or class distribution. Data Mining July 16, 2009 14
  • 15. 2. Regression Used to map a data item to a real valued prediction variable. Example. A manager wants to reach a certain level of savings before his retirement. Periodically he predicts his retirement savings by current value and several past values. He uses a simple linear regressive formula to predict the values of savings in future. 3. Prediction Many real world applications can be seen predicting future data states based on past and current data. Example - Predicting flooding is difficult problem Data Mining July 16, 2009 15
  • 16. 4. Clustering Clustering is similar to classification except that the groups are not predefined. 5. Association Rule Association refers to uncovering relationship 1998 among data. Used in retail sales community to identify the items Bread and (products) that are frequently Jam sell Zzzz... purchased together. together! Data Mining July 16, 2009 16
  • 17. 6. Summarization Summarization of general characteristics or features of target class of data. Data characterization presented in various forms - pie charts, bar charts, curves. Data discrimination comparison of general features of target class of data objects with general features of objects from one or a set of contrasting classes. 7. Outlier Analysis Database may contain data objects that do not comply with general behavior model of data. These data objects are called as outliers. Data mining methods discard outliers as noise or exceptions. In applications such as fraud detection, rare events may be more interesting than regularly occurring events. Data Mining July 16, 2009 17
  • 18. Data Mining: Types of Data  Relational data and transactional data  Text  Images, video  Mixtures of data Data Mining July 16, 2009 18
  • 19. Data Mining Products  DataMind -- neurOagent  Information Discovery -- IDIS  SAS Institute -- SAS/Neuronets 19 Data Mining July 16, 2009
  • 20. Data Mining Software  RapidMiner and Weka – Defining data mining process  Top 8 data mining software in 2008  Angoss software  Infor CRM Epiphany  Portrait Software  SAS  SPSS  ThinkAnalytics  Unica  Viscovery Data Mining July 16, 2009 20
  • 21. Application Areas Industry Application Finance Credit Card Analysis Insurance Fraud Analysis Telecommunication Call record analysis July 16, 2009 Data Mining 21
  • 22. Applications  Financial Industry, Banks, Businesses, E-commerce ◦ Stock and investment analysis ◦ Identify loyal customers and risky customer ◦ Predict customer spending  Database analysis and decision support ◦ Market analysis and management  target marketing, customer relation management, market basket analysis. ◦ Risk analysis and management  Forecasting, quality control, competitive analysis ◦ Fraud detection and management Data Mining July 16, 2009 22
  • 23. Data Mining in Usage 1. Intelligent Miner  It is IBM data mining product  Distinct feature is include scalability of its mining algorithm and tight integration with IBM DB2 related data base system. 5. DB Miner  Developed by DBMiner Technologies Inc.  Distinct features of DBMiner are Data cube based Online Analytical Mining Data Mining July 16, 2009 23
  • 24. The Telecomm Slice Product Household Telecomm o ns e gi R Video Europe Far East Audio India Retail Direct Special Sales Channel Data Mining July 16, 2009 24
  • 25. Conclusion  Data mining: discovering interesting patterns from large amounts of data  A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation  Mining can be performed in a variety of information repositories  Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier etc Data Mining July 16, 2009 25
  • 26. Thank you !!! Data Mining July 16, 2009 26