SlideShare une entreprise Scribd logo
1  sur  26
-Ayush Pareek (Sophomore)
The LNM Institute of InformationTechnology
 TOPICS COVERED:
 Pre-processing
 Stemming algorithms
 Generic and Query-based
Stemming
 Zipf's Law
 Stop-word removal
 frequency matrix
 Clustering
 SentenceWeighting
 Pearson Correlation
Coefficient
 Cosine Similarity
 Abstraction Extraction
based Summary
 =>For coding purposes
we sharpened our
knowledge of C/C++ file
handling, Standard
Template Library, diverse
libraries etc.
 same words were used in sentences containing redundant
information.
 notion of “Connectivity”
 But which Sentences should we use for summary?
 From Literature survey of Statistics::
a)Pearson Correlation Coefficient
b)Cosine Correlation Coefficient
c) Classical Info. Retrieval F-measure.
Step 3 “Sorting and Removing StopWords
Common words like the, and, is, are, for, am, so…
=>Symbols, numbers and punctuations.
STEP 2 “Stemming”
“do”, “doing”, “done”
 do
“agreed”, ”agree”  agree
“gone”, “go”, ”went”  go
• “plays”, ”play”, “playing”  play
STEP 1“Preprocessing”
Extracting only those words from the text which are relevant for analysis.
Pakistan India Surgery Medical Patient
Sentence 1 1 2 0 1 2
Sentence 2 0 0 3 1 1
Sentence 3 2 0 0 1 0
Sentence 4 1 0 0 0 1
Now theVector Corresponding to sentence 1 is::
[1 2 0 1 2]
Finding Correlation between Sentence
Vectors
 Text->Sentences ->Vectors->PCC-> value of r
->gives connectivity between vectors
->connectivity between sentences
COEFFICIENT VALUE
The coefficient value can range
between
-1.00 and 1.00.
CASE 1:: PCC > 0
 As one variable increases, the
other also increases.
 >0.5 =>Considerable
connectivity
 >0.7 =>Strong Connectivity
CASE 3:: PCC < 0
NoegativeAssociation
between variables
Sentence
1
Sentence 2 Sentence 3 Sentence
4
Sentence 5 Sentence 6
Sentence 1 1 0.224862 0.125127 0.40471 0.127615 0.224413
Sentence 2 0.224862 1 0.317351 0.328374 0.0122265 0.116916
Sentence 3 0.125127 0.317351 1 0.297626 -0.0922254 -0.0502292
Sentence 4 0.40471 0.328374 0.297626 1 0.0799604 0.349622
Sentence 5 0.127615 0. 0122265 -0.0922254 0.0799604 1 -0.0791082
Sentence 6 0.224413 0.116916 -0.0502292 0.349622 -0.0791082 1
We need to rank these sentences in order of
“connectivity”
We take the average of each sentenceVector
to compute their order of importance to the
entire text.
 Eg; sentence 3 >sentence 5>
 sentence 7> sentence 8> sentence 9
S1 S2 S3 S4 S5 S6
S1 1 0.225 0.40471 0.125 0.127 0.224
S2 0.225 1 0.317351 0.328374 0.0122265 -0.116916
S3 0.40471 0.317351 1 0.297626 -0.0922254 -
0.0502292
S4 0.125127 0.328374 0.297626 1 0.0799604 0.349622
S5 0.127615 0.0122265 -0.0922254 0.0799604 1 -0.0791082
S6 0.224413 -0.116916 -0.0502292 0.349622 -0.0791082 1
S2 S1+S3/2 S4 S5 S6
S2 1.000000 0.3173510.276618 0.012226 -0.116916
S3+S1/2 0.3173511.000000 0.211376 -0.092225 -0.050229
S4 0.276618 0.211376 1.000000 0.103788 0.287017
S5 0.012226 -0.092225 0.103788 1.000000 -0.079108
S6 -0.116916 -0.050229 0.287017 -0.079108 1.000000
(S1+S2+S3)/3 S4 S5 S4
(S1+S2+S3)/3 1.000000 0.243997 -0.039999 -0.083573
S4 0.243997 ` 1.000000 0.103788 0.287017
S5 -0.039999 0.103788 1.000000 -0.079108
S6 -0.083573 0.287017 -0.079108 1.000000
COEFFICIENT
MATRIX
USING
COSINE
SIMILARITY
Get Document
and perform
Preprocessing
START
TAKE
CONSENSUS
OF FINAL
RANKS
FROMALL 4
METHODS
Make a
WORD v/s
SENTENCE
FREQUENCY
MATRIX
Sentence
Weighting
Sentence
Clustering
Sentence
Weighing
Sentence
Clustering
COEFFICIENT
MATRIX USING
P.C.C.
Basic Steps used in all our algorithms
ALGO 1
ALGO 2
ALGO 3
ALGO
4
METHOD 1:: (GENERIC SUMMARY) Giving Equal
Weights to all 4 algorithms
 Shortcomings of one algorithm is compensated by the
strength of another algorithm.
 Thus, we get the reasonably accurate accurate ranking
possible.
Sentence
Weighting
Sentence
Clustering
P.C.C. Cosine
METHOD 2(Identifying DataSets)::
Algorithm for Math-Dataset
Algorithm for Literature Dataset
Algorithm for Encyclopedia articles
Algorithm for New Reports
Algorithm for Biographies
What is the
Genre of
Data? Use
algorithm on
that Basis
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Algorithm 6
Algorithm 7
Algorithm 8
Take Keywords from
user or use title of
text forWord
Matching with all the
available summaries
Final
Summary
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25
Accuracy
Accuracy
MAXIMA =
87.4 %
Number of sentences (x-axis)
Accuracy
 Language Independent summaries
 Sub-Heading and Index Creator
 Content Highlighter
 Browser Add-On
 Subjective Exam sheet checker
 Making Abstract of Research papers and articles
 Plagiarism Detector
 Hypertext context-link based summarizer
 Daily News feed summarizer / RSS
 In search engines to present compressed descriptions of the
search results
 In keyword directed subscription of news which are
summarized and pushed to the user.
 The software can effectively convert
BRUTE FORCE reading effort to DIVIDE-
AND-CONQUER
News summary maker
Abridged project ppt_ayush
Abridged project ppt_ayush

Contenu connexe

Similaire à Abridged project ppt_ayush

Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems23ashmawy
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesShantanu Sharma
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSHaritikaChhatwal1
 
A NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksA NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksKyriakos Chatzidimitriou
 
linear Algebra least squares
linear Algebra least squareslinear Algebra least squares
linear Algebra least squaresNoreen14
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...HPCC Systems
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
 
Error detection.
Error detection.Error detection.
Error detection.Wasim Akbar
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015Conor McGrory
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
Unsupervised learning
Unsupervised learning Unsupervised learning
Unsupervised learning AlexAman1
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxdickonsondorris
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmHadi Fadlallah
 
MetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systemsMetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systemsLawrence Paulson
 

Similaire à Abridged project ppt_ayush (20)

Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems2
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICS
 
A NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksA NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State Networks
 
linear Algebra least squares
linear Algebra least squareslinear Algebra least squares
linear Algebra least squares
 
Generation of Descriptive Elements for Text
Generation of Descriptive Elements for TextGeneration of Descriptive Elements for Text
Generation of Descriptive Elements for Text
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
 
Error detection.
Error detection.Error detection.
Error detection.
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
Unsupervised learning
Unsupervised learning Unsupervised learning
Unsupervised learning
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
 
MetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systemsMetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systems
 

Dernier

Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 

Dernier (20)

Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 

Abridged project ppt_ayush

  • 1. -Ayush Pareek (Sophomore) The LNM Institute of InformationTechnology
  • 2.  TOPICS COVERED:  Pre-processing  Stemming algorithms  Generic and Query-based Stemming  Zipf's Law  Stop-word removal  frequency matrix  Clustering  SentenceWeighting  Pearson Correlation Coefficient  Cosine Similarity  Abstraction Extraction based Summary  =>For coding purposes we sharpened our knowledge of C/C++ file handling, Standard Template Library, diverse libraries etc.
  • 3.  same words were used in sentences containing redundant information.  notion of “Connectivity”  But which Sentences should we use for summary?  From Literature survey of Statistics:: a)Pearson Correlation Coefficient b)Cosine Correlation Coefficient c) Classical Info. Retrieval F-measure.
  • 4. Step 3 “Sorting and Removing StopWords Common words like the, and, is, are, for, am, so… =>Symbols, numbers and punctuations. STEP 2 “Stemming” “do”, “doing”, “done”  do “agreed”, ”agree”  agree “gone”, “go”, ”went”  go • “plays”, ”play”, “playing”  play STEP 1“Preprocessing” Extracting only those words from the text which are relevant for analysis.
  • 5.
  • 6. Pakistan India Surgery Medical Patient Sentence 1 1 2 0 1 2 Sentence 2 0 0 3 1 1 Sentence 3 2 0 0 1 0 Sentence 4 1 0 0 0 1 Now theVector Corresponding to sentence 1 is:: [1 2 0 1 2] Finding Correlation between Sentence Vectors
  • 7.  Text->Sentences ->Vectors->PCC-> value of r ->gives connectivity between vectors ->connectivity between sentences COEFFICIENT VALUE The coefficient value can range between -1.00 and 1.00. CASE 1:: PCC > 0  As one variable increases, the other also increases.  >0.5 =>Considerable connectivity  >0.7 =>Strong Connectivity CASE 3:: PCC < 0 NoegativeAssociation between variables
  • 8.
  • 9. Sentence 1 Sentence 2 Sentence 3 Sentence 4 Sentence 5 Sentence 6 Sentence 1 1 0.224862 0.125127 0.40471 0.127615 0.224413 Sentence 2 0.224862 1 0.317351 0.328374 0.0122265 0.116916 Sentence 3 0.125127 0.317351 1 0.297626 -0.0922254 -0.0502292 Sentence 4 0.40471 0.328374 0.297626 1 0.0799604 0.349622 Sentence 5 0.127615 0. 0122265 -0.0922254 0.0799604 1 -0.0791082 Sentence 6 0.224413 0.116916 -0.0502292 0.349622 -0.0791082 1
  • 10. We need to rank these sentences in order of “connectivity” We take the average of each sentenceVector to compute their order of importance to the entire text.  Eg; sentence 3 >sentence 5>  sentence 7> sentence 8> sentence 9
  • 11. S1 S2 S3 S4 S5 S6 S1 1 0.225 0.40471 0.125 0.127 0.224 S2 0.225 1 0.317351 0.328374 0.0122265 -0.116916 S3 0.40471 0.317351 1 0.297626 -0.0922254 - 0.0502292 S4 0.125127 0.328374 0.297626 1 0.0799604 0.349622 S5 0.127615 0.0122265 -0.0922254 0.0799604 1 -0.0791082 S6 0.224413 -0.116916 -0.0502292 0.349622 -0.0791082 1
  • 12. S2 S1+S3/2 S4 S5 S6 S2 1.000000 0.3173510.276618 0.012226 -0.116916 S3+S1/2 0.3173511.000000 0.211376 -0.092225 -0.050229 S4 0.276618 0.211376 1.000000 0.103788 0.287017 S5 0.012226 -0.092225 0.103788 1.000000 -0.079108 S6 -0.116916 -0.050229 0.287017 -0.079108 1.000000
  • 13. (S1+S2+S3)/3 S4 S5 S4 (S1+S2+S3)/3 1.000000 0.243997 -0.039999 -0.083573 S4 0.243997 ` 1.000000 0.103788 0.287017 S5 -0.039999 0.103788 1.000000 -0.079108 S6 -0.083573 0.287017 -0.079108 1.000000
  • 14. COEFFICIENT MATRIX USING COSINE SIMILARITY Get Document and perform Preprocessing START TAKE CONSENSUS OF FINAL RANKS FROMALL 4 METHODS Make a WORD v/s SENTENCE FREQUENCY MATRIX Sentence Weighting Sentence Clustering Sentence Weighing Sentence Clustering COEFFICIENT MATRIX USING P.C.C. Basic Steps used in all our algorithms ALGO 1 ALGO 2 ALGO 3 ALGO 4
  • 15. METHOD 1:: (GENERIC SUMMARY) Giving Equal Weights to all 4 algorithms  Shortcomings of one algorithm is compensated by the strength of another algorithm.  Thus, we get the reasonably accurate accurate ranking possible. Sentence Weighting Sentence Clustering P.C.C. Cosine
  • 16. METHOD 2(Identifying DataSets):: Algorithm for Math-Dataset Algorithm for Literature Dataset Algorithm for Encyclopedia articles Algorithm for New Reports Algorithm for Biographies What is the Genre of Data? Use algorithm on that Basis
  • 17. Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 Algorithm 5 Algorithm 6 Algorithm 7 Algorithm 8 Take Keywords from user or use title of text forWord Matching with all the available summaries Final Summary
  • 18. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 Accuracy Accuracy MAXIMA = 87.4 % Number of sentences (x-axis) Accuracy
  • 20.  Sub-Heading and Index Creator  Content Highlighter  Browser Add-On  Subjective Exam sheet checker  Making Abstract of Research papers and articles  Plagiarism Detector  Hypertext context-link based summarizer  Daily News feed summarizer / RSS  In search engines to present compressed descriptions of the search results  In keyword directed subscription of news which are summarized and pushed to the user.
  • 21.  The software can effectively convert BRUTE FORCE reading effort to DIVIDE- AND-CONQUER
  • 22.
  • 23.