SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Tamil Internet Conference 2020
TamilInayaVaani - Integrating TVA Open-
Source spellchecker with Python
T. Shrinivasan, Nithya Duraisamy, Ashok Ramachandran, Manickkavasakam,
Arunmozhi, and A. Muthiah
Who are we?
Few Open Source Contributors from
Ezhil Foundation
Kaniyam Foundation
Thamizha
Mozilla Tamilnadu
Indian Linux Users group, Chennai
IndicNLP
Having similar dreams in many heads
Open source Tamil Spellchecker
A Dream for many years becoming real
Existing Efforts
●
Hunspell
●
GNU Aspell
●
LanguageTool.org
●
Open-Tamil Solthiruthi
●
Bloom Filter based spellchecker
Still long way to go
How long?
Problems with Tamil Spellchecker
●
Infinity Vocabulary
●
Rich in Morphology
●
Agglutinative
●
Free Word Order
●
Sandhi
●
...
Few Algorithms
Levenshtein distance search
Levenshtein distance search
Few Algorithms
Norvig Algorithm
Norvig Algorithm
Still not perfect
Research continues...
TamilinayaVaani
A Open Source Spellchecker from
Tamil Virtual Academy
TN Govt announcement
All the software released in GNU GPL V2
All digital content in CC-BY-SA
TamilinayaVaani
●
Developed as Desktop Version
●
C# based
●
Limited version of Vaani.neechalkaran.com
●
Cant use in Linux
●
Cant use as command line
●
Cant integrate with other applications
Porting to Python
Why?
Porting to Python
Python – Easy to develop further
Easy integration
Web applications
API
Scalable
Python Port Code
●
https://github.com/tshrinivasan/Tamilinaiya-Spellchecker
The beauty of Open Source
More Contributions
Open-Tamil Python Library
●
The defacto Python library for Tamil Computing
●
Process tamil text
●
Build Games, Tamil Utilities
●
http://Tamilpesu.us
Integrating with sandhichecker
●
Open-Tamil has a SandhiChecker
●
40+ rules
●
Added this sandhi Checker to Tamilinayavaani
Python Packaging
●
Easy install in any OS
●
Pip install tamiliyavaani
Sample Usage
Web Interface with TinyMCE
●
Added a good web interface
Web Interface
Web Interface
Web Interface
Web Interface
Web Interface
JavaScript
A JavaScript port is on the way
TODO
●
Provide API
●
Host as a Public website
●
Test and add more rules
●
Set edit distance=2
●
Find method to yield better alternate
●
Word Corpus
●
Collected 1,53,548 unique tamil nouns
●
Collected 25,83,000 unique tamil words
●
https://github.com/KaniyamFoundation/all_tamil_words
●
https://github.com/KaniyamFoundation/all_tamil_nouns
TODO
●
Clean them manually
●
Build a golden corpus for quick lookup
●
BloomFilter/SymSpell/LSTM and more
Please Contribute
●
Give Tamil Rules
●
Give Tamil Corpus
●
Write Code
●
Test
●
Document
●
Provide Hosting
●
Donate
Thanks
●
Muthu Annamalai
●
Tamilnadu Government
●
Neechalkaran
●
Nithya Duraisamy
●
Ashok Ramachandran
●
Manickkavasakam
●
Arunmozhi
●
And All Contributors for
Ezhil Foundation, Kaniyam
Foundation, Thamizha,
IndicNLP and all other
Open Source Teams
Contact
●
T Shrinivasan
●
tshrinivasan@gmail.com
●
Kaniyam.com

Contenu connexe

Similaire à Tamilinayavaani - integrating tva open-source spellchecker with python

Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
Jaganadh Gopinadhan
 
Open AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptxOpen AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptx
JKHomer
 
openaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdfopenaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdf
DavidOlivos3
 

Similaire à Tamilinayavaani - integrating tva open-source spellchecker with python (12)

Benefits & features of python |Advantages & disadvantages of python
Benefits & features of python |Advantages & disadvantages of pythonBenefits & features of python |Advantages & disadvantages of python
Benefits & features of python |Advantages & disadvantages of python
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
Fuel conference indic_computing_crossing_the_chasm
Fuel conference indic_computing_crossing_the_chasmFuel conference indic_computing_crossing_the_chasm
Fuel conference indic_computing_crossing_the_chasm
 
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent SystemTAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
TAAI 2016 Keynote Talk: Intercultural Collaboration as a Multi‐Agent System
 
Achievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An LocAchievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An Loc
 
Open-Tamil text processing library
Open-Tamil text processing libraryOpen-Tamil text processing library
Open-Tamil text processing library
 
Python Training in Bangalore
Python Training in BangalorePython Training in Bangalore
Python Training in Bangalore
 
Python theory.docx
Python theory.docxPython theory.docx
Python theory.docx
 
Open AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptxOpen AI Chat GPT-4-3.pptx
Open AI Chat GPT-4-3.pptx
 
openaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdfopenaichatgpt-4-3-230403022910-5eda7251.pdf
openaichatgpt-4-3-230403022910-5eda7251.pdf
 
Language Translator.pptx
Language Translator.pptxLanguage Translator.pptx
Language Translator.pptx
 
Python Programming Course
Python Programming CoursePython Programming Course
Python Programming Course
 

Plus de Shrinivasan T

கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
Shrinivasan T
 
Sprit of Engineering
Sprit of EngineeringSprit of Engineering
Sprit of Engineering
Shrinivasan T
 

Plus de Shrinivasan T (20)

Giving New Life to Old Tamil Little Magazines Through Digitization
Giving New Life to Old Tamil Little Magazines Through DigitizationGiving New Life to Old Tamil Little Magazines Through Digitization
Giving New Life to Old Tamil Little Magazines Through Digitization
 
Digitization of Tamil Soviet Publications and Little Magazines.pdf
Digitization of Tamil Soviet Publications and Little Magazines.pdfDigitization of Tamil Soviet Publications and Little Magazines.pdf
Digitization of Tamil Soviet Publications and Little Magazines.pdf
 
python-an-introduction
python-an-introductionpython-an-introduction
python-an-introduction
 
Algorithms for certain classes of tamil spelling correction
Algorithms for certain classes of tamil spelling correctionAlgorithms for certain classes of tamil spelling correction
Algorithms for certain classes of tamil spelling correction
 
Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்
Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்
Tamil and-free-software - தமிழும் கட்டற்ற மென்பொருட்களும்
 
Introducing FreeTamilEbooks
Introducing FreeTamilEbooks Introducing FreeTamilEbooks
Introducing FreeTamilEbooks
 
கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும்
கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும் கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும்
கணித்தமிழும் மென்பொருள்களும் - தேவைகளும் தீர்வுகளும்
 
Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...
Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...
Contribute to free open source software tamil - கட்டற்ற மென்பொருளுக்கு பங்களி...
 
ஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamil
ஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamilஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamil
ஏன் லினக்ஸ் பயன்படுத்த வேண்டும்? - Why Linux? in Tamil
 
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
கட்டற்ற மென்பொருள் பற்றிய அறிமுகம் - தமிழில் - Introduction to Open source in...
 
Share your knowledge in wikipedia
Share your knowledge in wikipediaShare your knowledge in wikipedia
Share your knowledge in wikipedia
 
Open-Tamil Python Library for Tamil Text Processing
Open-Tamil Python Library for Tamil Text ProcessingOpen-Tamil Python Library for Tamil Text Processing
Open-Tamil Python Library for Tamil Text Processing
 
Version control-systems
Version control-systemsVersion control-systems
Version control-systems
 
Contribute to-ubuntu
Contribute to-ubuntuContribute to-ubuntu
Contribute to-ubuntu
 
Dhvani TTS
Dhvani TTSDhvani TTS
Dhvani TTS
 
Freedom toaster
Freedom toasterFreedom toaster
Freedom toaster
 
Sprit of Engineering
Sprit of EngineeringSprit of Engineering
Sprit of Engineering
 
Amace ion newsletter-01
Amace ion   newsletter-01Amace ion   newsletter-01
Amace ion newsletter-01
 
Rpm Introduction
Rpm IntroductionRpm Introduction
Rpm Introduction
 
Foss History
Foss HistoryFoss History
Foss History
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Tamilinayavaani - integrating tva open-source spellchecker with python