A high-school level exploration of the importance of digital strings in various real-life programming applications including natural language processing, password sanitization, etc.
3. What Are Strings
Categories of string algorithms
❏ String Searching
❏ String Manipulation
❏ String Sorting
❏ String Parsing
Pic
4. Applications of String
User Communication
❏ Displaying messages
❏ Taking input
Text Editors
❏ Writing large amounts of text
❏ Finding, replacing and modifying text
Pic
5. Techniques:
❏ Tokenization
❏ Normalization (Stemming,
Lemmatization)
❏ Filtering Stop Words
❏ Using Regular Expressions
❏ Feature Extraction (Bag of Words)
❏ Term Frequency – Inverse Document
Frequency (TF-IDF)
Applications:
❏ Chatbots
❏ Translation Tools
❏ Spam Filtration
❏ Text Summarization
❏ Named Entity Recognition
❏ Autocomplete Features
❏ Social Media Mining
Natural Language Processing
Topic
Joseph Weizenbaum, creator of ELIZA, one of the first
chatbots, and considered a father of modern AI
7. Plagiarism and Spelling Checker
Plagiarism Checker:
❏ String matching is an approach for
plagiarism detection in programming.
This approach is a “character by
character” matching method
❏ A string of text from one document is
taken and then that same string is tried to
find in other documents
Spelling Checker:
❏ Trie is built based on a predefined set of
patterns. Then, this trie is used for string
matching
❏ The text is taken as input, and if any such
pattern occurs, it is shown by reaching
the acceptance state
8. ❏ String contains() method is used to
check the passwords
❏ This method accepts a CharSequence
as an argument and returns true if the
argument is present in a string
otherwise returns false
❏ Firstly the length of the password has
to be checked then whether it contains
uppercase, lowercase, digits and
special characters
❏ If all of them are present then the
method returns true
Password Checker
9. Intrusion Detection System
❏ The data packets containing
intrusion-related keywords
are found by applying string
matching algorithms
❏ All the malicious code is
stored in the database, and
every incoming data is
compared with stored data
❏ If a match is found, then the
alarm is generated
❏ It is based on exact string
matching algorithms where
each intruded packet must
be detected
11. Google Suggestions
When a user makes a Google
Search, Google adds that search
to a list of similar searches (with
same/similar keywords). It then
recommends the top 10-15 most
searched google searches related
to what the user has already
typed. String is used in Google
Suggestions for identifying those
keywords, so that Google can show
the user a relevant
recommendation
12. Digital Forensics
Digital forensics is the acquisition, analysis,
and preservation of data contained in
electronic devices whose information can
be used as evidence in a court of law.
Advanced digital forensic text string search
tools use match and indexing algorithms to
search for digital evidence to locate
specific text strings. They are designed to
achieve 100% query recall (find all instances
of the text strings).
13. Strings can be used for predicting the
composition of DNA and its pattern. DNA is
made of 4 hydrocarbons:
❏ Cytosine [C]
❏ Guanine [G]
❏ Adenine[A]
❏ Thymine [T]
DNA bases pair up with each other, A with T
and C with G, to form units called base pairs
and string programs can predict how they
pair up together based on data from other
similar strings containing information
about DNA
Prediction of DNA Sequences