Regular Expression Injection occurs when an attacker supplies malicious input that modifies the intended regular expression in a way that breaks the program's specifications. This can impact control flow, leak information, or cause denial-of-service vulnerabilities. The document discusses regular expressions, how to find regular expression injection issues through error-based or blind injection techniques, demonstrates an example exploit, and provides mitigation strategies like input validation before using regular expressions and killing expressions that take too long.
2. Who am I?
Name: Elton Crasto
Designation: Security Analyst
Twitter: @xd1810
3. Objectives:- 1. What is Regular Expression?
2. What is Regular Expression
Injection?
3. ReDos?
4. How do we Find it?
5. Exploit Demo
6. Mitigation
4. What is Regular
Expression?
Regular Expressions (regex) are
widely used to match strings of text.
For example, the grep utility supports
regular expressions for finding patterns
in the specified text.
7. Regular Expression
vs UI validation?
Main difference between them is :
An attacker can easily perform an HTTP
request without using a browser (using
proxy like Burp)and then send a payload
that can compromise our application.
Regex is difficult to set up correctly.
8. What is Regular
Expression
Injection?
An attacker may supply a malicious
input that modifies the original
regular expression in such a way that
the regex fails to comply with the
program's specification.
This attack is called a Regex
injection or Regular Expression
Injection, might affect control flow,
cause information leaks, or result in
denial-of-service (DOS) or ReDOS
vulnerabilities.
9. ReDOS?
ReDoS stands for Regular
Expression Denial of Service.
The ReDoS is an algorithmic
complexity attack that produces a
denial of service by providing a
regular expression that takes a very
long time to evaluate.
For example :
Regex: ^((ab)*)+$ (this regex
searches for ab and its repetition)
input:abababababab
10. ReDOS?
Now we can complicate things very
easily by throwing in abababa as the
input. This extra a in the end will
cause all kinds of trouble since it
does not match the pattern and it will
make the regex engine run all kinds
of permutation looking for a possible
match.
11. How do we find it?
Mostly like all injections we find it
with help of methods such as
A.Error Based
B.Blind Based [Fairly new]
Error-based is an in-band Injection
technique that relies on error
messages thrown by the server to
obtain information about the structure
of the regex.
Blind-Based is injection technique
that relies on time take to respond by
the server based on input.
12. Exploit Demo Detection:-
Below we have an application which has 2 types
of logs , private and public . Private logs can only
be seen by admin and public can be seen by all
Registered users.
13. Exploit Demo
Now on inputting any character the application
uses regex to find letters in public logs.But What if
i want to see private logs too.
So we try inputting all characters to see which one
isn't escaped.
Eg: !@#$%^&*()_+abcdefg....etc
On putting * we get to see an error which shows
us the regex code used.
14. Exploit Demo
Exploit:
Now that we know what the regex is all we have to do
is bypass it by tampering with the input to complete
the regex.
For example for the above regex which uses
.*<input>.* .We can easily bypass it with
. * ) | ( . *
Which gives us the following output:-
15. Mitigation ● Input validation/sanitization should
be done and then sent to regex.
String sanitized = subject.replaceAll("[ + * / ]");
Pattern regex = Pattern.compile(sanitized);
● If a regular expression takes too
long, kill it at once, and inform the
user that the regular expression
was taking too long.