The World's First All-Machine Hacking Tournament

THE WORLD'S FIRST
ALL-MACHINE
HACKING
TOURNAMENT
Tyler Nighswander

THE CYBER GRAND CHALLENGE
MOTIVATION
▸ Software security assessment does not scale
▸ Estimate 50:1 code writers to code reviewers
▸ Writers create 200 lines of code/week?
▸ Analysts need to review 10k lines/week, 500k/year
Good luck
‣ Oh, and until you catch up, code still being written!

MOTIVATION
▸ Bandwidth issue:
▸ More programs written than programs analyzed
▸ Latency issue:
▸ Takes a lot of time to assess a program from scratch
▸ Triaging any security issues is very time consuming

MOTIVATION
High
Bandwidth
Low
Bandwidth
High
Latency

MOTIVATION
High
Bandwidth
Medium
Bandwidth
Moderate
Latency
UNREALISTIC

MOTIVATION
Medium
Bandwidth
Low
Bandwidth
High
Latency
BAD FOR
BUSINESS

MOTIVATION
High
Bandwidth
Medium
Bandwidth
High
Latency
NOT
ECONOMICAL

MOTIVATION
High
Bandwidth
High
Bandwidth
Low
Latency
STILL MANY
YEARS AWAY

MOTIVATION
High
Bandwidth
High
Bandwidth
Low
Latency

THE BEGINNINGS
▸ Defense Advanced Research Projects Agency (DARPA)
▸ Helped create Internet, GPS, stealth, onion routing, etc.
▸ Famously hosts “challenges”
CMU Sandstorm Google Self Driving Car

THE BEGINNINGS
▸ Idea: host one of these challenges for computer security
▸ Model after Capture the Flag competition
▸ Offer USD$750,000 to top 7 teams to qualify
▸ Offer USD$2,000,000 to top team in ﬁnal contest
▸ Cyber Grand Challenge

ABOUT ME
▸ Playing in CTFs for about 7 years with PPP
▸ Top ranked CTF team in the world(usually) !
▸ Graduated from CMU
▸ Winner/ﬁnalist in several DARPA Challenges
▸ Researcher at ForAllSecure
▸ Based on a decade of research in automated security
Perfect ﬁt !

ABOUT CGC
▸ Attack & Defense CTF Format: REFEREE
PLAYERS

ABOUT CGC
▸ Attack & Defense CTF Format:

ABOUT CGC
+10 POINTS
FOR RED
-10 POINTS
FOR GREEN

ABOUT CGC
IS YOUR
SERVICE RUNNING?

ABOUT CGC
▸ Attack & Defense CTF Format: YOUR
SERVICE IS DOWN!
-20 POINTS
IS YOUR
SERVICE RUNNING?

ABOUT CGC
▸ Scores based on:
▸ Security:
▸ Lose points if anyone exploits your system
▸ Offense:
▸ Gain points if you exploit other teams
▸ Availability:
▸ Lose points if performance decreased
▸ Lose points if functionality is decreased

ABOUT CGC
▸ To make scoring more fair+secure, DAPRA runs everything
▸ Give DARPA patched binaries to run
▸ Give DARPA a program that sends exploits

MACHINES TALK
WITH API
REFEREE RUNS
ITS OWN GAME

ABOUT CGC
▸ Availability scoring very strict!
▸ Lose points for any functionality failures
▸ Measured based on pollers written alongside programs
▸ Pollers hidden, but trafﬁc is shown
▸ No speciﬁcation given!
▸ Lose points if time or memory usage increase by 5%
15% OVERHEAD
70% SCORE

ABOUT CGC
▸ Create a new executable format, modeled after ELF
▸ Seven system calls:
▸ exit
▸ transmit
▸ receive
▸ fdwait
▸ allocate
▸ deallocate
▸ random
▸ Makes running foreign code more safe
▸ Makes automated analysis easier vs Linux/Windows/OSX

ABOUT CGC
▸ No ﬁle system or fork or exec… what is an exploit?
▸ Type 1:
▸ Control of register and instruction pointer
▸ Type 2:
▸ Leak of sensitive data
Codenomicon/Synopsys

ABOUT CGC
▸ With 7 system calls, still plenty of complexity!
▸ Programs can have:
▸ Several binaries communicating
▸ User-space threads
▸ Non-determinism, nonces, checksums
▸ C or C++ code compiled to x86 assembly

ABOUT CGC
▸ Easily able to recreate several famous bugs:
▸ Heartbleed
▸ LNK exploit (Stuxnet infection vector)
▸ Crackaddr bug
▸ SQL Slammer
▸ All inside the more simple CGC framework

MAYHEM
▸ System created by ForAllSecure
▸ (spoiler alert: winning system!)
▸ Overall architecture roughly similar to other teams
▸ Several independent components with different tasks
▸ Use centralized database for sharing information
▸ Let’s dive in!

MAYHEM: ROBUSTNESS
▸ Must be 100% automated!
▸ Keeping a system running is hard!
▸ Running untrusted code is hard!
▸ Writing software to analyze unknown programs is hard!
▸ Distributed systems are hard!
▸ Now do all of this in an adversarial setting!

MAYHEM: ROBUSTNESS
▸ Step 1: Sandbox everything
▸ Run tools with seccomp
▸ … with resource limits
▸ … inside of VMs
▸ Step 2: Redundancy wherever possible
▸ Graceful failover of at least any single node
▸ Step 3: Assume things will fail, and recover when they do
▸ Liveness checks for all components
▸ Make sure components are easy to restart

MAYHEM: BUG FINDING
▸ Symbolic execution
▸ State of the art technique in academia
▸ Represent program as symbolic expressions
▸ Use SAT/SMT solvers to generate inputs

int main(int argc, char** argv) {
if (argc < 3)
return -1;
int a = atoi(argv[1]);
int b = atoi(argv[2]);
if (a == 42) {
if (b == 31337) {
puts(“You made it!”);
}
return 1;
}
return 0;
}
TREAT
SYMBOLICLY!
LEN(ARGV) >= 3
RETURN -1
42 == ATOI(ARGV[1])
31337 == ATOI(ARGV[2])
RETURN 0 RETURN 1
PRINT

MAYHEM: BUG FINDING
▸ Fuzzing
▸ Standard technique in industry
▸ Send concrete values to a program and execute it
▸ Can also “instrument” the program
▸ Reveal internal state about what it did with inputs
▸ Possible using a variety of techniques…

A = 239075, B = 75291
if (a == 42) {
if (b == 31337) {
puts(“You made it!”);
}
return 1;
}
return 0;
42 == A
31337 == B
PRINT
0.00000002%
0.00000002%
RETURN 0
99.99999998%
A = 239075, B = 31337A = 42, B = 9852756A = 42, B = 24752A = 42, B = 31337

MAYHEM: BUG FINDING
▸ Instrumentation for fuzzing:
▸ Efﬁcient ways to instrument are still an area of research
▸ QEMU, PIN, DynInst, DynamoRIO, etc ?
▸ All incur a 2-10x slow-down!
▸ Making a fuzzer too “smart” only makes it slow!

MAYHEM: BUG FINDING
Fuzzing
very easy to setup and run
10-10k executions/second
doesn’t “know” anything
light on resources
explore shallow paths fast
stalling
Symbolic Execution
usually very complex
1-∞ seconds/“execution”
“understands” the program
very resource intensive
explore deep paths slowly
path explosion

MAYHEM: BUG FINDING
▸ Fuzzing and Symbolic execution
▸ Combine the best of both worlds
▸ Fuzz a program to uncover shallow paths
▸ Use symbolic execution to ﬁnd deep, complex paths
▸ To work together, simply share inputs
▸ Keep track of which inputs did something “interesting”
▸ Share the interesting inputs

MAYHEM: BUG EXPLOITING
▸ After you have a bug, need to turn it into an exploit!
▸ Again, use symbolic execution:
▸ In addition to path constraints ask for EIP set to value
▸ Can even use SMT solver in exploit script!
▸ A hybrid approach here works as well:
▸ Use fuzzing of crashes to get deeper crashes
▸ Use symbolic execution to generate exploits
▸ Many of our exploits are actually pretty complex

MAYHEM: BUG EXPLOITING
▸ Given ability to write a single null byte, generate exploit
▸ Basic exploit chain:
▸ Overwrite lowest byte of saved EBP with 0
▸ Stack frame above uses EBP-relative address for buffer
▸ Future writes to that buffer will now clobber memory
▸ Send a command to do another write, overﬂow buffer
▸ Overwrite EIP, and win!
▸ Sound difﬁcult?
▸ Mayhem found and exploited this bug in 105 minutes

MAYHEM: PATCHING
▸ First: how can you modify a compiled program?
▸ “Hot patch”
▸ Static binary rewriting
▸ Recompilation

MAYHEM: PATCHING
▸ Hot patching
▸ Find or create more space in the program
▸ Redirect basic blocks to new location

MAYHEM: PATCHING
▸ Hot patching
▸ Find or create more space in the program
▸ Redirect basic blocks to new location
▸ Beneﬁts:
▸ Easy to write, unlikely to break program
▸ Disadvantages:
▸ Poor speed performance, poor memory performance

MAYHEM: PATCHING
▸ “Just change the code in the binary”
▸ …then go back and ﬁx everything you broke

MAYHEM: PATCHING
▸ “Just change the code in the binary”
▸ …then go back and ﬁx everything you broke
▸ Beneﬁts:
▸ Excellent memory performance, good speed
▸ Disadvantages:
▸ Very easy to break functionality of the program

MAYHEM: PATCHING
▸ Recompilation
▸ Create high level representation of program
▸ Apply high level patches, and then compile

0: CMP EAX, ECX
1: JNE EXIT
2: MOV EAX, [EDX+4]
IF (A == B)
A = D[1]
ELSE
EXIT()
IF (A != B++)
A = D[1]
ELSE
EXIT()
0: INC EDX
1: CMP EBX, EDX
2: JE EXIT
3: MOV EBX, [ECX+4]
LIFT TO
HIGH LEVEL
APPLY
HIGH LEVEL
PATCH
COMPILE

MAYHEM: PATCHING
▸ Recompilation
▸ Create high level representation of program
▸ Apply high level patches, and then compile
▸ Beneﬁts:
▸ Performance can rival that of compiled code
▸ Easy to do high level transformations and patches
▸ Disadvantages:
▸ Requires near-perfect decompiler
▸ Needs optimizations or performance will decrease

MAYHEM: PATCHING
▸ Helpful trick: use patching for instrumentation for fuzzing!
▸ Use patching process to insert very performant code
▸ Add code to log information about code paths
▸ Use this to replace compile-time instrumentation!

MAYHEM: PATCHING
▸ Once a program can be modiﬁed:
▸ Where to patch?
▸ What kinds of protections to add?

MAYHEM: PATCHING
▸ Given a binary, where should patches be applied?
▸ Every memory access?
▸ Way too slow…
▸ Every memory access that can’t be proven safe?
▸ Better, but still very slow…
▸ Memory accesses that look scary?
▸ Moderate performance, but not as safe
▸ Difﬁcult to decide where to protect

MAYHEM: PATCHING
▸ Given a binary, where should patches be applied?
▸ Wherever bugs are found!
▸ Excellent choice… if you can find all the bugs
▸ Minimal overhead, protects whatever you find
▸ Whatever can affect control flow
▸ Protect jumps, returns, calls, and so on
▸ If all are protected, no control flow hijacks possible
▸ Still moderate overhead

MAYHEM: TRIAGE
▸ If we want to patch bugs… what does that mean?
▸ Given a program and crashing input what is “responsible”?
▸ Access violation: the registers used in the access
▸ Execution violation: backtrack to control ﬂow instruction
▸ Add in checks that verify instruction won’t crash
▸ These do not address the root cause…

MAYHEM: TRIAGE
▸ To attempt to ﬁnd the root cause, use “normal” inputs
▸ Run the program, inspect program state near the crash
▸ Use this to infer invariants violated by crashes
▸ To patch, simply add in explicit checks for invariants!
▸ This is not perfect, and is still complex, but it’s a start…

MAYHEM: PATCHING
▸ What if we don’t ﬁnd 100% of the bugs?
▸ We can also apply generic patches
▸ Standard protections normally done by compilers
▸ Stack canaries
▸ Data Execution Prevention
▸ Control Flow Integrity
▸ Out of bounds read/write protection
▸ Even for compilers these protections are not trivial nor free
▸ Unlike compilers, we don’t have source code!
RETURN ADDRESS
SAVED EBP
LOCAL BUFFER
LOCAL ARGS
RETURN ADDRESS
CANARY
SAVED EBP
LOCAL BUFFER
LOCAL ARGS
…
if (a == 5)
foo()
return 0;
}
WHAT
CAN POSSIBLY
COME NEXT?

MAYHEM: PATCHING
▸ Use static analysis to recover program information
▸ Take shortcuts!
▸ Perfect CFI requires a lot of program information
▸ “Pretty good” CFI is still very powerful!
▸ Get free registers, etc. as a compiler would have
▸ Generate efﬁcient, lightweight patches
▸ Run thousands of benchmarks to ensure performance!

MAYHEM: PERFORMANCE CHECKING
▸ Performance only measurable when you know inputs
▸ Run valid inputs, measure speed/memory usage
▸ Make sure they output the same as unmatched version
▸ This is great for offline testing, but how about in a CTF?
▸ Don’t know “junk” traffic from “real” traffic
▸ Don’t know “proper” outputs from program
▸ Use inputs generated by bug finding components
▸ Things that don’t crash were “interesting”!
▸ Interesting means: uses program in non-trivial way

MAYHEM: DECISION MAKING
▸ How can an automated system make informed choices?
▸ Given several patches, which one is best?
▸ Is the system being attacked?
▸ Which exploits should be used?
▸ These questions are very important for cyber security!

▸ Several ways to approach this problem:
▸ Fixed strategy
▸ Easy to code, won’t surprise you
▸ Collection of strategies
▸ Easy to code, less naive, still no surprises
▸ Fully dynamic/learned strategy
▸ Difﬁcult to code, lots of surprises!

▸ Surprise isn’t always bad!
▸ Pre-written strategies can only do what you’ve planned for!
▸ What if:
▸ Teams are stronger/weaker than expected
▸ Other teams ﬁnd bugs which you do not
▸ Challenges have several unexpected bugs
▸ Things break in new, spectacular ways (because they will)

▸ One of many approaches: Bayesian estimation
▸ How likely is it we ﬁnd a crash through fuzzing?
▸ How likely is it a random crash is exploitable?
▸ How many crashes/exploits have we found?
▸ How correlated are our exploits with other teams?
▸ How likely is it team X patches randomly?
▸ How correlated are team X’s patches and exploits?
▸ Combine: How likely is it we are being attacked?

▸ Very difficult to tell which decisions are “correct”!
▸ But as a whole, decisions are very intelligent
▸ This is not just applicable for games!
▸ Is this traffic malicious or benign?
▸ Should servers be temporarily shut off?
▸ What are the costs/benefits of patching?
▸ Imagine deciding all of this in real time!

MAYHEM: TYING IT TOGETHER…
1. Receive compiled program
2. Generate patched version, and instrumented version
3. Use symbolic execution and fuzzing to uncover new inputs
4. Use non-crashing inputs to test patches
5. Use crashing inputs to generate exploits
6. Win

THE FUTURE
▸ What is the future for this technology?
▸ Testing:
▸ Test software security during development cycle
▸ Test software security during deployment
▸ This future is very close to reality!

THE FUTURE
▸ Triage + offense research:
▸ Lots of fuzzers produce crashes
▸ Most of these crashes are not security issues
▸ Very time consuming to sort through them!
▸ Automated system can produce exploit ⇒ security issue

THE FUTURE
▸ Response:
▸ Detecting 0-day exploits remains incredibly difﬁcult
▸ Humans take very long time to analyze/detect exploits
▸ Automatically detect exploits
▸ Automatically triage exploits to help human response
▸ Automatically use exploits to create patches
{

THE FUTURE
▸ Protection:
▸ Automatic software patching
▸ Harden/patch no-longer-supported software
▸ Harden/patch software by lazy vendors
▸ Automatic testing for patches

THE FUTURE
▸ Where are we today?
CMU Sandstorm
Google Self Driving Car
Mercedes Concept
X

THE FUTURE
▸ What still needs to be done:
▸ Porting to modern systems (partially done)
▸ Scaling to large scale programs
▸ More advanced techniques for exploitation

THE FUTURE
Image credit: Orion Pictures

THE FUTURE
Japan Daily Press: Robotic band Z-Machines with human band Amoyamo

The World's First All-Machine Hacking Tournament

The World's First All-Machine Hacking Tournament

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à The World's First All-Machine Hacking Tournament

Similaire à The World's First All-Machine Hacking Tournament (20)

Plus de CODE BLUE

Plus de CODE BLUE (20)

Dernier

Dernier (20)

The World's First All-Machine Hacking Tournament