The Cyber Grand Challenge (CGC) was announced in 2013--a first-of-its-kind competition in which fully autonomous systems would compete in a Capture The Flag (CTF) tournament. Starting from over 100 teams consisting of some of the top security researchers and hackers in the world, only 7 teams qualified to the final round. These 7 teams competed against eachother to guard their own software with IDS rules and software patches while attacking the other systems. All of this was done without access to program source code nor access to humans.
This never-before-seen level of autonomy demonstrated the state of the art in areas of computer security including static analysis, automated bug finding, automatic exploit generation, and automatic software patching. Over the course of just 10 hours, these systems competed to analyze over 80 totally new pieces of software, showing capabilities beyond what anyone has ever seen before.
In this talk we will discuss the Cyber Grand Challenge, explaining what it entailed, what the results mean, and how these advances will influence software security in the near future. Additionally, we will share lessons learned from the winning CGC team, and take a look at the future of automatic software analysis.
--- Tyler Nighswander
Tyler has been a computer hacker for several years. While an undergraduate student at Carnegie Mellon University, Tyler was one of the initial members of the hacking team known as the Plaid Parliament of Pwning. This team rose from a small group of students to the number one competitive hacking team in the world. After traveling around the world competing in hacking competitions, Tyler settled down and now works on making humans and computers think more like hackers at ForAllSecure. In 2016, the automated system he helped create won the DARPA Cyber Grand Challenge.
3. THE CYBER GRAND CHALLENGE
MOTIVATION
▸ Software security assessment does not scale
▸ Estimate 50:1 code writers to code reviewers
▸ Writers create 200 lines of code/week?
▸ Analysts need to review 10k lines/week, 500k/year
Good luck
‣ Oh, and until you catch up, code still being written!
4. THE CYBER GRAND CHALLENGE
MOTIVATION
▸ Bandwidth issue:
▸ More programs written than programs analyzed
▸ Latency issue:
▸ Takes a lot of time to assess a program from scratch
▸ Triaging any security issues is very time consuming
5. THE CYBER GRAND CHALLENGE
MOTIVATION
High
Bandwidth
Low
Bandwidth
High
Latency
6. THE CYBER GRAND CHALLENGE
MOTIVATION
High
Bandwidth
Medium
Bandwidth
Moderate
Latency
UNREALISTIC
7. THE CYBER GRAND CHALLENGE
MOTIVATION
Medium
Bandwidth
Low
Bandwidth
High
Latency
BAD FOR
BUSINESS
8. THE CYBER GRAND CHALLENGE
MOTIVATION
High
Bandwidth
Medium
Bandwidth
High
Latency
NOT
ECONOMICAL
9. THE CYBER GRAND CHALLENGE
MOTIVATION
High
Bandwidth
High
Bandwidth
Low
Latency
STILL MANY
YEARS AWAY
10. THE CYBER GRAND CHALLENGE
MOTIVATION
High
Bandwidth
High
Bandwidth
Low
Latency
11.
12. THE CYBER GRAND CHALLENGE
THE BEGINNINGS
▸ Defense Advanced Research Projects Agency (DARPA)
▸ Helped create Internet, GPS, stealth, onion routing, etc.
▸ Famously hosts “challenges”
CMU Sandstorm Google Self Driving Car
13. THE CYBER GRAND CHALLENGE
THE BEGINNINGS
▸ Idea: host one of these challenges for computer security
▸ Model after Capture the Flag competition
▸ Offer USD$750,000 to top 7 teams to qualify
▸ Offer USD$2,000,000 to top team in final contest
▸ Cyber Grand Challenge
14. THE CYBER GRAND CHALLENGE
ABOUT ME
▸ Playing in CTFs for about 7 years with PPP
▸ Top ranked CTF team in the world(usually) !
▸ Graduated from CMU
▸ Winner/finalist in several DARPA Challenges
▸ Researcher at ForAllSecure
▸ Based on a decade of research in automated security
Perfect fit !
15. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Attack & Defense CTF Format: REFEREE
PLAYERS
16. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Attack & Defense CTF Format:
17. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Attack & Defense CTF Format:
+10 POINTS
FOR RED
-10 POINTS
FOR GREEN
18. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Attack & Defense CTF Format:
IS YOUR
SERVICE RUNNING?
19. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Attack & Defense CTF Format:
20. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Attack & Defense CTF Format: YOUR
SERVICE IS DOWN!
-20 POINTS
IS YOUR
SERVICE RUNNING?
21. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Scores based on:
▸ Security:
▸ Lose points if anyone exploits your system
▸ Offense:
▸ Gain points if you exploit other teams
▸ Availability:
▸ Lose points if performance decreased
▸ Lose points if functionality is decreased
22. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ To make scoring more fair+secure, DAPRA runs everything
▸ Give DARPA patched binaries to run
▸ Give DARPA a program that sends exploits
24. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Availability scoring very strict!
▸ Lose points for any functionality failures
▸ Measured based on pollers written alongside programs
▸ Pollers hidden, but traffic is shown
▸ No specification given!
▸ Lose points if time or memory usage increase by 5%
15% OVERHEAD
70% SCORE
25. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Create a new executable format, modeled after ELF
▸ Seven system calls:
▸ exit
▸ transmit
▸ receive
▸ fdwait
▸ allocate
▸ deallocate
▸ random
▸ Makes running foreign code more safe
▸ Makes automated analysis easier vs Linux/Windows/OSX
26. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ No file system or fork or exec… what is an exploit?
▸ Type 1:
▸ Control of register and instruction pointer
▸ Type 2:
▸ Leak of sensitive data
Codenomicon/Synopsys
27. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ With 7 system calls, still plenty of complexity!
▸ Programs can have:
▸ Several binaries communicating
▸ User-space threads
▸ Non-determinism, nonces, checksums
▸ C or C++ code compiled to x86 assembly
28. THE CYBER GRAND CHALLENGE
ABOUT CGC
▸ Easily able to recreate several famous bugs:
▸ Heartbleed
▸ LNK exploit (Stuxnet infection vector)
▸ Crackaddr bug
▸ SQL Slammer
▸ All inside the more simple CGC framework
29. THE CYBER GRAND CHALLENGE
MAYHEM
▸ System created by ForAllSecure
▸ (spoiler alert: winning system!)
▸ Overall architecture roughly similar to other teams
▸ Several independent components with different tasks
▸ Use centralized database for sharing information
▸ Let’s dive in!
30. THE CYBER GRAND CHALLENGE
MAYHEM: ROBUSTNESS
▸ Must be 100% automated!
▸ Keeping a system running is hard!
▸ Running untrusted code is hard!
▸ Writing software to analyze unknown programs is hard!
▸ Distributed systems are hard!
▸ Now do all of this in an adversarial setting!
31. THE CYBER GRAND CHALLENGE
MAYHEM: ROBUSTNESS
▸ Step 1: Sandbox everything
▸ Run tools with seccomp
▸ … with resource limits
▸ … inside of VMs
▸ Step 2: Redundancy wherever possible
▸ Graceful failover of at least any single node
▸ Step 3: Assume things will fail, and recover when they do
▸ Liveness checks for all components
▸ Make sure components are easy to restart
32. THE CYBER GRAND CHALLENGE
MAYHEM: BUG FINDING
▸ Symbolic execution
▸ State of the art technique in academia
▸ Represent program as symbolic expressions
▸ Use SAT/SMT solvers to generate inputs
33. int main(int argc, char** argv) {
if (argc < 3)
return -1;
int a = atoi(argv[1]);
int b = atoi(argv[2]);
if (a == 42) {
if (b == 31337) {
puts(“You made it!”);
}
return 1;
}
return 0;
}
TREAT
SYMBOLICLY!
LEN(ARGV) >= 3
RETURN -1
42 == ATOI(ARGV[1])
31337 == ATOI(ARGV[2])
RETURN 0 RETURN 1
PRINT
34. THE CYBER GRAND CHALLENGE
MAYHEM: BUG FINDING
▸ Fuzzing
▸ Standard technique in industry
▸ Send concrete values to a program and execute it
▸ Can also “instrument” the program
▸ Reveal internal state about what it did with inputs
▸ Possible using a variety of techniques…
35. A = 239075, B = 75291
if (a == 42) {
if (b == 31337) {
puts(“You made it!”);
}
return 1;
}
return 0;
42 == A
31337 == B
PRINT
0.00000002%
0.00000002%
RETURN 0
99.99999998%
A = 239075, B = 31337A = 42, B = 9852756A = 42, B = 24752A = 42, B = 31337
36. THE CYBER GRAND CHALLENGE
MAYHEM: BUG FINDING
▸ Instrumentation for fuzzing:
▸ Efficient ways to instrument are still an area of research
▸ QEMU, PIN, DynInst, DynamoRIO, etc ?
▸ All incur a 2-10x slow-down!
▸ Making a fuzzer too “smart” only makes it slow!
37. THE CYBER GRAND CHALLENGE
MAYHEM: BUG FINDING
Fuzzing
very easy to setup and run
10-10k executions/second
doesn’t “know” anything
light on resources
explore shallow paths fast
stalling
Symbolic Execution
usually very complex
1-∞ seconds/“execution”
“understands” the program
very resource intensive
explore deep paths slowly
path explosion
38. THE CYBER GRAND CHALLENGE
MAYHEM: BUG FINDING
▸ Fuzzing and Symbolic execution
▸ Combine the best of both worlds
▸ Fuzz a program to uncover shallow paths
▸ Use symbolic execution to find deep, complex paths
▸ To work together, simply share inputs
▸ Keep track of which inputs did something “interesting”
▸ Share the interesting inputs
39. THE CYBER GRAND CHALLENGE
MAYHEM: BUG EXPLOITING
▸ After you have a bug, need to turn it into an exploit!
▸ Again, use symbolic execution:
▸ In addition to path constraints ask for EIP set to value
▸ Can even use SMT solver in exploit script!
▸ A hybrid approach here works as well:
▸ Use fuzzing of crashes to get deeper crashes
▸ Use symbolic execution to generate exploits
▸ Many of our exploits are actually pretty complex
40.
41. THE CYBER GRAND CHALLENGE
MAYHEM: BUG EXPLOITING
▸ Given ability to write a single null byte, generate exploit
▸ Basic exploit chain:
▸ Overwrite lowest byte of saved EBP with 0
▸ Stack frame above uses EBP-relative address for buffer
▸ Future writes to that buffer will now clobber memory
▸ Send a command to do another write, overflow buffer
▸ Overwrite EIP, and win!
▸ Sound difficult?
▸ Mayhem found and exploited this bug in 105 minutes
42. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ First: how can you modify a compiled program?
▸ “Hot patch”
▸ Static binary rewriting
▸ Recompilation
43. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Hot patching
▸ Find or create more space in the program
▸ Redirect basic blocks to new location
44.
45.
46.
47. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Hot patching
▸ Find or create more space in the program
▸ Redirect basic blocks to new location
▸ Benefits:
▸ Easy to write, unlikely to break program
▸ Disadvantages:
▸ Poor speed performance, poor memory performance
48. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Static binary rewriting
▸ “Just change the code in the binary”
▸ …then go back and fix everything you broke
49.
50.
51.
52. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Static binary rewriting
▸ “Just change the code in the binary”
▸ …then go back and fix everything you broke
▸ Benefits:
▸ Excellent memory performance, good speed
▸ Disadvantages:
▸ Very easy to break functionality of the program
53. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Recompilation
▸ Create high level representation of program
▸ Apply high level patches, and then compile
55. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Recompilation
▸ Create high level representation of program
▸ Apply high level patches, and then compile
▸ Benefits:
▸ Performance can rival that of compiled code
▸ Easy to do high level transformations and patches
▸ Disadvantages:
▸ Requires near-perfect decompiler
▸ Needs optimizations or performance will decrease
56. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Helpful trick: use patching for instrumentation for fuzzing!
▸ Use patching process to insert very performant code
▸ Add code to log information about code paths
▸ Use this to replace compile-time instrumentation!
57. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Once a program can be modified:
▸ Where to patch?
▸ What kinds of protections to add?
58. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Given a binary, where should patches be applied?
▸ Every memory access?
▸ Way too slow…
▸ Every memory access that can’t be proven safe?
▸ Better, but still very slow…
▸ Memory accesses that look scary?
▸ Moderate performance, but not as safe
▸ Difficult to decide where to protect
59. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Given a binary, where should patches be applied?
▸ Wherever bugs are found!
▸ Excellent choice… if you can find all the bugs
▸ Minimal overhead, protects whatever you find
▸ Whatever can affect control flow
▸ Protect jumps, returns, calls, and so on
▸ If all are protected, no control flow hijacks possible
▸ Still moderate overhead
60. THE CYBER GRAND CHALLENGE
MAYHEM: TRIAGE
▸ If we want to patch bugs… what does that mean?
▸ Given a program and crashing input what is “responsible”?
▸ Access violation: the registers used in the access
▸ Execution violation: backtrack to control flow instruction
▸ Add in checks that verify instruction won’t crash
▸ These do not address the root cause…
61. THE CYBER GRAND CHALLENGE
MAYHEM: TRIAGE
▸ To attempt to find the root cause, use “normal” inputs
▸ Run the program, inspect program state near the crash
▸ Use this to infer invariants violated by crashes
▸ To patch, simply add in explicit checks for invariants!
▸ This is not perfect, and is still complex, but it’s a start…
62. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ What if we don’t find 100% of the bugs?
▸ We can also apply generic patches
▸ Standard protections normally done by compilers
▸ Stack canaries
▸ Data Execution Prevention
▸ Control Flow Integrity
▸ Out of bounds read/write protection
▸ Even for compilers these protections are not trivial nor free
▸ Unlike compilers, we don’t have source code!
RETURN ADDRESS
SAVED EBP
LOCAL BUFFER
LOCAL ARGS
RETURN ADDRESS
CANARY
SAVED EBP
LOCAL BUFFER
LOCAL ARGS
…
if (a == 5)
foo()
return 0;
}
WHAT
CAN POSSIBLY
COME NEXT?
63. THE CYBER GRAND CHALLENGE
MAYHEM: PATCHING
▸ Use static analysis to recover program information
▸ Take shortcuts!
▸ Perfect CFI requires a lot of program information
▸ “Pretty good” CFI is still very powerful!
▸ Get free registers, etc. as a compiler would have
▸ Generate efficient, lightweight patches
▸ Run thousands of benchmarks to ensure performance!
64. THE CYBER GRAND CHALLENGE
MAYHEM: PERFORMANCE CHECKING
▸ Performance only measurable when you know inputs
▸ Run valid inputs, measure speed/memory usage
▸ Make sure they output the same as unmatched version
▸ This is great for offline testing, but how about in a CTF?
▸ Don’t know “junk” traffic from “real” traffic
▸ Don’t know “proper” outputs from program
▸ Use inputs generated by bug finding components
▸ Things that don’t crash were “interesting”!
▸ Interesting means: uses program in non-trivial way
65. THE CYBER GRAND CHALLENGE
MAYHEM: DECISION MAKING
▸ How can an automated system make informed choices?
▸ Given several patches, which one is best?
▸ Is the system being attacked?
▸ Which exploits should be used?
▸ These questions are very important for cyber security!
66. THE CYBER GRAND CHALLENGE
MAYHEM: DECISION MAKING
▸ Several ways to approach this problem:
▸ Fixed strategy
▸ Easy to code, won’t surprise you
▸ Collection of strategies
▸ Easy to code, less naive, still no surprises
▸ Fully dynamic/learned strategy
▸ Difficult to code, lots of surprises!
67. THE CYBER GRAND CHALLENGE
MAYHEM: DECISION MAKING
▸ Surprise isn’t always bad!
▸ Pre-written strategies can only do what you’ve planned for!
▸ What if:
▸ Teams are stronger/weaker than expected
▸ Other teams find bugs which you do not
▸ Challenges have several unexpected bugs
▸ Things break in new, spectacular ways (because they will)
68. THE CYBER GRAND CHALLENGE
MAYHEM: DECISION MAKING
▸ One of many approaches: Bayesian estimation
▸ How likely is it we find a crash through fuzzing?
▸ How likely is it a random crash is exploitable?
▸ How many crashes/exploits have we found?
▸ How correlated are our exploits with other teams?
▸ How likely is it team X patches randomly?
▸ How correlated are team X’s patches and exploits?
▸ Combine: How likely is it we are being attacked?
69. THE CYBER GRAND CHALLENGE
MAYHEM: DECISION MAKING
▸ Very difficult to tell which decisions are “correct”!
▸ But as a whole, decisions are very intelligent
▸ This is not just applicable for games!
▸ Is this traffic malicious or benign?
▸ Should servers be temporarily shut off?
▸ What are the costs/benefits of patching?
▸ Imagine deciding all of this in real time!
70. THE CYBER GRAND CHALLENGE
MAYHEM: TYING IT TOGETHER…
1. Receive compiled program
2. Generate patched version, and instrumented version
3. Use symbolic execution and fuzzing to uncover new inputs
4. Use non-crashing inputs to test patches
5. Use crashing inputs to generate exploits
6. Win
71. THE CYBER GRAND CHALLENGE
THE FUTURE
▸ What is the future for this technology?
▸ Testing:
▸ Test software security during development cycle
▸ Test software security during deployment
▸ This future is very close to reality!
72. THE CYBER GRAND CHALLENGE
THE FUTURE
▸ Triage + offense research:
▸ Lots of fuzzers produce crashes
▸ Most of these crashes are not security issues
▸ Very time consuming to sort through them!
▸ Automated system can produce exploit ⇒ security issue
73. THE CYBER GRAND CHALLENGE
THE FUTURE
▸ Response:
▸ Detecting 0-day exploits remains incredibly difficult
▸ Humans take very long time to analyze/detect exploits
▸ Automatically detect exploits
▸ Automatically triage exploits to help human response
▸ Automatically use exploits to create patches
{
74. THE CYBER GRAND CHALLENGE
THE FUTURE
▸ Protection:
▸ Automatic software patching
▸ Harden/patch no-longer-supported software
▸ Harden/patch software by lazy vendors
▸ Automatic testing for patches
75. THE CYBER GRAND CHALLENGE
THE FUTURE
▸ Where are we today?
CMU Sandstorm
Google Self Driving Car
Mercedes Concept
X
76. THE CYBER GRAND CHALLENGE
THE FUTURE
▸ What still needs to be done:
▸ Porting to modern systems (partially done)
▸ Scaling to large scale programs
▸ More advanced techniques for exploitation
77. THE CYBER GRAND CHALLENGE
THE FUTURE
Image credit: Orion Pictures
78. THE CYBER GRAND CHALLENGE
THE FUTURE
Japan Daily Press: Robotic band Z-Machines with human band Amoyamo