Solving a CTF Challenge with S2E

2017-08-08

Introduction

Symbolic execution tools such as Angr and Manticore have become increasingly popular for analyzing binaries in Capture the Flag (CTF) challenges.

In this blog post I will show that we can do the same with S2E, using it to solve a reverse engineering challenge from the 2016 Google CTF. This post walks through the process of writing an S2E plugin “from first principles” to solve this challenge.

For comparison, solutions for the challenge using Angr and Manticore are also available. If you want to cheat and skip straight to the end, the final plugin code is available here.

Getting started

As usual, I use s2e-env to organize my S2E environment. The challenge binary is a 64-bit ELF, so I will use S2E’s Debian x86_64 image. For those playing along at home, the instructions on how to use s2e-env can be found here.

Initial analysis

The challenge binary is available here. As other writeups have discussed, an initial static analysis reveals that the binary (a 64-bit ELF executable) accepts a single command line argument (the product activation code). If this argument is not provided the program will print a usage message and exit.

Disassembly view

If the product activation code is provided, the argument is copied into the global buffer dest. The strncpy operation (at address 0x4005B8) copies 67 characters, so we can assume that the product activation code is 67 characters long. We are also told that the product activation code starts with “CTF{”.

Further analysis reveals that the function at 0x400850 is called when an incorrect product activation code is given. The correct product activation call will result in the function at 0x400830 being called and the message “Thank you — product activated!” printed. Therefore, when exploring the program we want to focus on states that will lead to 0x400830 and avoid those that lead to 0x400850. With this information, we can develop an S2E plugin to solve the challenge.

Creating the S2E project

We can use s2e-env’s new_project command to create a new analysis project for the challenge. We can also use our knowledge about the product activation code to automatically add the command line argument to the bootstrap script. Create the project using the following command:

s2e new_project --image debian-8.7.1-x86_64 --sym-args="1" unbreakable \
  `python -c "print('CTF{%s' % ('x' * (67 - 4)))"`

The Python code will generate a 67 character string starting with “CTF{”. --sym-args="1" makes the first argument (i.e. the product activation code) symbolic. Note that the actual contents of the string are not important.

The astute observer might also notice that the printed instructions contain a message regarding S2E’s FunctionModels plugin. I will discuss how we can use this plugin later.

Writing the S2E plugin

We will develop a custom plugin to solve the challenge. Note that this is probably the more complex and time-consuming approach — you could also use S2E’s Lua annotations to avoid having to write C++ code. However, I think that developing a plugin from scratch is a better way to understand how S2E works.

Let’s start with the standard S2E plugin template. Create libs2eplugins/src/Plugins/GoogleCTFUnbreakable.h:

#ifndef S2E_PLUGINS_GOOGLE_CTF_UNBREAKABLE_H
#define S2E_PLUGINS_GOOGLE_CTF_UNBREAKABLE_H

#include <s2e/CorePlugin.h>
#include <s2e/Plugin.h>
#include <s2e/S2E.h>

namespace s2e {
namespace plugins {

class GoogleCTFUnbreakable : public Plugin {
  // Declares an S2E plugin
  S2E_PLUGIN

public:
  // Our constructor doesn't need to do anything
  GoogleCTFUnbreakable(S2E *s2e) : Plugin(s2e) { }

  // This will be called by S2E when registering and configuring the different plugins
  void initialize();

private:
  // We will add some more methods later
};

} // namespace plugins
} // namespace s2e

#endif

And in libs2eplugins/src/Plugins/GoogleCTFUnbreakable.cpp:

#include <s2e/S2E.h>

#include "GoogleCTFUnbreakable.h"

namespace s2e {
namespace plugins {

S2E_DEFINE_PLUGIN(
  GoogleCTFUnbreakable, // Plugin class
  "Solve the Google CTF unbreakable product activation code", // Plugin description
  "", // Unused
); // Plugin dependencies (currently there are none)

void GoogleCTFUnbreakable::initialize() {
}

} // namespace plugins
} // namespace s2e

This is a perfectly valid S2E plugin, it just doesn’t do anything useful. We need to tell the plugin what events we are interested in and how to react to them during runtime. CorePlugin.h gives an idea of what events are available. Events can also be generated by other plugins. For example, the OSMonitor plugin generates events for process creation, module loading/unloading, etc.

So what events are we interested in? From our initial analysis we know the function addresses that indicate success or failure. We can therefore use the onTranslateInstructionStart event to notify us when the code at these addresses is translated by QEMU. Declare an event handler in GoogleCTFUnbreakable.h:

private:
  // The method signature corresponds to the onTranslateInstructionStart signal template in CorePlugin.h
  void onTranslateInstruction(ExecutionSignal *signal,
    S2EExecutionState *state, TranslationBlock *tb, uint64_t pc);

We also need to register our interest in this event, which we do in the plugin’s initialize method.

void GoogleCTFUnbreakable::initialize() {
  s2e()->getCorePlugin()->onTranslateInstructionStart.connect(
    sigc::mem_fun(*this, &GoogleCTFUnbreakable::onTranslateInstruction));
}

This event occurs when code is translated by QEMU. We need to register another event listener to notify our plugin when the code is actually executed. We do this using the ExecutionSignal that is generated by the onTranslateInstructionStart event. In GoogleCTFUnbreakable.cpp (don’t forget to add the method declarations to GoogleCTFUnbreakable.h):

// We found these addresses during our initial analysis in IDA Pro.
// Note that we assume non-PIE addresses
static const uint64_t SUCCESS_ADDRESS = 0x400724;
static const uint64_t FAILURE_ADDRESS = 0x400850;

void GoogleCTFUnbreakable::onTranslateInstruction(ExecutionSignal *signal,
    S2EExecutionState *state, TranslationBlock *tb, uint64_t pc) {
  if (pc == SUCCESS_ADDRESS) {
    // Register a handler for when the "success" code is executed
    signal->connect(sigc::mem_fun(*this, &GoogleCTFUnbreakable::onSuccess));
  } else if (pc == FAILURE_ADDRESS) {
    // Register a handler for when the "failure" code is executed
    signal->connect(sigc::mem_fun(*this, &GoogleCTFUnbreakable::onFailure));
  }
}

void GoogleCTFUnbreakable::onSuccess(S2EExecutionState *state, uint64_t pc) {
  // We will return to this later
}

void GoogleCTFUnbreakable::onFailure(S2EExecutionState *state, uint64_t pc) {
  // There is no reason to continue execution any further. So kill the state
  s2e()->getExecutor()->terminateStateEarly(*state, "Invalid path");
}

Once execution reaches either the success or failure code, there is no reason to continue. We therefore kill the state to avoid wasting resources.

Note that we use absolute addresses for our success and failure code. For position independent code (PIE), you will need to register for module load events, record the load address of the module you are interested in (in this case unbreakable) and calculate the success and failure addresses via offsets at run time.

Unfortunately, there is a problem with our plugin. Remember that S2E performs full-system emulation, meaning other processes may be executing at the same time that we are analyzing unbreakable. Virtual addressing also means that there may be other processes that have code at SUCCESS_ADDRESS and FAILURE_ADDRESS. If this code is executed, our plugin’s onSuccess and onFailure methods will execute, potentially interfering with our analysis. We will avoid this problem with the ProcessExecutionDetector plugin.

The ProcessExecutionDetector tracks the execution of processes in the system. It is configurable so that only processes of interest are tracked. To add this plugin as a dependency, in GoogleCTFUnbreakable.cpp:

#include <s2e/Plugins/OSMonitors/Support/ProcessExecutionDetector.h>

S2E_DEFINE_PLUGIN(
  GoogleCTFUnbreakable,
  "Solve the Google CTF unbreakable product activation code",
  "",
  "ProcessExecutionDetector"
); // Plugin dependency

void GoogleCTFUnbreakable::initialize() {
  m_procDetector = s2e()->getPlugin<ProcessExecutionDetector>();

  // ...
}

And add the ProcessExecutionDetector as a private member of the GoogleCTFUnbreakable class:

// In GoogleCTFUnbreakable.h
class GoogleCTFUnbreakable : public Plugin {
// ...

private:
  ProcessExecutionDetector *m_procDetector;

// ...
};

Now we can filter out all other processes except the unbreakable process. The following code should be added to the beginning of the onTranslateInstruction event handler:

void GoogleCTFUnbreakable::onTranslateInstruction(ExecutionSignal *signal,
    S2EExecutionState *state, TranslationBlock *tb, uint64_t pc) {
  // The processes to track are declared in the S2E LUA configuration file
  if (!m_procDetector->isTracked(state)) {
    return;
  }

  // ...
}

This takes care of our success and failure paths. What about the rest of our initial analysis? Remember that we were told that the activation code begins with “CTF{”. How can we encode this knowledge in our plugin?

Because we made the product activation string symbolic, we can use the onSymbolicVariableCreation event to wait for the product activation code to become symbolic, and then constrain this variable with our existing knowledge. In GoogleCTFUnbreakable.cpp:

#include <klee/util/ExprTemplatees.h>

// Register our onSymbolicVariableCreation event handler
void GoogleCTFUnbreakable::initialize() {
  s2e()->getCorePlugin()->onSymbolicVariableCreation.connect(
    sigc::mem_fun(*this, &GoogleCTFUnbreakable::onSymbolicVariableCreation));

  // ...
}

void GoogleCTFUnbreakable::onSymbolicVariableCreation(S2EExecutionState *state,
    const std::string &name, const std::vector<klee::ref<klee::Expr>> &expr,
    const klee::MemoryObject *mo, const klee::Array *array) {
  // This check is not strictly required, because we only have one symbolic
  // variable in the analysis.
  //
  // The first program argument made symbolic with the S2E_SYM_ARGS
  // environment variable will have the name "arg1".
  if (name != "arg1") {
    return;
  }

  // We know that the product activation key starts with "CTF{". We encode
  // this information as KLEE constraints
  state->constraints.addConstraint(E_EQ(expr[0], E_CONST('C', klee::Expr::Int8)));
  state->constraints.addConstraint(E_EQ(expr[1], E_CONST('T', klee::Expr::Int8)));
  state->constraints.addConstraint(E_EQ(expr[2], E_CONST('F', klee::Expr::Int8)));
  state->constraints.addConstraint(E_EQ(expr[3], E_CONST('{', klee::Expr::Int8)));
}

Encoding this information in the form of additional constraints helps to speed up symbolic execution. Now the constraint solver will not waste time generating solutions which we know are not viable (e.g. activation codes beginning with “ABCD”, “1337”, etc.).

We could have also encoded additional constraints for our solution. For example, we can assume that none of the remaining characters is a NULL terminator.

for (unsigned i = 4; i < expr.size(); ++i) {
  state->constraints.addConstraint(E_NEQ(expr[i], E_CONST('\0', klee::Expr::Int8)));
}

An alternate set of constraints could be that the other remaining characters are printable ASCII characters.

for (unsigned i = 4; i < expr.size(); ++i) {
  state->constraints.addConstraint(E_GE(expr[i], E_CONST(' ', klee::Expr::Int8)));
  state->constraints.addConstraint(E_LE(expr[i], E_CONST('~', klee::Expr::Int8)));
}

In practice I found that these additional constraints had a significant impact on performance. I will discuss this later.

The final step is to complete the onSuccess method. This involves solving the constraints accumulated during symbolic execution and displaying the result.

#include <cctype>
#include <sstream>

void GoogleCTFUnbreakable::onSuccess(S2EExecutionState *state, uint64_t pc) {
  // `results` is a vector containing pairs of strings and a vector of bytes.
  // The string corresponds to the symbolic variable's name while the vector
  // of bytes is the actual solution
  std::vector<std::pair<std::string, std::vector<unsigned char>>> results;

  // Invoke the constraint solver
  if (!s2e()->getExecutor()->getSymbolicSolution(*state, results)) {
    getWarningsStream(state) << "Unable to generate a solution for the product activation code\n";
    exit(1);
  }

  // Since we only have a single symbolic variable, we will only have a single
  // result. We then iterate over the bytes in this result to print the solution
  std::stringstream ss;
  for (auto c : results[0].second) {
    if (!std::isprint(c)) {
      break;
    }
    ss << (char) c;
  }

  getInfoStream(state) << "Product activation code = " << ss.str() << "\n";

  // No need to continue running S2E — terminate
  exit(0);
}

With the plugin complete, we need to ensure that it is compiled with the other S2E plugins. To do this, add the following to libs2eplugins/src/CMakeLists.txt:

src/Plugins/GoogleCTFUnbreakable.cpp

S2E will need to be rebuilt to ensure that our plugin is compiled and available for the analysis. Finally, we need to rebuild S2E with our plugin.

s2e build --clean-target libs2e

Capturing the flag

With the plugin complete, we can use it to solve the challenge. First, the plugin must be enabled in our project’s S2E config file. By default, s2e new_project will generate a s2e-config.lua file with a standard set of plugins enabled and configured. However, most of them are not needed for this challenge. At a minimum the following plugins must be enabled:

  • GoogleCTFUnbreakable
  • BaseInstructions
  • HostFiles
  • LinuxMonitor
  • ProcessExecutionDetector

Finally, we can run S2E with the launch-s2e.sh script. After a short while S2E will terminate and the following message will be displayed:

Product activation code = CTF{0The1Quick2Brown3Fox4Jumped5Over6The7Lazy8Fox9}

Performance

Let’s return to the discussion on (1) the FunctionModels plugin and (2) the performance impact caused by enforcing additional constraints on the product activation code.

Using the FunctionModels plugin

The FunctionModels plugin attempts to reduce path explosion. s2e-env analyzes the binary when the new_project command is executed and determines that the binary imports the strncpy function, for which a model exists.

strncpy is typically implemented as a loop over the input string until either a NULL terminator is found or n characters have been copied. By analyzing the binary in a disassembler we can see that this input string is the product activation key, which we made symbolic. Looping over a symbolic string will result in a greater number of states forked, because any of the 63 characters following the “CTF{” prefix could contain a NULL terminator.

By enabling the FunctionModels plugin, the strncpy loop will be replaced by a single symbolic expression, reducing the total number of states forked. However, this comes with some performance trade-offs that I will discuss later.

To enable the FunctionModels plugin, add the following to s2e-config.lua:

add_plugin("FunctionModels")

Measuring performance

With the flag under our belt, we can now explore the various performance trade-offs that must be taken into account when we (1) apply various constraints to the product activation code and (2) use the FunctionModels plugin. Results are given in the tables below — one for execution time and one for the number of states forked.

For the “explore all states” results I removed the call to exit in the onSuccess method, even after capturing the flag (in practice this is unnecessary). For reference I also give the Angr results (as quoted from their source code) and Manticore results (from running it myself).

Execution times
Description Terminate on solution (secs) Explore all states (secs)
No additional constraints on the input string 12 240
No NULL terminator within the input string 12 12
All characters must be printable ASCII 3,180 3,180
Using FunctionModels 13 13
Angr 4.5 N/A
Manticore 60 N/A
No. of states forked
Description Terminate on solution (no. states) Explore all states (no. states)
No additional constraints on the input string 111 1720
No NULL terminator within the input string 49 49
All characters must be printable ASCII 44 44
Using FunctionModels 49 49
Angr N/A N/A
Manticore N/A N/A

The first set of results are our baseline. The only constraints that we impose are that the product activation code begins with “CTF{”; the remaining 63 characters are unconstrained. Of the 111 states forked before the solution is found, 62 of them occurred within strncpy. These 62 states correspond to the NULL terminator being located within one of the remaining 63 characters (e.g. “CTF{\0”, “CTF{x\0”, “CTF{xxxxxxxx\0”, etc.). However, the depth-first search (DFS) strategy means that none of these 62 states is scheduled for execution before the solution is found. The total number of states quickly explodes otherwise.

Now look at the performance after we constrain the remaining 63 characters of the product activation code to exclude the NULL terminator. The total number of states is reduced to 49. While this has negligible impact on execution time (when we terminate after the solution is found — again due to DFS), the total number of states no longer explodes.

Performance degrades significantly when we over-constrain the product activation code to only contain printable ASCII characters. While the total number of states is reduced, the bottleneck is simply transferred from state exploration to the constraint solver.

NOTE: To view the symbolic formulae during execution the --verbose-fork-info can be added to the kleeArgs table in s2e-config.lua.

Finally, we find that the FunctionModels plugin reduces the state space to the same levels as when we excluded NULL terminators from the product activation code. However, like when we over-constrain, the formula that the constraint solver must solve grows increasingly complex after the strncpy model is applied and the constraint solver once again becomes the bottleneck.

I personally find this trade-off between state space and constraint formulae complexity quite interesting. AFAIK there are no automatic ways to determine this trade-off; it can only be found through experimentation.

Conclusion

Compared to the Angr and Manticore solutions, solving this challenge with S2E may seem overly complex. There are a variety of reasons for this: S2E runs a full-system emulator, so the user must handle the full software stack (OS kernel, libraries, drivers, etc.); S2E is written in C++; and S2E is built on top of many different tools and frameworks (KLEE, QEMU, LLVM, etc.), each with their own APIs.

Ultimately it is a matter of selecting the correct tool for the job. For CTF-style challenges, where the program is largely self-contained and has very little interaction with the rest of the system (e.g. the OS, libraries such as libc, etc.), using full-system emulation is probably overkill (however, as I’ve hopefully demonstrated, it can be done!). However for more complex software (e.g. device drivers, software with more complex interaction with the system, etc.), S2E may be the more suitable choice.