Analysing "Trigger-based" Malware with S2E

󰃭 2018-09-02

Introduction

This blog post is a quick brain-dump of the work that I was doing during my last month in the Dependable Systems Lab at EPFL. At the time I was working on malware analysis with S2E. While not anything earth-shatteringly novel, I’m hopeful that this post will help others who want to use symbolic execution/S2E to analyse malware behaviour.

What makes malware analysis different?

My previous blog posts have looked at solving a CTF challenge and analysing file parsers. These programs had two things in common:

They were Linux ELF executables; and
Program input was specified by the user — either via STDIN or from a file that was read from disk.

In contrast, most malware:

Targets Windows (while some reports suggest that Android malware is on the rise, Windows remains the primary target for malware authors); and
Does not have a well defined input source. Input could come from command-line arguments, but this is uncommon. Input is more likely to come from registry keys, network data, etc.

For these reasons, analysing malware in S2E is not as simple as making command-line arguments symbolic, or feeding the program a symbolic file. This blog post will walk through the S2E-based tools that we developed for malware analysis, followed by two “case studies”. As usual, if you wish to play along at home you can find all of the code on Github.

Analysing Windows software in S2E

Up until now, we have only analysed Linux programs. Fortunately, S2E also supports the analysis of Windows programs. So what’s the difference?

When building a Windows guest image, a Windows ISO must be provided to the image_build command. ISOs for all versions of Windows supported by S2E (listed here) can be downloaded from MSDN. It is also possible to add support for other versions if required. For this post we’ll use Windows 7 Professional 32-bit.
There is no equivalent to s2e.so on Windows. Therefore, we’ll need an alternative approach to inject symbolic data into our malware. We could write an S2E plugin to do this, but this is complex. Instead, we’ll use DLL injection in the guest to hook Windows API calls and inject symbolic data through these hooks.

Hooking the Windows API

There are many different techniques for hooking the Windows API. We’ll use an “off the shelf” solution rather than (re)inventing a new one. When I first started this work, I wanted to reuse Cuckoo Sandbox’s Monitor for API hooking (as it was designed for malware analysis). However, we decided to use EasyHook instead, primarily because it required less work to get started with.

Before we dive into some code, here’s an overview of what we are going to build:

malware-inject: A program that will start other programs (e.g. malware) and inject a DLL into the newly-started process’ address space; and
malware-hook: A DLL that is injected into a process’ address space via malware-inject. This DLL will hook key functions from the Windows API, providing us with a mechanism to inject symbolic data.

Now let’s dive into some code!

We’ll start by opening $S2EDIR/source/s2e/guest/windows/s2e.sln in Visual Studio and creating two new projects:

malware-inject: A Win32 console application; and
malware-hook: A Win32 DLL.

Both projects require the EasyHook native package, installable via Nuget. Note that in the Github repo the malware-hook project is split into GetLocalTime-hook and wannacry-hook projects (our two case studies).

`malware-inject`

malware-inject is based on EasyHook’s example injector application. However, instead of using RhInjectLibrary (which injects a DLL into an already-running process), we’ll use RhCreateAndInject. This function starts an application in a suspended state, injects a DLL and then resumes the suspended process. malware-inject will also wait for the injected process to complete before returning. This is useful because it prevents S2E killing states when the malware-inject process exits.

Create inject.c and add the following code to it:

#include <stdio.h>
#include <string.h>

#include <Shlwapi.h>
#include <Windows.h>

#include <easyhook.h>

// We must add this header file to support writing to S2E's logs. s2e.h resides
// in the libcommon project, so the libcommon project must be added as a
// dependency to the malware-inject project
#define USER_APP
#include <s2e/s2e.h>

#define S2E_MSG_LEN 512
#define MAX_PATH_LEN 256

static INT s2eVersion = 0;

static void Message(LPCSTR fmt, ...) {
  CHAR message[S2E_MSG_LEN];
  va_list args;

  va_start(args, fmt);
  vsnprintf(message, S2E_MSG_LEN, fmt, args);
  va_end(args);

  if (s2eVersion) {
    S2EMessageFmt("[malware-inject] %s", message);
  } else {
    printf("[malware-inject] %s", message);
  }
}

static void GetFullPath(LPCWSTR path, PWCHAR fullPath) {
  if (!path) {
    Message("Path has not been provided\n");
    exit(1);
  }

  if (!PathFileExistsW(path)) {
    Message("Invalid path %S has been provided\n", path);
    exit(1);
  }

  if (!GetFullPathNameW(path, MAX_PATH_LEN, fullPath, NULL)) {
    Message("Unable to get full path of %S\n", path);
    exit(1);
  }
}

int main() {
  INT argc;
  LPWSTR *argv = CommandLineToArgvW(GetCommandLineW(), &argc);

  if (argc < 5) {
    printf("Usage: %S [options..]\n"
           "   --dll <dll>       Path to DLL to inject into the application\n"
           "   --app <target>    Path to application to start\n"
           "   --timeout <time>  Timeout value in milliseconds "
           "(infinite if not provided)\n", argv[0]);
    exit(1);
  }

  // Used by the Message function to decide where to write output to
  s2eVersion = S2EGetVersion();

  LPWSTR dllPath = NULL;
  WCHAR fullDllPath[MAX_PATH_LEN];

  LPWSTR appPath = NULL;
  WCHAR fullAppPath[MAX_PATH_LEN];

  DWORD timeout = INFINITE;

  for (int i = 1; i < argc; ++i) {
    if (wcscmp(argv[i], L"--dll") == 0) {
      dllPath = argv[++i];
      continue;
    }

    if (wcscmp(argv[i], L"--app") == 0) {
      appPath = argv[++i];
      continue;
    }

    if (wcscmp(argv[i], L"--timeout") == 0) {
      timeout = wcstoul(argv[++i], NULL, 10);
      continue;
    }

    Message("Unsupported argument: %s\n", argv[i]);
    exit(1);
  }

  // Check that the given paths are valid
  GetFullPath(dllPath, fullDllPath);
  GetFullPath(appPath, fullAppPath);

  // Start the target application (in a suspended state) and inject the given DLL
  ULONG pid;
  NTSTATUS result = RhCreateAndInject(appPath, L"", CREATE_SUSPENDED,
    EASYHOOK_INJECT_DEFAULT,
#if defined(_M_IX86)
    dllPath, NULL,
#elif defined(_M_X64)
    NULL, dllPath,
#else
    #error "Platform not supported"
#endif
    NULL, 0, &pid);

  if (FAILED(result)) {
    Message("RhCreateAndInject failed: %S\n", RtlGetLastErrorString());
    exit(1);
  }

  Message("Successfully injected %S into %S (PID=0x%x)\n", fullDllPath,
    fullAppPath, pid);

  DWORD exitCode = 1;

  // Get a handle to the newly-created process and wait for it to terminate.
  // Once the process has terminated, get its return code and return that as
  // our return code
  HANDLE hProcess = OpenProcess(SYNCHRONIZE | PROCESS_QUERY_INFORMATION,
    FALSE, pid);
  if (hProcess) {
    WaitForSingleObject(hProcess, timeout);
    GetExitCodeProcess(hProcess, &exitCode);
    CloseHandle(hProcess);
  } else {
    Message("Unable to open process 0x%x: 0x%X\n", pid, GetLastError());
  }

  return exitCode;
}

Of course, it is entirely possible that the malware will be watching for API hooks (we are dealing with malicious software after all!). Whilst an important issue, we won’t deal with it in this post.

Now that we’ve written the tool to run our malware with an injected DLL, let’s turn our attention to what this DLL actually does.

`malware-hook`

Likewise, we’ll base malware-hook on EasyHook’s example BeepHook DLL. Here is the skeleton of our hook DLL, which we’ll put in malware-hook.cpp:

#include <Windows.h>
#include <strsafe.h>

#include <easyhook.h>

#define USER_APP
extern "C" {
#include <s2e/s2e.h>
}

#define S2E_MSG_LEN 512

static INT s2eVersion = 0;

static void Message(LPCSTR fmt, ...) {
  CHAR message[S2E_MSG_LEN];
  va_list args;

  va_start(args, fmt);
  vsnprintf(message, S2E_MSG_LEN, fmt, args);
  va_end(args);

  if (s2eVersion) {
    S2EMessageFmt("[0x%x|malware-hook] %s", GetCurrentProcessId(), message);
  } else {
    printf("[0x%x|malware-hook] %s", GetCurrentProcessId(), message);
  }
}

// EasyHook will be looking for this export to support DLL injection. If not
// found then DLL injection will fail
extern "C" void __declspec(dllexport) __stdcall NativeInjectionEntryPoint(REMOTE_ENTRY_INFO *);

void __stdcall NativeInjectionEntryPoint(REMOTE_ENTRY_INFO *inRemoteInfo) {
  // Unused
  (void*) inRemoteInfo;

  // Used by the Message function to decide where to write output to
  s2eVersion = S2EGetVersion();

  // TODO initialize hooks

  // The process was started in a suspended state. Wake it up...
  RhWakeUpProcess();
}

So, what are we going to hook? Let’s take some inspiration from two great papers on this topic: David Brumley’s “Automatically Identifying Trigger-based Behaviour in Malware” and Andreas Moser’s “Exploring Multiple Execution Paths for Malware Analysis”. Both of these papers look at “trigger-based malware”, which is malware whose malicious actions only occur under specific circumstances; i.e. when certain trigger conditions are met. For example, malware may only launch its payload on a specific date (as the MyDoom worm did), or upon receiving specific data from a command & control server. In these two examples, the trigger sources are the current date/time and data read from a network. Other trigger sources include (as listed in Moser’s paper):

Internet connectivity;
Mutex objects;
Existence of files;
Existence of Registry entries; and
Data read from a file.

How can we analyse trigger-based malware? Brumley’s paper proposed Minesweeper, a tool designed to detect the existence of trigger-based behaviours and to find inputs that exercise these behaviours. As far as I can tell, Minesweeper was never publicly released. However, we can build a very similar system in S2E using our malware-hook DLL! So let’s go ahead and create hooks for some of the trigger sources discussed in these two papers.

Case study 1: `GetLocalTime-test`

The first trigger source that Brumley’s paper explores is GetLocalTime. GetLocalTime has the following prototype:

void WINAPI GetLocalTime(
  _Out_ LPSYSTEMTIME lpSystemTime
);

In Minesweeper, the user was required to specify where in memory the trigger inputs will be stored. This was so the symbolic execution engine could properly assign symbolic variables during execution. In the case of GetLocalTime, this would require specifying that GetLocalTime stores its result in a 16-byte structure pointed to by a stack value when GetLocalTime is called. Fortunately, we don’t have to worry about these low-level details. Instead, we can just call S2EMakeSymbolic on the variable we pass to GetLocalTime. Here’s how we do this in malware-hook:

// Function hooks

static void WINAPI GetLocalTimeHook(LPSYSTEMTIME lpSystemTime) {
  Message("Intercepted GetLocalTime\n");

  // Call the original GetLocalTime to get a concrete value
  GetLocalTime(lpSystemTime);

  // Make the value concolic
  S2EMakeSymbolic(lpSystemTime, sizeof(*lpSystemTime), "SystemTime");
}

// The names of the functions to hook (and the library they belong to)
static LPCSTR functionsToHook[][2] = {
  { "kernel32", "GetLocalTime"} ,
  { NULL, NULL },
};

// The function hooks that we will install
static PVOID hookFunctions[] = {
  GetLocalTimeHook,
};

// The actual hooks
static HOOK_TRACE_INFO hooks[] = {
  { NULL },
};

// This function was defined previously
void __stdcall NativeInjectionEntryPoint(REMOTE_ENTRY_INFO *inRemoteInfo) {
  // ...

  // Replace the previous TODO with the following code to install the GetLocalTime hook
  for (unsigned i = 0; functionsToHook[i][0] != NULL; ++i) {
    LPCSTR moduleName = functionsToHook[i][0];
    LPCSTR functionName = functionsToHook[i][1];

    // Install the hook
    NTSTATUS result = LhInstallHook(
      GetProcAddress(GetModuleHandleA(moduleName), functionName),
      hookFunctions[i],
      NULL,
      &hooks[i]);

    if (FAILED(result)) {
      Message("Failed to hook %s.%s: %S\n", moduleName, functionName,
              RtlGetLastErrorString());
    } else {
      Message("Successfully hooked %s.%s\n", moduleName, functionName);
    }

    // Ensure that all threads _except_ the injector thread will be hooked
    ULONG ACLEntries[1] = { 0 };
    LhSetExclusiveACL(ACLEntries, 1, &hooks[i]);
  }

  // ...
}

Let’s implement the running example that Brumley et al. used in their paper (Fig. 1.1) to test that everything works as expected.

#include <Windows.h>

void ddos (LPCSTR target) {
  // DDOS code goes here :)
}

int main() {
  SYSTEMTIME systime;
  LPCSTR site = "www.usenix.org";

  GetLocalTime(&systime);

  if (9 == systime.wDay) {
    if (10 == systime.wHour) {
      if (11 == systime.wMonth) {
        if (6 == systime.wMinute) {
          ddos(site);
        }
      }
    }
  }

  return 0;
}

Ensure that you compile everything for the x86 platform (since we’ll be using a 32-bit Windows 7 VM). Once everything is built (including the VM!), we can create a new project:

s2e new_project -i windows-7sp1pro-i386 /path/to/malware-s2e/GetLocalTime-test/Debug/GetLocalTime-test.exe

Note that this will create a bootstrap.sh that executes GetLocalTime-test.exe directly. We must modify bootstrap.sh to have malware-inject.exe execute GetLocalTime-test.exe instead. To do this we’ll need access to our hooking tools from within the VM. We can do this by executing the following command in our S2E environment to create the necessary symbolic links in our project directory:

cd $S2EDIR/projects/GetLocalTime-test
HOOK_FILES="EasyHook32.dll malware-hook.dll malware-inject.exe"
for FILE in $HOOK_FILES; do
  ln -s $S2EDIR/source/s2e/guest/windows/Debug/$FILE $FILE
done

And then edit bootstrap.sh as follows:

# ...

# The target does not get executed directly - we execute it via malware-inject
function execute_target {
  local TARGET
  TARGET="$1"

  ./malware-inject.exe --dll "./malware-hook.dll" --app ${TARGET}
}

# ...

# We also need to download the files required for hooking

# Download the target file to analyze
${S2EGET} "GetLocalTime-test.exe"

${S2EGET} "EasyHook32.dll"
${S2EGET} "malware-hook.dll"
${S2EGET} "malware-inject.exe"

# ...

Finally, we can disable the following plugins in s2e-config.lua (they are not required):

WebServiceInterface
KeyValueStore
MultiSearcher
CUPASearcher
StaticFunctionModels

We are now ready to run our analysis!

Results

We should see S2E fork four times during our analysis. If we enable the --verbose-fork-info KLEE argument (in s2e-config.lua) we can see the constraints generated at each of these four fork points. The following image shows a disassembly with these points highlighted.

ReadLSB w16 X SystemTime can be understood as “read 16 bits (i.e. one WORD) at offset X in the symbolic SystemTime variable. If we look up the SYSTEMTIME struct on MSDN we will see that each WORD at these offsets (0x6, 0x8, 0x2, 0xA) corresponds with the wDay, wHour, wMonth and wMinute fields respectively - just as expected. Finally, we should find a line in debug.txt containing the following test case (I’ve reformatted the line and added the field names to make it easier to read):

TestCaseGenerator:  v0_SystemTime_0 = {0x0, 0x0, /* wYear */
                                       0xb, 0x0, /* wMonth */
                                       0x0, 0x0, /* wDayOfWeek */
                                       0x9, 0x0, /* wDay */
                                       0xa, 0x0, /* wHour */
                                       0x6, 0x0, /* wMinute */
                                       0x0, 0x0, /* wSecond */
                                       0x0, 0x0} /* wMilliseconds */

If we cross-reference this against GetLocalTime-test/test.c we can see that this is the time to launch the DDOS. Success!

Case study 2: WannaCry

That was nice, but malware has moved on since the Minesweeper paper was written in 2007. Let’s look at something a bit more recent - the WannaCry ransomware. WannaCry famously contained a “killswitch” that stopped the ransomware from encrypting the target’s data. This killswitch was a check for whether a gibberish URL led to a live webpage. WannaCry would shut down if this URL could be reached (this check was probably done to fool dynamic analysis tools, which are typically configured to return valid, dummy responses to all network queries). With this in mind, let’s use S2E to explore WannaCry’s behaviour when this trigger condition is and isn’t satisfied. We’ll focus on the sample discussed in Amanda Rousseau’s excellent writeup (MD5 hash db349b97c37d22f5ea1d1841e3c89eb4).

Disassembly

Let’s take a quick look at the WannaCry killswitch in a disassembler.

We can see that the WinINet API is used to open a connection to the killswitch URL (hxxp://www[.]iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea[.]com). The following functions are called to do this:

InternetOpenA: Initializes the WinINet system. Returns an HINTERNET handle on success, or NULL on failure.
InternetOpenUrlA: Using the handle returned by InternetOpenA, open a resource specified by the given URL. Returns an HINTERNET handle on success, or NULL on failure.
InternetCloseHandle: Close the handles opened by InternetOpenA and InternetOpenUrlA.

At a minimum we must hook InternetOpenUrlA and force a fork to explore both paths at 0x4081a5. What about InternetOpenA? We can see in the WannaCry code that the HINTERNET handle returned by InternetOpenA is never checked, so we don’t have to worry about this function. If the returned handle was (properly) checked, we may have needed to hook InternetOpenA and force it to return some dummy, non-NULL value. Similarly, if we were interested in the code executed when InternetOpenA fails, we could also force a fork on some symbolic value. However, for simplicity we’ll just focus on InternetOpenUrlA. Let’s write some more code!

WinINet hooks

First, replace the hooked functions in malware-hook.cpp with the following:

static LPCSTR functionsToHook[][2] = {
  { "wininet", "InternetOpenUrlA" },
  { "wininet", "InternetCloseHandle" },
  { NULL, NULL },
};

static PVOID hookFunctions[] = {
  InternetOpenUrlAHook,
  InternetCloseHandleHook,
};

static HOOK_TRACE_INFO hooks[] = {
  { NULL },
  { NULL },
}

Then write the actual hook functions:

/// Keep track of dummy Internet handles that we've created
static std::set<HINTERNET> dummyHandles;

static HINTERNET WINAPI InternetOpenUrlAHook(
  HINTERNET hInternet,
  LPCSTR lpszUrl,
  LPCSTR lpszHeaders,
  DWORD dwHeadersLength,
  DWORD dwFlags,
  DWORD_PTR dwContext
) {
  Message("Intercepted InternetOpenUrlA(%p, %s, %s, 0x%x, 0x%x, %p)\n",
    hInternet, lpszUrl, lpszHeaders, dwHeadersLength, dwFlags, dwContext);

  // Force a fork via a symbolic variable. Since both branches are feasible,
  // both paths are taken
  UINT8 returnResource = S2ESymbolicChar("hInternet", 1);
  if (returnResource) {
    // Explore the program when InternetOpenUrlA "succeeds" by returning a
    // dummy resource handle. Because we know that the resource handle is never
    // used, we don't have to do anything fancy to create it.
    // However, we will need to keep track of it so we can free it when the
    // handle is closed.
    HINTERNET resourceHandle = (HINTERNET) malloc(sizeof(HINTERNET));

    // Record the dummy handle so we can clean up afterwards
    dummyHandles.insert(resourceHandle);

    return resourceHandle;
  } else {
    // Explore the program when InternetOpenUrlA "fails"
    return NULL;
  }
}

static BOOL WINAPI InternetCloseHandleHook(HINTERNET hInternet) {
  Message("Intercepted InternetCloseHandle(%p)\n", hInternet);

  std::set<HINTERNET>::iterator it = dummyHandles.find(hInternet);

  if (it == dummyHandles.end()) {
    // The handle is not one of our dummy handles, so call the original
    // InternetCloseHandle function
    return InternetCloseHandle(hInternet);
  } else {
    // The handle is a dummy handle. Free it
    free(*it);
    dummyHandles.erase(it);

    return TRUE;
  }
}

Here we follow the approach taken in S2E’s multi-path fault injection tutorial. The returnResource symbolic variable forces a fork, resulting in one state where InternetOpenUrlA succeeds (by returning a dummy resource) and another state where InternetOpenUrlA fails (by returning NULL). We can return a dummy resource handle because the InternetOpenUrlA handle is never actually used: remember, WannCry only checks if it is NULL. The InternetCloseHandle hook then cleans up the allocated memory. Now let’s hook and run WannaCry in S2E.

Initial results

We can follow the same procedure that we used for GetLocalTime-test to set up an S2E project for WannaCry. Remember to make symbolic links to EasyHook32.dll, malware-hook.dll and malware-inject.exe and s2eget them in the bootstrap script.

Before running S2E, enable the LibraryCallMonitor plugin in s2e-config.lua. This plugin monitors and logs external library function calls, which gives us a better picture of what WannaCry is doing. When you run S2E, you should see a fork in malware-hook’s address space (likely hidden amongst a lot of debug output produced by LibraryCallMonitor). If you follow the library calls made by the WannaCry executable (instead of all the other DLLs loaded in its address space), you should see the following library calls in state 0:

Address	DLL	Function
0x4081bc	wininet	InternetCloseHandle
0x4081bf	wininet	InternetCloseHandle
0x409b4e	msvcrt	exit

While in state 1 you should see:

Address	DLL	Function
0x4081a7	wininet	InternetCloseHandle
0x4081ab	wininet	InternetCloseHandle
0x40809f	kernel32	GetModuleFileNameA
0x4080a5	msvcrt	__p___argc
0x407c56	msvcrt	sprintf
0x407c68	advapi32	OpenSCManagerA
0x407c9b	advapi32	CreateServiceA
0x407cb2	advapi32	StartServiceA
…
0x407d74	kernel32	FindResourceA
0x407d86	kernel32	LoadResource
0x407d95	kernel32	LockResource
0x407da9	kernel32	SizeofResource
…
0x407ee8	kernel32	CreateProcessA
…

This looks good: we have successfully explored WannaCry’s behaviour when the killswitch was and wasn’t triggered. Rousseau’s writeup outlines WannaCry’s execution flow, and if we follow state 1’s library calls we should see that the execution flows match.

Hooking process creation

Let’s write one last hook. What happens if our hooked process spawns a new process? This is pretty common for “dropper” malware, and indeed WannaCry does this by loading an executable (tasksche.exe) from a resource, writing it to disk and then running it (via CreateProcessA). When this happens, we are totally blind to what this new process is doing: both in terms of injecting symbolic data via our hooks and tracking its behaviour with S2E (e.g. via the LibraryCallMonitor plugin).

We can solve the former (losing our ability to inject symbolic data into the new process) by hooking CreateProcessA and using the EasyHook API to inject malware-hook into this new process. The following code achieves this:

// Don't forget to add CreateProcessA to the functionsToHook, hookFunctions and
// hooks arrays

BOOL WINAPI CreateProcessAHook(
  LPCSTR                lpApplicationName,
  LPSTR                 lpCommandLine,
  LPSECURITY_ATTRIBUTES lpProcessAttributes,
  LPSECURITY_ATTRIBUTES lpThreadAttributes,
  BOOL                  bInheritHandles,
  DWORD                 dwCreationFlags,
  LPVOID                lpEnvironment,
  LPCSTR                lpCurrentDirectory,
  LPSTARTUPINFOA        lpStartupInfo,
  LPPROCESS_INFORMATION lpProcessInformation
) {
  Message("Intercepted CreateProcessA(%s, %s, %p, %p, %d, %d, %p, %s, %p, %p)",
    lpApplicationName, lpCommandLine, lpProcessAttributes,
    lpThreadAttributes, bInheritHandles, dwCreationFlags, lpEnvironment,
    lpCurrentDirectory, lpStartupInfo, lpProcessInformation);

  // Get this DLL's path
  HMODULE hDll = NULL;
  DWORD hModFlags = GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
    GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT;
  if (!GetModuleHandleEx(hModFlags, (LPCTSTR)&Message, &hDll)) {
    Message("Failed to retrive DLL handle: 0x%X\n", GetLastError());
    goto default_create_process;
  }

  WCHAR dllPath[MAX_PATH_LEN];
  if (!GetModuleFileNameW(hDll, dllPath, MAX_PATH_LEN)) {
    Message("Failed to retrive DLL path: 0x%X\n", GetLastError());
    goto default_create_process;
  }

  // Create the new process, but force it to be created in a suspended state
  if (!CreateProcessA(lpApplicationName, lpCommandLine, lpProcessAttributes,
      lpThreadAttributes, bInheritHandles, dwCreationFlags | CREATE_SUSPENDED,
      lpEnvironment, lpCurrentDirectory, lpStartupInfo, lpProcessInformation)) {
    Message("Failed to create suspended process: 0x%X\n", GetLastError());
    goto default_create_process;
  }

  // Inject ourselves into the new, suspended process.
  // NativeInjectionEntryPoint will call RhWakeupProcess, which will kick
  // ourselves out of the suspended state
  NTSTATUS result = RhInjectLibrary(lpProcessInformation->dwProcessId,
    lpProcessInformation->dwThreadId, EASYHOOK_INJECT_DEFAULT,
#if defined(_M_IX86)
    dllPath, NULL,
#elif defined(_M_X64)
    NULL, dllPath,
#else
#error "Platform not supported"
#endif
    NULL, 0);

  if (FAILED(result)) {
    Message("RhInjectLibrary failed: %S\n", RtlGetLastErrorString());
    goto default_create_process;
  }

  Message("Successfully injected %S into %s %s (PID=0x%x)\n", dllPath,
    lpApplicationName, lpCommandLine, lpProcessInformation->dwProcessId);

  return TRUE;

default_create_process:
  return CreateProcessA(lpApplicationName, lpCommandLine, lpProcessAttributes,
    lpThreadAttributes, bInheritHandles, dwCreationFlags, lpEnvironment,
    lpCurrentDirectory, lpStartupInfo, lpProcessInformation);
}

This hook will start the new process in a suspended state and inject itself into the new process. malware-hook’s NativeInjectionEntryPoint function is then responsible for waking the process up.

This solves the problem of injecting symbolic data into a new process started by WannaCry. What about tracking this new process’ behaviour in S2E? Unfortunately, this requires a bit more work. One approach could be to write an S2E plugin that listened for OSMonitor’s onProcessLoad signal. If a new process was found to originate from the WannaCry process, we could add the new child process to ProcessExecutionDetector’s tracked modules. LibraryCallMonitor would then start emitting onLibraryCall events for this new process, allowing us to track its behaviour too. Because I wanted to avoid writing S2E plugins in this post, I’ll leave this “as an exercise for the reader”.

One last problem exists: The original WannaCry process terminates after it starts tasksche.exe. This causes malware-inject to also terminate (remember it calls WaitForSingleObject), leading to bootstrap.sh killing the current (and only active) state. Unfortunately, this means that S2E will terminate before we get to see WannaCry do something interesting (like encrypt our data). The hacky way to fix this: add a sleep command after the call to execute in bootstrap.sh (don’t forget to set an appropriate amount of time to sleep for). This is hacky because it means that we’ll waste time sleeping in state 0 after WannaCry exits (and does nothing interesting). A better approach is to wait for tasksche.exe (and any other child processes) to terminate. Let’s add a function to do this:

// Set a sensible timeout value (in milliseconds). Can also be INFINITE
#define CHILD_PROCESS_TIMEOUT 10 * 1000

/// Keep track of child proceses (such as tasksche.exe)
static std::set<DWORD> childPids;

static BOOL WaitForChildProcesses(DWORD timeout) {
  bool retCode = TRUE;

  if (childPids.size() > 0) {
    // Convert the set of PIDS to a list of handles with the appropriate permissions
    std::vector<HANDLE> childHandles;
    for (DWORD pid : childPids) {
      Message("Getting handle to process 0x%x\n", pid);
      HANDLE childHandle = OpenProcess(SYNCHRONIZE | PROCESS_QUERY_INFORMATION,
        FALSE, pid);
      if (childHandle) {
        childHandles.push_back(childHandle);
      } else {
        Message("Unable to open child process 0x%x: 0x%X\n", pid, GetLastError());
        return FALSE;
      }
    }

    // Wait for the processes to terminate
    Message("Waiting %d ms for %d children processes to terminate...\n",
      timeout, childHandles.size());
    DWORD waitRes = WaitForMultipleObjects(childHandles.size(),
      childHandles.data(), TRUE, timeout);
    switch (waitRes) {
      case WAIT_FAILED:
        Message("Failed to wait for child processes: 0x%X\n", GetLastError());
        retCode = FALSE;
        break;
      case WAIT_TIMEOUT:
        Message("Timeout - not all child processes may have terminated\n");
        break;
    }

    // Close all handles
    for (HANDLE handle : childHandles) {
      CloseHandle(handle);
    }
  }

  return retCode;
}

WaitForChildProcesses should be called when the hooked WannaCry process exits. We can do this by adding DLLMain and checking for reason code DLL_PROCESS_DETACH:

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved) {
  switch (fdwReason) {
  // Don't exit until all child processes have terminated (or a timeout is reached)
  case DLL_PROCESS_DETACH:
    return WaitForChildProcesses(CHILD_PROCESS_TIMEOUT);
  }

  return TRUE;
}

Finally, don’t forget to add the following code to CreateProcessAHook to track child processes. The child process should only be saved if it is successfully hooked (i.e. before returning TRUE).

// This function was defined previously
static BOOL WINAPI CreateProcessAHook(
  LPCSTR                lpApplicationName,
  LPSTR                 lpCommandLine,
  LPSECURITY_ATTRIBUTES lpProcessAttributes,
  LPSECURITY_ATTRIBUTES lpThreadAttributes,
  BOOL                  bInheritHandles,
  DWORD                 dwCreationFlags,
  LPVOID                lpEnvironment,
  LPCSTR                lpCurrentDirectory,
  LPSTARTUPINFOA        lpStartupInfo,
  LPPROCESS_INFORMATION lpProcessInformation
) {
  // ...

  // Save the newly-created process' PID
  childPids.insert(lpProcessInformation->dwProcessId);

  return TRUE;

  // ...
}

If you comment out GRAPHICS=-nographic in launch-s2e.sh (to enable the QEMU GUI), you’ll eventually be rewarded with the following (depending on the value chosen for CHILD_PROCESS_TIMEOUT):

Conclusion and next steps

In this post we’ve looked at analysing Windows malware with S2E, essentially recreating David Brumley’s Minesweeper tool in S2E. Unlike programs we’ve looked at in previous posts, we had to come up with some new techniques to inject symbolic data into our Windows programs. We used EasyHook to hook “trigger” functions that are commonly used by malware to hide their behaviour. While this approach worked well for our two case studies (which were admittedly highly contrived), there are many avenues for improvement. These avenues include:

Hooking more of the Windows API. Brumley and Moser describe a number of different trigger sources (e.g. network data, registry keys, etc.) that aren’t covered in this post.
Building more complex hooks. For example, our InternetOpenUrlA hook was overly simplistic - it just returned a dummy handle allocated on the heap. If this handle was later passed to a function like InternetReadFile, we’d have to hook this function as well. This is essentially the “environment modelling” problem inherit in most symbolic execution engines.
Hiding our hooks from the malware being analysed. Some ideas including porting Cuckoo Monitor to S2E or doing everything in an S2E plugin.
A broader study on real malware. Is this type of symbolic execution even helpful for malware analysis? How common is trigger-based malware - can we get away with just doing a dynamic analysis in Cuckoo Sandbox? Are the obfuscation techniques discussed in Banescu’s work on Code Obfuscation Against Symbolic Execution Attacks used by malware authors, and if so how do they affect our analysis?

Hopefully this post gives you the necessary background and tools to go and look at some of these improvements. Maybe one day I’ll even find the time to look at some of them myself!

Edit

21/10/2018: I’ve updated this post with a less-hacky way of waiting for WannaCry’s tasksche.exe to start encrypting data.