Analysing "Trigger-based" Malware with S2E
2018-09-02
Introduction
This blog post is a quick brain-dump of the work that I was doing during my last month in the Dependable Systems Lab at EPFL. At the time I was working on malware analysis with S2E. While not anything earth-shatteringly novel, I’m hopeful that this post will help others who want to use symbolic execution/S2E to analyse malware behaviour.
What makes malware analysis different?
My previous blog posts have looked at solving a CTF challenge and analysing file parsers. These programs had two things in common:
- They were Linux ELF executables; and
- Program input was specified by the user — either via STDIN or from a file that was read from disk.
In contrast, most malware:
- Targets Windows (while some reports suggest that Android malware is on the rise, Windows remains the primary target for malware authors); and
- Does not have a well defined input source. Input could come from command-line arguments, but this is uncommon. Input is more likely to come from registry keys, network data, etc.
For these reasons, analysing malware in S2E is not as simple as making command-line arguments symbolic, or feeding the program a symbolic file. This blog post will walk through the S2E-based tools that we developed for malware analysis, followed by two “case studies”. As usual, if you wish to play along at home you can find all of the code on Github.
Analysing Windows software in S2E
Up until now, we have only analysed Linux programs. Fortunately, S2E also supports the analysis of Windows programs. So what’s the difference?
- When building a Windows guest image, a Windows ISO must be provided to
the
image_build
command. ISOs for all versions of Windows supported by S2E (listed here) can be downloaded from MSDN. It is also possible to add support for other versions if required. For this post we’ll use Windows 7 Professional 32-bit. - There is no equivalent to
s2e.so
on Windows. Therefore, we’ll need an alternative approach to inject symbolic data into our malware. We could write an S2E plugin to do this, but this is complex. Instead, we’ll use DLL injection in the guest to hook Windows API calls and inject symbolic data through these hooks.
Hooking the Windows API
There are many different techniques for hooking the Windows API. We’ll use an “off the shelf” solution rather than (re)inventing a new one. When I first started this work, I wanted to reuse Cuckoo Sandbox’s Monitor for API hooking (as it was designed for malware analysis). However, we decided to use EasyHook instead, primarily because it required less work to get started with.
Before we dive into some code, here’s an overview of what we are going to build:
malware-inject
: A program that will start other programs (e.g. malware) and inject a DLL into the newly-started process’ address space; andmalware-hook
: A DLL that is injected into a process’ address space viamalware-inject
. This DLL will hook key functions from the Windows API, providing us with a mechanism to inject symbolic data.
Now let’s dive into some code!
We’ll start by opening $S2EDIR/source/s2e/guest/windows/s2e.sln
in Visual
Studio and creating two new projects:
malware-inject
: A Win32 console application; andmalware-hook
: A Win32 DLL.
Both projects require the EasyHook native package, installable via
Nuget. Note that in the Github repo the
malware-hook
project is split into GetLocalTime-hook
and wannacry-hook
projects (our two case studies).
malware-inject
malware-inject
is based on EasyHook’s example
injector
application. However, instead of using RhInjectLibrary
(which injects a DLL
into an already-running process), we’ll use RhCreateAndInject
. This function
starts an application in a suspended state, injects a DLL and then resumes the
suspended process. malware-inject
will also wait for the injected process to
complete before returning. This is useful because it prevents S2E killing
states when the malware-inject
process exits.
Create inject.c
and add the following code to it:
#include <stdio.h>
#include <string.h>
#include <Shlwapi.h>
#include <Windows.h>
#include <easyhook.h>
// We must add this header file to support writing to S2E's logs. s2e.h resides
// in the libcommon project, so the libcommon project must be added as a
// dependency to the malware-inject project
#define USER_APP
#include <s2e/s2e.h>
#define S2E_MSG_LEN 512
#define MAX_PATH_LEN 256
static INT s2eVersion = 0;
static void Message(LPCSTR fmt, ...) {
CHAR message[S2E_MSG_LEN];
va_list args;
va_start(args, fmt);
vsnprintf(message, S2E_MSG_LEN, fmt, args);
va_end(args);
if (s2eVersion) {
S2EMessageFmt("[malware-inject] %s", message);
} else {
printf("[malware-inject] %s", message);
}
}
static void GetFullPath(LPCWSTR path, PWCHAR fullPath) {
if (!path) {
Message("Path has not been provided\n");
exit(1);
}
if (!PathFileExistsW(path)) {
Message("Invalid path %S has been provided\n", path);
exit(1);
}
if (!GetFullPathNameW(path, MAX_PATH_LEN, fullPath, NULL)) {
Message("Unable to get full path of %S\n", path);
exit(1);
}
}
int main() {
INT argc;
LPWSTR *argv = CommandLineToArgvW(GetCommandLineW(), &argc);
if (argc < 5) {
printf("Usage: %S [options..]\n"
" --dll <dll> Path to DLL to inject into the application\n"
" --app <target> Path to application to start\n"
" --timeout <time> Timeout value in milliseconds "
"(infinite if not provided)\n", argv[0]);
exit(1);
}
// Used by the Message function to decide where to write output to
s2eVersion = S2EGetVersion();
LPWSTR dllPath = NULL;
WCHAR fullDllPath[MAX_PATH_LEN];
LPWSTR appPath = NULL;
WCHAR fullAppPath[MAX_PATH_LEN];
DWORD timeout = INFINITE;
for (int i = 1; i < argc; ++i) {
if (wcscmp(argv[i], L"--dll") == 0) {
dllPath = argv[++i];
continue;
}
if (wcscmp(argv[i], L"--app") == 0) {
appPath = argv[++i];
continue;
}
if (wcscmp(argv[i], L"--timeout") == 0) {
timeout = wcstoul(argv[++i], NULL, 10);
continue;
}
Message("Unsupported argument: %s\n", argv[i]);
exit(1);
}
// Check that the given paths are valid
GetFullPath(dllPath, fullDllPath);
GetFullPath(appPath, fullAppPath);
// Start the target application (in a suspended state) and inject the given DLL
ULONG pid;
NTSTATUS result = RhCreateAndInject(appPath, L"", CREATE_SUSPENDED,
EASYHOOK_INJECT_DEFAULT,
#if defined(_M_IX86)
dllPath, NULL,
#elif defined(_M_X64)
NULL, dllPath,
#else
#error "Platform not supported"
#endif
NULL, 0, &pid);
if (FAILED(result)) {
Message("RhCreateAndInject failed: %S\n", RtlGetLastErrorString());
exit(1);
}
Message("Successfully injected %S into %S (PID=0x%x)\n", fullDllPath,
fullAppPath, pid);
DWORD exitCode = 1;
// Get a handle to the newly-created process and wait for it to terminate.
// Once the process has terminated, get its return code and return that as
// our return code
HANDLE hProcess = OpenProcess(SYNCHRONIZE | PROCESS_QUERY_INFORMATION,
FALSE, pid);
if (hProcess) {
WaitForSingleObject(hProcess, timeout);
GetExitCodeProcess(hProcess, &exitCode);
CloseHandle(hProcess);
} else {
Message("Unable to open process 0x%x: 0x%X\n", pid, GetLastError());
}
return exitCode;
}
Of course, it is entirely possible that the malware will be watching for API hooks (we are dealing with malicious software after all!). Whilst an important issue, we won’t deal with it in this post.
Now that we’ve written the tool to run our malware with an injected DLL, let’s turn our attention to what this DLL actually does.
malware-hook
Likewise, we’ll base malware-hook
on EasyHook’s example
BeepHook DLL.
Here is the skeleton of our hook DLL, which we’ll put in malware-hook.cpp
:
#include <Windows.h>
#include <strsafe.h>
#include <easyhook.h>
#define USER_APP
extern "C" {
#include <s2e/s2e.h>
}
#define S2E_MSG_LEN 512
static INT s2eVersion = 0;
static void Message(LPCSTR fmt, ...) {
CHAR message[S2E_MSG_LEN];
va_list args;
va_start(args, fmt);
vsnprintf(message, S2E_MSG_LEN, fmt, args);
va_end(args);
if (s2eVersion) {
S2EMessageFmt("[0x%x|malware-hook] %s", GetCurrentProcessId(), message);
} else {
printf("[0x%x|malware-hook] %s", GetCurrentProcessId(), message);
}
}
// EasyHook will be looking for this export to support DLL injection. If not
// found then DLL injection will fail
extern "C" void __declspec(dllexport) __stdcall NativeInjectionEntryPoint(REMOTE_ENTRY_INFO *);
void __stdcall NativeInjectionEntryPoint(REMOTE_ENTRY_INFO *inRemoteInfo) {
// Unused
(void*) inRemoteInfo;
// Used by the Message function to decide where to write output to
s2eVersion = S2EGetVersion();
// TODO initialize hooks
// The process was started in a suspended state. Wake it up...
RhWakeUpProcess();
}
So, what are we going to hook? Let’s take some inspiration from two great papers on this topic: David Brumley’s “Automatically Identifying Trigger-based Behaviour in Malware” and Andreas Moser’s “Exploring Multiple Execution Paths for Malware Analysis”. Both of these papers look at “trigger-based malware”, which is malware whose malicious actions only occur under specific circumstances; i.e. when certain trigger conditions are met. For example, malware may only launch its payload on a specific date (as the MyDoom worm did), or upon receiving specific data from a command & control server. In these two examples, the trigger sources are the current date/time and data read from a network. Other trigger sources include (as listed in Moser’s paper):
- Internet connectivity;
- Mutex objects;
- Existence of files;
- Existence of Registry entries; and
- Data read from a file.
How can we analyse trigger-based malware? Brumley’s paper proposed Minesweeper,
a tool designed to detect the existence of trigger-based behaviours and to find
inputs that exercise these behaviours. As far as I can tell, Minesweeper was
never publicly released. However, we can build a very similar system in S2E
using our malware-hook
DLL! So let’s go ahead and create hooks for some of
the trigger sources discussed in these two papers.
Case study 1: GetLocalTime-test
The first trigger source that Brumley’s paper explores is
GetLocalTime.
GetLocalTime
has the following prototype:
void WINAPI GetLocalTime(
_Out_ LPSYSTEMTIME lpSystemTime
);
In Minesweeper, the user was required to specify where in memory the trigger
inputs will be stored. This was so the symbolic execution engine could properly
assign symbolic variables during execution. In the case of GetLocalTime
, this
would require specifying that GetLocalTime
stores its result in a 16-byte
structure pointed to by a stack value when GetLocalTime
is called.
Fortunately, we don’t have to worry about these low-level details. Instead, we
can just call S2EMakeSymbolic
on the variable we pass to GetLocalTime
.
Here’s how we do this in malware-hook
:
// Function hooks
static void WINAPI GetLocalTimeHook(LPSYSTEMTIME lpSystemTime) {
Message("Intercepted GetLocalTime\n");
// Call the original GetLocalTime to get a concrete value
GetLocalTime(lpSystemTime);
// Make the value concolic
S2EMakeSymbolic(lpSystemTime, sizeof(*lpSystemTime), "SystemTime");
}
// The names of the functions to hook (and the library they belong to)
static LPCSTR functionsToHook[][2] = {
{ "kernel32", "GetLocalTime"} ,
{ NULL, NULL },
};
// The function hooks that we will install
static PVOID hookFunctions[] = {
GetLocalTimeHook,
};
// The actual hooks
static HOOK_TRACE_INFO hooks[] = {
{ NULL },
};
// This function was defined previously
void __stdcall NativeInjectionEntryPoint(REMOTE_ENTRY_INFO *inRemoteInfo) {
// ...
// Replace the previous TODO with the following code to install the GetLocalTime hook
for (unsigned i = 0; functionsToHook[i][0] != NULL; ++i) {
LPCSTR moduleName = functionsToHook[i][0];
LPCSTR functionName = functionsToHook[i][1];
// Install the hook
NTSTATUS result = LhInstallHook(
GetProcAddress(GetModuleHandleA(moduleName), functionName),
hookFunctions[i],
NULL,
&hooks[i]);
if (FAILED(result)) {
Message("Failed to hook %s.%s: %S\n", moduleName, functionName,
RtlGetLastErrorString());
} else {
Message("Successfully hooked %s.%s\n", moduleName, functionName);
}
// Ensure that all threads _except_ the injector thread will be hooked
ULONG ACLEntries[1] = { 0 };
LhSetExclusiveACL(ACLEntries, 1, &hooks[i]);
}
// ...
}
Let’s implement the running example that Brumley et al. used in their paper (Fig. 1.1) to test that everything works as expected.
#include <Windows.h>
void ddos (LPCSTR target) {
// DDOS code goes here :)
}
int main() {
SYSTEMTIME systime;
LPCSTR site = "www.usenix.org";
GetLocalTime(&systime);
if (9 == systime.wDay) {
if (10 == systime.wHour) {
if (11 == systime.wMonth) {
if (6 == systime.wMinute) {
ddos(site);
}
}
}
}
return 0;
}
Ensure that you compile everything for the x86 platform (since we’ll be using a 32-bit Windows 7 VM). Once everything is built (including the VM!), we can create a new project:
s2e new_project -i windows-7sp1pro-i386 /path/to/malware-s2e/GetLocalTime-test/Debug/GetLocalTime-test.exe
Note that this will create a bootstrap.sh
that executes
GetLocalTime-test.exe
directly. We must modify bootstrap.sh
to have
malware-inject.exe
execute GetLocalTime-test.exe
instead. To do this we’ll
need access to our hooking tools from within the VM. We can do this by
executing the following command in our S2E environment to create the necessary
symbolic links in our project directory:
cd $S2EDIR/projects/GetLocalTime-test
HOOK_FILES="EasyHook32.dll malware-hook.dll malware-inject.exe"
for FILE in $HOOK_FILES; do
ln -s $S2EDIR/source/s2e/guest/windows/Debug/$FILE $FILE
done
And then edit bootstrap.sh
as follows:
# ...
# The target does not get executed directly - we execute it via malware-inject
function execute_target {
local TARGET
TARGET="$1"
./malware-inject.exe --dll "./malware-hook.dll" --app ${TARGET}
}
# ...
# We also need to download the files required for hooking
# Download the target file to analyze
${S2EGET} "GetLocalTime-test.exe"
${S2EGET} "EasyHook32.dll"
${S2EGET} "malware-hook.dll"
${S2EGET} "malware-inject.exe"
# ...
Finally, we can disable the following plugins in s2e-config.lua
(they are not
required):
WebServiceInterface
KeyValueStore
MultiSearcher
CUPASearcher
StaticFunctionModels
We are now ready to run our analysis!
Results
We should see S2E fork four times during our analysis. If we enable the
--verbose-fork-info
KLEE argument (in s2e-config.lua
) we can see the
constraints generated at each of these four fork points. The following image
shows a disassembly with these points highlighted.
ReadLSB w16 X SystemTime
can be understood as “read 16 bits (i.e. one WORD
)
at offset X
in the symbolic SystemTime
variable. If we look up the
SYSTEMTIME
struct on
MSDN we
will see that each WORD
at these offsets (0x6
, 0x8
, 0x2
, 0xA
)
corresponds with the wDay
, wHour
, wMonth
and wMinute
fields
respectively - just as expected. Finally, we should find a line in debug.txt
containing the following test case (I’ve reformatted the line and added the
field names to make it easier to read):
TestCaseGenerator: v0_SystemTime_0 = {0x0, 0x0, /* wYear */
0xb, 0x0, /* wMonth */
0x0, 0x0, /* wDayOfWeek */
0x9, 0x0, /* wDay */
0xa, 0x0, /* wHour */
0x6, 0x0, /* wMinute */
0x0, 0x0, /* wSecond */
0x0, 0x0} /* wMilliseconds */
If we cross-reference this against GetLocalTime-test/test.c
we can see that
this is the time to launch the DDOS. Success!
Case study 2: WannaCry
That was nice, but malware has moved on since the Minesweeper paper was written in 2007. Let’s look at something a bit more recent - the WannaCry ransomware. WannaCry famously contained a “killswitch” that stopped the ransomware from encrypting the target’s data. This killswitch was a check for whether a gibberish URL led to a live webpage. WannaCry would shut down if this URL could be reached (this check was probably done to fool dynamic analysis tools, which are typically configured to return valid, dummy responses to all network queries). With this in mind, let’s use S2E to explore WannaCry’s behaviour when this trigger condition is and isn’t satisfied. We’ll focus on the sample discussed in Amanda Rousseau’s excellent writeup (MD5 hash db349b97c37d22f5ea1d1841e3c89eb4).
Disassembly
Let’s take a quick look at the WannaCry killswitch in a disassembler.
We can see that the WinINet API is used to open a connection to the killswitch URL (hxxp://www[.]iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea[.]com). The following functions are called to do this:
- InternetOpenA:
Initializes the WinINet system. Returns an
HINTERNET
handle on success, orNULL
on failure. - InternetOpenUrlA:
Using the handle returned by
InternetOpenA
, open a resource specified by the given URL. Returns anHINTERNET
handle on success, orNULL
on failure. - InternetCloseHandle:
Close the handles opened by
InternetOpenA
andInternetOpenUrlA
.
At a minimum we must hook InternetOpenUrlA
and force a fork to explore both
paths at 0x4081a5
. What about InternetOpenA
? We can see in the WannaCry
code that the HINTERNET
handle returned by InternetOpenA
is never checked,
so we don’t have to worry about this function. If the returned handle was
(properly) checked, we may have needed to hook InternetOpenA
and force it to
return some dummy, non-NULL
value. Similarly, if we were interested in the
code executed when InternetOpenA
fails, we could also force a fork on some
symbolic value. However, for simplicity we’ll just focus on InternetOpenUrlA
.
Let’s write some more code!
WinINet hooks
First, replace the hooked functions in malware-hook.cpp
with the following:
static LPCSTR functionsToHook[][2] = {
{ "wininet", "InternetOpenUrlA" },
{ "wininet", "InternetCloseHandle" },
{ NULL, NULL },
};
static PVOID hookFunctions[] = {
InternetOpenUrlAHook,
InternetCloseHandleHook,
};
static HOOK_TRACE_INFO hooks[] = {
{ NULL },
{ NULL },
}
Then write the actual hook functions:
/// Keep track of dummy Internet handles that we've created
static std::set<HINTERNET> dummyHandles;
static HINTERNET WINAPI InternetOpenUrlAHook(
HINTERNET hInternet,
LPCSTR lpszUrl,
LPCSTR lpszHeaders,
DWORD dwHeadersLength,
DWORD dwFlags,
DWORD_PTR dwContext
) {
Message("Intercepted InternetOpenUrlA(%p, %s, %s, 0x%x, 0x%x, %p)\n",
hInternet, lpszUrl, lpszHeaders, dwHeadersLength, dwFlags, dwContext);
// Force a fork via a symbolic variable. Since both branches are feasible,
// both paths are taken
UINT8 returnResource = S2ESymbolicChar("hInternet", 1);
if (returnResource) {
// Explore the program when InternetOpenUrlA "succeeds" by returning a
// dummy resource handle. Because we know that the resource handle is never
// used, we don't have to do anything fancy to create it.
// However, we will need to keep track of it so we can free it when the
// handle is closed.
HINTERNET resourceHandle = (HINTERNET) malloc(sizeof(HINTERNET));
// Record the dummy handle so we can clean up afterwards
dummyHandles.insert(resourceHandle);
return resourceHandle;
} else {
// Explore the program when InternetOpenUrlA "fails"
return NULL;
}
}
static BOOL WINAPI InternetCloseHandleHook(HINTERNET hInternet) {
Message("Intercepted InternetCloseHandle(%p)\n", hInternet);
std::set<HINTERNET>::iterator it = dummyHandles.find(hInternet);
if (it == dummyHandles.end()) {
// The handle is not one of our dummy handles, so call the original
// InternetCloseHandle function
return InternetCloseHandle(hInternet);
} else {
// The handle is a dummy handle. Free it
free(*it);
dummyHandles.erase(it);
return TRUE;
}
}
Here we follow the approach taken in S2E’s multi-path fault injection
tutorial.
The returnResource
symbolic variable forces a fork, resulting in one state
where InternetOpenUrlA
succeeds (by returning a dummy resource) and another
state where InternetOpenUrlA
fails (by returning NULL
). We can return a
dummy resource handle because the InternetOpenUrlA
handle is never actually
used: remember, WannCry only checks if it is NULL
. The InternetCloseHandle
hook then cleans up the allocated memory. Now let’s hook and run WannaCry in
S2E.
Initial results
We can follow the same procedure that we used for GetLocalTime-test
to set up
an S2E project for WannaCry. Remember to make symbolic links to EasyHook32.dll,
malware-hook.dll and malware-inject.exe and s2eget
them in the bootstrap
script.
Before running S2E, enable the LibraryCallMonitor
plugin in s2e-config.lua
.
This plugin monitors and logs external library function calls, which gives us a
better picture of what WannaCry is doing. When you run S2E, you should see a
fork in malware-hook
’s address space (likely hidden amongst a lot of
debug output produced by LibraryCallMonitor
). If you follow the library calls
made by the WannaCry executable (instead of all the other DLLs loaded in its
address space), you should see the following library calls in state 0:
Address | DLL | Function |
---|---|---|
0x4081bc | wininet | InternetCloseHandle |
0x4081bf | wininet | InternetCloseHandle |
0x409b4e | msvcrt | exit |
While in state 1 you should see:
Address | DLL | Function |
---|---|---|
0x4081a7 | wininet | InternetCloseHandle |
0x4081ab | wininet | InternetCloseHandle |
0x40809f | kernel32 | GetModuleFileNameA |
0x4080a5 | msvcrt | __p___argc |
0x407c56 | msvcrt | sprintf |
0x407c68 | advapi32 | OpenSCManagerA |
0x407c9b | advapi32 | CreateServiceA |
0x407cb2 | advapi32 | StartServiceA |
… | ||
0x407d74 | kernel32 | FindResourceA |
0x407d86 | kernel32 | LoadResource |
0x407d95 | kernel32 | LockResource |
0x407da9 | kernel32 | SizeofResource |
… | ||
0x407ee8 | kernel32 | CreateProcessA |
… |
This looks good: we have successfully explored WannaCry’s behaviour when the killswitch was and wasn’t triggered. Rousseau’s writeup outlines WannaCry’s execution flow, and if we follow state 1’s library calls we should see that the execution flows match.
Hooking process creation
Let’s write one last hook. What happens if our hooked process spawns a new
process? This is pretty common for “dropper” malware, and indeed WannaCry does
this by loading an executable (tasksche.exe
) from a resource, writing it to
disk and then running it (via CreateProcessA
). When this happens, we are
totally blind to what this new process is doing: both in terms of injecting
symbolic data via our hooks and tracking its behaviour with S2E (e.g. via the
LibraryCallMonitor
plugin).
We can solve the former (losing our ability to inject symbolic data into the
new process) by hooking CreateProcessA
and using the EasyHook API to inject
malware-hook
into this new process. The following code achieves this:
// Don't forget to add CreateProcessA to the functionsToHook, hookFunctions and
// hooks arrays
BOOL WINAPI CreateProcessAHook(
LPCSTR lpApplicationName,
LPSTR lpCommandLine,
LPSECURITY_ATTRIBUTES lpProcessAttributes,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
BOOL bInheritHandles,
DWORD dwCreationFlags,
LPVOID lpEnvironment,
LPCSTR lpCurrentDirectory,
LPSTARTUPINFOA lpStartupInfo,
LPPROCESS_INFORMATION lpProcessInformation
) {
Message("Intercepted CreateProcessA(%s, %s, %p, %p, %d, %d, %p, %s, %p, %p)",
lpApplicationName, lpCommandLine, lpProcessAttributes,
lpThreadAttributes, bInheritHandles, dwCreationFlags, lpEnvironment,
lpCurrentDirectory, lpStartupInfo, lpProcessInformation);
// Get this DLL's path
HMODULE hDll = NULL;
DWORD hModFlags = GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT;
if (!GetModuleHandleEx(hModFlags, (LPCTSTR)&Message, &hDll)) {
Message("Failed to retrive DLL handle: 0x%X\n", GetLastError());
goto default_create_process;
}
WCHAR dllPath[MAX_PATH_LEN];
if (!GetModuleFileNameW(hDll, dllPath, MAX_PATH_LEN)) {
Message("Failed to retrive DLL path: 0x%X\n", GetLastError());
goto default_create_process;
}
// Create the new process, but force it to be created in a suspended state
if (!CreateProcessA(lpApplicationName, lpCommandLine, lpProcessAttributes,
lpThreadAttributes, bInheritHandles, dwCreationFlags | CREATE_SUSPENDED,
lpEnvironment, lpCurrentDirectory, lpStartupInfo, lpProcessInformation)) {
Message("Failed to create suspended process: 0x%X\n", GetLastError());
goto default_create_process;
}
// Inject ourselves into the new, suspended process.
// NativeInjectionEntryPoint will call RhWakeupProcess, which will kick
// ourselves out of the suspended state
NTSTATUS result = RhInjectLibrary(lpProcessInformation->dwProcessId,
lpProcessInformation->dwThreadId, EASYHOOK_INJECT_DEFAULT,
#if defined(_M_IX86)
dllPath, NULL,
#elif defined(_M_X64)
NULL, dllPath,
#else
#error "Platform not supported"
#endif
NULL, 0);
if (FAILED(result)) {
Message("RhInjectLibrary failed: %S\n", RtlGetLastErrorString());
goto default_create_process;
}
Message("Successfully injected %S into %s %s (PID=0x%x)\n", dllPath,
lpApplicationName, lpCommandLine, lpProcessInformation->dwProcessId);
return TRUE;
default_create_process:
return CreateProcessA(lpApplicationName, lpCommandLine, lpProcessAttributes,
lpThreadAttributes, bInheritHandles, dwCreationFlags, lpEnvironment,
lpCurrentDirectory, lpStartupInfo, lpProcessInformation);
}
This hook will start the new process in a suspended state and inject itself
into the new process. malware-hook
’s NativeInjectionEntryPoint
function
is then responsible for waking the process up.
This solves the problem of injecting symbolic data into a new process started
by WannaCry. What about tracking this new process’ behaviour in S2E?
Unfortunately, this requires a bit more work. One approach could be to write an
S2E plugin that listened for OSMonitor’s
onProcessLoad
signal. If a new process was found to originate from the
WannaCry process, we could add the new child process to
ProcessExecutionDetector’s
tracked modules. LibraryCallMonitor
would then start emitting onLibraryCall
events for this new process, allowing us to track its behaviour too. Because I
wanted to avoid writing S2E plugins in this post, I’ll leave this “as an
exercise for the reader”.
One last problem exists: The original WannaCry process terminates after it
starts tasksche.exe
. This causes malware-inject
to also terminate (remember
it calls WaitForSingleObject
), leading to bootstrap.sh
killing the current
(and only active) state. Unfortunately, this means that S2E will terminate
before we get to see WannaCry do something interesting (like encrypt our data).
The hacky way to fix this: add a sleep
command after the call to execute
in
bootstrap.sh
(don’t forget to set an appropriate amount of time to sleep
for). This is hacky because it means that we’ll waste time sleeping in state 0
after WannaCry exits (and does nothing interesting). A better approach is to
wait for tasksche.exe
(and any other child processes) to terminate. Let’s add
a function to do this:
// Set a sensible timeout value (in milliseconds). Can also be INFINITE
#define CHILD_PROCESS_TIMEOUT 10 * 1000
/// Keep track of child proceses (such as tasksche.exe)
static std::set<DWORD> childPids;
static BOOL WaitForChildProcesses(DWORD timeout) {
bool retCode = TRUE;
if (childPids.size() > 0) {
// Convert the set of PIDS to a list of handles with the appropriate permissions
std::vector<HANDLE> childHandles;
for (DWORD pid : childPids) {
Message("Getting handle to process 0x%x\n", pid);
HANDLE childHandle = OpenProcess(SYNCHRONIZE | PROCESS_QUERY_INFORMATION,
FALSE, pid);
if (childHandle) {
childHandles.push_back(childHandle);
} else {
Message("Unable to open child process 0x%x: 0x%X\n", pid, GetLastError());
return FALSE;
}
}
// Wait for the processes to terminate
Message("Waiting %d ms for %d children processes to terminate...\n",
timeout, childHandles.size());
DWORD waitRes = WaitForMultipleObjects(childHandles.size(),
childHandles.data(), TRUE, timeout);
switch (waitRes) {
case WAIT_FAILED:
Message("Failed to wait for child processes: 0x%X\n", GetLastError());
retCode = FALSE;
break;
case WAIT_TIMEOUT:
Message("Timeout - not all child processes may have terminated\n");
break;
}
// Close all handles
for (HANDLE handle : childHandles) {
CloseHandle(handle);
}
}
return retCode;
}
WaitForChildProcesses
should be called when the hooked WannaCry process
exits. We can do this by adding DLLMain
and checking for reason code
DLL_PROCESS_DETACH
:
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved) {
switch (fdwReason) {
// Don't exit until all child processes have terminated (or a timeout is reached)
case DLL_PROCESS_DETACH:
return WaitForChildProcesses(CHILD_PROCESS_TIMEOUT);
}
return TRUE;
}
Finally, don’t forget to add the following code to CreateProcessAHook
to
track child processes. The child process should only be saved if it is
successfully hooked (i.e. before returning TRUE
).
// This function was defined previously
static BOOL WINAPI CreateProcessAHook(
LPCSTR lpApplicationName,
LPSTR lpCommandLine,
LPSECURITY_ATTRIBUTES lpProcessAttributes,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
BOOL bInheritHandles,
DWORD dwCreationFlags,
LPVOID lpEnvironment,
LPCSTR lpCurrentDirectory,
LPSTARTUPINFOA lpStartupInfo,
LPPROCESS_INFORMATION lpProcessInformation
) {
// ...
// Save the newly-created process' PID
childPids.insert(lpProcessInformation->dwProcessId);
return TRUE;
// ...
}
If you comment out GRAPHICS=-nographic
in launch-s2e.sh
(to enable the QEMU
GUI), you’ll eventually be rewarded with the following (depending on the value
chosen for CHILD_PROCESS_TIMEOUT
):
Conclusion and next steps
In this post we’ve looked at analysing Windows malware with S2E, essentially recreating David Brumley’s Minesweeper tool in S2E. Unlike programs we’ve looked at in previous posts, we had to come up with some new techniques to inject symbolic data into our Windows programs. We used EasyHook to hook “trigger” functions that are commonly used by malware to hide their behaviour. While this approach worked well for our two case studies (which were admittedly highly contrived), there are many avenues for improvement. These avenues include:
- Hooking more of the Windows API. Brumley and Moser describe a number of different trigger sources (e.g. network data, registry keys, etc.) that aren’t covered in this post.
- Building more complex hooks. For example, our
InternetOpenUrlA
hook was overly simplistic - it just returned a dummy handle allocated on the heap. If this handle was later passed to a function likeInternetReadFile
, we’d have to hook this function as well. This is essentially the “environment modelling” problem inherit in most symbolic execution engines. - Hiding our hooks from the malware being analysed. Some ideas including porting Cuckoo Monitor to S2E or doing everything in an S2E plugin.
- A broader study on real malware. Is this type of symbolic execution even helpful for malware analysis? How common is trigger-based malware - can we get away with just doing a dynamic analysis in Cuckoo Sandbox? Are the obfuscation techniques discussed in Banescu’s work on Code Obfuscation Against Symbolic Execution Attacks used by malware authors, and if so how do they affect our analysis?
Hopefully this post gives you the necessary background and tools to go and look at some of these improvements. Maybe one day I’ll even find the time to look at some of them myself!
Edit
21/10/2018: I’ve updated this post with a less-hacky way of waiting for
WannaCry’s tasksche.exe
to start encrypting data.