LummaC2: Obfuscation Through Indirect Control Flow

3 months ago 28
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Written by: Nino Isakovic, Chuong Dong


Overview

This blog post delves into the analysis of a control flow obfuscation technique employed by recent LummaC2 (LUMMAC.V2) stealer samples. In addition to the traditional control flow flattening technique used in older versions, the malware now leverages customized control flow indirection to manipulate the execution of the malware. This technique thwarts all binary analysis tools including IDA Pro and Ghidra, significantly hindering not only the reverse engineering process, but also automation tooling designed to capture execution artifacts and generate detections.

To provide insights to Google and Mandiant security teams, we developed an automated method for removing this protection layer through symbolic backward slicing. By leveraging the recovered control flow, we are able to rebuild and deobfuscate the samples into a format readily consumable for any static binary analysis platform.

Protection Components

Overview

An obfuscating compiler, which we will also informally refer to as an "obfuscator," is a transformation tool designed to enhance the security of software binaries by making them more resilient to binary analysis. It operates by transforming a given binary into a protected representation, thereby increasing the difficulty for the code to be analyzed or tampered with. These transformations are typically applied at a per-function basis where the user selects the specific functions to apply these transformations to.

Obfuscating compilers are distinct from packers, although they may incorporate packing techniques as part of their functionality. They fall under the broader classification of software protections, such as OLLVM, VMProtect, and Code Virtualizer, which provide comprehensive code transformation and protection mechanisms beyond simple packing. Notably, for all protected components, the original code will never be exposed in its original, unprotected form at any point during the runtime of a protected binary. It is also common for obfuscating compilers to mix the original compiler-generated code with obfuscator-introduced code. This generally tends to necessitate a comprehensive deobfuscator from an analyst in order to analyze the binary.

The obfuscator employed by LummaC2 applies a multitude of transformations consistent with standard obfuscating compiler technology. Our concern only focuses on the newly introduced control flow protection scheme that we uncovered.

Our analysis strongly suggests that the authors of the obfuscator have intimate knowledge of the LummaC2 stealer. Certain parts of the protection, as described in the upcoming sections, are specialized to handle specific components of the LummaC2 stealer.

Dispatcher Blocks

The obfuscator transforms the control flow of a protected function into one guided by "dispatcher blocks," each consisting of a subset of the original instructions that constituted the unprotected function and the new instructions introduced by the obfuscator. Each dispatcher block ends with an indirect jump that branches to a dynamically-resolved destination stored in a register or memory address. The result produced thereof mutates the original progressive linear control flow into a disjointed series of scattered blocks. Each block is isolated, containing only the runtime logic necessary to transfer execution to its immediate successor block.

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig1.max-1900x1900.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig1.max-1900x1900.png

Figure 1: Dispatcher blocks overview

We refer to all instructions generated by the obfuscator as "dispatcher instructions" to differentiate them from "original instructions." Dispatcher blocks used by the obfuscator can be categorized into two main types: unconditional and conditional dispatcher.

    • Unconditional dispatcher: This dispatcher type protects the majority of instructions in an obfuscated function. It consists of dispatcher instructions that fetch encoded offsets from a lookup table in the .data section and perform ADD and XOR operations on them to calculate the next destination to transfer execution to.

    • Conditional dispatcher: This dispatcher type protects either individual conditional jump instructions (e.g., jne or ja) or basic blocks that end with a conditional jump. Instead of a single encoded offset to calculate and transfer execution to, the conditional dispatcher fetches one of two possible encoded offsets depending on the result of the condition to test.

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig2.max-2100x2100.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig2.max-2100x2100.png

Figure 2: Dispatcher block types

Conditional and unconditional dispatcher blocks are further categorized based on the distinct characteristics and layout of dispatcher instructions.

  • Register-based dispatcher: All calculations from dispatcher instructions operate solely on registers and always constitute the remaining instructions of the basic block.
  • Memory-based dispatcher: Dispatcher instructions operate on both registers and stack values for calculating the final jump destination and are also always the remaining instructions within the basic block.
  • Mixed-order dispatcher: A variant of register-based and memory-based dispatchers. The order and positions of dispatcher instructions in this layout are intertwined among original instructions that they are protecting instead of being placed at the end of the block.
https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig3.max-2100x2100.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig3.max-2100x2100.png

Figure 3: Obfuscating compiler dispatcher layouts

Dispatcher blocks can also exist standalone where they do not protect any original code. In such cases, they act as a single step responsible for continuing the control flow. 

Register-based Dispatcher Layout

Using the following LummaC2 sample with MD5 hash 205e45e123aea66d444feaba9a846748 from the Google Threat Intelligence collection here as a case study, we discover that out of 2,009 dispatcher blocks processed, there are 1,981 register-based dispatcher blocks, making it the most common dispatcher layout. This layout is applied to both conditional and unconditional dispatcher types that occur in any protected function.

00416630 mov eax, off_457C8C ; Retrieve CONSTANT1 from .data section 00416635 mov ecx, 22A7266Eh ; Populate CONSTANT2 0041663A xor ecx, dword_457C94 ; XOR CONSTANT2 with CONSTANT3 ; from the .data section 00416640 add eax, ecx ; ADD CONSTANT1 with the result 00416642 inc eax ; Increment the result 00416643 jmp eax ; Jump to the result

Figure 4: Register-based instruction dispatcher

By analyzing dispatcher blocks of this layout, we can derive some key characteristics of the protection. These blocks typically include mov instructions to fetch a value from the malware's .data section or populate the register with a constant. Next, an xor/lea instruction and an inc instruction perform arithmetic operations on the retrieved values. Finally, the dispatcher block ends with a jmp instruction to branch to the dynamically calculated value stored in a register.

This final indirect jump obfuscates the function's original control flow. It breaks the control flow recovery algorithms of tools like IDA Pro which is unable to recover the jump destination statically, hindering both the disassembly and decompilation operations.

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig5.max-1200x1200.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig5.max-1200x1200.png

Figure 5: IDA Pro's disassembly and decompiler views of a protected subroutine

By identifying the common patterns within these dispatcher instructions, it's possible to differentiate them from the function's core instructions, which is crucial for lifting the protection and deobfuscating the function.

Another observation is that the obfuscator produces duplicated original instructions when injecting its dispatcher instructions. Our assumption is that the obfuscator does not want to reallocate original instruction blocks when injecting the dispatcher code. As a result, it resolves this by copying those instructions to a new block at the destination.

0041665A push 0FFFFFFF6h ; Duplicated instruction 0041665C call ds:GetStdHandle ; Duplicated instruction 00416662 call sub_41A4A0 ; Duplicated instruction 00416667 push 0FFFFFFF6h ; Original instruction. Last dispatcher ; block will jump here 00416669 call ds:GetStdHandle ; Original instruction of next block 0041666F call sub_41A4A0 ; Original instruction of next block 00416674 mov ecx, off_457CB0 ; Next dispatcher instructions 0041667A mov edx, 9148854h 0041667F xor edx, dword_457CB4 00416685 add ecx, edx 00416687 inc ecx 00416688 jmp ecx

Figure 6: Duplicated instructions between two dispatcher blocks

Memory-based Dispatcher Layout

Memory-based dispatcher blocks appear significantly less frequently, as there are only 28 dispatchers of this type in the 2,009 blocks processed. Unlike the register-based layout, this layout relies on both registers and stack values for calculating and jumping to the destination. An example of this layout is shown in Figure 7, where the add dispatcher instruction adds a value stored on the stack to a register.

0044AA3A mov edi, [esi+50h] ; esi = esp in previous instruction 0044AA3D cmp edi, [esi+98h] 0044AA43 setb bl 0044AA46 mov edi, off_46C030[ebx*4] 0044AA4D add edi, [esi+9Ch] ; Dispatcher instruction. Adding a stack ; value to edi (jump destination) 0044AA53 mov ebx, [esi+0A0h] 0044AA59 jmp edi ; Jumping to edi

Figure 7: Dispatcher utilizing stack values to calculate the indirect jump's destination

In a smaller number of cases, we encounter dispatcher blocks of this layout ending with a jmp instruction that does not branch to a register value. Instead, it utilizes a value stored on the stack to determine the jump target.

0041CCB4 mov eax, [esi+5Ch] 0041CCB7 mov [eax], edi 0041CCB9 jmp dword ptr [esi+14h] ; Dispatcher jump to a stack value

Figure 8: Dispatcher with memory-based indirect jump

Mixed-order Dispatcher Layout

Mixed-order dispatcher layout is a variant of the register-based and memory-based dispatcher layouts. There are 12 memory-based and 28 register-based dispatcher blocks that fall into this mixed-order category.

Most dispatcher instructions are placed at the tail of an original instruction or a sequence of original instructions. However, this can vary and parts of the dispatcher block can also be split up and randomly intertwined with the initial instructions. This unpredictable placement adds another layer of complexity to the deobfuscation process.

Dispatcher instructions: 0041E847 mov eax, 0F5A88CDAh ; Dispatcher instruction 0041E84C xor eax, dword_459880 ; Dispatcher instruction 0041E852 mov ecx, off_459878 ; Dispatcher instruction 0041E858 add eax, ecx ; Dispatcher instruction 0041E85A inc eax ; Dispatcher instruction Original instructions: 0041E85B mov ebx, [esi+48h] 0041E85E mov ecx, [ebp+10h] 0041E861 mov [ebx], ecx 0041E863 mov edi, [esi+2Ch] 0041E866 mov ecx, [ebp+0Ch] 0041E869 mov [edi], ecx 0041E86B mov edi, [esi+0Ch] 0041E86E mov ecx, [esi+20h] 0041E871 mov dword ptr [edi], 0 0041E877 mov dword ptr [ecx], 0 0041E87D xorps xmm0, xmm0 0041E880 movups xmmword ptr [edx+4], xmm0 0041E884 movups xmmword ptr [edx+14h], xmm0 0041E888 movups xmmword ptr [edx+24h], xmm0 0041E88C mov dword ptr [edx+38h], 0 0041E893 mov dword ptr [edx+34h], 0 0041E89A mov dword ptr [edx], 3Ch 0041E8A0 mov dword ptr [edx+8], 0FFFFFFFFh 0041E8A7 mov dword ptr [edx+14h], 0FFFFFFFFh 0041E8AE mov dword ptr [edx+30h], 0FFFFFFFFh --------------------------------------------------- 0041E8B5 jmp eax ; Indirect jump

Figure 9: Mixed-order dispatcher example

Conditional Dispatcher

Conditional dispatchers deserve extra attention as they introduce more logic than unconditional ones. It is also important to note that all conditional branches are not subject to being obfuscated. We have identified 379 such instances within the case study sample that remain in their original state. These are leveraged in the context of tight loops and heavy string processing routines. They are likely left out of the protection scheme due to the severe performance degradation they induce.

The structure of conditional dispatcher blocks exhibits a slight variation from that of unconditional dispatchers. Given that the intent is to protect conditional logic, there will always be two possible outcomes:

  • The branch that satisfies the condition being taken

  • The fallthrough branch that does not satisfy the condition being taken

The obfuscator employs a table of paired entries for each conditional branch that is indexed given the result of the condition, which will either be true or false (0 or 1). Each index corresponds to one of the two branches that can be taken.

Conditional dispatchers fall into three distinct categories.

  1. Standard conditional logic
    • The obfuscator accounts for all common conditional jump conditions
    • The condition code is evaluated using one of the following instructions:
      • test <reg>, <reg>
      • cmp <reg>, <imm>
    • setcc is then used to capture the original conditional jump logic. That is to say, every original conditional jump instruction is reflected as its setcc counterpart (e.g., a jnz becomes a setnz)
  2. Loop logic
    • Non-infinite loops require conditional logic as a means of exiting the loop body. The obfuscator implements this using three distinct dispatcher blocks linked with an arbitrary subset of dispatcher blocks that represent the loop body
      • Initialization block
        • Initializes the default branch target via an "exit condition" flag that is always set to false (so that execution is transferred to the start of the loop body)
      • Update block
        • Updates the exit condition flag based on the processing of either the initialization block or logic stemming from the loop body
      • Exit-check block
        • Checks whether the exit condition flag is either set to exit the loop or transfer execution back to the loop body
  3. Syscall logic
    • This category is specific to a LummaC2 component that invokes Windows syscalls and disguises how the resulting NTSTATUS code is verified. This is effectively a conditional dispatcher that implements the NT_SUCCESS macro.
    • The following instruction sequences are used to determine the success of a syscall by negating the returned NTSTATUS and inspecting its sign value. A value of 1 indicates a successful syscall while 0 indicates a failed syscall.
      • not eax
      • shr eax, 0x1F

Standard Conditional Dispatcher Type

Continuing with using the case study sample from earlier, we find the standard conditional dispatcher type occurring 987 times out of the 1,063 conditional dispatchers.

Figure 10 and Figure 11 illustrate this type where the conditional value is tested against both zero and a non-zero constant. The first figure shows the conditional value being compared to 0 using a test instruction. The second shows the conditional value being evaluated against a non-zero constant 0x5A4D using a cmp instruction.

0041656E call sub_41C610 ; subroutine call at 0x41C610 00416573 mov esi, eax ; save set return value (eax) into esi 00416575 xor eax, eax ; clear out the index 00416577 test esi, esi ; evaluate the result 00416579 setnz al ; Set al if conditional value is not zero 0041657C mov eax, off_457CF4[eax*4] ; fetch appropriate encoded branch target 00416583 mov ecx, 0C09E0A35h ; start the decoding sequence 00416588 xor ecx, dword_457CFC 0041658E add eax, ecx 00416590 inc eax 00416591 jmp eax ; transfer execution to the decoded ; branch value

Figure 10: Conditional dispatcher with the conditional value being compared to 0

0044DD15 movzx ecx, word ptr [edi] ; fetch the 16-bit value to evaluate 0044DD18 xor edx, edx ; clear out the index 0044DD1A cmp ecx, 5A4Dh ; compare to the 0x5A4D constant 0044DD20 setnz dl ; set the index to the result 0044DD23 mov ecx, off_46F304[edx*4] ; fetch appropriate encoded branch target 0044DD2A mov edx, 9EC9743Dh ; start the decoding sequence 0044DD2F xor edx, dword_46F30C 0044DD35 add ecx, edx 0044DD37 inc ecx 0044DD38 jmp ecx ; transfer execution to the decoded ; branch value

Figure 11: Conditional dispatcher with the conditional value being compared to a non-zero constant

Loop Conditional Dispatcher Type

Figure 12, Figure 13 and Figure 14 provide an illustration of a loop conditional dispatcher type, which occurs 42 times within the sample. It is always a collection of linked dispatcher blocks that include the loop initialization sequence, the loop body (an arbitrary collection of dispatcher blocks specific to the loop logic), an update condition block, and finally a check-exit condition block.

The initialization block sets the stage for a loop by establishing an "exit condition" flag and initializing it to false, ensuring the loop body executes at least once. The update block then modifies this flag based on the results of the initialization block or the loop body's logic. Finally, the exit-check block examines the flag's state to determine whether to continue iterating or exit the loop.

0044CD55 mov dword_470A30, ebx 0044CD5B mov edi, [ebp-34h] 0044CD5E xchg ax, ax 0044CD60 mov eax, off_46CB3C 0044CD65 mov ecx, 74F906B5h 0044CD6A xor ecx, dword_46CB44 0044CD70 add eax, ecx 0044CD72 inc eax 0044CD73 mov dword ptr [ebp-30h], 0 0044CD7A mov dword ptr [ebp-18h], 0 ; conditional flag, initially 0 to ; reflect transfer to the loop body ; not the loop exit 0044CD81 mov dword ptr [ebp-28h], 0 0044CD88 mov dword ptr [ebp-40h], 0 0044CD8F jmp eax

Figure 12: A loop implementation block

0044C108 mov ecx, [ebp-5Ch] 0044C10B mov eax, [ecx+1] 0044C10E add eax, ecx 0044C110 add eax, 5 0044C113 mov [ebp-18h], eax ; instructions that update the ; conditional flag 0044C116 mov eax, off_46CFE4 0044C11B mov ecx, 681DADB7h 0044C120 xor ecx, dword_46CFEC 0044C126 add eax, ecx 0044C128 inc eax 0044C129 nop dword ptr [eax+00000000h] 0044C130 mov ecx, [ebp-18h] 0044C133 mov [ebp-28h], ecx 0044C136 jmp eax

Figure 13: A update-block loop

0044C2AD xor eax, eax 0044C2AF mov edx, [ebp-18h] ; evaluate the conditional flag 0044C2B2 test edx, edx 0044C2B4 setnz al 0044C2B7 mov ecx, 27DC8BC9h 0044C2BC xor ecx, dword_46D248 0044C2C2 mov eax, off_46D240[eax*4] ; fetch the target 0044C2C9 add eax, ecx 0044C2CB inc eax 0044C2CC mov [ebp-28h], edx 0044C2CF mov ebx, [ebp-20h] 0044C2D2 jmp eax ; Jump back to a loop body block ; or exit the loop

Figure 14: An exit-check block

Syscall Conditional Dispatcher Type

Dispatchers of this type are used for checking the return values of LummaC2-specific function calls that perform a syscall. They appear only 34 times in the case study sample. In these functions, LummaC2 decrypts the shellcode in Figure 15 and executes it in memory to make a particular syscall.

mov eax, <syscall ID> mov edx, win32u.Wow64SystemServiceCall call edx ret <imm16>

Figure 15: Shellcode to call Windows system call

In other cases, the malware makes direct calls to Windows Native APIs instead of utilizing the shellcode in Figure 15.

The conditional dispatcher for this type implements the NT_SUCCESS macro by checking whether the returned NTSTATUS code is successful or not. This is done via checking the sign of the inverted NTSTATUS code and capturing it as the branch target index, which will either be 0 or 1. Given that a successful NTSTATUS code is always a 32-bit zero value, a successful syscall will result in the true branch (index 1) being taken, and a failed syscall will result in the false branch (index 0) being taken.

00424D95 call sub_44EDA0 ; wrapper function to perform a syscall 00424D9A add esp, 0Ch 00424D9D not eax ; negate all bits of the NTSTATUS return value 00424D9F shr eax, 1Fh ; isolate the sign bit to capture the ; result and in turn, the index to ; the according branch 00424DA2 mov eax, off_45DC9C[eax*4] ; fetch the according branch target 00424DA9 mov ecx, 31637ACh 00424DAE xor ecx, dword_45DCA4 00424DB4 add eax, ecx 00424DB6 inc eax 00424DB7 jmp eax

Figure 16: Conditional dispatcher to check syscall return values

Obfuscated Function Recovery

Original Instruction Recovery 

Recovering the original control flow of a protected function requires us to differentiate between the obfuscator's injected dispatcher instructions and the function's original instructions. To solve this, we decide to use symbolic backward slicing, a program analysis technique that identifies instructions that influence a specific register or memory address at a given point within a simulated execution on an intermediate representation. In this context, we employ backward slicing to do the following:

  • Isolate the dispatcher instructions from the original instructions

  • Determine which explicit instructions calculate the final indirect transfer of control

In our deobfuscator design, we leverage the Triton symbolic execution engine to conduct the core of the recovery. Triton implements backward tracing APIs that we can use directly. When executing the program, Triton maintains a set of symbolic expressions that represent the values of registers and memory addresses. These expressions are stored as an Abstract Syntax Tree (AST), where each tree node represents an operation with operands that result from the execution flow. Triton refers to this implementation as "processing," which is the result of simulating the memory effects a culmination of emulated instructions produce and reflecting that result as an AST.

This is a powerful abstraction that allows us to reason about the deobfuscation at an AST level and ignore the verbose disassembly produced by the obfuscator. 

To distinguish dispatcher instructions, we'll focus on the destination of the final indirect jump in a dispatcher block. By looking up this destination in the constructed ASTs after all dispatcher instructions are processed, we can extract its corresponding symbolic expressions. 

Figure 17 shows the AST of the destination register eax at an indirect jump. This AST represents all symbolic expressions from the result of the symbolic processing of the corresponding instructions that influence the value of the destination register before the indirect jump is executed.

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig17.max-2100x2100.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig17.max-2100x2100.png

Figure 17: ASTs of the destination register after the indirect jump instruction is processed

Using Triton's APIs, we can extract a subset (or slice) of the processed expressions that collectively contribute to the final destination address of an indirect jump. For each expression in the slice, we can map it back to the specific dispatcher instruction that generates it. This mapping is possible because Triton maintains the association between instructions and the symbolic expressions they produce during its execution.

A snippet of the code used to perform backward slicing to distinguish dispatcher instructions from the original ones is shown in Figure 18.

# Retrieve the bytes of the instruction at the current program counter instructionBytes = context.getConcreteMemoryAreaValue(pc, 16) # Create a Triton Instruction object from the retrieved bytes instruction = Instruction(pc, instructionBytes) # Process the instruction using the Triton context context.processing(instruction) # Scan for dispatcher jump instruction if instruction.getType() == OPCODE.X86.JMP: # Extract the operand of the JMP instruction jmpOperand = instruction.getOperands()[0] # Process JMP instructions with register operand only if jmpOperand.getType() == OPERAND.REG: # Get symbolic expression of destination register destRegExpression = context.getSymbolicRegisters()[jmpOperand.getId()] # Backward slice on the destination register slicing = context.sliceExpressions(destRegExpression) # Iterating through the slices for _, sliceInstr in sorted(slicing.items()): # Print out the disassembled instruction of each slice sliceInstrDisassembly = sliceInstr.getDisassembly() print('\t[Slice]', sliceInstrDisassembly)

Figure 18: Triton code to perform backward slicing to recover all dispatcher instructions

Here, we continuously execute instructions until a jmp instruction is encountered. If the instruction's operand is a register, we retrieve its set of symbolic expressions and perform a backward slice to recover all instructions that influenced its result. Triton allows us to further preserve the original disassembly given a set of symbolic expressions that we leverage to extract the exact dispatcher instructions that produce the slice, and not merely the AST representation.

Once the complete backward slice for the destination has been retrieved, we can confidently distinguish the dispatcher instructions from the original instructions within the function. This distinction holds true regardless of the placement or order of the dispatcher instructions within a protected block since the backward slice only monitors those instructions that directly influence the final value.

Backward slicing output: ... [Processing] 0x416530: lea eax, [esp + 8] [Processing] 0x416534: push eax [Processing] 0x416535: call dword ptr [0x454a18] [Processing] 0x41653b: mov eax, esp [Processing] 0x41653d: push eax [Processing] 0x41653e: call dword ptr [0x454a14] [Processing] 0x416544: mov eax, dword ptr [0x457c1c] [Processing] 0x416549: mov ecx, 0xa15bd01f [Processing] 0x41654e: xor ecx, dword ptr [0x457c24] [Processing] 0x416554: add eax, ecx [Processing] 0x416556: inc eax [Processing] 0x416557: jmp eax [Slice] 0x416544: mov eax, dword ptr [0x457c1c] [Slice] 0x416549: mov ecx, 0xa15bd01f [Slice] 0x41654e: xor ecx, dword ptr [0x457c24] [Slice] 0x416554: add eax, ecx [Slice] 0x416556: inc eax ...

Figure 19: Output for the code in Figure 18 to distinguish dispatcher instructions

Control Flow Recovery

In addition to recovering all original instructions of the function, we must also recover the original control flow. While instructions are processed dynamically, Triton allows us to determine the concrete destination value of the final indirect jump in the dispatcher block. With this, we can trace the program's execution flow and reconstruct the order in which dispatcher blocks are executed.

To explore all possible execution paths within the function, we employ a depth-first search (DFS) traversal algorithm. 

We begin by exploring a single path, following the control flow dictated by the obfuscator's indirect jumps. This continues until the path reaches a termination point, such as a ret instruction or a program-ending API call (e.g., ExitProcess).

In our deobfuscator design, we default to viewing all of these protected jumps as jnz instructions by forcing the index register to be 1 in the main execution path being processed. When encountering a protected conditional jump, we assume the condition is met and continue exploring the path that follows the jump. However, we don't discard the alternative path. The alternative path is stored in a queue-like data structure. This allows us to revisit these paths later when we've exhausted all possibilities on the current path.

By systematically exploring all paths using DFS and handling conditional jumps strategically, we can reconstruct the original control flow that has been obfuscated with the compiler's indirect jumps.

Deobfuscation: Rebuilding Original Function

With the original instructions and execution paths identified, we can deobfuscate the sample by rebuilding the functions we have processed. Our goal is to ensure the deobfuscated functions are restored to their original state, preserving their original semantics and removing all traces of the obfuscator.

Instruction Rewriting

When rebuilding, we can overwrite the original protected function with the deobfuscated instructions. Since a deobfuscated function always has fewer instructions than an obfuscated function, there is guaranteed space to accommodate the rebuilt function. The remaining space can be padded with standard compiler padding instructions like 0xCC.

The rewriting process involves writing instructions back from the function's entry point in the order they are processed and executed during the Triton analysis, excluding all dispatcher instructions. Here, we will address two specific cases involving indirect jumps originally added by the obfuscator.

The first case involves processing an unconditional dispatcher block. For this case, if the jump target has not been written yet, we simply skip it and continue writing instructions sequentially. If the jump target has already been written, we replace the indirect jump with a direct one to branch back to that target.

The second case for handling the jump instruction of a conditional dispatcher block is a bit more convoluted. Before tackling this, we must determine the original conditional jump type (e.g., jz, jnz, jl) based on the preceding setcc dispatcher instruction.

Since the indirect jump can target one of the two destinations given a condition, we must replace it with two instructions. The first instruction is a conditional jump to the first destination using the correct conditional jump type.

The second instruction can be either:

  • A conditional jump with the opposite type as the first, targeting the second destination.

  • A direct jump to the second destination. This is chosen for simplicity of our deobfuscator implementation.

0041652B call sub_4455F0 ; original instruction 00416530 movzx eax, al ; eax = al = return value 00416533 test eax, eax ; set flags 00416535 jnz loc_416540 ; replacing indirect jmp with jnz for the first path 0041653B jmp loc_416554 ; insert a jmp for the second path

Figure 20: Replacing an indirect conditional jump with a jnz-jmp instruction pair

Offset Relocation

The final step, relocation, addresses a remnant from our rebuilding process. As we remove dispatcher instructions and duplicated instructions, the rewritten instructions will occupy different locations from where they were in the original function. This displacement throws off the offsets of jump, call, and other memory-referencing instructions that are not position-independent, as they now need to refer to memory locations from their new addresses.

In our current implementation, we address this by parsing all of the memory-referencing instructions and calculating their correct offsets after deobfuscation. This involves tracking both the original and relocated addresses of each instruction. With this information, we can calculate the adjusted offset to reach the target memory reference and craft the correct opcode for each instruction.

Final Result

By employing techniques described in this blog post, we have successfully developed a deobfuscation tool for this version of LummaC2. In the following figures, we see the result of our deobfuscator lifting the protection from two protected functions in the case study sample.

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig21.max-1700x1700.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig21.max-1700x1700.png

Figure 21: Disassembly view of the subroutine at the binary's entrypoint before deobfuscation

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig22.max-1800x1800.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig22.max-1800x1800.png

Figure 22: Decompiler view of the subroutine at the binary's entrypoint after deobfuscation

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig23.max-1700x1700.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig23.max-1700x1700.png

Figure 23: Disassembly view of the subroutine at address 0x41EE50 before deobfuscation

https://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig24.max-1800x1800.pnghttps://storage.googleapis.com/gweb-cloudblog-publish/images/lummac2-obfuscation-fig24.max-1800x1800.png

Figure 24: Decompiler view of the subroutine at address 0x41EE50 after deobfuscation

As shown in these figures, the original instructions are now readily apparent, free from the clutter of dispatcher blocks added by the obfuscator. The control flow, once obscured by indirect jumps, is now clearly visible and can be recovered and decompiled using IDA Pro. After deobfuscating all protected functions, we can now analyze the original program to comprehend its capabilities and behaviors.

Conclusion

In this blog post, we have explored the inner workings of LummaC2's obfuscation technique using indirect jumps to manipulate control flow. By leveraging backward slicing and symbolic execution, we have been able to consistently identify the original instructions and eliminate dispatcher instructions added by the obfuscator. Furthermore, we have discussed strategies for deobfuscation, including rebuilding the original function from the recovered control flow and addressing relocation challenges.

While this blog post focuses on deobfuscating LummaC2 protected subroutines, the power of backward slicing as a binary analysis technique extends well beyond this specific case. We hope our exploration of deobfuscating LummaC2 through the use of backward slicing has provided valuable insights to fellow analysts tackling similar challenges in the ever-evolving realm of reverse engineering and malware analysis.

Indicators of Compromise

A Google Threat Intelligence Collection featuring indicators of compromise (IOCs) related to the activity described in this post is now available.

Host-Based IOCs

MD5

Associated Malware Family

d01e27462252c573f66a14bb03c09dd2

LUMMAC.V2

5099026603c86efbcf943449cd6df54a

LUMMAC.V2

205e45e123aea66d444feaba9a846748

LUMMAC.V2

Posted in
Read Entire Article