Hacking Like its 1999 - Automating Analysis Like its 2021

Lacework Labs·June 14, 2021·6 min read

Jared Stroud
Cloud Security Researcher – Lacework Labs

The Takeaways

Lacework Labs is releasing a Ghidra script to automate the extraction of IRC IPs/Domains, channel and channel credentials used by Katien IRC bots and its variants.
Lacework Labs Ghidra Scripts

Summary

Lacework Labs has historically reported on the usage of IRC bots in conjunction with Cryptojacking attacks. These IRC bots provide the adversaries with remote access to the victim hosts via the IRC protocol. By releasing this Ghidra script we hope to help researchers and incident responders in automating the extraction of critical information about a Katien/Tsunami/Ziggystartux variant.

RFC 1459 – A Chat Protocol in a C2 World

The IRC protocol was originally defined in RFC 1459 in May of 1993. While initially offering a simple and lightweight chat protocol, it also quickly became a common command-and-control (TA0011) protocol for various malware samples over the years. A couple notable 2021 samples leveraging IRC include “FreakOut,” a worm leveraging VMWare vCenter exploits for initial access, as well as the Windows container-focused malware “Siloscape”. This 28 year old protocol is alive and well in the malware landscape.

A prolific IRC bot that Lacework Labs frequently runs across is Katien. The usage of Katien and its associated variants (ziggystartux, Tsunami, etc…) is often paired with an exploitation script that leverages known exploits for initial access, and a Cryptojacking utility. Notably, the Tsunami variant of Katien was baked into the ISO images of Linux Mint in 2016 when the Linux Mint website was compromised.

Lacework Labs has identified Tsunami variants embedded within Docker images, and also as stand-alone agents. Due to the frequency of these samples, we automated the extraction of key configuration details from the agents via Ghidra. While you could write a script in your favorite programming language to identify and carve out valuable information from a binary, by using Ghidra you get access to a powerful reverse engineering ecosystem to give further insight into malware variants your organization may come across.

The Configuration Structure of Katien & Variants

The very start of the Katien (and variants) IRC bot contain a configuration section defined by #define macros for the name of the process (#define FAKENAME), the IRC channel to join (#define CHAN), the key to join the IRC channel (#define KEY), if applicable. Thankfully, source code is available for researchers (and adversaries). If we take the example code and compile it, we’ll have the debug symbols that will enable us to understand how these hardcoded values are stored within the underlying binary.

Figure 1 – Katien IRC Configuration

The ELF file structure contains a section called “.rodata” where strings exist. The hardcoded values discussed above are stored there. Jumping to the .rodata section after Ghidra (or your preferred disassembler tool) finishes analyzing the binary gives a great starting point for identifying the configuration information required to connect to an IRC server. The Figure-2 image below shows the start of the .rodata within Ghidra revealing three different IRC domains.

Figure 2 – Ghidra IRC rodata

Referencing the Katien.c source code again, the configuration section macros are references starting at line 904. The configuration section in Figure-3 below contains a con() function followed by a IRC command being sent over the created socket.

Figure 3 – Katien configuration within main function

This hard-coded string can act as an “anchor” for creating a Ghidra script. For example, if we first identify this hard coded string we can then jump back a specific number of bytes, and end up in the configuration section in the .text section (where the actual code of the binary is stored). Variants such as Tsunami and Ziggystartux also have this hard coded value below the configuration section.

Figure-4 shows the reference of the NICK string from the .rodata section (left) to the .text section (right) landing us exactly where we want to be to identify the configuration information.

Figure 4 – NICK string reference

A critical aspect about this methodology is that it also creates “brittle” scripts. For example, If the string is not found, the Ghidra script would exit. Understanding reverse engineering tools/ scripts, and their associated limitations is critical in being able to identify where issues may arise. Whether it be tools outright breaking, “lying” to you by not displaying data you expect or even outright something nefarious such as anti-reversing/anti-debugging techniques that are being employed by the malware authors.

Scripting Analysis with Ghidra – PwnKatien

Now that the manual methodology has been discussed, we can look at the Ghidra API documentation on how to script and automate our analysis. The find API call within Ghidra’s FlatAPI allows for searching the entire address space for a specific string. There are other function definitions of find that allow for limiting the scope of find to a specific start and end address. By using the IRC NICK user string, we’ll identify the main function within the .text section of the binary(if it’s not already labeled), and also configuration information in the .rodata section. Figure 5 shows if the string has been identified, then references to that string populate an array of references.

Figure 5 – Identify References to NICK string

Next, the Ghidra script will iterate through references obtained via getReferencesTo function, and print out references to various configuration elements within the binary. If your binary has not identified a “main” function, this script will attempt to label it.

Finally, offsets to configuration information will be printed within the Ghidra console. In some cases this may be off by a handful of bytes depending on configuration of the IRC bot. However, with internal testing on various bots, the Lacework Labs team has consistently ended up in the correct sections to pull data from.

Figure 6 – Identifying References & Offsets

The execution of this Ghidra script will display content within the Ghidra console. Figure 7 below shows the execution of the Ghidra script along the addresses for various configuration values. The addresses within the console are clickable, allowing for quick navigation to the resources within the .text and .rodata sections of the binary.

Figure 7 – Ghidra Console Output

Conclusion

After going through this manual process of identifying an “anchor” string to base references off of, it’s fairly trivial to automate the analysis and identification of addresses/values you’re interested in via Ghidra’s scripting API.

An additional “gotcha” to be aware of are modifications to the source code made by malware authors. Lacework Labs has identified situations where the configuration section of the domains and IPs are encrypted or otherwise obfuscated, as well as contains numerous IRC servers which creates an incorrect calculation in the offset of this particular Ghidra script. Always consider the capability of your tools and how they may be circumvented by an adversary.

For more content like this, follow us on social media @LaceworkLabs Twitter or LinkedIn to keep up with our latest research.