RL Blog
|

Data Exfiltrator

A New Tactic for Ransomware Adversaries

Robert Simmons
Blog Author

Robert Simmons, Principal Malware Researcher at ReversingLabs. Read More...

Data Exfiltrator
Summary

Over the past year, a major change in tactics employed by ransomware adversaries is to exfiltrate data from the victim's environment. The data then serves as the material for an extortion threat on top of the ransom for encrypted data. This additional tactic became a trend followed by most major ransomware families early last year, 2020 1.To support this tactic, some ransomware operators have added a specific type of malware to perform this exfiltration to their intrusion set 2. The five samples analyzed here perform this type of data exfiltration. They upload a set of files from the victim's computer to command and control servers hosted on IP addresses 51.81.153[.]212, 51.161.82[.]135, and 51.77.110[.]6. All of these IP addresses are owned by OVH SAS, a French hosting company. The malware follows the exfiltration with a single line PowerShell command that stops the malware's running process and then deletes the malware file that was executed. The malware has a type of anti-analysis behavior called "Relocate API Code" according to the Malware Behavior Catalog's 3 categorization 4. The malware reads a copy of system DLLs into memory and resolves imports from there. This causes a problem for debuggers such as x64dbg 5.

Interestingly, these files share code with an earlier malware sample with completely different capabilities. This earlier file has been observed alongside TrickBot, CobaltStrike, and ransomware 6. This earlier malware additionally uses the same anti-analysis technique, but does not exfiltrate data. It has the capability to download a CobaltStrike beacon and execute it 7. In addition to this overlap in code and behavior, the command and control (C2) infrastructure domains are registered via the same registrar. Also, the C2 IP addresses are owned by the same hosting company, OVH.

Anti-analysis Trick

Relocate API Code

The first behavior one observes when running these samples in a sandbox or in a debugger is that at the point where the imports are resolved an exception is raised. Debugging past this point is not possible without circumvention of an anti-analysis trick. This circumvention starts by examining the first encoded string the samples decode. Encoded strings are found in two general forms in these samples. First is with all the rest of the strings in the file in the .data section. Some of these can be seen in Figure 1 with one example highlighted.

Encoded Strings


Figure 1: Encoded Strings

Notice the "mOrxxxx" characters that trail each of the encoded strings. These trailing characters will be examined below.

The other location where encoded strings are found is split up character-by-character as stack strings. Each byte is moved one-by-one to a location on the stack before the decoding operation occurs. The first string of this type is in the function that resolves imports from kernel32.dll. This encoded string is shown in Figure 2.

First Encoded String


Figure 2: First Encoded String

This first string decodes to "C:\Windows\System32\kernel32.dll". This path is then used to read the DLL from the filesystem into memory. Imports are then resolved against this copy of the DLL rather than the system DLL. The function calls to copy the DLL are shown in Figure 3.

Copy DLL Function


Figure 3: Copy DLL Function

In the debugger's environment, this read fails with an exception which then prevents the imports from being properly resolved 8. A detailed explanation of what's happening here can be found on OALabs YouTube channel 9.To circumvent this trick, one can create a copy of the DLLs on the filesystem and change their names along with the decoded path strings. This way the DLLs can be read correctly and the imports properly resolved. This change can be done on the fly in the debugger after the strings are decoded. An example of changing this on the fly using the filename kernel33.dll is shown in Figure 4.

Change DLL Name


Figure 4: Change DLL Name

Alternatively, the single encoded byte needed for this change can be modified in the sample with a hex editor to make this change permanent. This also makes restarting the analysis in the debugger less annoying. This byte difference is highlighted using HexFiend's 10 comparison function and can be seen in Figure 5.

Original Compared to Patch


Figure 5: Original Compared to Patch

After the DLL path has been decoded, the various imports from that DLL are then decoded from similar stack strings. One example with LoadLibraryW is shown in Figure 6. Again, please note the trailing "mOrxxxx" string immediately after the encoded bytes of the "LoadLibraryW" string.

Encoded LoadLibraryW with Trailing Additional String


Figure 6: Encoded LoadLibraryW with Trailing Additional String

Once the import strings are decoded, a custom implementation of GetProcAddress is used on the copy of kernel32.dll to resolve the imports. The results of this process can be seen in Figure 7.

After Imports Resolved


Figure 7: After Imports Resolved

Examining the exports for these samples shows a DLL name "Input.exe" as well as one exported function "bsearch". These exports are shown in Figure 8.

Exports


Figure 8: Exports

This bsearch function is a version of the binary search algorithm 11. It appears once in the samples as part of the custom GetProcAddress implementation. This function call is highlighted in Figure 9.

Function Call to bsearch in Custom GetProcAddress


Figure 9: Function Call to bsearch in Custom GetProcAddress

After the imports from kernel32.dll have been resolved, the next DLL is ntdll.dll. The process shown above is repeated for this DLL. The alternative name used here was npdll.dll. The path after this change is shown in Figure 10.

DLL Name Change


Figure 10: DLL Name Change

Some of the resolved imports give an idea of what's to come and what the capabilities are for these samples. Examples of this are the imports of "HttpAddRequestHeadersW" and "EnumProcesses" as shown in Figures 11 and 12.

Import HttpAddRequestHeadersW


Figure 11: Import HttpAddRequestHeadersW

Import EnumProcesses


Figure 12: Import EnumProcesses

The rest of the DLLs after ntdll.dll that are loaded are not loaded using this same antianalysis trick. These other DLLs are user32.dll, wininet.dll, and psapi.dll. The steps to decode and resolve imports from each DLL are divided into separate functions. Each of these functions is shown in figure 13.

Resolve Import Functions


Figure 13: Resolve Import Functions

In the code of these samples, another interesting library function calling pattern is to call NtAllocateVirtualMemory using syscall to allocate memory. This pattern of function call is shown in Figure 14.

Syscall Used on NtAllocateVirtualMemory


Figure 14: Syscall Used on NtAllocateVirtualMemory


Collect Environment Information

The first set of capabilities in these samples is to collect information about the victim's environment. The first bit of information collected is the name of the computer. The call to GetComputerNameExA is shown in Figure 15.

Collect Computer Name


Figure 15: Collect Computer Name

The next bit of environmental information collected is the physical and virtual memory status. This is done via a call to GlobalMemoryStatusEx 12 which is shown in Figure 16.

Measure Physical and Virtual Memory Status


Figure 16: Measure Physical and Virtual Memory Status

The memory status is not sent back to the command and control (C2) infrastructure. It is probably used in the file processing algorithm because the primary purpose of these samples is to exfiltrate files from the victim's computer. These files must be copied from the filesystem to memory for processing before being sent to the C2.

The next data point collected is the username that ran the malware file. This data point does not appear to be sent back to the C2 according to the fields in the network traffic. The API call to GetUserNameA is shown in Figure 17.

Collect Username


Figure 17: Collect Username

Next the samples check the free disk space via GetDiskFreeSpaceExA as shown in Figure 18.

Check Disk Free Space


Figure 18: Check Disk Free Space

Next the OS version and product information is collected via calls to GetVersionExA and GetProductInfo. These calls are shown in Figure 19.

Gather OS Version and Product Information


Figure 19: Gather OS Version and Product Information

Malware Configuration

After the environment information has been collected, the malware configuration strings are then decoded. As opposed to the stack strings used in the import resolution process, the configuration strings are normal strings in the .data section as shown above. All of these strings have a trailing "mOrxxxx" string. Interestingly this additional data does not cause problems for the decoding process. The reason for this is the decoding function works on a null terminated string. Examining the encoded strings closely, this null termination can be seen before the additional characters. An example of this null termination in a configuration string is shown in Figure 20.

Null Termination


Figure 20: Null Termination

An example of this null termination in a stack string is shown in Figure 21.

Null Termination


Figure 21: Null Termination in Stack String

The configuration strings for one particular sample 13 are shown in Figure 22.

Decoded Configuration Strings


Figure 22: Decoded Configuration Strings

The second configuration string from the top in Figure 22 above is used in a field called "key" in the C2 traffic along with the exfiltrated data. Each of the samples analyzed here have different key strings. The following table shows each of these strings along with the first five characters of the SHA256 hash of the file the string was collected from.

SHA256 Prefix Key
dcc4a 46rnyegq235etnerhgf43trrthgbfRYdfnhg
68af2 8953n7b8ewurdfb3njnyuridrwdb
934c5 huve3fn298vmfu293jKVFDSfvjjfe893
a7cf0 huve3fn298vmfu293jKVFDSfvjjfe893
8cfd5 3f9n8uv0n43809vn3d092v09290

Exfiltration

The data exfiltration process starts by enumerating the logical drives that are available on the victim's computer. This is determined using a call to GetLogicalDriveStringsW and is shown in Figure 23.

Determine Available Logical Drives


Figure 23: Determine Available Logical Drives

For each of these logical drives, a function is called that walks the file system searching for targeted files and exfiltrating them. This walk function interestingly is recursive. This recursion is shown in Figure 24.

Walk Function Recursion


Figure 24: Walk Function Recursion

The first steps taken in the walk drives function is to add an asterisk to the path that is the input of the function. This is numbered "1" below. Then memory is allocated twice in a row. This is numbered "2" below. The string with the trailing asterisk is then written to one of the two allocated memory locations. This is numbered "3" below. Finally, this string is used to call "FindFirstFileW" with the second allocated memory location as the output location that receives the structure resulting from the API call. These are all shown in Figure 25.

Finding Files


Figure 25: Finding Files

As the malware walks the file system, any files that contain one of the following strings in the filename are exfiltrated.

.doc .xls .pdf
.docx .xlsx  


Interestingly, the algorithm used to find these files is probably not what the adversary expected. Rather than checking for a file extension as a suffix it actually matches any infix of the above strings. Because of this, any file with .xlsx will already match .xls for example. These target file extensions are shown in Figure 26.

Target File Extensions


Figure 26: Target File Extensions

Next the file size is determined using a call to GetFileSize. This information is included in the exfiltrated data. The API call is shown in Figure 27.

Determine File Size


Figure 27: Determine File Size

As the file system walk proceeds, the path strings are emitted as debug strings via a call to OutputDebugStringW. This is followed immediately by bytes for a Windows carriage return line feed. These two are shown in Figure 28.

Emit Debug Strings


Figure 28: Emit Debug Strings

This malware can exfiltrate large files. It does this by dividing the file into chunks according to a hard coded "frame size". This hard coded size is 32535 bytes and is highlighted in Figure 29.

Frame Size


Figure 29: Frame Size

The above also shows all the other fields that are used in the C2 traffic. The frame number starts at "-0" then "1", "2" etc. The "filecrc" field is the cyclic redundancy check (CRC) which is used as a checksum for error detection 14. The last two fields are the filename and the computer name.

Using a constructed test file named "testfile.doc" the TLS encrypted C2 traffic is intercepted and analyzed using Burp 15 and Inetsim 16. The body parameters and the request headers from this test file are shown in Figure 30.

Test Document Shown Being Exfiltrated in C2 Network Traffic


Figure 30: Test Document Shown Being Exfiltrated in C2 Network Traffic

Interestingly, the configuration strings include "GET" in addition to "POST". However, this capability does not appear to be used in these samples. A check is performed which determines which of the two request methods are used. This check is shown at the top of Figure 31. The POST and GET options are shown in the center, and the call to HttpOpenRequestW is shown at the bottom.

Request Method Options


Figure 31: Request Method Options

The algorithm used to determine which files are exfiltrated is flawed in that it will match files that are not Word, Excel, or PDF documents. It will exfiltrate any file that contains the target strings anywhere in the filename. A test of this is shown in Figure 32 which uses a fake file with the extension ".txt" and ".pdf" in the middle of the filename.

Unexpected Exfiltration
Figure 32: Unexpected Exfiltration

After all the filesystem walking has completed, the next function called sends a dummy "end of transmission" file out to the C2. The filename of the file is ".lock" and the content is "locked". This file only exists in memory and network traffic. It is not written to the filesystem. The strings for this dummy file are shown in Figure 33.

Dummy File Strings


Figure 33: Dummy File Strings

This dummy file as seen in the network traffic is shown in Figure 34.

data_exfiltrator_fig34_end_of_transmission
Figure 34: End of Transmission Dummy File

After all of the above is finished, the last action taken by the samples is to run a PowerShell command from a string. This command gets the process ID of the parent process of the command itself. It uses that process ID to kill the malware's process. It then deletes the malware file from the filesystem. This is to clean up after the data exfiltration is complete. The command is executed by a call to CreateProcessA as shown in Figure 35.

PowerShell Cleanup Command


Figure 35: PowerShell Cleanup Command

The full text of the PowerShell command is shown in Figure 36.

data_exfiltrator_fig36_powershell_text
PowerShell Command


Evolving Variants

The earliest observed variant of this malware family was compiled on April 24th, 2021 according to the PE header TimeDateStamp field. It was then first seen in the Titanium Platform on April 25th, 2021. This earliest variant did not include the PowerShell cleanup that was used in the later two variants. The comparison of the older sample 18 and the newer sample 19 analyzed using Relyze 20 is shown in Figure 36.

Addition of PowerShell Cleanup Capability


Figure 36: Addition of PowerShell Cleanup Capability

Another difference between this oldest sample and most of the newer ones is the inclusion of a program database (PDB) path. This path is shown in Figure 37.

PDB Path String


Figure 37: PDB Path String

The two newer samples have very few differences. Code-wise, there is one single function that is ~55% different between the two. The rest of the code is identical or nearly identical. This small difference is shown in Figure 38.

Difference in Newer Samples


Figure 38: Difference in Newer Samples

Another interesting difference between the two newer samples is that the newest sample does not call back to a hostname for C2 communication. It is configured to call back to the C2 URL on a bare IP address.

Related Malware

One function in particular that is found in each of the three samples analyzed above is the string decoding function. Encoding and decoding functions are a good area of code to examine closely and to hunt for in malware repositories. This type of function can be stable across samples and tends to be reused by an adversary in multiple campaigns even if the capabilities of the malware are radically different. Building a YARA rule based on this function reveals an older malware sample 21 which was blogged about by researchers at Walmart 22. The results of a retrohunt in the Titanium Platform using this YARA rule is shown in Figure 39. The result shown in orange is a false positive. This file is a copy of one of the other actual malware files but appears to have been modified by a researcher.

Retrohunt Results


Figure 39: Retrohunt Results

This sample is definitely related to the three samples analyzed here. It uses the same anti-analysis trick as well as the same string obfuscation including the "mOrxxxx" trailing characters. Both the standard string form as well as the stack strings are found in this sample. However, this older sample is a CobaltStrike beacon loader rather than a data exfiltrator. The YARA rule for detecting this code overlap is provided at the end of the blog. As opposed to earlier YARA rules that I have written, I read the advice from Marc Ochsenmeier on Twitter about adding comments with the meaning of opcodes in YARA rules meant for sharing 23. This rule and future rules will include the assembly as well as the bytecode strings.

In addition to this older malware sample that is definitely related, a hunt for other malware that contains variations of the string "mOrxxxx" reveals a multitude of potentially related malware samples. These are nearly uniformly detected as malicious. A future blog will address these additional files. The results of a retrohunt for the full string "mOrxxxx" is shown in Figure 40.

Full String mOrxxxx


Figure 40: Full String mOrxxxx

The results of retrohunts for this string with two Xs and one X are shown in Figure 41 and 42.

Retrohunt for String with Two Xs
Figure 41: Retrohunt for String with Two Xs

Retrohunt for String with One X
Figure 42: Retrohunt for String with One X

Analysis of the results of these retrohunts will be the topic of a future blog post.

IOCs

Sample 1
File

Filename Input.exe
Filename v2c.exe
MD5 1010bec081572dc3bd16e26a1e37d815
SHA1 bfc0219efb60fb270cee0b7b102afc0d4b0a121a
SHA256 dcc4ac1302ac5693875c4a4b193242cbb441b77cd918569c43fe318bcf64fe3d
Import Hash 85ce0801668e488873e72eeb306503da
SSDEEP 768:ycscKP14scGOqEMQcanOPBbEbeFpUGC/YDR5Ws:yV3cGOqEMQcanOJFpUGC/Y9
Timestamp 2021-04-24T17:34:20Z
PDB E:\work\proj\file_sender\x64\file_sender.pdb
Magic PE32+ executable (GUI) x86-64, for MS Windows
File Type Win64 EXE
File Size 35328
First Seen 2021-04-25


User Agent

Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) discord/0.0.309 Chrome/83.0.4103.122 Electron/9.3.5 Safari/537.36


URL

hxxps[://]files[.]pablotech[.]info/uploadFile.php


Hostname

files[.]pablotech[.]info


Domain

Name Registrar IANA ID
pablotech[.]info Hosting Concepts B.V. d/b/a Registrar.eu 1647



Sample 2
File

Filename Input.exe
Filename sender.exe
MD5 e3300ec2f31f5730970c5bb066d2f0ed
SHA1 c768882e102a5dd3d1c17d306698c5cfc3d9d8d5
SHA256 68af250429833d0b15d44052637caec2afbe18169fee084ee0ef4330661cce9c
Import Hash 6473877da5764bbd5a9b16892ef13b69
SSDEEP 768:zp2FXczP/cpWyB/3RtUcGOqEMIcqfz/YghUx:zp2FsTcB/UcGOqEMIcqfz/Yg4
Timestamp 2021-04-28T03:00:08Z
Magic PE32+ executable (GUI) x86-64, for MS Windows
File Type Win64 EXE
File Size 36352
First Seen 2021-05-03


User Agent

Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) discord/0.0.309 Chrome/83.0.4103.122 Electron/9.3.5 Safari/537.36


URL

hxxps[://]figures[.]pablotech[.]info/uploadFile.php


Hostname

figures[.]pablotech[.]info


Domain

Name Registrar IANA ID
pablotech[.]info Hosting Concepts B.V. d/b/a Registrar.eu 1647

 

Sample 3
File

Filename Input.exe
Filename v2c.exe
MD5 4af8b45c9b0f73d47a499d92064b6c2e
SHA1 424f3c281f46e4cf2350c78cfa89a87873e0b994
SHA256 934c557e52bd47fa312ea4098e05781145d0b81c9dc543ef42b266813bdb05d4
Import Hash 6473877da5764bbd5a9b16892ef13b69
SSDEEP 768:9vutX7Qp6CPRp8Yh/ZYWcGOqEMUcgk9/Y7hCeUpU:K7QpJp8YFrcGOqEMUcg0/Y7lk
Timestamp 2021-05-17T20:33:40Z
Magic PE32+ executable (GUI) x86-64, for MS Windows
File Type Win64 EXE
File Size 36352
First Seen 2021-05-18


User Agent

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0


URL

hxxps[://]51.161.82[.]135/uploadFile.php


IP Address

IP Owner ASN
51.161.82[.]135 OVH SAS 16276

 

Sample 4
File

Filename Input.exe
Filename v2.exe
MD5 7c801e3c256d2e9e1f4462fe84e44c68
SHA1 4cd9cecd1d093f290e6f8f0ad6d5e76dbedbf3d9
SHA256 a7cf0f72bb6f1e0a61fbf39e3a3a36db6540250caeef35b47fb51a8959f40984
Import Hash 9f86f12427bca134faaa21bcc0d849d3
SSDEEP 768:vkcGOqEMccVhPO4TrASVqipOHMd6m/YFh50:ccGOqEMccV7rAZipOHA/YFT
Timestamp 2021-05-24T23:06:16Z
PDB E:\work\proj\file_sender\x64\file_sender.pdb
Magic PE32+ executable (GUI) x86-64, for MS Windows
File Type Win64 EXE
File Size 37376
First Seen 2021-06-01


User Agent

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0


URL

hxxps[://]51.161.82[.]135/uploadFile.php


IP Address

IP Owner ASN
51.161.82[.]135 OVH SAS 16276

 

Sample 5
File

Filename file_sender.exe
Filename sender.exe
MD5 12a7595d94e142847a04f11659ed183d
SHA1 f80a2f102ca0297d053c75e0676049dc87cb3f35
SHA256 8cfd554a936bd156c4ea29dfd54640d8f870b1ae7738c95ee258408eef0ab9e6
Import Hash 9f86f12427bca134faaa21bcc0d849d3
SSDEEP 768:sPcGOqEMccNNPayYDcfHyIY2QUy2h08wx:2cGOqEMccNEDuhY2FS84
Timestamp 2021-06-15T10:57:36Z
PDB E:\work\proj\file_sender\x64\file_sender.pdb
Magic PE32+ executable (GUI) x86-64, for MS Windows
File Type Win64 EXE
File Size 36352
First Seen 2021-06-16


User Agent

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0


URL

hxxp[://]51.77.110[.]6/uploadFile.php


IP Address

IP Owner ASN
51.77.110[.]6 OVH SAS 16276

 

YARA Rule

private rule WindowsPE
{
   condition:
      uint16(0) == 0x5A4D and uint32(uint32(0x3C)) == 0x00004550
}

rule DataExfiltrator_Decoder
{
   meta:
      author = "Malware Utkonos"
      date = "2021-05-07"
      description = "String decoding function found in data exfiltration malware."
exemplar = "dcc4ac1302ac5693875c4a4b193242cbb441b77cd918569c43fe318bcf64fe3d"
   strings:
      $a = { 4489442418 // mov dword [rsp+0x18 {arg_18}], r8d
           88542410 // mov byte [rsp+0x10 {arg_10}], dl
           48894c2408 // mov qword [rsp+0x8 {arg_8}], rcx
           4883ec28 // sub rsp, 0x28
           8b442440 // mov eax, dword [rsp+0x40 {arg_18}]
           33d2 // xor edx, edx {0x0}
           b904000000 // mov ecx, 0x4
           48f7f1 // div rcx
           }
   condition:
      WindowsPE and $a
}



References:

1  https://research.checkpoint.com/2020/ransomware-evolved-double-extortion/
2  https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html#_5ol9xlbbnrdn
3  https://github.com/MBCProject/mbc-markdown/tree/master/yfaq
4  https://github.com/MBCProject/mbc-markdown/blob/master/anti-behavioral-analysis/evade-debugger.md
5  https://x64dbg.com/
6  https://medium.com/walmartglobaltech/trickbot-crews-new-cobaltstrike-loader-32c72b78e81c
7  Ibid.
8 Thanks to Sandor Nemes for assistance in understanding this behavior.
9  https://www.youtube.com/watch?v=242Tn0IL2jE&t=1053s
10 https://hexfiend.com/
11 https://en.cppreference.com/w/c/algorithm/bsearch
12 https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-globalmemorystatusex
13 934c557e52bd47fa312ea4098e05781145d0b81c9dc543ef42b266813bdb05d4
14 https://en.wikipedia.org/wiki/Cyclic_redundancy_check
15 https://portswigger.net/burp
16 https://www.inetsim.org/
17 dcc4ac1302ac5693875c4a4b193242cbb441b77cd918569c43fe318bcf64fe3d
18 Ibid.
19 68af250429833d0b15d44052637caec2afbe18169fee084ee0ef4330661cce9c
20 https://www.relyze.com/
21 0234f80c6fd3768f9619d6fcd50d775ec686719fcc665007bfd1606bbe787744
22 https://medium.com/walmartglobaltech/trickbot-crews-new-cobaltstrike-loader-32c72b78e81c
23 https://twitter.com/ochsenmeier/status/1379546812437118980 

Keep learning


Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

More Blog Posts

Do More With Your SOAR

Do More With Your SOAR

Running an SOC is complex — and running without the best tools makes it more difficult. Learn how RL File Enrichment can automate and bolster your SOC.
Read More