My RCE PoC walkthrough for (CVE-2021–21974) VMware ESXi OpenSLP heap-overflow vulnerability

Introduction

During a recent engagement, I discovered a machine that is running VMware ESXi 6.7.0. Upon inspecting any known vulnerabilities associated with this version of the software, I identified it may be vulnerable to ESXi OpenSLP heap-overflow (CVE-2021–21974). Through googling, I found a blog post by Lucas Leong (@_wmliang_) of Trend Micro’s Zero Day Initiative, who is the security researcher that found this bug. Lucas wrote a brief overview on how to exploit the vulnerability but share no reference to a PoC. Since I couldn’t find any existing PoC on the internet, I thought it would be neat to develop an exploit based on Lucas’ approach. Before proceeding, I highly encourage fellow readers to review Lucas’ blog to get an overview of the bug and exploitation strategy from the founder’s perspective.

Setup

To setup a test environment, I need a vulnerable copy of VMware ESXi for testing and debugging. VMware offers trial version of ESXi for download. Setup is straight forward by deploying the image through VMware Fusion or similar tool. Once installation is completed, I used the web interface to enable SSH. To debug the ‘slpd’ binary on the server, I used gdbserver that comes with the image. To talk to the gdbserver, I used SSH local port forwarding:

On the ESXi server, I attached gdbserver to ‘slpd’ as follow:

Lastly, on my local gdb client, I connected to the gdbserver with the following command:

Service Location Protocol

The Service Location Protocol is a service discovery protocol that allows connecting devices to identify services that are available within the local area network by querying a directory server. This is similar to a person walking into a shopping center and looking at the directory listing to see what stores is in the mall. To keep this brief, a device can query about a service and its location by making a ‘service request’ and specifying the type of service it wants to look up with an URL.

For example, to look up the VMInfrastructure service from the directory server, the device will make a request with ‘service:VMwareInfrastructure’ as the URL. The server will respond back with something like ‘service:VMwareInfrastructure://localhost.localdomain’.

A device can also collect additional attributes and meta-data about a service by making an ‘attribute request’ supplying the same URL. Devices that want to be added to the directory can submit a ‘service registration’. This request will include information such as the IP of the device that is making the announcement, the type of service, and any meta-data that it wants to share. There are more functions the SLP can do, but the last message type I am interested in is the ‘directory agent advertisement’ because this is where the vulnerability is at. The ‘directory agent advertisement’ is a broadcast message sent by the server to let devices on the network know who to reach out if they wanted to query about a service and its location. To learn more about SLP, please see this and that.

SLP Packet Structure

While the layout of the SLP structure will be slightly different between different SLP message types, they generally follow a header + body format.

A ‘service request’ packet looks like this:

An ‘attribute request’ packet looks like this:

A ‘service registration’ packet looks like this:

Lastly, a ‘directory agent advertisement’ packet looks like this:

The Bug

As noted in Lucas’ blog, the bug is in the ‘SLPParseSrvURL’ function, which gets called when a ‘directory agent advertisement’ message is being process.

On line 18, the length of the URL is added with the number 0x1d to form the final size to ‘calloc’ from memory. On line 22, the ‘strstr’ function is called to seek the position of the substring “:/” within the URL. On line 28, the content of the URL before the substring “:/” will be copied into the newly ‘calloced’ memory from line 18.

Another thing to note is that the ‘strstr’ function will return 0 if the substring “:/” does not exists or if the function hits a null character.

I speculated VMware test case only tried ‘scopes’ with a length size below 256. If we look at the following ‘directory agent advertisement’ layout snippet, we see sample 1’s length of ‘scopes’ includes a null byte. This null byte accidentally acted as the string terminator for ‘URL’ since it sits right after it. If the length of ‘scopes’ is above 256, the hex representation of the length will not have a null byte (as in sample 2), and therefore the ‘strstr’ function will read passed the ‘URL’ and continue seeking the substring “:/” in ‘scopes’.

Therefore, the ‘memcpy’ call will lead to a heap overflow because the source contains content from‘URL’ + part of ‘scopes’ while the destination only have spaces to fit ‘URL’.

SLP Objects

Here I will go over the relevant SLP components as they serve as the building blocks for exploitation.

_SLPDSocket

All client that connects to the ‘slpd’ daemon will create a ‘slpd-socket’ object on the heap. This object contains information on the current state of the connection, such as whether it is in a reading state or writing state. Other important information stored in this object includes the client’s IP address, the socket file descriptor in-use for the connection, pointers to ‘recv-buffer’ and ‘send-buffer’ for this specific connection, and pointers to ‘slpd-socket’ object created from prior and future established connections. The size of this object is fixed at 0xd0, and cannot be changed.

_SLPDSocket structure from OpenSLP source code
memory layout for a _SLPDSocket object

_SLPBuffer

All SLP message types received from the server will create at least two SLPBuffer objects. One is called ‘recv-buffer’, which stores the data received by the server from the client. Since I can control the size of the data I send from the client, I can control the size of the ‘recv-buffer’. The other SLPBuffer object is called ‘send-buffer’. This buffer stores the data that will be send from the server to client. The ‘send-buffer’ have a fixed size of 0x598 and I cannot control its size. Furthermore, the SLPBuffer have meta-data properties that points to the starting, current, and ending position of said data.

_SLPBuffer from OpenSLP source code
memory layout for a _SLPBuffer object

SLP Socket State

The SLP Socket State defines the status for a particular connection. The state value is set in the _SLPSocket object. A connection will either be calling ‘recv’ or ‘send’ depending on the state of the socket.

Socket states constants defined in OpenSLP source code

It is important to understand the properties of _SLPSocket, _SLPBuffer and Socket States because the exploitation process requires modifying those values.

Objectives, Expectations and Limitations

This section goes over objectives required to land a successful exploitation.

Objective 1

Achieve remote code execution by leveraging the heap overflow to overwrite the ‘__free_hook’ to point to shellcode or ROP chain.

Expectation 1

If I can overwrite the ‘position’ pointers in a _SLPBuffer ‘recv-buffer’ object, I can force incoming data to the server to be written to arbitrary memory location.

Objective 2

In order to know the address of ‘__free_hook’, I have to leak an address referencing the libc library.

Expectation 2

If I can overwrite the ‘position’ pointers in a _SLPBuffer ‘send-buffer’ object, I can force outgoing data from the server to read from arbitrary memory location.

Now that I defined goals and objectives, I have to identify any limitations with the heap overflow vector and memory allocation in general.

Limitations

  1. ‘URL’ data stored in the “Directory Agent Advertisement’s URL” object cannot contain null bytes (due to the ‘strstr’ function). This limitation prevents me from directly overwriting meta-data within an adjacent ‘_SLPDSocket’ or ‘_SLPBuffer’ object because I would have to supply an invalid size value for the objects’ heap header before reaching those properties.
  2. The ‘slpd’ binary allocates ‘_SLPDSocket’ and ‘_SLPBuffer’ objects with ‘calloc’. The ‘calloc’ call will zero out the allocated memory slot. This limitation removes all past data of a memory slot which could contain interesting pointers or stack addresses. This looks like a show stopper because if I was to overwrite a ‘position’ pointer in a _SLPBuffer, I would need to know a valid address value. Since I don’t know such value, the next best thing I can do is partially overwrite a ‘position’ pointer to at least get me in a valid address range that could be meaningful. With ‘calloc’ zeroing everything out, I lose that opportunity.

Fortunately, not all is lost. As shared in Lucas’ blog post, I can still get around the limitations.

Limitations Bypass

  1. Use the heap overflow to partially overwrite the adjacent free memory chunk’s size to extend it. By extending the free chunk, I can have it position to overlap with its neighbor ‘_SLPDSocket’ or ‘_SLPBuffer’ object. When I allocate memory that occupies the extended free space, I can overwrite the object’s properties.
  2. The ‘calloc’ call will retain past data of a memory slot if it was previously marked as ‘IS_MAPPED’ when it was still freed. The key thing is the ‘calloc’ call must request a chunk size that is an exact size as the freed slot with ‘IS_MAPPED’ flag enabled to preserve its old data. If a ‘IS_MAPPED’ freed chunk is splitted up by a ‘calloc’ request, the ‘calloc’ will service a chunk without the ‘IS_MAPPED’ flag and zero out the slot’s content.

There is still one more catch. Even if I can mark arbitrary position to store or read data for the _SLPBuffer, the ‘slpd’ binary will not comply unless associated socket state is set to the proper status. Therefore, the heap overflow will also have to overwrite the associated _SLPDSocket object’s meta-data in order to get arbitrary read and write primitive to work.

Heap Grooming

This sections goes over the heap grooming strategy to achieve the following:

The Building Blocks

Before I go over the heap grooming design, I want to say a few words about the purpose of the SLP messages mentioned earlier in fitting into the exploitation process.

service request — primarily use for creating a consecutive heap layout and holes.

directory agent advertisement — use to trigger the heap overflow vector to overwrite into the next neighbor memory block.

service registration — store user controlled data into the memory database which will be retrieved through the ‘attribute request’ message. This message is solely to set up ‘attribute request’ and is not used for the purpose of heap grooming.

attribute request — pull user controlled data from the memory database. Its purpose is to create a ‘marker’ that can be used to identify current position during the information leak stage. Also, the dynamic memory use to store the user controlled data can be a good stack pivot spot with complete user controllable content.

Overwrite _SLPBuffer ‘send-buffer’ object (Arbitrary Read Primitive)

(1). Client A, B, and C create connections to server. Client A sends ‘service request’ message. Client D creates connection and sends ‘service request’ message. Client B sends ‘service request’ message.
(2). Close client D’s connection.
(3). Client E creates a connection and sends an ‘attribute request’ message.
(4). Client E’s ‘send-buffer’ will go through reallocation because the data is too large.
(5). Client E’s connection is still intact and not closed, however, the ‘message’ object is now freed.
(6). Client G and H creates connection to server. Client C will now send a ‘service request’ to fill the hole left by Client E’s ‘send-buffer’ reallocation and freed ‘message’.
(7). Close client B’s connection.
(8). Client F creates connection to server and sends a ‘directory agent advertisement’ message. This leaves a freed 0x100 size chunk right after the ‘URL’ object for extension and overlapping.
(9). The ‘URL’ object extended its neighboring freed chunk size from 0x100 to 0x120. The server will free the allocated objects initiated by client F. It can be observed that all objects related to client F are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(10). Client G sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client G’s ‘recv-buffer’ object. This object overlaps with client E’s ‘send-buffer’, which can now overwrite the ‘position’ pointers in it.
(11). Client J creates connection to server and sends a ‘service request’ message. Its purpose is to fill up the hole left by client F’s ‘directory agent advertisement’ message.
(12). Close client A’s connection.
(13). Client I creates connection to server and sends a ‘directory agent advertisement’ message.
(14). The ‘URL’ object extended its neighboring freed chunk size from 0x100 to 0x140. The server will free the allocated objects initiated by client I. It can be observed that all objects related to client I are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(15). Client H’s sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client H’s ‘recv-buffer’ object. This object overlaps with client E’s ‘slpd-socket’, which can now overwrite the properties in it.

Overwrite _SLPBuffer ‘recv-buffer’ object (Arbitrary Write Primitive)

(1). Client A creates connection to server and sends ‘service request’ message. Client B creates connection only. Client C creates connection and sends ‘service request’ message. Client B now sends ‘service request’ message. Client D and E create connections to server.
(2). Close client C’s connection.
(3). Client F creates connection to server and sends a ‘directory agent advertisement’ message. This leaves a freed 0x100 size chunk right after the ‘URL’ object for extension and overlapping.
(4). The ‘URL’ object extended its neighboring freed chunk size from 0x100 to 0x140. The server will free the allocated objects initiated by client F. It can be observed that all objects related to client F are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(5). Client E sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client E’s ‘recv-buffer’ object. This object overlaps with client B’s ‘recv-buffer’, which can now overwrite the ‘position’ pointers in it.
(6). Client G creates connection to server and sends a ‘service request’ message. Its purpose is to fill up the hole left by client F’s ‘directory agent advertisement’ message.
(7). Close client A’s connection.
(8). Client H creates connection to server and sends a ‘directory agent advertisement’ message. This leaves a freed 0x100 size chunk right after the ‘URL’ object for extension and overlapping.
(9). The ‘URL’ object extends its neighboring freed chunk from 0x100 to 0x140. The server will free the allocated object initiated by client H. It can be observed that all objects related to client H are freed and consolidated. The ‘URL’ object is freed as well, but because its size fits in the fast-bin, the ‘URL’ object did not get coalesced.
(10). Client D sends a ‘service request’ message. The first-fit algorithm will assign the extended free block to client D’s ‘recv-buffer’ object. This object overlaps with client B’s ‘slpd-socket’, which can now overwrite the properties in it.

The above visual heap layouts is created with villoc.

Exploitation Strategy Walkthrough

It is best to look at the exploit code along with following the below narration to understand how the exploit works.

  1. Client 1 sends a ‘directory agent advertisement’ request to prepare for any unexpected memory allocation that may happen for this particular request. I observed the request makes additional memory allocation when the ‘slpd’ daemon is run on startup but does not when running it through /etc/init.d/slpd start. Any unexpected memory allocation would eventually be freed and end up on the freelist. The assumptions is these unique freed slots will be used again by future ‘directory agent advertisement’ messages as long as I do not explicitly allocate memory that would hijack them.
  2. Clients 2–5 makes a ‘service request’ with each receiving buffer having a size of 0x40. This is to fill up some initial freed slots that exists on the freelist. If i don’t occupy these freed slot, it would hijack future ‘URL’ memory allocation for future ‘directory agent advertisement’ message and ruin the heap grooming.
  3. Clients 6–10 sets up client 7 to send the ‘service registration’ message to the server. The server only accepts ‘service registration’ message originating from localhost, therefore client 7’s ‘slpd-socket’ needs to be overwritten to have its IP address updated. Once the message is sent, client 7’s socket object will be updated again to hold the listening file descriptor to handle future incoming connection. If this step is skipped, future clients cannot establish connection with the server.
  4. Clients 11–21 sets up the arbitrary read primitive by overwriting client 15’s ‘send-buffer’ position pointers. Since I have no knowledge of what addresses to leak in the first place, I will perform a partial overwrite of the last two significant bytes of the ‘start’ position pointer with null values. This requires setting up the extended free chunk to be marked ‘IS_MAPPED’ to avoid getting zeroed out by the ‘calloc’ call. The ‘send-buffer’ that gets updated belongs to the ‘attribute request’ message. As I have no visibility to how much data will be leaked, I can get a ballpark idea of where the leak is at by including a marker value as part of the ‘service registration’ message noted in step 3. If the leaked content contains the marker, I know it is leaking data from the ‘attribute request’ ‘send-buffer’ object. This tells me it is about time to stop reading from the leak. Lastly, I have to update client 15’s ‘slpd-socket’ to have its state to be in ‘STREAM_WRITE’, which will makes the ‘send’ call to my client.
  5. I was able to collect heap addresses and libc addresses from the leak which I can derive everything else. My goal is to overwrite libc’s __free_hook with libc’s system address. I will need a gadget to position my stack at a location that won’t be subject to alteration by the application. I found a gadget from libc-2.17.so that will stack lift the stack address by 0x100.
  6. With the collected libc address, I can calculate the libc environment address which stores the stack address. I use clients 22–31 to setup the arbitrary read primitive to leak the stack address. I have to update client 25's file descriptor in the ‘slpd-socket’ to hold the listening file descriptor.
  7. Clients 32–40 sets up the arbitrary write primitive. This requires overwriting client 33’s ‘recv-buffer’ object’s position pointers. It first stores shell commands into client 15’s ‘send-buffer’ object, which is a large slab of space under my control. It then writes the libc’s system address, a fake return address, and the address of the shell command onto the predicted stack location after stack lifting is performed. Afterwards, it overwrites libc’s __free_hook to hold the stack lifting gadget address. Lastly, each arbitrary write requires updating the corresponding ‘slpd-socket’ object state to ‘STREAM_READ’. If this step is skipped, the server will not accept the overwritten values for the position pointers.
  8. The desired shell commands will be executed once all the above steps are completed.

Final Remark

I enjoyed implementing this exploit very much and learned a few things when writing it. One of the biggest thing I learn is never make an assumption and should always test an idea out. When I was trying to get the leaking data part of the exploit code to work, I was preparing to implement it the way Lucas described in his blog, which seems slightly complicated. I was curious as to why I can’t just flip the socket object’s state to ‘STREAM_WRITE’ which send the data back to me. After reviewing the OpenSLP code, I understand the problem and see why Lucas came up with his particular solution. Nevertheless, I still wanted to see what happens if I just flip the state on the socket object, and to my disbelief, the daemon did send me the leaked data immediately without going through the additional hurdles. Another take away is when doing any heap grooming design, it is best to work it backward from how I want the heap to look in its finished form, and back track the layout to the beginning.

The PoC should work out of the box against VMware ESXi 6.7.0 build-14320388, which is the trial version. I was able to get it to work 14 out of 15 tries.