Rust `struct` conversion to and from C `void*`

Rust `struct` conversion to and from C `void*`

I thought it would be a good idea to write a PostgreSQL foreign data wrapper (FDW) in Rust with PGX. My top three reasons:

  • Learn how to write an FDW
  • Get deeper into Rust programming
  • Have some fun

Turns out things are taking much longer than I assumed. They always do. And here is a particular problem I stumbled over.

Data-go-around

Numerous times data has to move from one C-function to another. While knowing data structures when producing and consuming, intermediate storage often needs to be type agnostic. In C, this typically happens by using the most generic pointer mankind ever thought of (and every aspiring programmer stumbles over): void*.

C can make this conversion easy. Well, it also makes it easy to shoot yourself in the foot: just cast that damn thing. Got a char* something? (void*)something does the trick. In a naïve nutshell, the two main things to consider are memory boundaries & type sizes.

For instance, consider casting unsigned char* to void*. And then back to unsigned long*. This is perfectly doable but unlikely to be a good idea. You will very likely get garbled data the moment you dereference the last one.

But, I got off the point talking about PostgreSQL FDWs in Rust. Let's assume the following Rust struct:

struct PlanState {
    pub filename: string::String
}

Which is, of course, extensible to whatever you need. At that current point of the implementation, I just need a filename and I need to store it in baserel->fdw_private of type void*.

Part 1: Convert Something to void*

How to get this into a contiguous copyable piece of memory? I am going to pick Serde in conjunction with Rust Message Pack (RMP). It might be a little too much, but I also do not know yet what else will be in that struct and don't want to bother. The final "contiguous copyable piece of memory" will be a C-string. Hence we get the following code to serialize:

let mut encoded = rmp_serde::encode::to_vec(self).unwrap();
encoded.push('\0' as u8);

Note, that we need to add an additional '\0' to properly terminate the C-string. Next, we copy this piece of memory into a PostgreSQL memory context. Luckily, PGX got us covered here:

let raw_pg = CurrentMemoryContext.copy_ptr_into(
    encoded.as_mut_ptr(),
    encoded.len());

We are almost done. However, raw_pg is of type *mut u8 but we need void*. Here, this is not a big deal since the memory location is correct already. Thus, we can just cast and store:

(*baserel).fdw_private = raw_pg as *mut std::ffi::c_void;

(Tricky) Part 2: void* Back to Something

How to get back from *mut std::ffi::c_void to Vec<u8> or [u8] so that we can deserialize it? Well, we did store a C-string, and highly likely, FFI has a C-string-type. It does, it even got two: CStr and CString (basically equivalent to str and String in Rust).

Since our Rust struct uses String, it is tempting to go with CString. Type similarity and such. Even more tempting: CString has a from_raw(ptr: *mut c_char). Let's try:

// raw: *mut std::ffi::c_void
let raw_str = unsafe {
    std::ffi::CString::from_raw(raw as *mut c_char)
};
let decoded = rmp_serde::decode::from_slice(
    raw_str.as_bytes()).unwrap();

If I would have to do this again, I would not take any bets on this to work. Because it does not. It will crash during deserialization. What is happening?

The LLDB backtrace reveals, that it tries to free the CString. Furthermore, it crashes there because of some "invalid pointer."

Let us go back a bit and think about (one of the) golden rules of computer engineering (and life, probably): RTFM. For CString, the docs state on from_raw:

Retakes ownership of a CString that was transferred to C via CString::into_raw.

Did we ever call into_raw? No. What a recipe for disaster.

CStr to the rescue

The solution is to use CStr here. It defines from_ptr<'a>(ptr: *const c_char) and the manual states:

Wraps a raw C string with a safe C string wrapper.

This function will wrap the provided ptr with a CStr wrapper, which allows inspection and interoperation of non-owned C strings. [...]

Just what need, so we are going to plug this in:

// raw: *mut std::ffi::c_void
let raw_str = unsafe {
    std::ffi::CStr::from_ptr(raw as *const c_char)
};
let decoded = rmp_serde::decode::from_slice(
    raw_str.to_bytes()).unwrap();

Compiles, runs, and - more importantly - works.

What about the '\0'?

Ah, yeah, the string terminator. to_bytes takes care of this, as it will remove the terminating character:

The returned slice will not contain the trailing nul terminator that this C string has.

That's it, we can now go Rust struct -> C void* -> Rust struct.


Photo by eyeball3000 from Pexels