In Progress
Unit 1, Lesson 1
In Progress

Memory Management in C Extensions – Part 1

Most Ruby developers will spend years happily writing Ruby code without ever needing to touch native C code. But someday you might find yourself an exception to this rule. Maybe you need to add a feature to a native library binding, or even implement a new library binding from scratch. Or maybe you need to rewrite some CPU-intensive calculations in native code for performance reasons.

Whatever the reason, once you start writing C code, you’ll need to understand the Ruby C APIs for managing memory. In today’s episode, the first of a two-part series, veteran library maintainer Jeremy Evans joins us to demonstrate how to make code C extensions so that they don’t leak memory. Enjoy!

Video transcript & code

One of the nice things about Ruby is that Ruby handles memory for our programs, so we don't have to.

During development, let's say we write a method called doubled, which takes an object and returns the value of multiplying the object by 2.


def doubled(a)
  a * 2
end

When we call this method with the string ab, it returns the string concatenated with itself.


doubled("ab")
# => "abab"

When we call this method with an array containing 1 and 3, it returns the array concatenated with itself.


doubled([1, 3])
# => [1, 3, 1, 3]

The nice part about this is we didn't have to worry about allocating any memory for the returned object, and this method doesn't leak memory, because Ruby will take care of freeing the memory for the returned object after it is no longer referenced. Let's say we call the length method on the object returned by doubled.


doubled([1, 3]).length
# => 4

If we then start garbage collection manually using GC.start, it will free the memory allocated by the doubled method so it can be reused.


doubled([1, 3]).length
# => 4
GC.start
# object allocated by doubled has been freed

Unfortunately, there may come a time when we can't use just Ruby to solve our programming problem. Maybe we need to interact with a library written in C?


Integrate with C Libaries

Or maybe we are trying to optimize our code and are unable to get sufficient performance from a pure Ruby implementation.


Optimize!

In either case, we'll probably need to use a C extension, and integrate with Ruby using Ruby's C-API. In our C extension library, we start by including the ruby.h header file.


#include 

As C-APIs go, Ruby's is fairly nice, but programming in C is whole different ballgame than programming in Ruby in terms of memory management. Let's write a C doubled_str function that is similar to the Ruby doubled method, but just works for strings. This method will take a pointer to the start of the string, and the length of the string.


char * doubled_str(char *string, long length)

The first thing we need to worry about is having to allocate space for the new string using the malloc function. Note that this C code is only for example purposes, so it skips checking for integer overflow and allocation failure.


char * doubled_str(char *string, long length) {
  char *doubled_string = malloc(length * 2);
}

Then we have to copy the data from the old string into the new string, twice, and finally we return the new string. Like the Ruby doubled method, this function allocates memory, but since Ruby isn't managing the memory, unless we manually free the allocated memory later, this function leaks memory.


char * doubled_str(char *string, long length) {
  char *doubled_string = malloc(length * 2);
  memcpy(doubled_string, string, length);
  memcpy(doubled_string+length, string, length);
  return doubled_string;
}

We don't want to leak memory, but maybe we also don't want to be responsible for freeing the memory manually. We can use Ruby's C-API to help manage the memory using the rb_str_buf_new function. rb_str_buf_new takes the length of the string in bytes and will create a Ruby string to hold the memory, and Ruby will free the memory during garbage collection after the string is no longer referenced


VALUE rb_str_buf_new(long);

So we want to modify the doubled_str function to use rb_str_buf_new. We first change the return type from a character pointer to VALUE, which is the C type used for all Ruby objects, including strings.


VALUE doubled_str(char *string, long length) {
  char *doubled_string = malloc(length * 2);
  memcpy(doubled_string, string, length);
  memcpy(doubled_string+length, string, length);
  return doubled_string;
}

We then declare the Ruby string object we'll be returning. This string object will wrap the managed memory. Then we remove the malloc function call.


VALUE doubled_str(char *string, long length) {
  VALUE doubled_string_wrapper;
  char *doubled_string;

  memcpy(doubled_string, string, length);
  memcpy(doubled_string+length, string, length);
  return doubled_string;
}

We then use rb_str_buf_new to create the Ruby string that will be managing this memory as well as allocate space for the C string we will be using. Then we use RSTRING_PTR to get the pointer to the allocated memory. Thankfully, we don't need to change the memcpy function calls.


VALUE doubled_str(char *string, long length) {
  VALUE doubled_string_wrapper;
  char *doubled_string;

  doubled_string_wrapper = rb_str_buf_new(length * 2);
  doubled_string = RSTRING_PTR(doubled_string_wrapper);
  memcpy(doubled_string, string, length);
  memcpy(doubled_string+length, string, length);
  return doubled_string;
}

Finally, we switch to returning the Ruby string instead of the C pointer. When Ruby determines there are no more references to the string during garbage collection, it will free the allocated memory for us.


VALUE doubled_str(char *string, long length) {
  VALUE doubled_string_wrapper;
  char *doubled_string;

  doubled_string_wrapper = rb_str_buf_new(length * 2);
  doubled_string = RSTRING_PTR(doubled_string_wrapper);
  memcpy(doubled_string, string, length);
  memcpy(doubled_string+length, string, length);
  return doubled_string_wrapper;
}

When we create a Ruby string object, Ruby makes the object accessible via ObjectSpace. We don't want to have the wrapper for the allocated memory modified, so if we are paranoid, we can freeze the string via rb_obj_freeze.


VALUE doubled_str(char *string, long length) {
  VALUE doubled_string_wrapper;
  char *doubled_string;

  doubled_string_wrapper = rb_str_buf_new(length * 2);
  rb_obj_freeze(doubled_string_wrapper);
  doubled_string = RSTRING_PTR(doubled_string_wrapper);
  memcpy(doubled_string, string, length);
  memcpy(doubled_string+length, string, length);
  return doubled_string_wrapper;
}

Before freezing the string, we should tell Ruby that we will be using the entire capacity of the memory that we allocated, so we set the length of the string to the allocated capacity using rb_str_set_len.


VALUE doubled_str(char *string, long length) {
  VALUE doubled_string_wrapper;
  char *doubled_string;

  doubled_string_wrapper = rb_str_buf_new(length * 2);
  rb_str_set_len(doubled_string_wrapper, length * 2);
  rb_obj_freeze(doubled_string_wrapper);
  doubled_string = RSTRING_PTR(doubled_string_wrapper);
  memcpy(doubled_string, string, length);
  memcpy(doubled_string+length, string, length);
  return doubled_string_wrapper;
}

In some cases, we may not be able to use rb_str_buf_new to have Ruby manage our memory, because a C function will do the memory allocation. For example, the PostgreSQL C library has a PQunescapeBytea function that allocates memory and returns a pointer to the allocated memory, and makes the caller responsible for freeing the memory using the PQfreemem function after the caller has finished using it.


extern unsigned char *PQunescapeBytea(const unsigned char *strtext,
        size_t *retbuflen);

Let's say we want to call the doubled_str function with the pointer returned by PQunescapeBytea function, and want to free the pointer after use to avoid leaking memory. We can write a doubled_unescaped_str function for this. We start off declaring the variables we will be using, the ruby string wrapper, a pointer to the unescaped string, and the length of the unescaped string.


VALUE doubled_unescaped_str(char *escaped_string) {
  VALUE doubled_string_wrapper;
  char *unescaped_string;
  long unescaped_length;
}

We'll call the PQunescapeBytea function with the escaped string, and have it return the unescaped string and set the unescaped length.


VALUE doubled_unescaped_str(char *escaped_string) {
  VALUE doubled_string_wrapper;
  char *unescaped_string;
  long unescaped_length;

  unescaped_string = PQunescapeBytea(escaped_string, &unescaped_length);
}

Then we just need to double that string using our already written doubled_str function.


VALUE doubled_unescaped_str(char *escaped_string) {
  VALUE doubled_string_wrapper;
  char *unescaped_string;
  long unescaped_length;

  unescaped_string = PQunescapeBytea(escaped_string, &unescaped_length);
  doubled_string_wrapper = doubled_str(unescaped_string, unescaped_length);
}

Finally, we can free the temporary memory using PQfreemem, then return our wrapper.


VALUE doubled_unescaped_str(char *escaped_string) {
  VALUE doubled_string_wrapper;
  char *unescaped_string;
  long unescaped_length;

  unescaped_string = PQunescapeBytea(escaped_string, &unescaped_length);
  doubled_string_wrapper = doubled_str(unescaped_string, unescaped_length);
  PQfreemem(unescaped_string);
  return doubled_string_wrapper;
}

It would be nice if things were that simple. Unfortunately, this has a nasty corner case. rb_str_buf_new, like most Ruby C-API functions, can raise Ruby exceptions. When this happens when using the C-API, the effect is the same as in Ruby, where normal control flow stops, and control passes to the nearest rescue or ensure block.


VALUE rb_str_buf_new(long);

Going back to our doubled_unescaped_str example, if an exception is raised inside doubled_str, the unescaped_string is not freed, and the result is a memory leak.


VALUE doubled_unescaped_str(char *escaped_string) {
  VALUE doubled_string_wrapper;
  char *unescaped_string;
  long unescaped_length;

  unescaped_string = PQunescapeBytea(escaped_string, &unescaped_length);

  /* If exception raised here */
  doubled_string_wrapper = doubled_str(unescaped_string, unescaped_length);

  /* Memory is not freed here */
  PQfreemem(unescaped_string);

  return doubled_string_wrapper;
}

To ensure that the memory is freed, we need to call the PQfreemem function in the equivalent of a Ruby ensure block, so that it is always called even if an exception is raised. Fortunately, there is a Ruby C-API function for that, called rb_ensure.


VALUE rb_ensure(VALUE(*)(ANYARGS),VALUE,VALUE(*)(ANYARGS),VALUE); 

Unfortunately, like many things in C, it's much more cumbersome than similar code in Ruby. The first argument is a function pointer for the function with the code we want to run, and the second argument is a single argument to pass to that function. Think of the first argument as the begin section of a begin/ensure block.

The third and fourth arguments are similar, but the third argument is for the function to always call, even if calling the first argument raises an exception. Think of the third argument as the ensure section of a begin/ensure block.

Let's use rb_ensure in our doubled_unescaped_str function. We'd like to be able to use doubled_str as the first argument to rb_ensure, but unfortunately, doubled_str takes two arguments, and rb_ensure only supports functions that accept a single argument.


VALUE doubled_unescaped_str(char *escaped_string) {
  VALUE doubled_string_wrapper;
  char *unescaped_string;
  long unescaped_length;

  unescaped_string = PQunescapeBytea(escaped_string, &unescaped_length);

  /* doubled_str takes 2 arguments! */
  rb_ensure(doubled_str, ???, PQfreemem, unescaped_string);

  return doubled_string_wrapper;
}

To work around that, we need to use a wrapper function and a C struct for the arguments to the doubled_str function. Let's start with the struct. The struct only needs to contain the arguments passed to doubled_str.


struct doubled_str_args {
  char *string;
  long length;
};

Then let's add the wrapper function. I'm not feeling particular creative in terms of function naming, so I'm calling this doubled_str2. It takes a pointer to the doubled_str_args struct we defined, and just calls the doubled_str function with the two members of the struct.


struct doubled_str_args {
  char *string;
  long length;
};

VALUE doubled_str2(struct doubled_str_args *args) {
  return doubled_str(args->string, args->length);
}

We then modify the doubled_unescaped_str function to use the doubled_str_args struct, having PQunescapeBytea update the members of the struct.


struct doubled_str_args {
  char *string;
  long length;
};

VALUE doubled_str2(struct doubled_str_args *args) {
  return doubled_str(args->string, args->length);
}

VALUE doubled_unescaped_str(char *escaped_string) {
  struct doubled_str_args args;

  args.string = PQunescapeBytea(escaped_string, &(args.length));
 
}

We can then use rb_ensure correctly, using the doubled_str2 function and the address of our doubled_str_args struct, and returning the VALUE of rb_ensure, which will be the object returned by doubled_str.


struct doubled_str_args {
  char *string;
  long length;
};

VALUE doubled_str2(struct doubled_str_args *args) {
  return doubled_str(args->string, args->length);
}

VALUE doubled_unescaped_str(char *escaped_string) {
  struct doubled_str_args args;

  args.string = PQunescapeBytea(escaped_string, &(args.length));

  return rb_ensure(doubled_str2, &args, PQfreemem, args.string);
}

One thing to note here is that the rb_ensure arguments are supposed to be functions that both accept and return a single VALUE, or Ruby object, but we're calling the functions with arguments that are not Ruby objects. This works because the arguments are pointers, and VALUE is the same size as a pointer. It will cause compiler warnings, but those can be avoided by manually casting the types to VALUE.


/* definition */
rb_ensure(VALUE(*)(ANYARGS),VALUE,VALUE(*)(ANYARGS),VALUE); 

/* use */
rb_ensure(doubled_str2, &args, PQfreemem, args.string);

Today we've gone over a couple ways to help avoid memory leaks in Ruby C extensions. We've used rb_str_buf_new to have Ruby manage some memory for us, and used rb_ensure to make sure we free memory appropriately when calling Ruby methods.


This Episode: Avoiding Memory Leaks

Next episode we'll tackle a separate but related memory management issue in C extensions, which is how to ensure that Ruby doesn't free memory while we are still using it.

`
Next Episode: Avoiding Use-After-Free

Responses