Adventures in C++: Disappearing Strings

The first project for my Distributed Systems class must be done in C++. The majority of my coding efforts, however, have been in C, Objective-C, or Java. And while this project could have easily been hacked together using, essentially, only C, I decided to take the three weeks we had for the project to learn to write the best C++ solution I could.

After a few days of fumbling around with C++’s syntax and reading through countless online tutorials, though, I realized that there was lots of conflicting advice about the best ways to write C++ scattered about the web. Mired in lots of syntax, terminology, and design choices I didn’t understand, I ordered Effective C++, by Scott Meyers. After a few hours of reading, lots of things suddenly clicked, and I started to lay down some code.

The Bug

In a day or so, I found myself with a simple, seemingly well-structured prototype. Unfortunately, it was a prototype that would blow up with a segmentation fault whenever I tried to run it. So I fired up gdb. And, strangely, the program was crashing when an instance of a class, Password, was just trying to access value_, one of its member variables:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xfffffffffffffff8
0x00007fff84b510aa in std::string::_Rep::_M_grab ()

(gdb) info stack
#0  0x00007fff84b510aa in std::string::_Rep::_M_grab ()
#1  0x00007fff84b51171 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string ()
#2  0x00000001000029a0 in esd::Password::get_value (this=0x1001000a0) at password.h:19

This seemed really weird, because get_value() was declared like this:

std::string get_value() const { return value_; }

with value_ itself declared as:

const std::string value_;

and, as all const member variables should be, value_ was set in Password’s only constructor’s initialization list:

Password::Password(const std::string& value) : value_(value) {}

It was easy to see that the string passed in to the constructor was, indeed, quite legitimate:

Breakpoint 1, esd::Password::Password (this=0x1001000a0, value=@0x7fff5fbff490) at password.cc:40
(gdb) p value
$1 = (const string &) @0x7fff5fbff490: {
    static npos = 18446744073709551615, 
    _M_dataplus = {
        <std::allocator<char>> = {
            <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
        members of std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Alloc_hider: 
        _M_p = 0x100100098 "AA"
    }
}

and that, immediately after the execution of the constructor, the object in value_ was a perfect copy of that passed-in string:

(gdb) p value_
$2 = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
        <std::allocator<char>> = {
            <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
        members of std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Alloc_hider: 
        _M_p = 0x100100098 "AA"
    }
}

But gdb also revealed that, at the time get_value() was invoked, value_ wasn’t looking so great:

(gdb) frame 2
#2  0x00000001000029a0 in esd::Password::get_value (this=0x1001000a0) at password.h:19
(gdb) p value_
$3 = {
    static npos = 18446744073709551615, 
    _M_dataplus = {
        <std::allocator<char>> = {
            <__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
        members of std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Alloc_hider: 
        _M_p = 0x0
    }
}

Specifically, _M_dataplus’s _M_p had gone from 0x100100098, which contained "AA", to 0x0, which, well, didn’t contain anything.

In other words, based on my knowledge of C++, it appeared as though the sequence of events leading up to the crash was as follows:

  1. The Password::Password(const std::string&) constructor is invoked with a reference to a std::string.

  2. The constructor’s initialization list code passed that reference to std::string(const std::string& str), std::string’s copy constructor, and placed the resulting copy in the const value_ member variable.

  3. At some later point, value_, which, as noted above, was constant, is somehow transformed such that the actual string data it contains goes from the correct, original value ("AA") to an incorrect value (0x0).

  4. After value_ has been mangled, someone invokes Password::get_value(), which is an implicitly inlined accessor method defined to return value_.

  5. Password::get_value() is defined to return a std::string, so the string copy constructor is invoked to make the copy of value_ that will be returned.

  6. The string copy constructor causes a segmentation fault, because value_ is no longer a valid string.

The task, then, was to figure out what happened in Step 3: how did a const string, copied from another string on initialization, end up becoming invalid?

The Frustration

I worked hard on tracking down this issue for two days, and considered a wide range of theories. Did I not understand how initializer lists worked? Was I failing to grasp the meaning of copy constructors? Should I be storing pointers to strings instead of the strings themselves?

I thought that, after reading Effective C++, I had formed a correct basic understanding of how the language worked, but each of these theories seemed to invalidate that understanding. I tried to keep in mind, however, that every good, hard bit of learning I had ever accomplished had felt just as frustrating as this at times. Whenever I tried something new, I would start out knowing that I knew nothing, move to knowing enough to put together a toy example, learn enough that I thought I had a solid foundation, and then, finally, work through a series of really difficult problems in which I found out that my foundation was missing pieces or put together slightly wrong. I was in this last portion, and I knew it. My only way out was to work as hard as I could at figuring out what I wasn’t seeing.

Once I had reread all of the relevant Google results and applicable chapters in Effective C++, though, I still didn’t feel like I was any closer to solving the problem. But then, just as I was about to reach out to Stack Overflow for help, I thought of Valgrind, which I had heard could help track down some of the trickiest of memory issues. So I gave it a try.

The Problem

That was, as they say, a good call. As soon as Valgrind started up, it gave me this:

==25468== Invalid read of size 8
==25468==    at 0x50F88BE: std::string::string(std::string const&) (in /usr/lib64/libstdc++.so.6.0.8)
==25468==    by 0x403133: esd::Password::get_value() const (password.h:19)
==25468==    by 0x4029FF: esd::Password::IncrementPlace(unsigned long) const (password.cc:52)
==25468==    by 0x403373: esd::WorkUnit::get_passwords() const (work_unit.cc:37)
==25468==    by 0x4034DB: esd::WorkUnit::get_last_password() const (work_unit.cc:27)
==25468==    by 0x404BF2: esd::PasswordSpace::get_work_unit_queue() const (password_space.cc:85)
==25468==    by 0x405023: esd::PasswordSpace::CheckOutWorkUnit() (password_space.cc:41)
==25468==    by 0x401C57: main (server.cc:52)
==25468==  Address 0x405C080 is 0 bytes inside a block of size 8 free'd
==25468==    at 0x4C20130: operator delete(void*) (vg_replace_malloc.c:244)
==25468==    by 0x4021A2: std::tr1::_Sp_deleter<esd::Password>::operator()(esd::Password*) const (boost_shared_ptr.h:93)
==25468==    by 0x40229E: std::tr1::_Sp_counted_base_impl<esd::Password*, std::tr1::_Sp_deleter<esd::Password> >::dispose() (boost_shared_ptr.h:215)
==25468==    by 0x4022E0: std::tr1::_Sp_counted_base::release() (boost_shared_ptr.h:153)
==25468==    by 0x402339: std::tr1::shared_count::~shared_count() (boost_shared_ptr.h:277)
==25468==    by 0x4023A8: std::tr1::shared_ptr<esd::Password>::~shared_ptr() (boost_shared_ptr.h:486)
==25468==    by 0x402FAA: esd::Password::DistanceTo(std::tr1::shared_ptr<esd::Password>) const (password.cc:113)
==25468==    by 0x404AB6: esd::PasswordSpace::get_work_unit_queue() const (password_space.cc:79)
==25468==    by 0x405023: esd::PasswordSpace::CheckOutWorkUnit() (password_space.cc:41)
==25468==    by 0x401C57: main (server.cc:52)

That was exactly what I needed to see. std::string::string(std::string const&) was crashing because it was trying to access memory that had been freed. Specifically, that freed memory was the password itself. Even better, Valgrind could tell me who and where that memory was freed. In this case, it was esd::Password::DistanceTo(std::tr1::shared_ptr<esd::Password>), on line 133 of password.cc.

It turns out that line 133 of password.cc is simply where DistanceTo(…) returns the distance it has computed. That’s important, but it doesn’t explain why DistanceTo(…) deletes the object on which it is invoked. Looking at the whole method in conjunction with the Valgrind output, however, flushes the bug out:

size_t Password::DistanceTo(const std::tr1::shared_ptr<Password> other) const {
    if (Equals(other)) {
        return 0;
    }

    int comparison = CompareTo(other);
    std::tr1::shared_ptr<Password> this_shared_ptr(const_cast<Password*>(this));
    std::tr1::shared_ptr<Password> start;
    std::tr1::shared_ptr<Password> end;
    if (comparison < 0) {
        start = this_shared_ptr;
        end = other;
    } else {
        start = other;
        end = this_shared_ptr;
    }

    size_t distance = 0;
    while (!start->Equals(end)) {
        distance++;
        start = start->NextPassword();
    }
    return distance;
}

All I’m trying to do here figure out the absolute distance, in some ordering, from the password on which DistanceTo(…) was invoked to some other password. To accomplish this, I’m figuring out which password comes before the other and then counting how many increments of the earlier password it takes to get to the later password. (There are likely faster ways to do this, but, for an initial prototype in a new language, I just wanted something that I could write clearly and quickly.)

Here’s the problem:

std::tr1::shared_ptr<Password> this_shared_ptr(const_cast<Password*>(this));

This line tried to obtain a smart_ptr<Password> to this, so that I could assign either start or end to it, depending on whether this came before other in the password ordering. But this is const inside of DistanceTo(…), since the method itself is const, and so, in my naïvety, I thought, “Well, I’ll just remove the const from this with a const_cast<Password*>, and that will solve my problem!”

And it did solve the problem of this being const, but it created a much larger problem: I now have at least two shared_ptr objects that point to the password instance: the one that was used to invoke DistanceTo(…), and this_shared_ptr. Unfortunately, creating two shared_ptrs that point to the same object but that don’t know about each other is a recipe for disaster, since they don’t know that they’re sharing ownership. So, when DistanceTo(…) returns, it deletes this_shared_ptr, which, in turn, deletes the object it points to—the password instance—because it doesn’t think that any other shared_ptr is pointing to it.

Ultimately, then, when someone else with a shared_ptr to that password instance uses it to try and invoke get_value(), we get a segmentation fault, because the memory that once contained the member value_ has been freed.

The Solution

So, how can I get a correct shared_ptr to this? One solution is described in Boost Smart Pointer Programming Techniques, which has a section called, appropriately, Obtaining a shared_ptr to this.

After thinking about the problem for a bit longer, though, I chose a different, more straightforward approach: instead of trying to obtain a shared_ptr to this, I simply created a new Password with the same value as this:

std::tr1::shared_ptr<Password> this_copy = std::tr1::shared_ptr<Password>(new Password(get_value()));
std::tr1::shared_ptr<Password> start;
std::tr1::shared_ptr<Password> end;
if (comparison < 0) {
    start = this_copy;
    end = other;
} else {
    start = other;
    end = this_copy;
}

It might not be quite as memory efficient as the technique discussed in the Boost documentation, but, in my case, just using a copy of this is clean and simple. In addition, the use of a copy makes it easier to see that DistanceTo(…) really is const.

The End

This is a long post about what, ultimately, was a simple problem that resulted from my inexperience. I already feel a bit silly when I think back on the bug, since, in retrospect, the issue seems so obvious—why didn’t I think through the consequences of creating a new shared_ptr to this?—and my path to the solution so convoluted—why didn’t I just run Valgrind right away?

But three hours ago, I wasn’t sure what to try next, while, now, I can’t see how I could have ever been so lost. Some time soon, I’ll likely be struggling for hours before writing a post about that next simple problem. But the fight to a solution will have made me a better programmer, just as it has this time. That’s worth the effort.

Sunday, February 7, 2010
I&#8217;m selling my first-generation Amazon Kindle on Internet Garage Sale.

Back to the Land

“If you eat too much of this food, you become sick and also fatafat. And no amount of fatafat pills will help you.”

As seen in the 5th floor men&#8217;s bathroom in the Gates-Hillman Center.

As seen in the 5th floor men’s bathroom in the Gates-Hillman Center.

World Record BASE Jump: VC2 (via hagus and GreatDismal)