Wednesday, November 4, 2009

Objects, Part 2: Inheritance and Name Mangling

As was covered in the previous post, we are dealing with how C++ classes would compile down to equivalent C code. Consider the following classes:
// C++
class Parent
{
public:
int getParentMember();
void setParentMember(int);
protected:
int parent_member;
};

class Child: public Parent
{
public:
int getChildMember();
void setChildMember(int);
private:
int childMember;
};

int Parent::getParentMember()
{
return this->parentMember;
}

void Parent::setParentMember(int p)
{
this->parentMember = p;
}

int Child::getChildMember()
{
return this->childMember;
}

void Child::setChildMember(int c)
{
this->childMember = c;
}

The Parent class' C equivalent is straightforward. Parent is really no different from any of the classes we've dealt with so far:
// C
struct Parent
{
int (*getParentMember)();
void (*setParentMember)(int);
int parentMember;
};

// Parent::Parent()
void constructor(struct Parent* this)
{
this->getParentMember = getParentMember;
this->setParentMember = setParentMember;
}

// Parent::~Parent()
void destructor(struct Parent* this)
{
}

// int Parent::getParentMember()
int getParentMember(struct Parent* this)
{
return this->parentMember;
}

void setParentMember(struct Parent* this, int p)
{
this->parentMember = p;
}

Tuesday, November 3, 2009

Objects, Part 1: Members, Methods, and Constructors/Destructors

Many programmers today take object-oriented programming for granted. In fact, at my university, I never once wrote a program in a programming language that didn't support objects (okay, MIPS Assembly, but does that really count as a programming language?). However, there once was a time when objects didn't exist, and people still did most of the same things that people do with object orientation today. In fact, early C++ compilers, which included object orientation, compiled to C. So how did they do it?

The most basic part of an object is its members. Let's take a look.
// C++
class Object
{
int member;
};

// C
struct Object
{
int member;
};

Well, that was pretty straightforward. Boring, even. But wait, what about public and private?
// C++
class Object
{
public:
int publicMember;
private:
int privateMember;
};

// C
struct Object
{
int publicMember, privateMember;
};

You may be thinking, "This fellow is an idiot, can't he see that he has treated public_member and private_member as if they were exactly the same?" The answer is, "Yes." They are treated exactly the same, at least in C++. Public and private are only enforced by the compiler at compile time. As soon as compilation is done, private members are just the same as public variables. This is why public and private aren't to be used for security. The distinction is only there to keep the programmer from making mistakes. Other access keywords such as protected and friend are also completely irrelevant after compilation.

I will add that some languages such as Java do enforce a distinction between access levels at runtime. However, this is still only for preventing programmer mistakes, and not for security. In fact, one can even bypass the restrictions with reflection.

Next we look at methods. Consider the following class:
// C++
class Object
{
public:
int getMember();
void setMember(int);
private:
int member;
};

int Object::getMember()
{
return this->member;
}

void Object::setMember(int m)
{
this->member = m;
}

We can easily change it to a struct and change the methods to function pointers...
// C
struct Object
{
int (*getMember)();
void (*setMember)(int);
int member;
};

int getMember()
{
return this->member;
}

void setMember(int m)
{
this->member = m;
}

...but there is a serious issue with this C code. this isn't a keyword in C and it's never declared either in the global scope or in the function. We can get a clue about this from Python:
# Python
# I know "self" is the idiomatic name in Python,
# but it's legal to name it whatever you want
# and it makes my example clearer.
class Object:
def getMember(this):
return this.foo
def setMember(this,f):
this.foo = f

Python is the only object oriented language I know of which requires the programmer to explicitly declare the this (or self) argument to a method. However, under the covers, all object oriented languages work this way. They just abstract it away from the user as a form of syntactic sugar. This allows us to complete our C code:
// C
struct Object
{
int (*getMember)();
void (*setMember)(int);
int member;
};

int getMember(struct Object* this)
{
return this->member;
}

void setMember(struct Object* this, int m)
{
this->member = m;
}

There are just a few more pieces of the puzzle which we need to make our class usable; a default constructor and destructor:
// C
void constructor(struct Object* this)
{
this->getMember = getMember;
this->setMember = setMember;
}

void destructor(struct Object* this)
{
}

Note that so far, our destructor doesn't do anything.

Now we can translate some basic usages of our class:
// C++
{
Object stackAllocated;
Object* heapAllocated = new Object();

stackAllocated.setMember(0);
stackAllocated.getMember();

heapAllocated->setMember(1);
heapAllocated->getMember();

delete heapAllocated;
}

// C
{
struct Object stackAllocated;
constructor(&stackAllocated);
struct Object* heapAllocated;
heapAllocated = malloc(sizeof(struct Object));
constructor(heapAllocated);

setMember(&stackAllocated,0);
getMember(&stackAllocated);

setMember(heapAllocated,1);
getMember(heapAllocated);

destructor(heapAllocated);
free(heapAllocated);
destructor(&stackAllocated);
}

Notes:
  • You can see now why new and delete are keywords rather than functions like malloc and free; what they do is actually a bit more complicated than a simple function call. The allocation and deallocation actually occur inside the constructor.

  • Note that the constructor initializes the function pointers (methods), but not the member variables. This is because methods are basically constant in C++, so that's entirely abstracted away from the programmer, but the member variables are more malleable and therefore aren't initialized (at least, this is the behavior on some older C++ compilers).

  • Constructors and destructors in C are named like Object::Object and Object::~Object. If you looked at them separately, the obvious way to do this in C would be to name the functions Object and Object, but this causes a blatant name collision when you put the two together which is even worse when you consider that the name of the struct is also Object. I handled this in my example by naming the functions constructor and destructor, but this is just a temporary solution, since this will cause problems as soon as we define even one more class. We'll see how C++ handles this problem in the next post.

Next Post: Objects, Part 2: Inheritance and Name Mangling