We all know how important pointers are in C++. A pointer is basically an object that contains an address to a variable. We can use this address to access the value stored in that variable, and that variable is located somewhere in the memory. It’s like how we use an address to locate someone in real life! An interesting thing to note about pointers is that although they contain addresses, they don’t really care about what they are actually pointing to. It’s like looking at an address in real life and not knowing if it’s a house or a clothing store. Now you might ask, why is such a thing relevant in programming? Can we take actually advantage of this?
Why do we need it?
Let’s say you have a pointer pointing to a float value. Now, you want to extract an int out of it without actually rounding it off. How do we do it? Using pointer casting, a pointer to one type of value can be converted to a pointer to a different type without modifying anything. But the problem here is that the result may be undefined. This happens because different types of variables have different sizes, and they are aligned differently in memory. If you don’t totally get it as of now, it will become clear soon. A pointer to an object can be converted to a pointer to a different kind of object whose type requires less or equally strict storage alignment. If we don’t follow this, then we won’t be able to get back to the original object without change.
To give a rudimentary analogy, let’s say you have two bottles with capacities of 8 oz and 11 oz. Let’s say you fill up the 8 oz bottle with water. Now if you transfer all the contents of the 8 oz bottle into the 11 oz bottle, you won’t lose anything. But if you completely fill the 11 oz bottle and start pouring water into the other bottle until it’s empty, you will up losing 3 oz of water and you won’t get it back. A pointer to void can be converted to and from a pointer to any type, without restriction or loss of information. A pointer to void is basically an empty bottle. If the result is converted back to the original type, the original pointer is recovered. If a pointer is converted to another pointer with the same type but having different or additional qualifiers, the new pointer is the same as the old except for restrictions imposed by the new qualifier.
What exactly is it?
Now that we have sufficiently talked about water bottles, let’s see what pointer casting actually means in the context of programming. A pointer is an arrow that points to an address in memory, with a label indicating the type of the value. The label refers to indicating the type of value. For example, it is equivalent to specifying if it’s a house or a clothing store. The address indicates where to look and the type indicates what to take. Casting the pointer changes the label on the arrow but not where the arrow points. This means that even if we change the label from a house to a clothing store, it will not actually change the reality of the thing at that address.
Let’s consider the following diagram. Here, ‘p’ is a pointer to ‘c’ which is of type ‘char’. A char is basically one byte of memory. So when ‘p’ is dereferenced, you get the value in that one byte of memory. Dereferencing refers to the act of looking up the address in the pointer and seeing what’s inside.
-|-----|-----|-----|-----|-----|-----|- | | c | | | | | -|-----|-----|-----|-----|-----|-----|- ^^^^^ p (char)
When you cast p to int*, you’re implying that ‘p’ points to an int value now. On most systems today, an int occupies 4 bytes.
-|-----|-----|-----|-----|-----|-----|- | | c x1 x2 x3 | | -|-----|-----|-----|-----|-----|-----|- ^^^^^^^^^^^^^^^^^^^^^^^ (int*)p
Now, when we dereference (int*)p, we get a value that is determined from these four bytes of memory. Now you might ask, we don’t even know what inside those remaining three bytes? What value am I going to get? Well, we’ll never know! The value you get depends on what is in these cells marked ‘x’, and on how an int is represented in memory. It’s something that’s not under our control.
I don’t get it. If it’s not under our control, then why would we ever do it?
Well, it will become clear soon. Computers usually have two ways to interpret bytes in memory: little endian and big endian. If it’s little-endian, it means that the value of an int is calculated this way: val = c * 20 + x1 * 28 + x2 * 216 + x3 * 224. If we actually look at this value, we will see that this is usually garbage. If it’s big-endian, then the bytes are arranged in the other direction: val = c * 224 + x1 * 216 + x2 * 28 + x3 * 20.
Depending on your compiler and the operating system, you may find that the value is different every time you run the program. It may also happen that it’s always the same but changes when you make even minor tweaks to the source code. On some systems, an int value must be stored in an address that’s a multiple of 4 (or 2, or 8). This is called an alignment requirement. Depending on whether the address of ‘c’ happens to be properly aligned or not, the program may crash. As we all know, dealing with crashes is every programmer’s favorite thing!
Here’s what happens when you have an int value and take a pointer to it.
int x = 23; int *p = &x;
-|-----|-----|-----|-----|-----|-----|- | | x | | -|-----|-----|-----|-----|-----|-----|- ^^^^^^^^^^^^^^^^^^^^^^^ p (int pointer)
The pointer p points to an int value, and there are no surprises when dereferencing it.
Can I always cast pointers?
Casting pointers is usually invalid in C. People who build and maintain the C language thought that they will do just fine without all this drama. So they just decided to ditch it, and they had good reasons too. Due to alignment considerations, it’s possible that the destination pointer type is not able to represent the value of the source pointer type. For example, if int * were inherently 4-byte aligned, casting ‘char *’ to ‘int *’ would lose the lower bits. Also, in general, it’s forbidden to access an object except via an lvalue of the correct type for the object. There are some exceptions, but unless you understand them very well, you don’t want to do it. One thing to note is that aliasing is only a problem if you actually dereference the pointer i.e. apply the * or -> operators to it, or pass it to a function that will dereference it. Wait a minute, what is aliasing here? Well, aliasing refers to the situation where the same memory location can be accessed by using different names.
When is it actually useful?
Pointer casting is usually used when you want to access the bit stream pointed by one type of pointer and read it as something else. The requirements have to be specific enough and you have to be really careful when doing this. When the destination pointer type points to character type, it’s okay to use it. Let’s say that you are receiving an input stream containing integers, and you want do some sort of operation on every byte. In this situation, you can use pointer casting and directly start reading bytes as opposed to dealing with type conversions and overheads. Pointers to character types are guaranteed to be able to represent any pointer to any type, and successfully round-trip it back to the original type if desired. Pointer to void (void *) is exactly the same as a pointer to a character type except that you’re not allowed to dereference it or do arithmetic on it. It automatically converts to and from other pointer types without needing a cast, so pointers to void are usually preferable over pointers to character types for this purpose. When the destination pointer type is a pointer to structure type whose members exactly match the initial members of the originally-pointed-to structure type. This is useful for various object-oriented programming techniques in C++. Needless to say, you should use this feature with caution, and stay away from it as much as possible.