Ashley Hawkins

Arrays and Pointers in C (and C++)

by

in

Introduction to the problem

This intro is quite long so if you don’t want to read it, just skip to the explanation.

I have seen a lot of people get caught out by the differences between arrays and pointers. Some misconceptions include the idea that arrays are pointers, that pointers are arrays, or that arrays “aren’t pointers but behave the same as pointers”.

While it is true that arrays and pointers are connected, it is important to recognise the underlying reasons why they are connected, as well as why they are absolutely not the same, in order to stop making mistakes that involve incorrect assumptions about pointers being “array-y” or arrays being “pointer-y”.

I believe part of the reason these misconceptions have come about is a terminology issue. In a lot of tutorials I’ve seen that try to teach C, a value of type T* is referred to as an array of T when the memory it points to holds a sequence of values of type T. This is wrong, as a T* is a either a pointer to T, or a pointer to a member of an array of T. That is a bit wordy, and I can see why people would prefer to just call it an array, but that just ends up causing more issues in the long run. An int* pointing to a sequence of int values is not in and of itself an array, in the same way that a an int* pointing to a single int value is not in and of itself an int.

The other problem that sort of cements this misconception is the subscript operator, and some of the other syntax in C that tries to hide the differences between arrays and pointers in some contexts. If I can index my pointer with the [] operator, and I can index an array with the [] operator, then it must be the same as an array, right?

Well no, that’s not right. If it was right I wouldn’t have been bothered enough to make this blog post. This brings us to the first proper “part” of this post.

Applicability to C versus C++

This post will use C for all of the examples, since I most often see the misconception in people who are just beginning to learn C. Most of what is mentioned here also technically applies to C++ — with exceptions being noted in the footnotes — although C++ generally has different idioms than C which often discourage using these fundamental data types in favour of things like std::array, std::vector, std::span, and std::mdspan. So in the sense that it exists in the C++ language, this stuff does apply to C++, but in the “will this ever be useful to me?” sense, it applies less to C++ than to C but it’s still definitely useful to know.

Fundamental properties of arrays to keep in mind

First, let’s be clear on what an array is. According to section 6.2.5.20 of the ISO C17 standard:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called “array of T”. The construction of an array type from an element type is called “array type derivation”.

The first part is the most important part here for our purposes: “An array type describes a contiguously allocated nonempty set of objects with a particular member type”. The important thing to emphasise here is that the array is the set of objects, it’s not a “pointer to” a set of objects. An array object is in and of itself the entire set which it contains. And all of the objects it contains are allocated contiguously, which means there are no gaps in memory between the objects, they are stored adjacently in memory. These properties of arrays are important to keep in mind for the remainder of the post.

Parts[0]: Subscript Operator and Pointer Arithmetic

To the surprise of many, the subscript operator in C is not an operator that works directly on arrays, instead it deals with pointer arithmetic.1

Imagine that you have a pointer P, and an index i. You can access the element at position i of the array containing the pointed-to value with the following usage of the subscript operator:

P[i]

Now, what this is actually saying is, “take the pointer P, add i to it, and then dereference the resulting pointer to get the value at that address. In other words, the above syntax is equivalent to:

*((P) + (i))

Some people new to the language may make the mistake of assuming that adding 1 to a pointer will add 1 to its actual address value (i.e. offset it by a single byte). In reality, adding 1 to a pointer that points to T results in a pointer which points to the next T value along in the array. In other words, and to use a real type to make it easier to understand, adding 1 to a pointer of type int* shifts it along by one int, so that the pointer resulting from the addition will point to the int at an offset of 1 from the int that it originally pointed to (which is directly adjacent to it in memory).

The equivalence of the two above code blocks is the reason that the subscript syntax of a pointer works the way it does. The pointer is telling you the base address, and the index (along with the pointer’s type) is saying how much to offset that base address by before dereferencing it to get the value at that position.

Bonus: The fact that P[i] is equivalent to *((P) + (i)) and the fact that the + operator is commutative means that the subscript operator is also commutative, meaning that P[i] and i[P] are equivalent. So yes, given that P points to the first element of an array for which i is a valid index, you can get the ith element of that array by doing i[P].2

Parts[1]: So how do arrays fit into this?

So now you should have an understanding of what the subscript operator does under the hood, and the reason why it works to get the ith element of an array by using the subscript operator on a pointer that points to the element at the start of the array. So how do plain arrays fit in? This is where a behaviour called array to pointer decay comes into play.

In most circumstances, an array will decay into a pointer to its first element. That is to say, if I have an array arr, and I pass it to a function or use an operator on it, such as unary plus (which just performs integer promotion and returns the operand), the array object will “decay” to a pointer to its first element, such that when arr decays in that expression, arr would then be a pointer to the start of that array (equivalent to &arr[0]) instead of representing the entire array.

This behaviour is what allows the subscript syntax to work on arrays. Since the array decays to a pointer in this expression, as it does in most expressions, the logic discussed in part 1 would apply, as the value being operated on by the subscript operator would be a pointer to the first element of the array.

The exceptions to this behaviour of decaying3 in C are as follows:

  • As an operand to the sizeof operator, e.g. sizeof arr
    (jump to explanation)
  • As an operand to the & (address of) operator, e.g. &arr
    (jump to explanation)
  • When the array is specifically a string constant and is being used to initialise an array of char, e.g. char myStr[] = "Hello!!";

The first two cases will be discussed further in the next parts as they relate to the most common misunderstandings of decay.

Parts[2]: sizeof operator

The first case where an array doesn’t just decay into a pointer to its first element is when the array is used as the operand to the sizeof operator. The sizeof operator returns the size in bytes of its operand. Since an array is all of its elements, that means the size of the entire array, in bytes, which is the sum of the sizes of each of its constituent elements in bytes. So for an array of 8 ints, on a system where ints are 4 bytes, the size of that array would be 32 bytes. A common usage of sizeof with arrays is the “array length” macro, which you may have seen used before, which generally looks something like this:

#define ARRLEN(arr) (sizeof (arr) / sizeof *(arr))

This is great, and it works as expected when you use it on an array.

The place where people start getting into misconceptions is when you try and pass an array as a parameter to a function and then take its length:

void foo(int arr[])
{
    size_t arr_length = ARRLEN(arr);
    // ...
}

This is not going to work as one might expect, as when passed to a function, an array decays to a pointer, and writing int arr[] doesn’t create some sort of “array parameter”, it’s equivalent to int * arr. The pointer’s type has no length information, it’s just a pointer to a single object. Trying to get the size of that object just gives you the size of the pointer itself, not the size of all the elements of the array that the pointed-to object occupies. In order to pass around the size of an array, it needs to be passed through explicitly as a parameter to the function, and the caller will bear the responsibility of providing correct length information to the receiving function.

void foo(size_t arr_length, int arr[])
{
    // ...
}

Parts[3]: address of operator and pointers to arrays

When using the & (address of) operator on an array, the value returned is a pointer to that array, which has a type in the form T (*)[N] where T is the element type, and N is the length of the array. It may seem obvious to some that taking the address of an array gives you a pointer to an array, but others make the mistake of thinking that they will actually get a pointer to a pointer, as they either think that the array is a pointer, or they think that it would decay to a pointer before the & operator operates on it.4 This is why it is important to note that the decay does not occur when the array is the operand to the address of operator.

This resulting pointer has the length information encoded into its type: it points to the entire array, and if you added 1 to it, it would shift the pointer along by the size of the entire array, so it would point to the next array (but of course when you’re taking the address of a single array, there is no “next array”. That is where we get to the concept of an array of arrays, where there can be a “next array” to make this pointer-to-array pointer arithmetic useful.

int my_array[] = {1, 2, 3, 4}
// Points to the whole array, the length of the
// array is encoded as part of the pointer's type.
int (*my_array_ptr)[ARRLEN(my_array)] = &my_array;

Parts[4]: Array of arrays (aka multi-dimensional array)

Alternatively called a multi-dimensional array, an array of arrays is an array whose element type is another array type. For example, a two-dimensional array:

int arr[2][4];

What this says is “Declare arr as an array with a length of two, whose element type is an array of 4 ints”. Again, some people make the mistake of thinking that this array would decay into a “pointer to a pointer to int”, but this is wrong, because an array is not a pointer, it only decays into one, and a “pointer to array” doesn’t decay, only an actual array decays.

So you can pass a 1D array to a function by passing a pointer to the element type, how do you do this with a 2D array? Well that’s simple, by passing a pointer to the element type. But that is not going to be a pointer to a pointer. The element type in this case would be an array of 4 ints. So you would pass a pointer to an array of 4 ints.

void foo(size_t arr_length, int (*arr)[4])
{
    // do something with arr
}

This way, you can take as a parameter any array whose element type is array of 4 ints. But you can go a step further to avoid hard coding the length of the inner array. In C there are types called variably modified types5, one example of which is a pointer to variable length array, which means that a runtime variable, such as a function parameter can be part of the type of the pointer, so you can make a function that takes a pointer to VLA based on the length argument that was passed in6:

void foo(size_t x, size_t y, int (*arr)[y])
{
    for (size_t i = 0; i < x; ++i)
    {
        for (size_t j = 0; j < y; ++j)
        {
            printf("%2d ", arr[i][j]);
        }
        putc('\n', stdout);
    }
}

// Using ARRLEN macro shown earlier, call as:
int myarr[8][4] = { 0 };

foo(ARRLEN(myarr), ARRLEN(*myarr), myarr);

View full code on Compiler Explorer

You can also use this syntax for a pointer-to-array, but only in function parameters, similar to how [] means “pointer” but only in function parameters:

void foo(size_t x, size_t y, int arr[][y])
{
    // ...
}

So what is actually happening if I do myarr[i][j] to this array shown above which was declared as int myarr[8][4] = { 0 };? Well, myarr is an array, so it will first decay into a pointer to its first element, the type of that pointer would be int (*)[4] (“pointer to array of 4 ints”). Next, it is used in the [] operator, along with i, so we get a pointer that it i “arrays of 4 ints” along from the start, which is then dereferenced, so we now have the inner array itself (which is of type int[4]). This inner array is then used with the second subscript operator, and so this array itself also decays and we are left with a pointer to its first element, which is just a pointer to an int. It is then used in the subscript operator to get the int at offset j from the start of that inner array.

It is important to recognise that just because you can index some object by [i][j] does not mean it is an array of arrays. It could also be an array of pointers, which is a type that is laid out completely differently in memory and has a completely incompatible type (you can’t pass an array of arrays into a function that expects an array of pointers, and vice versa), and just happens to also work with the same indexing syntax. Arrays of pointers are discussed further in the next part.

Bonus: This isn’t really another bonus because it’s basically the same thing as before but it’s an expansion of the previous one that looks even more goofy. Because of the commutative nature of the subscript operator, myarray[i][j] is equivalent to j[i[myarray]]. This is basically like rewriting the *(*(myarray + i) + j) to *(j + *(i + myarray))

Parts[5]: Array of pointers that point to the first element of another array

The other type of “array of arrays” people may sometimes refer to is an array of pointers which point to the first element of other arrays (typically of type T** with T being the “element type”). This is sometimes known as a jagged array, as the inner pointers may point to arrays of completely different lengths, so that if you would display the array it would appear jagged. Of course, if we just use a “pointer to pointer” alone when we don’t have any size information here so we would also need to have some separate way to store the lengths of the pointed-to arrays if they are actually different lengths. In a lot of use cases, people don’t need to have different lengths for the pointed-to arrays, so they only need to store a single value for the inner length.

These “jagged arrays” are completely different to “arrays of arrays”, despite the fact that they can be accessed with the same subscript syntax. They are not compatible with one another, they have a very different memory layout and are not interchangeable. In an actual array of arrays, all of the arrays are allocated contiguously in memory. For an example of this incompatibility, if a function expects a 2D jagged array (T**), you can’t pass in a 2D array of arrays (T(*)[N]).


  1. The subscript operator can be overloaded in C++, so this only applies to its usage with both operands being fundamental types. ↩︎
  2. Again, in C++ this only applies when both operands are fundamental types. ↩︎
  3. In C++ there are more exceptions to decaying of arrays, such as during template argument deduction, or taking an array by reference. C++ is more complex than C in this regard, so the full exhaustive list is only mentioned for C. ↩︎
  4. This misconception also shows a misunderstanding of the decay when it does occur, in the sense that the decay results in a temporary rvalue, and it’s not possible to take the address of such a value. So for example, &(+arr) would not give you a pointer to a pointer because it would just not be valid at all. ↩︎
  5. Variably modified types were introduced in C99, made optional in C11, and they were made mandatory again in C23 (but having the ability to create them with automatic storage duration is still optional). MSVC notably doesn’t support VM types. ↩︎
  6. VLAs do not exist in C++, and neither do pointers to VLAs, so this would not work. You would need to either hard code the length, use a compiler which supports VLAs as an extension, or pass in a flat array and do the maths yourself. In the future, std::mdspan and std::mdarray will be modern standardised alternatives to this. ↩︎

Summary

So to summarise the key points:

  • Arrays are not pointers, they are a contiguously allocated set of elements of a certain type. The array is all of its elements, it’s not a pointer to them.
  • The subscript operator actually works on pointers, not arrays, and the way it works is by offsetting the address by a certain amount, and this allows the dereferencing of that resulting address to grab the element at a certain index, since all of the elements of the array are allocated contiguously.
  • Arrays decay to pointers in certain circumstances, allowing the subscript operator to work on arrays. Some operators acting on arrays
  • Arrays of arrays and arrays of pointers are different, even though the same subscript syntax works with both.

This post is a work in progress that will be updated over time. If you notice a mistake or find that any part of this post is misleading, missing context, or difficult to understand, or think some part could do with an additional code example, or think of any other way to improve it, please email me or write a comment so it can be improved. Pedants are welcome to grill me.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *