14

I have came across a weird experience in C programming. Consider this code:

int main(){
  int array1[6] = {0, 1, 2, 3, 4, 5};
  int array2[6] = {6, 7, 8, 9, 10, 11};

  printf("%d\n", array1[-1]);
  return 0;
}

When I compile and run this, I don't get any errors or warnings. As my lecturer said, the array index -1 accesses another variable. I'm still confused, why on earth does a programming language have this capability? I mean, why allow negative array indices?

Mohammed Fawzan
  • 283
  • 1
  • 4
  • 8

7 Answers7

27

The array indexing operation a[i] gains its meaning from the following features of C

  1. The syntax a[i] is equivalent to *(a + i). Thus it is valid to say 5[a] to get at the 5th element of a.

  2. Pointer-arithmetic says that given a pointer p and an integer i, p + i the pointer p advanced by i * sizeof(*p) bytes

  3. The name of an array a very quickly devolves to a pointer to the 0th element of a

In effect, array-indexing is a special case of pointer-indexing. Since a pointer can point to any place inside an array, any arbitrary expression that looks like p[-1] is not wrong by examination, and so compilers don't (can't) consider all such expressions as errors.

Your example a[-1] where a is actually the name of an array is actually invalid. IIRC, it is undefined if there's a meaningful pointer value as the result of the expression a - 1 where a is know to be a pointer to the 0th element of an array. So, a clever compiler could detect this and flag it as an error. Other compilers can still be compliant while allowing you to shoot yourself in the foot by giving you a pointer to a random stack slot.

The computer science answer is:

  • In C, the [] operator is defined on pointers, not arrays. In particular, it's defined in terms of pointer arithmetic and pointer dereference.

  • In C, a pointer is abstractly a tuple (start, length, offset) with the condition that 0 <= offset <= length. Pointer arithmetic is essentially lifted arithmetic on the offset, with the caveat that if the result of the operation violates the pointer condition, it is an undefined value. De-referencing a pointer adds an additional constraint that offset < length.

  • C has a notion of undefined behaviour which allows a compiler to concretely represent that tuple as a single number, and not have to detect any violations of the pointer condition. Any program that satisfies the abstract semantics will be safe with the concrete (lossy) semantics. Anything that violates the abstract semantics can be, without comment, accepted by the compiler and it can do anything it wants to do with it.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Hari
  • 386
  • 2
  • 5
15

Arrays are simply laid out as contiguous chunks of memory. An array access such as a[i] is converted to an access to memory location addressOf(a)+i. This the code a[-1] is perfectly understandable, it simply refers to the address one before the start of the array.

This may seem crazy, but there are many reasons why this is allowed:

  • it is expensive to check whether the index i to a[-] is within bounds of the array.
  • some programming techniques actually exploit the fact that a[-1] is valid. For instance, if I know that a is not actually the start of the array, but a pointer into the middle of the array, then a[-1] simply gets the element of the array that is to the left of the pointer.
Dave Clarke
  • 20,345
  • 4
  • 70
  • 114
4

As the other answers explain, this is undefined behaviour in C. Consider that C was defined (and is mostly used) as a "high level assembler". C's users value it for its uncompromising speed, and checking stuff at runtime is (mostly) out of the question for the sake of sheer performance. Some C constructs that look nonsensical for people comming from other languages make perfect sense in C, like this a[-1]. Yes, it doesn't always make sense (

vonbrand
  • 14,204
  • 3
  • 42
  • 52
3

One can use such a feature to write memory allocation methods that access memory directly. One such use is to check the previous memory block using a negative array index to determine if the two blocks can be merged. I've used this feature when I develop a non-volatile memory manager.

2

C is not strongly typed. A standard C compiler wouldn't check array bounds. The other thing is that an array in C is nothing but a contiguous block of memory and indexing starts at 0 so an index of -1 is the location of whatever bit-pattern is before a[0].

Other languages exploit negative indices in a nice way. In Python, a[-1] will return the last element, a[-2] will return the second-to-last element and so on.

mrk
  • 3,748
  • 23
  • 35
1

In simple words:

All variables(including arrays) in C are stored in memory. Let's say you have 14 bytes of "memory" and you initialize the following:

int a=0;
int array1[6] = {0, 1, 2, 3, 4, 5};

Also,consider the size of an int as 2 bytes. Then,hypothetically, in the first 2 bytes of memory the integer a will be saved.In the next 2 bytes the integer of the first position of the array will be saved(that means array[0]).

Then, when you say array[-1] is like referring to the integer saved in memory that is just before array[0],which in our is,hypothetically, integer a. In reality, this is not exactly the way that variables are stored in memory.

Dchris
  • 425
  • 4
  • 10
0
//:Example of negative index:
//:A memory pool with a heap and a stack:

unsigned char memory_pool[64] = {0};

unsigned char* stack = &( memory_pool[ 64 - 1] );
unsigned char* heap  = &( memory_pool[ 0     ] );

int stack_index =    0;
int  heap_index =    0;

//:reserve 4 bytes on stack:
stack_index += 4;

//:reserve 8 bytes on heap:
heap_index  += 8;

//:Read back all reserved memory from stack:
for( int i = 0; i < stack_index; i++ ){
    unsigned char c = stack[ 0 - i ];
    //:do something with c
};;
//:Read back all reserved memory from heap:
for( int i = 0; i < heap_index; i++ ){
    unsigned char c = heap[ 0 + i ];
    //:do something with c
};;
KANJICODER
  • 101
  • 1