Arrays

Let's talk about arrays. So why would we ever want to use arrays? Well let's say you have a program that needs to store 5 student IDs. It might seem reasonable to have 5 separate variables. For reasons we'll see in a bit, we'll start counting from 0. The variables we'll have will be int id0, int id1, and so on. Any logic we want to perform on a student ID will need to be copied and pasted for each of these student IDs. If we want to check which students happen to be in CS50, we'll first need to check if id0 represents the student in the course. Then to do the same for the next student, we'll need to copy and paste the code for id0 and replace all occurrences of id0 with id1 and so on for id2, 3, and 4. As soon as you hear that we need to copy and paste, you should start thinking that there is a better solution. Now what if you realize you don't need 5 student IDs but rather 7? You need to go back into your source code and add in an id5, an id6, and copy and paste the logic for checking if the IDs belong to the class for these 2 new IDs. There is nothing connecting all these IDs together, and so there is no way of asking the program to do this for IDs 0 through 6. Well now you realize you have 100 student IDs. It's starting to seem less than ideal to need to separately declare each of these IDs, and copy and paste any logic for those new IDs. But maybe we are determined, and we do it for all 100 students. But what if you don't know how many students there actually are? There are just some n students and your program has to ask the user what that n is. Uh oh. This isn't going to work very well. Your program only works for some constant number of students. Solving all of these problems is the beauty of arrays. So what is an array? In some programming languages an array type might be able to do a bit more, but here we'll focus on the basic array data structure just as you'll see it in C. An array is just a big block of memory. That's it. When we say we have an array of 10 integers, that just means we have some block of memory that is large enough to hold 10 separate integers. Assuming that an integer is 4 bytes, this means that an array of 10 integers is a continuous block of 40 bytes in memory. Even when you use multidimensional arrays, which we won't go in to here, it's still just a big block of memory. The multidimensional notation is just a convenience. If you have a 3 by 3 multidimensional array of integers, then your program will really just treat this as a big block of 36 bytes. The total number of integers is 3 times 3, and each integer takes up 4 bytes. Let's take a look at a basic example. We can see here 2 different ways of declaring arrays. We'll have to comment 1 of them out for the program to compile since we declare x twice. We'll take a look at some of the differences between these 2 types of declarations in a bit. Both of these lines declare an array of size N, where we have #define N as 10. We could just as easily have asked the user for a positive integer and used that integer as a number of elements in our array. Like our student ID example before, this is kind of like declaring 10 completely separate imaginary variables; x0, x1, x2, and so on up to xN-1. Ignoring the lines where we declare the array, notice the square brackets intact inside the for loops. When we write something like x[3], which I'll just read as x bracket 3, you can think of it like asking for the imaginary x3. Notice than with an array of size N, this means that the number inside of the brackets, which we'll call the index, can be anything from 0 to N-1, which is a total of N indices. To think about how this actually works remember that the array is a big block of memory. Assuming that an integer is 4 bytes, the entire array x is a 40 byte block of memory. So x0 refers to the very first 4 bytes of the block. X[1] refers to the next 4 bytes and so on. This means that the start of x is all the program ever needs to keep track of. If you want to use x[400], then the program knows that this is equivalent to just 1,600 bytes after the start of x. Where'd we get 1,600 bytes from? It's just 400 times 4 bytes per integer. Before moving on, it's very important to realize that in C there is no enforcement of the index that we use in the array. Our big block is only 10 integers long, but nothing will yell at us if we write x[20] or even x[-5]. The index doesn't even have to be a number. It can be any arbitrary expression. In the program we use the variable i from the for loop to index into the array. This is a very common pattern, looping from i=0 to the length of the array, and then using i as the index for the array. In this way you effectively loop over the entire array, and you can either assign to each spot in the array or use it for some calculation. In the first for loop, i starts at 0, and so it will assign to the 0 spot in the array, the value 0 times 2. Then i increments, and we assign the first spot in the array the value 1 times 2. Then i increments again and so on up until we assign to position N-1 in the array the value N-1 times 2. So we've created an array with the first 10 even numbers. Maybe evens would have been a bit better name for the variable than x, but that would have given things away. The second for loop then just prints the values that we have already stored inside of the array. Let's try running the program with both types of array declarations and take a look at the output of the program. As far as we can see, the program behaves the same way for both types of declarations. Let's also take a look at what happens if we change the first loop to not stop at N but rather say 10,000. Way beyond the end of the array. Oops. Maybe you've seen this before. A segmentation fault means your program has crashed. You start seeing these when you touch areas of memory you shouldn't be touching. Here we are touching 10,000 places beyond the start of x, which evidently is a place in memory we shouldn't be touching. So most of us probably wouldn't accidentally put 10,000 instead of N, but what if we do something more subtle like say write less than or equal to N in the for loop condition as opposed to less than N. Remember that an array only has indices from 0 to N-1, which means that index N is beyond the end of the array. The program might not crash in this case, but it's still an error. In fact, this error is so common that it has it's own name, an off by 1 error. That's it for the basics. So what are the major differences between the 2 types of array declarations? One difference is where the big block of memory goes. In the first declaration, which I'll call the bracket-array type, though this is by no means a conventional name, it will go on the stack. Whereas in the second, which I'll call the pointer-array type, it will go on the heap. This means that when the function returns, the bracket array will automatically be deallocated, whereas as you must explicitily call free on the pointer array or else you have a memory leak. Additionally, the bracket array isn't actually a variable. This is important. It's just a symbol. You can think of it as a constant that the compiler chooses for you. This means that we can't do something like x++ with the bracket type, though this is perfectly valid with the pointer type. The pointer type is a variable. For the pointer type, we have 2 separate blocks of memory. The variable x itself is stored in the stack and is just a single pointer, but the big block of memory is stored on the heap. The variable x on the stack just stores the address of the big block of memory on the heap. One implication of this is with the size of operator. If you ask for the size of the bracket array, it will give you the size of the big block of memory, something like 40 bytes, but if you ask for the size of the pointer type of array, it will give you the size of the variable x itself, which on the appliance is likely just 4 bytes. Using the pointer-array type, it is impossible to directly ask for the size of the big block of memory. This isn't usually much of a restriction since we very rarely want the size of the big block of memory, and we can usually calculate it if we need it. Finally, the bracket array happens to provide us with a shortcut for initializing an array. Let's see how we could write the first 10 even integers using the shortcut initilization. With the pointer array, there isn't a way to do a shortcut like this. This is just an introduction to what you can do with arrays. They show up in almost every program you write. Hopefully you can now see a better way of doing the student IDs example from the beginning of the video. My name is Rob Bowden, and this is CS50.

Arrays

Documents

Transcript of Arrays