Data structures 1 – An introduction to data structures
When trying desperately to explain data structures to a colleague, I came to the rather shocking realisation that there is very little material out there for beginners to cut their teeth on. So, without further ado, here is my little primer on data structures and how they work. In each post, I will try to discuss the general principles and algorithms behind the data structure, and show as little code as possible in any given programming language. Where code is unavoidable, I will attempt to use a form of pseudo-code that should lend itself easily to implementation in most programming languages. Where code is really unavoidable, I’ll use Java.
Let’s start with the basics. What is a data structure? A data structure is essentially a way to store related types of data in a way that performing certain operations on the data is made more convenient. The operations might be as simple as being able to access the data quickly and efficiently on demand or as complex as sorting or searching through the data. The type of data structure used depends both on the type of data and the operation you want to perform on it.
Let’s make this easier with an analogy. Say you want to store notes from lectures or meetings. If the notes are on loose-leaf paper, you’d store it in a binder. The binder would then be one type of data structure. In this case, you can make some assumptions about the data. For example, you know that the paper is thin, and that you’re most likely to use only one size of paper, say A4. Now, if you only have a few pages of notes, it will be very easy for you to find the page you want, you simply go through the pages one at a time sequentially till you find the one you’re looking for. However, as your collection of notes grows, at some point, going through each page will no longer a feasible way of finding the desired page. At this point, you will either put in separators or start numbering the pages and make an index, so you can find what you were looking for quickly. You might even put the notes in different binders depending on their subject.
Now, imagine that you have a box full of balls with numbers on them. If you want to find a particular ball, you would have to randomly pick balls till you find the right one. What would you if you wanted to find the ball numbered 27 quickly? Why not put the balls on a shelf with the numbers clearly visible? In this case, the shelf is clearly the superior data structure when you want to find the quickly. So, the type of data structure that is best for a particular task (binder, box or shelf) depends both on the type of data (loose-leaf paper or balls) and what you want to do with it (just store it, read it through one page at a time, find the page you want quickly or find a ball with a particular number).
The simplest type of data structure is called an array. An array is a data structure that holds a fixed maximum number of a single type of data. In computer memory, an array is stored contiguously. This means that the array is stored as a single piece of memory. The shelf that we made to store the balls is an array. No matter how big the room is (the computer’s memory), the shelf can only store a set number of balls in a row.
Another feature of an array is random access. This means that you can easily access the ith element of an array, simply by knowing the number of the element. In our ball example, if you wanted to access the fifth ball from the left, and you knew the diameter of each ball, and also that the balls were arranged in a row so that each ball touched the next one, even if you weren’t able to see each ball, you would know exactly how far to the right you had to go to find the fifth ball, i.e. 5 x diameter. This is how a computer knows where to find a particular element of an array. It simply takes the size of the data type, multiplies it by the number of the element you want to access and adds the result to the array’s starting location. You can also access arrays sequentially by reading each element in succession.
An array in computer memory. The array is stored contiguously in memory (all in one go). Notice how the size of each element (the numbered rectangles) is the same. The sizes of each element and of the overall array are fixed. One can access each element of the array by index (the number) and sequentially (by starting at 1 and continuing till .
So, arrays provide the very best in both random and sequential access. So why not use arrays all the time? You don’t really need to know about any other data structure, right? Wrong. While arrays are convenient and fast, the convenience and speed come with some pretty nasty trade-offs. For one, because arrays are stored contiguously in memory, you have to tell the computer in advance what the maximum size of the array is. What’s worse, if in the middle of your program you decide that you want to increase or decrease the size of the array, you can’t. Furthermore, trying to access an array index that’s beyond the maximum size of your array can lead to program crashes in some languages and complete system lockups in others. So, unless you can guess pretty well how much memory your array is going to use, you’re either going to use lots of memory to store very little data, or end up not having enough space to store it. In our ball example, that would be like building a shelf 50 feet long, only to discover that you only had to store 5 pool balls or building a shelf designed to hold 50 balls, only to discover that some bright spark decided to send you 500.
So, in conclusion, arrays are a potentially very useful basic data structure, provided you can make reasonable assumptions about how much data you want to store. What if you don’t know? Well, that’s what linked lists are for. But I’ll cover those some other time. Next time, we’ll take a more detailed look at arrays.
ORIGINALLY POSTED BY CGKANCHI FOR THETAZZONE/TAZFORUM HERE
Do not use, republish, in whole or in part, without the consent of the Author. TheTAZZone policy is that Authors retain the rights to the work they submit and/or post…we do not sell, publish, transmit, or have the right to give permission for such…TheTAZZone merely retains the right to use, retain, and publish submitted work within it’s Network