I wanted to write a short article on the importance of indexing your database tables when creating an application. This may sound like a small thing but the time savings you’ll see as a result are much larger.
More Than Code
If there is enough interest I may create several articles on the subject of database improvements and ways to increase efficiency in your application due to following best practices in your database layer. I mentioned it a bit earlier that it’s sounds like a small thing, but many developers neglect this step when they are creating their applications. There never seems to be a shortage of interest in improving page load speeds, increasing efficiency in the code which runs the various functions and conditionals for retrieving the data. However, it seems increasingly rare to find good developers interested in improving their application from a properly architected database.
I find this diminishing focus disturbing as the database interactions, queries, searches, and general structure account for a good amount of the processing time required when a page is loaded or data is retrieved.
In this article I want to look at the value of indexing your database columns, when you should use indexes, and what value they provide.
Let’s begin with a quick definition of the term. Indexing a database table means you are marking a particular column within a particular database table to be stored for use later. It may be helpful to think of the index as follows: When you insert a row into your table you are essentially adding that information to the end of every row and all the data currently existing within that table (that’s easy enough to understand). Now, consider the situation where you have a table consisting of a couple thousand rows (that’s not even a very large table). When you go to look up a particular piece of information you must perform a ‘table scan’ of every row and every value in every column of that table.
Let’s assume you have a database table to store all your contacts information. You collect 20 pieces of information on each person (first name, last name, company, address, city, state, facebook, twitter and so on). And you’re religious about entering information so you have now acquired 10,000 contacts. Congratulations you’ve done a fantastic job, but now you need to find that one person whose name you can’t quite remember but you know it has a wa in the last name somewhere. You perform a search on your contacts to find those contacts with the last name having a wa in it. Tick Tick Tick…that is an intensive search.
Now let’s look at that same situation only this time let’s imagine you have an index on the last name field. This means every time you insert a new record your index is updated and the new last name value added to the index. When you perform an index, this will typically be an alphabetical index with a list of corresponding row ID’s where that value is found. This is where it gets exciting.
The Math Behind The Scenes
Your index can be searched incredibly quickly by using a binary search. When performing a binary search you essentially can reduce your time by half (or more) with each row search.Rather than an “N” based search where N is the number of rows in your database table you can follow a much more efficient log(n) search instead. Math is so much fun! Here’s a wikipedia definition:
A binary search halves the number of items to check with each iteration, so locating an item (or determining its absence) takes logarithmic time.
Here’s a simple comparison: You can compare this to how you look a name up in a phonebook. If you know you’re looking for a name that starts with Hu, and you flip open the book to Kr and on a second row you find Ed then you know the result you want is roughly halfway between the two. You continue “splitting” the pages like this until you find the page and the number you need. Your database works much the same way. Instead of requiring you to search each and every row completely to the end, you can instead search only the indexed column (which has been sorted) and search much more quickly for the exact row you need.
The Downside of Indexes
Nothing is perfect in life and just as with other software architect choices there is a tradeoff you will need to consider when implementing indexes in your code. The process of adding a new row to an index does increase the time involved when storing a row. This is because it must add the new value to the index and then re-sort it. It’s a minor expense but one you should consider as you implement indexes. Be somewhat selective on what fields (columns) would be most benefited by adding an index.
The More You Know
Database architecting is a vitally important part of the designing phase of any application. If you are a programmer and focus most of your time working on creating cutting edge code you should take the time to also learn how to best create your database structure. There are many other opportunities for improving your development skills and your applications by examining your database schema and your query building. In future articles we’ll explore more of these techniques and way they can increase your skills. The more you know the better you’ll be.
Remember, we’re all in this together!