What is Indexing
Index is one of the database object and indexing can be done in one or more columns of the database table. The purpose of doing indexing is simple i.e, to speed up the retrieval process of rows in the table. Indexing can be applied to tables or views to speed up the retrieval process. Index can be considered as “copy” of the data of table and will be sorted based on some way. For example if we have a table called “Category” as shown below -
Category Table
CategoryID INT - PK CategoryName VARCHAR(50) CategoryAdded DATETIME CategoryDeleted DATETIME
As shown above in the table structure “Category ID” is a primary key and will default have the clustered index. So all primary keys will default have the clustered index associated with it. Now consider another scenario where in we search the records based on “Category Name” and say the table has more than 1 million records in that table it will surely give the performance issues while filtering records based on “Category Name” as it does “full table scan” internally.
So to avoid the performance issues we can set “Category Name” as Non clustered Index. So now it sorts the data based on “Category Name” and stores it in the disk of the machine and hugely increases the performance as well.
Clustered Index
Let’s add some data into the table –“Category” and let’s see how the table search happens through query and let’s see how we can minimize number of comparisons using clustered index –
Category ID | Category Name | Category Added | Category Deleted |
---|---|---|---|
1 | Category1 | NULL | NULL |
2 | Category2 | NULL | NULL |
3 | Category3 | NULL | NULL |
4 | Category4 | NULL | NULL |
5 | Category5 | NULL | NULL |
Below is the sample image of how the table scan is happened without clustered index -
Below is the sample image of how the table data is structured after clustered index -
As shown in the first figure above, table – Category have primary key – “Category ID” which is by default have a clustered index. Primary key as we know does not allow null and does not allow duplicate values.
Below are the list of points to be kept in mind while creating a Clustered Index -
- There could be only one clustered index allowed in a table.
- It is good to add the clustered index to the column which is used mostly in “where” clause the most. Because it greatly affects the performance of the table.
- Whenever index is changed, reordering of data is happened in the table.
- When clustered index is created it forms a binary tree and binary tree info stored in the disc separately. Sample binary tree diagram shown above.
- During Update, Delete, Insert statements reordering will happen in the table.
Now let’s see how the data compared in binary tree in a query. Suppose I have a query to find the category having CategoryID = 5 –
Select * from Category where Category ID = 5
As shown in the above binary tree figure -
- 5 is matched with 3,
- 5 is matched with 4
- 5 is matched with 5 so only 3 comparisons.
So here total comparisons are 3 which is less compared to comparisons without clustered index 5.