Monday, November 30, 2015

Clustered Index vs Non Clustered Index

What is Indexing

Index is one of the database object and indexing can be done in one or more columns of the database table. The purpose of doing indexing is simple i.e, to speed up the retrieval process of rows in the table. Indexing can be applied to tables or views to speed up the retrieval process. Index can be considered as “copy” of the data of table and will be sorted based on some way. For example if we have a table called “Category” as shown below -
Category Table
CategoryID INT - PK
CategoryName VARCHAR(50)
CategoryAdded DATETIME
CategoryDeleted DATETIME
As shown above in the table structure “Category ID” is a primary key and will default have the clustered index. So all primary keys will default have the clustered index associated with it. Now consider another scenario where in we search the records based on “Category Name” and say the table has more than 1 million records in that table it will surely give the performance issues while filtering records based on “Category Name” as it does “full table scan” internally.
So to avoid the performance issues we can set “Category Name” as Non clustered Index. So now it sorts the data based on “Category Name” and stores it in the disk of the machine and hugely increases the performance as well.

Clustered Index

Let’s add some data into the table –“Category” and let’s see how the table search happens through query and let’s see how we can minimize number of comparisons using clustered index –
Category IDCategory NameCategory AddedCategory Deleted
1Category1NULLNULL
2Category2NULLNULL
3Category3NULLNULL
4Category4NULLNULL
5Category5NULLNULL
Below is the sample image of how the table scan is happened without clustered index -
Table Index

Below is the sample image of how the table data is structured after clustered index -
clustered index

As shown in the first figure above, table – Category have primary key – “Category ID” which is by default have a clustered index. Primary key as we know does not allow null and does not allow duplicate values.
Below are the list of points to be kept in mind while creating a Clustered Index -
  • There could be only one clustered index allowed in a table.
  • It is good to add the clustered index to the column which is used mostly in “where” clause the most. Because it greatly affects the performance of the table.
  • Whenever index is changed, reordering of data is happened in the table.
  • When clustered index is created it forms a binary tree and binary tree info stored in the disc separately. Sample binary tree diagram shown above.
  • During Update, Delete, Insert statements reordering will happen in the table.
Now let’s see how the data compared in binary tree in a query. Suppose I have a query to find the category having CategoryID = 5 –
Select * from Category where Category ID = 5
clustered index iteration

As shown in the above binary tree figure -
  • 5 is matched with 3, 
  • 5 is matched with 4
  • 5 is matched with 5 so only 3 comparisons.
So here total comparisons are 3 which is less compared to comparisons without clustered index 5.

Non Clustered Index

Non clustered index will be helpful for the situation where in one column if we have multiple repeated values. For example “Category Added” column can have same date in case category added at the same date and time.
Non clustered index has to be manually created unlike clustered index which will be created on creating the primary key for a table. We can add more than one non clustered index in a table and we can combine more than one column for non-clustered index.
Non clustered index always depend on the clustered index of the table. This can be explained with an example of book. By looking at the index page we can figure out the topic from the page number as it is already sorted. And while searching for the page number (say 200) you may end up being in page 300 then you will go back to find the page number – 200 or if you end up being in page 100 then you will go forward to find the page number – 200.
So this concept is similar to clustered index concept of binary tree. So from this concept we can make out non clustered index depend on clustered index.
Below are the points of Non-Clustered index has to be kept in mind while creating Non-Clustered Index
  • Non clustered index acts like a pointer to the data for the rows in a table.
  • Non clustered index will depend on the number of records which are fetched from our application as well. If it is more that make sense to add a non-clustered index to the column on which we are filtering the data. Meantime adding more non clustered index in a table can affect the performance of the table as well.
  • In case column other than primary key marked as clustered index then primary key will by default be non clustered index.
  • We can create a non-clustered index by expanding the “category” table and right click on the indexes folder. So here Non clustered index will be like pointers to the binary tree created in clustered index and while fetching the data it check the data to be traversed from the pointer.
  • If the table has not indexed it would by default use the heap structure and do a table scan. So once after creating index it will use “seek” which does not compare all the records and it shown in the above comparisons as well.

No comments: