Monday, June 18, 2012

Monitor Current SQL Server Processes Without Activity Monitor

As a database admin or one of the main guys in SQL Server admin group, we always want to see what's going on the sever level. Example: What are the session Id's that are active , Is there any head blocks occurring due to which the server is having high load or is there anyone running an expensive query that is sucking all the server resources. As a part of SQL Server tools , SQL Server management studio provides a utility called Activity Monitor that gives a very good picture of whats going on. We are here not to discuss the Activity Monitor  but another way by which you can figure out what going on in the server. The reason i am presenting this because many times when the load on the server increases , there are times Activity monitor can't even come up and you are stuck. 


This situation could  happen when the server is running on low memory,  high CPU usage or may be a process that is sucking all the server resources. I will present a query that will help monitor current SQL Server processes even when you can't bring up  Activity Monitor and  figure out what might be  the cause of down of the problem. There have been a number of times that I couldn't bring up the activity monitor and use this query to figure out the SessionId(SPID) that is causing the problems to the SQL Server. 


The following TSQL code queries the sys database and figure out the current processes on the SQL Server and also provides the total number of pages allocated to the process. You should sort it by total number of pages allocated (descending) to get the most expensive process running. 




 SELECT s.session_id    AS 'SessionId',
       s.login_name    AS 'Login',
       COALESCE(s.host_name, c.client_net_address) AS 'Host',
       s.program_name  AS 'Application',
       t.task_state    AS 'TaskState',
       r.start_time    AS 'TaskStartTime',
       r.[status] AS 'TaskStatus',
       r.wait_type     AS 'TaskWaitType',
       TSQL.[text] AS 'TSQL',
       (
           tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count
       ) +(
           tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count
       )               AS 'TotalPagesAllocated'
FROM   sys.dm_exec_sessions s
       LEFT  JOIN sys.dm_exec_connections c
            ON  s.session_id = c.session_id
       LEFT JOIN sys.dm_db_task_space_usage tsu
            ON  tsu.session_id = s.session_id
       LEFT JOIN sys.dm_os_tasks t
            ON  t.session_id = tsu.session_id
            AND t.request_id = tsu.request_id
       LEFT JOIN sys.dm_exec_requests r
            ON  r.session_id = tsu.session_id
            AND r.request_id = tsu.request_id
       OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) TSQL
WHERE  (
           tsu.user_objects_alloc_page_count - tsu.user_objects_dealloc_page_count
       ) +(
           tsu.internal_objects_alloc_page_count - tsu.internal_objects_dealloc_page_count
       ) > 0;




I would suggest you wrap this query in a view and query the view using order by TotalPagesAllocated DESC. You can then KILL the SPID that is causing an issue or figure out where the problem is coming from. I will briefly discuss the columns returned by this query : 


1)SessionID: The unique number assigned to a process connected to SQL Server. This is also called a SPID.


2) Login: The login to which the task is tied.


3) Host: The host from where the task is originated.


4) Application: The application that invokes this task.


5)  TaskState: The current state of the task. 


6) TaskStartTime: The time at which the task began. 


7) TaskStatus: The current status of the task. 


8) TaskWaitType: Indicates the event the task is  waiting on.


9) TSQL: Actual SQL that is being run. 


10) TotalPagesAllocated: This is sum of total pages allocated to the process. The higher number indicates the possibility of the process sucking the resources of the server. 


You can also set up an automatic job that can look at this view and perform some logic based on process running time or TotalPagesAllocated.That would be a good way of automate monitoring of server.   


I hope you have enjoyed reading this article. Please leave your comments . I will keep on improving my posts and will try to share good material.  



Thursday, June 7, 2012

Difference between Unique Key and Primary Key in Database that are most important and most infrequently discussed


I am going to discuss some of the most important differences between Unique Key and Primary key in databases. Typically this is a very common topic but the most misleading as well. I am not going to write down the already known differences but will discuss the things that one should understand before deciding the primary key and unique key in database design.
Fig 1

By definition , Primary key and Unique key technically means the same thing that is , a column or group of columns that can identify a uniqueness in a row. The role of both Unique key and primary key is the same in  this term (so technically they are same). The most common mistake that people do during database design is to create a primary key as ID int not null column and think that it is a unique identifier for the row. Although this is a correct statement but this primary key (Id column) is just a number added as an extra attribute to the table and technically does not belong to the definition of the data. Let me give you an example(See Fig 1):
The Fig 1 shows a table with Id column as PK . Now, this table consists of the Employee FName and LName. By looking at an example you can see that 1,John,Neville ; 2,John,Neville and so on is just a same thing. Adding Id column really did not make the row unique. So, Just by creating an Id column does not mean that you have  found a way to uniquely identify your row(Does not solve the problem). Basically what we are looking to do is to create a key that belongs to a data that can truly identify uniqueness.
A better way of designing a same table would be to add some more attributes to the table to make the row unique that also conforms to the meaning of the data. May be a add an SSN or something.
So the takeaway from this discussion is that always think about the unique key on the table in terms of the data related to the table and not by adding a value such as Id.

Creating a unique key that is data relevant helps the query statistics to remain consistent and helps the query optimizer to come with an optimal query execution plan that results in faster processing.

Another misconception that people have about Unique and Primary key is that Primary key always creates a Clustered Index and Unique key always creates a Non Clustered index. However this statement is true in a way that when you use a SQL server management studio and use a User Interface to create keys , it by default create clustered index on Primary key and Non Clustered index on Unique key  but it is not mandatory to do that. I would suggest always use scripts to create your keys.
So, it is not mandatory to create clustered index on primary or non clustered on unique key . You can create a clustered index on unique key and  non clustered on primary key.


ALTER TABLE dbo.Employee ADD  CONSTRAINT [PK_ID] PRIMARY KEY NONCLUSTERED(
[Id] ASC
)
This is a perfectly valid statement in SQL. So don't be confused with this indexing for keys.

So to sum up, Following are the things that needs to be keep in mind before thinking about UK and PK
1) How can I make this row unique without introducing the fake Id column. 
2) Technically Primary key and Unique key is the same thing
3) You can typically save the extra joins if there are some look up tables by figuring out the unique values. I will show this thing in my later blogs. 
4) Sql server by default does not allow nulls in PK but you can put nulls in a UK column. 

I hope you have enjoyed reading this blog. Please post on the comments if you have any thoughts to share. 

Monday, June 4, 2012

Deleting records from sql table efficiently

This blog is dedicated to let the sql community know that there is a better way of how we can delete the records from a table.
Delete keyword in SQL is a DML(Data Manipulation Language) statement and whenever delete statement is run, the sql engine has to get a lock on the table before it can delete the records.
If the table has millions of rows, then it will take a long time for delete to happen as well. Moreover, the delete also blocks the other DML statements to occur before it can complete it's job. Example: Other inserts , updates will be blocked by this delete because it has the table lock. This situation can lead to blocking on the database server and also increase in I/O , memory usage etc. To avoid to this situation and to mitigate the effect  , i am going to present a way in which we will store the ID's from the table to be deleted in to a table variable and then use the table  variable ID's to  delete the records from the original table. Using this solution, we are not locking the table for the time in which the data needs  to be fetched and also using the ID column which should have an index on it, making the access to data faster.


The following is the T SQL code of how you can do this:


DECLARE @IdsToBeDeleted TABLE (Id INT PRIMARY KEY);



INSERT INTO @IdsToBeDeleted 
  (
    Id
  )
SELECT Id
FROM   Table(These are the ID's that need to be deleted)
--No Locking until here



BEGIN TRANSACTION;


DELETE s
FROM   Table  s 
       JOIN @ IdsToBeDeleted  si
            ON  si.Id = s.Id


 COMMIT TRANSACTION;


This way you can make your delete's run efficiently and will have lower load on the server.
I am thankful that you have gone through the technique i have mentioned here. Please leave some comments and let me know what you think or if you have any questions.
Abhi