What is SQL server Statistics?
SQL Server statistics will be used by optimizer to create the optimized execution plans where estimating the number of rows that can be returned, density of pages and statistics object will also hold a histogram of information about the distinctive number of rows and range of typical rows. All this information will be used by optimizer to estimate optimal execution plan to retrieve the data.
When do these statistics will be created?
SQL server will create statistics object when we create an index on table and statistics object will also be created by SQL server automatically when we use non-indexed column in a where condition of select queries (What this mean is we are missing an index there). Also we could just create the statistics manually.
Let’s see this with an example:
I have created a database and a table called dbo.employee (I copied it over from Adventurewroks2012 )which does not have any index after copied to a new database called statistics as below
When I run a simple select on this table like below pic (2) will result into a table scan in execution plan as it doesn’t have an index and will still create statistics pic (3 )to make use of them when the same query run every time.
Let’s create index pic (4) to get benefited as all we know seek is better than scan when we select for specific rows instead a whole set of rows.
When I created the index, now SQL server will create the statistics specifically for this index and these statistics will tell the optimizer how to use this index and get the data faster and optimal.
How these statistics will help Optimizer?
Let’s take the same above Simple select to see the estimated execution plan. Interesting now, the estimated execution plan is showing that the plan was just created by using the statistics object that was creates along with the index and displays the estimated rows that will returned and other information which estimated and this information was read by Optimizer from the statistics.
I’m not going into detail how the query was executed internally when the SQL hits the SQL engine to process the query but in general the optimizer which will be in relation engine will use this statistics to create the estimated plan and will handed over the plan to storage engine to get the data where the actual execution plan comes into picture.
As long as the estimated and actual plans are same, there will no performance issue as this means that optimizer had the updated statistics. When these statistics got not updated may result in actual plans different from estimated which result in not much accurate plans from optimizer and let the performance down.
When do these statistics get out of date?
Usually, statistics will be out of date or inaccurate when data in a table changes from time to time. By default statics for table will get updated when
- When an empty table gets a row
- A table had less than 500 and increased by 500 rows
- A table had less than 500 and increased by 500 rows +every 20 % of the total rows
- Trace flag 2371 which will change the fixed rate of the 20% threshold for update statistics into a dynamic percentage rate. The higher the number of rows in a table, the lower the threshold will become to trigger an update of the statistics. For example, if the trace flag is activated it means an update statistics will be triggered on a table with 1 billion rows when 1 million changes occur.
More info on this Trace flag can be found here TraceFlag2371 (I haven’t tested it though)
The best way to check the statistics are out of date is to verify the estimated number of rows in an execution plan to actual number of rows, if they both are almost same then we have the accurate stats and if not then time to update stats.
How to automate updating these statistics?
The database options to create and update statistics will let you do it automatically by SQL server or Optimizer. But how smart are they on a highly transactional databases and large TB databases with millions of data loads every minute to minute?
By default these are enabled.
How to perform manual updating these stats?
1. Below stored procedure will update the stats within whole single database
EXEC sp_updatestats; Or EXEC sp_updatestats ‘resample’
- Resample will use the most recent sample number rows to update the stats.
- sp_updatestats updates only the statistics that require updating based on the rowmodctr (row modification counter)information in the sys.sysindexes
2. Below update command will update the all stats for specific table or specific index if specified. Also this command will provided with multiple options. I will explain few of them which are important to note while doing this update on large TB of data.
UPDATE STATISTICS Table Name or View Name , index_name with Options
FULLSCAN: Will scan the entire number of rows in a table to update the stats.
SAMPLE NUMBER: If you are not like to do whole scan of the table rows (Time consuming) and update the stats, we have option to specify the sample number of rows or percent of rows to scan to update or create the stats.
NORECOMPUTE: If this option is specified then query optimizer completes this statistics update and disables future updates (AUTO_UPDATE_Statistics). We must carefull while using this option as it will turn off the auto stats for the specified table.
What happens to these stats when re-indexing or Re-organizing indexes?
When re-indexing the stats will also be re-created and there is nothing changed to stats when re-organizing the indexes
How to check when your stats were last updated?
1. One way to query sys.stats system table, something like below
select object_name(object_id) TableName ,name ,stats_date(object_id, stats_id) Last_updated from sys.stats where objectproperty(object_id, 'IsUserTable') = 1
2. the other way is
DBCC SHOW_STATISTICS (TableName, statsName)