SQL Server Indexes Management Overview

Outlines the SQL Server Indexes types and management. Illustrates the evolution of the dat

PDF Format
Outlines the SQL Server Indexes types and management.

Illustrates the evolution of the databases in forms of the RDBMS concepts using the Binary Tree and the B-Tree algorithms.

Providing the T-SQL statements to shrink and defrag databases.

Author

Introduction

Tree Search Algorithms

SQL Language

Microsoft SQL Server Overview

DBMS

RDBMS

Data Access Protocols

Indexes

Database Shrinking

References

 

 

 

Author

 

Ahmad Adel Gad was born in Egypt-Cairo on June, 1980.

He was graduated from faculty of engineering (Civil Engineering Department) on 2003.

He found himself in Software Developing rather than the civil engineering since the first year in the college as they were teaching them how to write a code since the first year in order to implement applications to support their work.

After he was graduated from college by two years he decided to shift to the Software Engineering.

On the early 2007, he attended a scholarship for Software Engineering in C#.

On the early of 2008, he had certified as Microsoft Solutions Developer (MCSD) for C# 2003.

On the middle of 2008 he had certified on (Microsoft Certified Trainer) MCT where he got an official license from Microsoft to deliver Microsoft official courses.

On 2004 he started to develop Desktop Applications for Windows with his partner the CEO Wael Ibrahim Arafa in his company Arafasoft and his role was the executive manager for the company.

Currently, Ahmad is working as fulltime in the IT field and as freelancer in the following fields as training, consultation and developing with the two the languages VB.Net and C#.Net:

  1. SharePoint Developing/Administration
  2. Net Applications
  3. Windows Applications
  4. SQL Server Developing/Administration
  5. AJAX Implementation (JavaScript/jQuery + XML Web Service)
  6. Cryptography/Digital Signature Consultation

MCTS - Windows Application 3.5
MCTS - ADO.Net 3.5
MCPD - Web Developer 4
MCTS - .NET Framework 4, Windows Applications
MCTS - Programming in HTML5 with JavaScript and CSS3
MCITP - Database Administrator 2008
MCTS - Microsoft SQL Server 2008, Database Development
MCTS - Core Solutions of Microsoft SharePoint Server 2013
MCTS - Advanced Solutions of Microsoft SharePoint Server 2013

ITIL Foundation
MCT
MCAD
MCSD
MCPD Enterprise
MCTS - SQL Server 2005
MCPD – SharePoint 2010
MCITP – SharePoint 2010
MCTS - Web Application 3.5

Currently, Ahmad is officially certified as:

Introduction

This article will handle the SQL database shrinking concept in consideration of the fragmentations after the shrinking and how rebuild the index to reduce those fragmentations.

But before going in the technical details, this article will take the readers in backward to illustrate the evolution of the databases and the RDBMS (Relational Database Management System) using the Binary Tree and B-Tree algorithms which were evaluated to create the modern RDBMS databases with the required indexes.

Before inventing this magnificent algorithm the computers’ users or the developers used to store their data in plain text files (flat files) with different formats, and the most popular one is the Comma Separated Values (CSV).

This method was very poor including the following shortfalls:

  1. Not secured and can be accessible by anyone and many simple tools.
  2. Unicode characters not allow.
  3. Performance as the only way to get some certain data is by reading the whole file sequentially.
  4. Relational Model not supported, and to overcome this scenario, they were creating file for each table and maintaining the consistency was very complicated and kind of impossible.

Therefore, the Binary Tree was a huge revolution of the database concepts while it solved almost most of the plain text’s short falls. And by developing this algorithm as B-Tree and later B+Tree, the RDBMS became the common concept with any new database storage system.

 

 

Tree Search Algorithms

The tree methodologies were invented to store and search the data to ease and speeding of the data query and manipulation (inserting, updating and deleting). It called a tree or a reversed tree, because the root contains the keys while the tree leafs contain the data.

  • Binary Tree

The Binary Tree is a tree data structure into two nodes, which each node has at most two children, which are referred to as the left child and the right child. The following figures are giving an overview how the data is sorted using the Binary Tree.

   

The Binary Tree was a smart invention to sort and query the data but it contains a remarkable disadvantage which is represented by its name (Binary). Each node can be branched only into two sub nodes and each sub node can be branched only into two nodes until the end of the tree. In most cases some nodes can grow more than others and we can get a tree with very short branches and very long branches. In this case the search could take longer to query some data rather than others.

  • B-Tree

In 1972, Rudolf Bayer and Ed McCreight, both working at Boeing at the time, were looking for a way to overcome some of the shortfalls of the binary tree. Their solution was the B-Tree.


The B-Tree is the evolution of the Binary Tree algorithm which overcomes its disadvantage.
In the B-Tree, each node can be branched into many nodes as required to keep the tree nodes balanced and symmetric as possible as could be.

Unlike the binary search trees, the B-Tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and file systems.

 

The “B” character confused many about the name, some of them where referring to Binary Tree wrongly, but the most common naming is “Balanced”.

For a long time it was unclear what the "B" in the name represented. Candidates discussed in public where "Boeing", "Bayer", "Balanced", "Bushy" and others. In 2013, the B-Tree had just turned 40, Ed McCreight revealed in an interview, that they intentionally never published an answer to this question. They were thinking about many of these options themselves at the time and decided to just leave it an open question. The CPU processing of the search for Binary Tree is the same as B-Tree, but for the memory consuming (RAM) B-Tree consumes less because the node would be much shorter than the binary one.

 

  • B+Tree

B-Trees are a lot more efficient than binary trees when it comes to updates, but some operations can still turn out expensive, depending on where the node that will hold the new or updated data lives in the tree. Therefore, another optimization was made to B-Trees to help with this problem. Instead of treating all nodes equal, the new structure has two types of nodes. The lowest level nodes, also called leaf nodes, hold the actual data. All other nodes including the root node only hold the key values and pointers to the next nodes. This type of tree is called a B+Tree as seen the following figure:

 

There are no limitations on the number of key-pointer-pairs or data rows within a node. The only limitation which is a significant advantage in the same time is that, all leaf nodes have the same distance from the root node. That means that the work to seek for a particular data point is always the same, no matter what the key value is. It also keeps updates very localized in the tree. I might have to move a few rows to a new node during an update but then I probably only need to change a single parent node to integrate that new node. It is however possible for a single change to affect every level of the tree, but those changes are rare.

 

  • B+Trees in SQL Server

SQL Server stores its indexes in B+Tree format. There are a few exceptions - for example temporary hash indexes, created during a hash join operation, or column store indexes, which are not really indexes at all. However, all key-based clustered and non-clustered persisted SQL Server indexes are organized and stored as B+Trees.

 

Each node in such a tree is a page in SQL Server terms and you will find two page types in each index. The first type are the data pages (pages of type 1). Each leaf level node in a SQL Server B+Tree index is a single data page. The second type are intermediate index pages (pages of type 2). Each node in an index B+Tree that is not a leaf level node is a single page of type 2. Those pages contain rows just like the data pages. But in addition they contain a pointer for each row that identifies the next child page. That child page can be either of type 1 or of type 2, depending on the location in the B+Tree.

 

  • UB-Tree

The UB-Tree as proposed by Rudolf Bayer and Volker Markl is a balanced tree for storing and efficiently retrieving multidimensional data. It is basically a B+ tree with records stored according to Z-order, also called Morton order. Z-order is simply calculated by bitwise interlacing the keys.

 

Insertion, deletion, and point query are done as with ordinary B+ trees. To perform range searches in multidimensional point data, however, an algorithm must be provided for calculating, from a point encountered in the data base, the next Z-value which is in the multidimensional search range.

SQL Language

  • History

The first version of SQL (Structured Query Language) was developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL (Structured English Query Language).

SQL was designed to manipulate and retrieve data stored in IBM's original relational database product, System R. The SQL language was later formally standardized by the American National Standards Institute (ANSI) in 1986. Subsequent versions of the SQL standard have been released as International Organization for Standardization (ISO) standards.

 

The SQL language was standardized and developed with many vendors as illustrated in the following table:

Source

Common Name

Full name

ANSI/ISO Standard

SQL/PSM

SQL/Persistent Stored Modules

Interbase / Firebird

PSQL

Procedural SQL

IBM DB2

SQL PL

SQL Procedural Language (implements SQL/PSM)

IBM Informix

SPL

Stored Procedural Language

IBM Netezza

NZPLSQL

(based on Postgres PL/pgSQL)

Microsoft / Sybase

T-SQL

Transact-SQL

Mimer SQL

SQL/PSM

SQL/Persistent Stored Module (implements SQL/PSM)

MySQL

SQL/PSM

SQL/Persistent Stored Module (implements SQL/PSM)

MonetDB

SQL/PSM

SQL/Persistent Stored Module (implements SQL/PSM)

NuoDB

SSP

Starkey Stored Procedures

Oracle

PL/SQL

Procedural Language/SQL (based on Ada)

PostgreSQL

PL/pgSQL

Procedural Language/PostgreSQL (based on Oracle PL/SQL)

PostgreSQL

PL/PSM

Procedural Language/Persistent Stored Modules (implements SQL/PSM)

Sybase

Watcom-SQL

SQL Anywhere Watcom-SQL Dialect

Teradata

SPL

Stored Procedural Language

SAP

SAP HANA

SQL Script

 

 

  • Language Elements

The SQL language is subdivided into several language elements, including:

  • Queries, which retrieve the data based on specific criteria. This is an important element of SQL.
  • Clauses, which are constituent components of statements and queries. (In some cases, these are optional.) (Where clauses)
  • Expressions, which can produce either scalar values, or tables consisting of columns and rows of data. (Select expression From ….;)
  • Predicates, which specify conditions that can be evaluated to SQL three-valued logic (3VL) (true/false/unknown) or Boolean truth values and are used to limit the effects of statements and queries, or to change program flow.
  • Statements, which may have a persistent effect on schemata and data, or may control transactions, program flow, connections, sessions, or diagnostics. (DDL/DCL)

 

  • T-SQL Languages

The following languages are popular in any RDBMS SQL language, but we are going to expand the commands which language in prospective of the Microsoft T-SQL.

  • DQL (Data Query Language) – Select
  • DML (Data Manipulating Language) – Insert, Update and Delete
  • DDL (Data Definition Language) – Create, Alter and Drop
  • DCL (Data Control Language) – Grant, Deny and Revoke
  • TCL (Transaction Control Language) – Begin Tran, Commit Tran and Rollback Tran

Microsoft SQL Server Overview

  • Microsoft & Sybase

Prior to version 7.0 the code base for MS SQL Server was sold by Sybase SQL Server to Microsoft, and was Microsoft's entry to the enterprise-level database market, competing against Oracle, IBM, and, later, Sybase. Microsoft, Sybase and Ashton-Tate originally worked together to create and market the first version named SQL Server 1.0 for OS/2 (about 1989) which was essentially the same as Sybase SQL Server 3.0 on Unix, VMS, etc. Microsoft SQL Server 4.2 was shipped around 1992 (available bundled with IBM OS/2 version 1.3). Later Microsoft SQL Server 4.21 for Windows NT was released at the same time as Windows NT 3.1. Microsoft SQL Server v6.0 was the first version designed for NT, and did not include any direction from Sybase.

About the time Windows NT was released in July 1993, Sybase and Microsoft parted ways and each pursued its own design and marketing schemes. Microsoft negotiated exclusive rights to all versions of SQL Server written for Microsoft operating systems. (In 1996 Sybase changed the name of its product to Adaptive Server Enterprise to avoid confusion with Microsoft SQL Server.) Until 1994, Microsoft's SQL Server carried three Sybase copyright notices as an indication of its origin.

SQL Server 7.0 and SQL Server 2000 included modifications and extensions to the Sybase code base, adding support for the IA-64 architecture. By SQL Server 2005 the legacy Sybase code had been completely rewritten.

Since the release of SQL Server 2000, advances have been made in performance, the client IDE tools, and several complementary systems that are packaged with SQL Server 2005. These include:

  • An extract-transform-load (ETL) tool (SQL Server Integration Services or SSIS)
  • A Reporting Server
  • An OLAP and data mining server (Analysis Services)
  • Several messaging technologies, specifically Service Broker and Notification Services
  • SQL Server Release History

Version

Year

Release Name

Code Name

Internal Version

1.0 (OS/2)

1989

SQL Server 1.0 (16 bit)

Ashton-Tate /

Microsoft SQL Server

-

1.1 (OS/2)

1991

SQL Server 1.1

(16 bit)

-

-

4.21 (WinNT)

1993

SQL Server 4.21

SQLNT

-

6

1995

SQL Server 6.0

SQL95

-

6.5

1996

SQL Server 6.5

Hydra

-

7

1998

SQL Server 7.0

Sphinx

515

-

1999

SQL Server 7.0

(OLAP Tools)

Palato mania

-

8

2000

SQL Server 2000

Shiloh

539

8

2003

SQL Server 2000

Liberty

539

64-bit Edition

9

2005

SQL Server 2005

Yukon

611/612

10

2008

SQL Server 2008

Katmai

661

10.25

2010

Azure SQL DB

Cloud Database

or CloudDB

-

10.5

2010

SQL Server 2008 R2

Kilimanjaro (aka KJ)

665

11

2012

SQL Server 2012

Denali

706

12

2014

SQL Server 2014

SQL14

782

 

 

 

DBMS
Database Management Systems (DBMS) are computer software applications that interact with the user, other applications, and the database itself to capture and analyze data. A general-purpose DBMS is designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, Sybase and IBM DB2. A database is not generally portable across different DBMSs, but different DBMS can interoperate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one DBMS. Database management systems are often classified according to the database model that they support; the most popular database systems since the 1980s have all supported the relational model as represented by the SQL language.

RDBMS
A Relational Database Management System (RDBMS) is a database management system (DBMS) that is based on the relational model as invented by E. F. Codd, of IBM's San Jose Research Laboratory.
Many popular databases currently in use are based on the relational database model.

RDBMSs are a common choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s. Relational databases have often replaced legacy hierarchical databases and network databases because they are easier to understand and use. However, relational databases have been challenged by object databases, which were introduced in an attempt to address the object-relational impedance mismatch in relational databases, and XML databases.

In 1974, IBM began developing System R, a research project to develop a prototype RDBMS. Its first commercial product was SQL/DS, released in 1981.However, the first commercially available RDBMS was Oracle, released.

In 1979 by Relational Software, now Oracle Corporation.[6] Other examples of an RDBMS include DB2, SAP Sybase ASE, and Informix.

Data Access Protocols

Any DBMS requires a connectivity driver/provider to be installed on the client’s machine.

This is very close to the PC hardware drivers when it is required to install them into the PC in order to be able to connect the driver. Ex, the printer was coming in the past with Parallel Port connection and later with USB port connection, and in order to prepare a PC to connect it, the proper driver get to be installed compatible with the connected port.

The same with the DBMS, the prober driver/provider get to be installed on the client device to connect to the target DBMS.

We will list the common standard known providers to the developers:

  • ODBC

In computing, ODBC (Open Database Connectivity) is a standard programming language middleware API for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

ODBC accomplishes DBMS independence by using an ODBC driver as a translation layer between the application and the DBMS. The application uses ODBC functions through an ODBC driver manager with which it is linked, and the driver passes the query to the DBMS. An ODBC driver can be thought of as analogous to a printer or other driver, providing a standard set of functions for the application to use, and implementing DBMS-specific functionality. An application that can use ODBC is referred to as "ODBC-compliant". Any ODBC-compliant application can access any DBMS for which a driver is installed. Drivers exist for all major DBMSs, many other data sources like address book systems and Microsoft Excel, and even for text or CSV files.

ODBC was originally developed by Microsoft during the early 1990s, and became the basis for the Call Level Interface (CLI) standardized by SQL Access Group in the Unix and mainframe world.

  • OLEDB

OLE DB (Object Linking and Embedding Database), sometimes written as OLEDB or OLE-DB.
It is an API designed by Microsoft, allows accessing data from a variety of sources in a uniform manner. The API provides a set of interfaces implemented using the Component Object Model (COM); it is otherwise unrelated to OLE. Microsoft originally intended OLE DB as a higher-level replacement for, and successor to, ODBC, extending its feature set to support a wider variety of non-relational databases, such as object databases and spreadsheets that do not necessarily implement SQL.

 

  • OLEDB vs ODBC

The differences between these two data access protocols are technical, but in the most general terms, OLE DB is newer, more advanced, and compatible with a wider selection of database technologies. In addition, OLE DB is more general, in that it includes the ODBC functionality.

 

Technically speaking, ODBC (Open Database Connectivity) is designed to provide access primarily to SQL data in a multi-platform environment. OLE DB (Object Linking and Embedding Database) is designed to provide access to all types of data in an OLE Component Object Model (COM) environment. OLE DB includes the SQL functionality defined in ODBC but also defines interfaces suitable for gaining access to data other than SQL data.

A query generated via ODBC is functionally similar to a query generated by OLE-DB. However, be aware that if you generate a query (via STATISTICA Query) using an ODBC connection, STATISTICA uses OLE DB to connect to the ODBC connection. Because there is an intermediate connection, ODBC queries may be slower than OLE DB queries.

  • JDBC

JDBC is a Java Database Connectivity technology (Java Standard Edition platform). This technology is an API for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-accessible data source in the JVM host environment.

Sun Microsystems released JDBC as part of JDK 1.1 on February 19, 1997. Since then it has been part of the Java Standard Edition.

The JDBC classes are contained in the Java package java.sql and javax.sql.

  • Net Data Provider

An ADO.NET data provider is a software component that interacts with a data source comes with the Microsoft .Net Framework. ADO.NET data providers are analogous to ODBC drivers, JDBC drivers, and OLE DB providers.

ADO.NET providers can be created to access such simple data stores as a text file and spreadsheet, through to such complex databases as Oracle Database, Microsoft SQL Server, MySQL, PostgreSQL, SQLite, IBM DB2, Sybase ASE, and many others. They can also provide access to hierarchical data stores such as email systems.

However, because different data store technologies can have different capabilities, every ADO.NET provider cannot implement every possible interface available in the ADO.NET standard.

Starting from Microsoft .Net Framework v1.1, the developers don’t need to install any extra API’s on the clients’ machines to connect to both SQL Server and Oracle Database Servers.

Just one important remark, the Microsoft ADO.Net Provider can connect to Microsoft SQL Server v7.0 and higher. If the developers want to connect to Microsoft SQL Server 6.5 or lower, they get to use the ODBC or OLEDB connection.

 

 

Indexes

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed.

Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

An index is a copy of select columns of data from a table that can be searched very efficiently that also includes a low-level disk block address or direct link to the complete row of data it was copied from.

  • Index Architecture & Types

The data generally stored into records and each record cannot exceed the size of 8 KB (except the very large data such as Binary, VarChar(Max), nVarChar(Max), Image, etc which are stored in different pages with different structure). The records in turn are stored in pages and each page cannot exceed the size 8 KB. The pages are grouped in extents while each extent can contain up to 8 pages. That means each extent cannot exceed 64 KB.

The data in a table without any index created, are located in a logical partition calls Heap, while with the index structure, that data is stored in the Leaf Level (LL), and the index definition keys are stored in the Intermediate Level (IL) which is calls also as Non-Leaf Level.

If the table doesn’t contain any index (Clustered or Non-Clustered), the insert of the data would be very fast because all the new records are added sequentially into the heap without taking time to reorder them based on a sorting criteria such as in the case of the indexes. But in the other hand, the search, update or delete would be very slow in comparing with the index because it needs to scan the whole pages in the heap to retrieve the required data, update them or delete them.

In SQL Server databases, each table each table can only contain one Clustered Index and up to 249 or 999 of Non-Clustered Indexes based on the SQL version.

Additional two types of indexes (XML and Spatial) were added initiated by SQL Server 2005 and SQL Server 2008 respectively. The following points are representing the four types of indexes (Clustered, Non-Clustered, XML and Spatial).

 

 

  • Clustered Index

The cluster index alters the data block into a certain distinct order to match the index, resulting in the row data being stored physically in order for a specific column or combined with multiple column which calls composite index. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval. Usually it comes with the primary key index but not a necessary.

If a database table contains a clustered index, there won’t be an existing for heap because all the data would be restructured on the Leaf Level with the extension of the intermediate level (Non-Leaf Level).
The following figure illustrates the process of finding rows in a Clustered Index:

Syntax:
Create Clustered Index Index_Name On Table_Name(Col1, Col2, …);
Create Unique Clustered Index Index_Name On Table_Name(Col1, Col2, …);

 

  • Non-Clustered Index

The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and the row number in the data page in page-organized engines; row offset in file-organized engines).

The database table can contain one or more Non-Cluster index plus Clustered Index or without.

In case of a Clustered Index plus Non-Clustered Index, the data exists in the leaf level of the Clustered Index while the Non-Clustered Index is creating a tree which is pointing into the data in the Leaf Level of the clustered index as represented the next figure:

In case of a Non-Clustered Index without a Clustered Index, the data will be still lying in the heap while the Non-Clustered Index’s is pointing to the data into the heap.

The next figure illustrates the process of finding rows in a Heap with a Non-Clustered Index:

 

 

Syntax:
Create NonClustered Index Index_Name On Table_Name(Col1, Col2, …);
Create Unique NonClustered Index Index_Name On Table_Name(Col1, Col2, …);

Note: If the index type (Clustered/NonClustered) not mentioned whether it is unique or not in the syntax, the index is created as “NonClustered”.

  • Fill Factor

The Fill Factor is the defined ratio of filling the data into each page in the leaf level with each index created.
If no Fill Factor is defined within an index, the page is filled with 100% of its capacity (8 KB).

Note: If the Fill Factor is defined as 0%, it would be the same respect of 100% which means all the pages is filled with data.

It is very significant in performance to specify a fill factor to keep reserved empty blank space in each page for updating data or deleting data. If the page is already 100% full and new item should be inserted into this page, the page would be split in order to hold the new record based on its order, while if it was having a predefined fill factor, the record would have a place to be stored in. Defining the fill factor ratio should be calculated based on the amount of data manipulation (insert, update or delete).

The disadvantage of the Fill factor, that is preserve extra disk space and it increases with the increase of the Fill factor ratio, therefore, it must be revised wisely.

In SharePoint Content Databases, the fill factor is 80%.

It is healthy to rebuild the index from time to other to organize the data into the pages based on the defined fill factor which could increase the data base size on the disk but in return, it maintains the performance.
Syntax:
Create Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With FillFactor = Fill_Ratio_As_Integer;

Create Unique Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With FillFactor = Fill_Ratio_As_Integer;

 

  • Pad Index

The Pad Index is optional keyword defined with the index creation statement to preserve the same defined Fill Factor value to the Intermediate Level pages (Non-Leaf Level) as well; otherwise, it will be 100% by default. (The intermediate Level fill factor cannot be different than the leaf level fill factor).
Syntax:
Create Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With Pad_Index, FillFactor = Fill_Ratio_As_Integer;

Create Unique Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With Pad_Index, FillFactor = Fill_Ratio_As_Integer;

 

 

  • XML Index

New index type was initiated by SQL Server 2005. XML indexes can be created on xml data type columns. They index all tags, values and paths over the XML instances in the column and benefit query performance. Your application may benefit from an XML index in the following situations:

  • Queries on XML columns are common in your workload. XML index maintenance cost during data modification must be considered.
  • Your XML values are relatively large and the retrieved parts are relatively small. Building the index avoids parsing the whole data at run time and benefits index lookups for efficient query processing.

XML indexes fall into the following categories:

  • Primary XML index
  • Secondary XML index

The first index on the xml type column must be the primary XML index. Using the primary XML index, the following types of secondary indexes are supported: PATH, VALUE, and PROPERTY. Depending on the type of queries, these secondary indexes might help improve query performance.

 

  • Spatial Index

Starting from SQL Server 2008 and higher are supporting spatial data and spatial indexes. A spatial index is a type of extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography.

The following table is representing the maximum indexes count per table:

Version

Columns/Table

Columns/Index Key

Clustered Indexes

Non-Clustered Indexes

XML Indexes

SQL Server 2005

1024

16

1

249

249

SQL Server 2008

1024*

16

1

999

249

SQL Server 2008 R2

1024*

16

1

999

249

SQL Server 2012

1024*

16

1

999

249

SQL Server 2014

1024*

16

1

999

249

* In wide tables which is initiated by SQL Server 2008, it could take up to 30,000 columns per table and also up to 1,000 indexes but with maximum tow size of 8KB such as the non-wide table.

Database Shrinking

When the database is consuming much space on the hard-disk with much unallocated data, we can think seriously on database shrinking which is trimming the unallocated data on the disk.

The unallocated data is caused by the much manipulation operations such as the data updating and deleting; because if the data is moved from a page to another due to an update or even delete, the reserved space on the disk is not freed up, furthermore, it may increase the index fragmentation.

  • Shrinking Log Files

The log can truncated with the Transaction Log backups. Otherwise, to shrink a log file, the recovery model of database mush be set to “Simple” first and then run the “DBCC SHRINKFILE” command.

Syntax:
USE DatabaseName;
GO
DBCC SHRINKFILE (LogFileName);
or
DBCC SHRINKFILE (LogFileName, NewFileSize_in_MB);

  • Shrinking Data Files

The data files can be shrunk with any recovery model using the “DBCC SHRINKFILE”:
Syntax:
USE DatabaseName;
GO
DBCC SHRINKFILE (DataFileName);
or
DBCC SHRINKFILE (DataFileName, NewFileSize_in_MB);

To avoid the data file size expansion after the defragmentation (index rebuilding) which would be performed after the shrinking, always specify the new file size which is equal or greater than the current allocated size with extra few gigabytes.

  • Check Fragmentations

Use Database_Name;

Go

SELECT (Sch.Name + '.[' + T.Name + ']') As TableName, I.Name AS IndexName, INS.index_type_desc As IndexType, Fill_Factor, Page_Count, INS.avg_fragmentation_in_percent

FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) As INS

INNER JOIN sys.indexes I ON I.object_id = INS.object_id AND I.index_id = INS.index_id

Inner Join sys.Tables As T On T.object_id = INS.object_id

Inner Join sys.Schemas As Sch On T.Schema_ID = Sch.Schema_ID

WHERE I.Name IS Not Null And INS.avg_fragmentation_in_percent > 0

ORDER BY INS.avg_fragmentation_in_percent DESC;

You can always check the database fragmentation using the following T-SQL query:

 

 

 

  • Defragmentation

The defragmentation is done basically by rebuilding the index or reorganizing it using the Alter Index command which would replace the usage of the two commands “DBCC INDEXDEFRAG” and “DBCC DBREINDEX” in a future release of Microsoft SQL.

Defragmentation can be performed by rebuilding (if fragmentation is above 30%) and reorganizing (if fragmentation is equal to or less than 30%) all the indexes in the database using one of the following attached T-SQL scripts:

  1. Defrag_Database.sql” can be executed this script direct using the parameters listed on the underneath table.
  2. JSP_Defrag_Database - Stored Procedure.sql” can be executed once per a time to create the stored procedure in the memory until the current query session is closed. This Stored Procedure can be used to defrag multiple databases in the same opened query session using the following syntax:
Execute ##JSP_Defrag_Database @DataBase = 'AdventureWorks2008R2';

 

Parameter Name

Required/Optional

Data Type

Default Value

Values Range

Description

@DataBase

Optional

nVarChar(1000)

Null

N/A

The target database to be defragged. If not set, it would defrag the current connected database.

@FillFactor

Optional

SmallInt

-1

0 - 100

The fill factor ratio; between 0 to 100 while 0 is the same respects as 100.

@Online

Optional

VarChar(3)

On

On | Off

If set to “On”, the database would be kept in the “Online” mode, if set to “Off” the database would be set to the “Offline” mode.

@XMLOnlineMode

Optional

VarChar(10)

Reorg

Off | Reorg

Both XML and Spatial indexes are not supported in rebuilding with “Online” mode (@Online = ‘On’), therefore, it must be to set wither to “Off” to be rebuilt with “Offline” mode, or “Reorg” to be reorganized instead of the rebuilding.

@SpatialOnlineMode

Optional

VarChar(10)

Reorg

Off | Reorg

 

Important Remarks:

  1. The “Online = On” option will not work with the following cases:
  • Non Enterprise Edition of SQL.
  • The index is an XML
  • The index is a Spatial
  • The index is on a local temp table.
  • The index is clustered and the table contains a LOB database column(s).
  • The index is not clustered and the index itself contains a LOB database column(s).
  1. Do not execute the above script on Microsoft SQL Server 2012 or Microsoft SQL Server 2014 with “Online” option.

If using SQL Server 2012 or SQL Server 2014, mind there is a bug where the online index rebuild can cause index corruption or data loss when it is used together with concurrent queries that modify many rows.

Apply a CU that includes hotfix http://support.microsoft.com/kb/2969896.
For SQL Server 2012 SP1, apply at least CU11.
For SQL Server 2012 SP2, apply at least CU1.
For SQL Server 2014, apply at least CU2.

The following T-SQL script is useful to get all the info about the current installed SQL:

SELECT

       SERVERPROPERTY('ServerName') AS ServerName,

       SERVERPROPERTY('MachineName') AS MachineName,

       SERVERPROPERTY('ComputerNamePhysicalNetBIOS') AS CurrentActiveClusterMachine,

       SERVERPROPERTY('InstanceName') AS InstanceName,

       SERVERPROPERTY('ProductVersion') AS ProductVersion,

       SERVERPROPERTY('ProductLevel') AS ProductLevel,

       SERVERPROPERTY('Edition') AS Edition,

       SERVERPROPERTY('EngineEdition') AS EngineEdition,

       SERVERPROPERTY('LicenseType') AS LicenseType,

       SERVERPROPERTY('Collation') AS Collation,

SERVERPROPERTY('BuildClrVersion') AS BuildClrVersion;

References

Add comment

Loading