SQL Server Overview

This article should give the reader an overview about the SQL language generally and SQL Server specifically.

This article will take the readers backward to illustrate the evolution of the databases and the RDBMS (Relational Database Management System) using the Binary Tree and B-Tree algorithms which were evaluated to create the modern RDBMS databases.

Furthermore, it will cover the history of Microsoft SQL Server Editions with quick comparison to the different editions for functionalities, boundaries, licenses and pricing.

PDF Format

Table Of Contents

Author

Introduction

Tree Search Algorithms

SQL Language

Microsoft SQL Server Overview

DBMS

RDBMS

Data Access Protocols

Indexes

References

Version History

 

 

 

Author

Ahmad Adel Gad was born in Egypt, Cairo on June, 1980.

He was graduated from faculty of engineering (Civil Engineering Department) on 2003.

He found himself in Software Developing rather than the civil engineering since the first year in the college as they were teaching them how to write a code since the first year in order to implement applications to support their work.

After he was graduated from college by two years he decided to shift to the Software Engineering.

On the early 2007, he attended a scholarship for Software Engineering in C#.

On the early of 2008, he had certified as Microsoft Solutions Developer (MCSD) for C# 2003.

On the middle of 2008 he had certified on (Microsoft Certified Trainer) MCT where he got an official license from Microsoft to deliver Microsoft official courses.

On 2004 he started to develop Desktop Applications for Windows with his partner the CEO Wael Ibrahim Arafa in his company Arafasoft and his role was the executive manager for the company.

Currently, Ahmad is working as fulltime in the IT field and as freelancer in the following fields as training, consultation and developing with the two the languages VB.Net and C#.Net:

  1. SharePoint Developing/Administration
  2. Net Applications
  3. Windows Applications
  4. SQL Server Developing/Administration
  5. AJAX Implementation (JavaScript/jQuery + XML Web Service)
  6. Cryptography/Digital Signature Consultation

Currently, Ahmad is officially certified as:

MCTS - Windows Application 3.5
MCTS - ADO.Net 3.5
MCPD - Web Developer 4
MCTS - .NET Framework 4, Windows Applications
MCTS - Programming in HTML5 with JavaScript and CSS3
MCITP - Database Administrator 2008
MCTS - Microsoft SQL Server 2008, Database Development
MCTS - Core Solutions of Microsoft SharePoint Server 2013
MCTS - Advanced Solutions of Microsoft SharePoint Server 2013

ITIL Foundation
MCT
MCAD
MCSD
MCPD Enterprise
MCTS - SQL Server 2005
MCPD – SharePoint 2010
MCITP – SharePoint 2010
MCTS - Web Application 3.5

 

Introduction

This article should give the reader an overview about the SQL language generally and SQL Server specifically.

This article will take the readers backward to illustrate the evolution of the databases and the RDBMS (Relational Database Management System) using the Binary Tree and B-Tree algorithms which were evaluated to create the modern RDBMS databases.

Before inventing this magnificent algorithm, the computers’ users or the developers used to store their data in plain text files (flat files) with different formats, and the most common one is the Comma Separated Values (CSV).



This method was very poor including the following shortfalls:

  1. Security; not secured and can be accessible by anyone and many simple tools.
  2. Unicode; only ASCII (American Standard Code for Information Interchange) code characters are allowed.
  3. Performance; as the only way to get some certain data is by reading the whole file sequentially.
  4. Relational Model; not supported, and to overcome this scenario, they were creating file for each table and maintaining the consistency was very complicated to be managed even by coding and kind of impossible to maintain.

Therefore, the Binary Tree was a huge revolution of the database concepts while it resolved almost most of the plain text’s shortfalls. And by developing this algorithm as B-Tree and later B+Tree, the RDBMS became the common concept with any new database storage system.

 

 

Tree Search Algorithms

The tree methodologies were invented to store and search the data to ease and speeding of the data query and manipulation (inserting, updating and deleting). It called a tree or a reversed tree, because the root contains the keys while the tree leafs contain the data.

  • Binary Tree

The Binary Tree is a tree data structure into two nodes, which each node has at most two children, which are referred to as the left child and the right child. The following figures are giving an overview how the data is sorted using the Binary Tree.

   

The Binary Tree was a smart invention to sort and query the data but it contains a remarkable disadvantage which is represented by its name (Binary). Each node can be branched only into two sub nodes and each sub node can be branched only into two nodes until the end of the tree. In most cases some nodes can grow more than others and we can get a tree with very short branches and very long branches. In this case the search could take longer to query some data rather than others.

  • B-Tree

In 1972, Rudolf Bayer and Ed McCreight, both working at Boeing at the time, were looking for a way to overcome some of the shortfalls of the binary tree. Their solution was the B-Tree.


The B-Tree is the evolution of the Binary Tree algorithm which overcomes its disadvantage.
In the B-Tree, each node can be branched into many nodes as required to keep the tree nodes balanced and symmetric as possible as could be.

Unlike the binary search trees, the B-Tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and file systems.

 

The “B” character confused many about the name, some of them where referring to Binary Tree wrongly, but the most common naming is “Balanced”.

For a long time, it was unclear what the "B" in the name represented. Candidates discussed in public where "Boeing", "Bayer", "Balanced", "Bushy" and others. In 2013, the B-Tree had just turned 40, Ed McCreight revealed in an interview, that they intentionally never published an answer to this question. They were thinking about many of these options themselves at the time and decided to just leave it an open question. The CPU processing of the search for Binary Tree is the same as B-Tree, but for the memory consuming (RAM) B-Tree consumes less because the node would be much shorter than the binary one.

 

  • B+Tree

B-Trees are a lot more efficient than binary trees when it comes to updates, but some operations can still turn out expensive, depending on where the node that will hold the new or updated data lives in the tree. Therefore, another optimization was made to B-Trees to help with this problem. Instead of treating all nodes equal, the new structure has two types of nodes. The lowest level nodes, also called leaf nodes, hold the actual data. All other nodes including the root node only hold the key values and pointers to the next nodes. This type of tree is called a B+Tree as seen the following figure:

 

There are no limitations on the number of key-pointer-pairs or data rows within a node. The only limitation which is a significant advantage in the same time is that, all leaf nodes have the same distance from the root node. That means that the work to seek for a particular data point is always the same, no matter what the key value is. It also keeps updates very localized in the tree. I might have to move a few rows to a new node during an update but then I probably only need to change a single parent node to integrate that new node. It is however possible for a single change to affect every level of the tree, but those changes are rare.

 

  • B+Trees in SQL Server

SQL Server stores its indexes in B+Tree format. There are a few exceptions - for example temporary hash indexes, created during a hash join operation, or column store indexes, which are not really indexes at all. However, all key-based clustered and non-clustered persisted SQL Server indexes are organized and stored as B+Trees.

 

Each node in such a tree is a page in SQL Server terms and you will find two page types in each index. The first type are the data pages (pages of type 1). Each leaf level node in a SQL Server B+Tree index is a single data page. The second type are intermediate index pages (pages of type 2). Each node in an index B+Tree that is not a leaf level node is a single page of type 2. Those pages contain rows just like the data pages. But in addition they contain a pointer for each row that identifies the next child page. That child page can be either of type 1 or of type 2, depending on the location in the B+Tree.

 

  • UB-Tree

The UB-Tree as proposed by Rudolf Bayer and Volker Markl; is a balanced tree for storing and efficiently retrieving multidimensional data. It is basically a B+ tree with records stored according to Z-order, also called Morton order. Z-order is simply calculated by bitwise interlacing the keys.

 

Insertion, deletion, and point query are done as with ordinary B+ trees. To perform range searches in multidimensional point data, however, an algorithm must be provided for calculating, from a point encountered in the data base, the next Z-value which is in the multidimensional search range.

SQL Language

  • History

The first version of SQL (Structured Query Language) was developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL (Structured English Query Language).

SQL was designed to manipulate and retrieve data stored in IBM's original relational database product, System R. The SQL language was later formally standardized by the American National Standards Institute (ANSI) in 1986. Subsequent versions of the SQL standard have been released as International Organization for Standardization (ISO) standards.

 

The SQL language was standardized and developed with many vendors as illustrated in the following table:

Source

Common Name

Full name

ANSI/ISO Standard

SQL/PSM

SQL/Persistent Stored Modules

Interbase / Firebird

PSQL

Procedural SQL

IBM DB2

SQL PL

SQL Procedural Language (implements SQL/PSM)

IBM Informix

SPL

Stored Procedural Language

IBM Netezza

NZPLSQL

(based on Postgres PL/pgSQL)

Microsoft / Sybase

T-SQL

Transact-SQL

Mimer SQL

SQL/PSM

SQL/Persistent Stored Module (implements SQL/PSM)

MySQL

SQL/PSM

SQL/Persistent Stored Module (implements SQL/PSM)

MonetDB

SQL/PSM

SQL/Persistent Stored Module (implements SQL/PSM)

NuoDB

SSP

Starkey Stored Procedures

Oracle

PL/SQL

Procedural Language/SQL (based on Ada *)

PostgreSQL

PL/pgSQL

Procedural Language/PostgreSQL (based on Oracle PL/SQL)

PostgreSQL

PL/PSM

Procedural Language/Persistent Stored Modules (implements SQL/PSM)

Sybase

Watcom-SQL

SQL Anywhere Watcom-SQL Dialect

Teradata

SPL

Stored Procedural Language

SAP

SAP HANA

SQL Script

(*) Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from pascal and other languages.

Ada improves code safety and maintainability by using the compiler to find errors in favor of runtime errors. Ada is an international standard; the current version (known as ada 2012) is defined by iso/iec 8652:2012.

Ada was originally designed by a team led by jean ichbiah of the french computer company cii honeywell bull under contract to the united states department of defense (dod) from 1977 to 1983 to supersede the hundreds of programming languages.

 

 

  • Language Elements

The SQL language is subdivided into several language elements, including:

  • Queries, which retrieve the data based on specific criteria. This is an important element of SQL.
  • Clauses, which are constituent components of statements and queries. (In some cases, these are optional.) (Where clauses)
  • Expressions, which can produce either scalar values, or tables consisting of columns and rows of data. (Select expression From …. ;)
  • Predicates, which specify conditions that can be evaluated to SQL three-valued logic (3VL) (true/false/unknown) or Boolean truth values and are used to limit the effects of statements and queries, or to change program flow.
  • Statements, which may have a persistent effect on schemata and data, or may control transactions, program flow, connections, sessions, or diagnostics. (g. DDL/DCL)

 

  • T-SQL Languages

The following languages are popular in any RDBMS SQL language, but we are going to expand the commands which language in prospective of the Microsoft T-SQL.

  • DQL (Data Query Language) – Select
  • DML (Data Manipulating Language) – Insert, Update and Delete
  • DDL (Data Definition Language) – Create, Alter and Drop
  • DCL (Data Control Language) – Grant, Deny and Revoke
  • TCL (Transaction Control Language) – Begin Tran, Commit Tran and Rollback Tran
  • Microsoft SQL Server History

Microsoft SQL Server Overview

Prior to version 7.0 the code base for MS SQL Server was sold by Sybase SQL Server to Microsoft, and was Microsoft's entry to the enterprise-level database market, competing against Oracle, IBM, and, later, Sybase. Microsoft, Sybase and Ashton-Tate [*] originally worked together to create and market the first version named SQL Server 1.0 for OS/2 (about 1989) which was essentially the same as Sybase SQL Server 3.0 on Unix, VMS, etc. Microsoft SQL Server 4.2 was shipped around 1992 (available bundled with IBM OS/2 version 1.3). Later Microsoft SQL Server 4.21 for Windows NT was released at the same time as Windows NT 3.1. Microsoft SQL Server v6.0 was the first version designed for NT, and did not include any direction from Sybase.

About the time Windows NT was released in July 1993, Sybase and Microsoft parted ways and each pursued its own design and marketing schemes. Microsoft negotiated exclusive rights to all versions of SQL Server written for Microsoft operating systems. (In 1996 Sybase changed the name of its product to Adaptive Server Enterprise to avoid confusion with Microsoft SQL Server.) Until 1994, Microsoft's SQL Server carried three Sybase copyright notices as an indication of its origin.

SQL Server 7.0 and SQL Server 2000 included modifications and extensions to the Sybase code base, adding support for the IA-64 architecture. By SQL Server 2005 the legacy Sybase code had been completely rewritten.

Since the release of SQL Server 2000, advances have been made in performance, the client IDE tools, and several complementary systems that are packaged with SQL Server 2005. These include:

  • An extract-transform-load (ETL) tool (SQL Server Integration Services or SSIS)
  • A Reporting Server
  • An OLAP and data mining server (Analysis Services)

(*) Ashton-Tate (Ashton-Tate Corporation) was a US-based software company best known for developing the popular dBase database application. Ashton-Tate grew from a small garage-based company to become a multinational corporation and was later sold to Borland in September 1991.

  • Several messaging technologies, specifically Service Broker and Notification Services
  • Microsoft SQL Server Release History

Version

Year

Release Name

Code Name

Internal Version

1.0 (OS/2)

1989

SQL Server 1.0 (16 bit)

Ashton-Tate / Microsoft SQL Server

-

1.1 (OS/2)

1991

SQL Server 1.1 (16 bit)

-

-

4.21 (WinNT)

1993

SQL Server 4.21

SQLNT

-

6

1995

SQL Server 6.0

SQL95

-

6.5

1996

SQL Server 6.5

Hydra

-

7

1998

SQL Server 7.0

Sphinx

515

-

1999

SQL Server 7.0 (OLAP Tools)

Palato mania

-

8

2000

SQL Server 2000

Shiloh

539

8

2003

SQL Server 2000 64-bit Edition

Liberty

539

9

2005

SQL Server 2005

Yukon

611/612

10

2008

SQL Server 2008

Katmai

661

10.25

2010

Azure SQL DB

Cloud Database or CloudDB

-

10.5

2010

SQL Server 2008 R2

Kilimanjaro (aka KJ)

665

11

2012

SQL Server 2012

Denali

706

12

2014

SQL Server 2014

SQL14

782

13

2016

SQL Server 2016

 Community Technology Preview 2 (CTP2)

 

 

Cross-Box Scale Limit

Feature Name

Enterprise

Business Intelligence

Standard

Web

Express with Advanced Services

Express with Tools

Express

Maximum Compute Capacity Used by a Single Instance (SQL Server Database Engine)1

Operating System maximum

Limited to lesser of 4 Sockets or 16 cores

Limited to lesser of 4 Sockets or 16 cores

Limited to lesser of 4 Sockets or 16 cores

Limited to lesser of 1 Socket or 4 cores

Limited to lesser of 1 Socket or 4 cores

Limited to lesser of 1 Socket or 4 cores

Maximum Compute Capacity Used by a Single Instance (Analysis Services, Reporting Services) 1

Operating system maximum

Operating system maximum

Limited to lesser of 4 Sockets or 16 cores

Limited to lesser of 4 Sockets or 16 cores

Limited to lesser of 1 Socket or 4 cores

Limited to lesser of 1 Socket or 4 cores

Limited to lesser of 1 Socket or 4 cores

Maximum memory utilized (per instance of SQL Server Database Engine)

Operating system maximum

128 GB

128 GB

64 GB

1 GB

1 GB

1 GB

Maximum memory utilized (per instance of Analysis Services)

Operating system maximum

Operating system maximum

64 GB

N/A

N/A

N/A

N/A

Maximum memory utilized (per instance of Reporting Services)

Operating system maximum

Operating system maximum

64 GB

64 GB

4 GB

N/A

N/A

Maximum relational Database size

524 PB

524 PB

524 PB

524 PB

10 GB

10 GB

10 GB

 

High Availability

Feature Name

Enterprise

Business Intelligence

Standard

Web

Express with Advanced Services

Express with Tools

Express

Server Core support

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Log Shipping

Yes

Yes

Yes

Yes

Database mirroring

Yes

Yes (Safety Full Only)

Yes (Safety Full Only)

Witness only

Witness only

Witness only

Witness only

Backup compression

Yes

Yes

Yes

Database snapshot

Yes

Alwayson Failover Cluster Instances

Yes (Node support: Operating system maximum

Yes (Node support: 2)

Yes (Node support: 2)

AlwaysOn Availability Groups

Yes (up to 8 secondary replicas, including 2 synchronous secondary replicas)

Connection Director

Yes

Online page and file restore

Yes

Online indexing

Yes

Online schema change

Yes

Fast recovery

Yes

Mirrored backups

Yes

Hot Add Memory and CPU2

Yes

Database Recovery Advisor

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Encrypted Backup

Yes

Yes

Yes

Smart Backup

Yes

Yes

Yes

No

 

Microsoft SQL Server Developer Edition comes with all the features in the Enterprise edition and it’s released as cost-effectively build for developers for testing and developing purpose and not licensed to be installed on a production server/machine to host real productive data.

 

For further comparison details click here or check the following MSDN official link:

https://msdn.microsoft.com/en-us/library/cc645993(v=sql.120).aspx

 

SQL Server 2014 editions

Ideal for….

Licensing model

Channel availability

Open NL Price (US$)

Enterprise

Comprehensive, mission-critical performance for demanding database and business intelligence requirements. The Enterprise edition provides the highest service and performance levels for Tier-1 workloads.*

Per Core**

Volume licensing, hosting

$14,256***

Business intelligence

A comprehensive platform empowering organizations to build and deploy secure, scalable and manageable BI solutions.

Server + CAL****

Volume licensing, hosting

$8,908***

Standard

Core data management and business intelligence capabilities for non-critical workloads with minimal IT resources.

Per Core**

Volume licensing, hosting

$3,717***

Server + CAL****

Volume licensing, hosting, retail FPP

$931***

Developer

Full-featured version of SQL Server software that allows developers to cost-effectively build and test and demonstrate applications based on SQL Server software.

Per user

Volume licensing, retail FPP

$38***

Web

Secure, cost effective and highly scalable data platform for public web sites. The Web edition is available to third party software service providers only.

-

Hosting only

See your hosting partner for pricing

Express

Entry-level, free database that is ideal for learning and building desktop and small server data-driven applications of up to 10 GB.

-

Free download

Free

 

Enterprise Edition with Server + Client Access License (CAL) based licensing (not available for new agreements) is limited to a maximum of 20 cores per SQL Server instance. There are no limits under the Core-based Server Licensing model.

 

*Customers who require an MPP data warehouse now have access to Parallel Data Warehouse (PDW) through their Enterprise Edition core licenses with Software Assurance. PDW is part of the Analytics Platform System (APS).

**Editions sold in the Per Core licensing model are sold as 2-core packs.

***Pricing represents Open No Level (NL) ERP. For your specific pricing, contact your Microsoft reseller.

****CALs ($199 per CAL) are required for every user or device accessing a server in the Server + CAL licensing model. See the Products use rights for details.

 

Important note; those prices are not fixed and can vary from region to another. Variant Microsoft Partners can offer competitive plans prices.

Microsoft partners can be located online based on regions from this link:
https://mspartner.microsoft.com/en/us/Pages/Locale.aspx

 

DBMS
Database Management Systems (DBMS) are computer software applications that interact with the user, other applications, and the database itself to capture and analyze data. A general-purpose DBMS is designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, Sybase and IBM DB2. A database is not generally portable across different DBMSs, but different DBMS can interoperate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one DBMS. Database management systems are often classified according to the database model that they support; the most popular database systems since the 1980s have all supported the relational model as represented by the SQL language.

RDBMS
A Relational Database Management System (RDBMS) is a database management system (DBMS) that is based on the relational model as invented by E. F. Codd, of IBM's San Jose Research Laboratory.
Many popular databases currently in use are based on the relational database model.

RDBMSs are a common choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s. Relational databases have often replaced legacy hierarchical databases and network databases because they are easier to understand and use. However, relational databases have been challenged by object databases, which were introduced in an attempt to address the object-relational impedance mismatch in relational databases, and XML databases.

In 1974, IBM began developing System R, a research project to develop a prototype RDBMS. Its first commercial product was SQL/DS, released in 1981.However, the first commercially available RDBMS was Oracle, released.

In 1979 by Relational Software, now Oracle Corporation. Other examples of an RDBMS include DB2, SAP Sybase ASE, and Informix.

Data Access Protocols

Any DBMS requires a connectivity driver/provider to be installed on the client’s machine.

This is very close to the PC hardware drivers when it is required to install them into the PC in order to be able to connect the driver. Ex, the printer was coming in the past with Parallel Port connection and later with USB port connection, and in order to prepare a PC to connect it, the proper driver get to be installed compatible with the connected port.

The same with the DBMS, the prober driver/provider get to be installed on the client device to connect to the target DBMS.

We will list the common standard known providers to the developers:

  • ODBC

In computing, ODBC (Open Database Connectivity) is a standard programming language middleware API for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

ODBC accomplishes DBMS independence by using an ODBC driver as a translation layer between the application and the DBMS. The application uses ODBC functions through an ODBC driver manager with which it is linked, and the driver passes the query to the DBMS. An ODBC driver can be thought of as analogous to a printer or other driver, providing a standard set of functions for the application to use, and implementing DBMS-specific functionality. An application that can use ODBC is referred to as "ODBC-compliant". Any ODBC-compliant application can access any DBMS for which a driver is installed. Drivers exist for all major DBMSs, many other data sources like address book systems and Microsoft Excel, and even for text or CSV files.

ODBC was originally developed by Microsoft during the early 1990s, and became the basis for the Call Level Interface (CLI) standardized by SQL Access Group in the Unix and mainframe world.

  • OLEDB

OLE DB (Object Linking and Embedding Database), sometimes written as OLEDB or OLE-DB.
It is an API designed by Microsoft, allows accessing data from a variety of sources in a uniform manner. The API provides a set of interfaces implemented using the Component Object Model (COM); it is otherwise unrelated to OLE. Microsoft originally intended OLE DB as a higher-level replacement for, and successor to, ODBC, extending its feature set to support a wider variety of non-relational databases, such as object databases and spreadsheets that do not necessarily implement SQL.

 

  • OLEDB vs ODBC

The differences between these two data access protocols are technical, but in the most general terms, OLE DB is newer, more advanced, and compatible with a wider selection of database technologies. In addition, OLE DB is more general, in that it includes the ODBC functionality.

 

Technically speaking, ODBC (Open Database Connectivity) is designed to provide access primarily to SQL data in a multi-platform environment. OLE DB (Object Linking and Embedding Database) is designed to provide access to all types of data in an OLE Component Object Model (COM) environment. OLE DB includes the SQL functionality defined in ODBC but also defines interfaces suitable for gaining access to data other than SQL data.

A query generated via ODBC is functionally similar to a query generated by OLE-DB. However, be aware that if you generate a query (via STATISTICA Query) using an ODBC connection, STATISTICA uses OLE DB to connect to the ODBC connection. Because there is an intermediate connection, ODBC queries may be slower than OLE DB queries.

  • JDBC

JDBC is a Java Database Connectivity technology (Java Standard Edition platform). This technology is an API for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-accessible data source in the JVM host environment.

Sun Microsystems released JDBC as part of JDK 1.1 on February 19, 1997. Since then it has been part of the Java Standard Edition.

The JDBC classes are contained in the Java package java.sql and javax.sql.

  • Net Data Provider

An ADO.NET data provider is a software component that interacts with a data source comes with the Microsoft .Net Framework. ADO.NET data providers are analogous to ODBC drivers, JDBC drivers, and OLE DB providers.

ADO.NET providers can be created to access such simple data stores as a text file and spreadsheet, through to such complex databases as Oracle Database, Microsoft SQL Server, MySQL, PostgreSQL, SQLite, IBM DB2, Sybase ASE, and many others. They can also provide access to hierarchical data stores such as email systems.

However, because different data store technologies can have different capabilities, every ADO.NET provider cannot implement every possible interface available in the ADO.NET standard.

Starting from Microsoft .Net Framework v1.1, the developers don’t need to install any extra API’s on the clients’ machines to connect to both SQL Server and Oracle Database Servers.

Just one important remark, the Microsoft ADO.Net Provider can connect to Microsoft SQL Server v7.0 and higher. If the developers want to connect to Microsoft SQL Server 6.5 or lower, they get to use the ODBC or OLEDB connections.

 

 

Indexes

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed.

Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

An index is a copy of select columns of data from a table that can be searched very efficiently that also includes a low-level disk block address or direct link to the complete row of data it was copied from.

  • Index Architecture & Types

The data generally stored into records and each record cannot exceed the size of 8 KB (except the very large data such as Binary, VarChar(Max), nVarChar(Max), Image, etc. which are stored in different pages with different structure). The records in turn are stored in pages and each page cannot exceed the size 8 KB. The pages are grouped in extents while each extent can contain up to 8 pages. That means each extent cannot exceed 64 KB.

The data in a table without any index created, are located in a logical partition calls Heap, while with the index structure, that data is stored in the Leaf Level (LL), and the index definition keys are stored in the Intermediate Level (IL) which is calls also as Non-Leaf Level.

If the table doesn’t contain any index (Clustered or Non-Clustered), the insert of the data would be very fast because all the new records are added sequentially into the heap without taking time to reorder them based on a sorting criteria such as in the case of the indexes. But in the other hand, the search, update or delete would be very slow in comparing with the index because it needs to scan the whole pages in the heap to retrieve the required data, update them or delete them.

In SQL Server databases, each table each table can only contain one Clustered Index and up to 249 or 999 of Non-Clustered Indexes based on the SQL version.

Additional two types of indexes (XML and Spatial) were added initiated by SQL Server 2005 and SQL Server 2008 respectively. The following points are representing the four types of indexes (Clustered, Non-Clustered, XML and Spatial).

 

 

  • Clustered Index

The cluster index alters the data block into a certain distinct order to match the index, resulting in the row data being stored physically in order for a specific column or combined with multiple column which calls composite index. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval. Usually it comes with the primary key index but not a necessary.

If a database table contains a clustered index, there won’t be an existing for heap because all the data would be restructured on the Leaf Level with the extension of the intermediate level (Non-Leaf Level).
The following figure illustrates the process of finding rows in a Clustered Index:

Syntax:
Create Clustered Index Index_Name On Table_Name(Col1, Col2, …);
Create Unique Clustered Index Index_Name On Table_Name(Col1, Col2, …);

 

  • Non-Clustered Index

The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and the row number in the data page in page-organized engines; row offset in file-organized engines).

The database table can contain one or more Non-Cluster index plus Clustered Index or without.

In case of a Clustered Index, plus Non-Clustered Index, the data exists in the leaf level of the Clustered Index while the Non-Clustered Index is creating a tree which is pointing into the data in the Leaf Level of the clustered index as represented the next figure:

In case of a Non-Clustered Index without a Clustered Index, the data will be still lying in the heap while the Non-Clustered Index’s is pointing to the data into the heap.

The next figure illustrates the process of finding rows in a Heap with a Non-Clustered Index:

 

 

Syntax:
Create NonClustered Index Index_Name On Table_Name(Col1, Col2, …);
Create Unique NonClustered Index Index_Name On Table_Name(Col1, Col2, …);

Note: If the index type (Clustered/NonClustered) not mentioned whether it is unique or not in the syntax, the index is created as “NonClustered”.

  • Fill Factor

The Fill Factor is the defined ratio of filling the data into each page in the leaf level with each index created.
If no Fill Factor is defined within an index, the page is filled with 100% of its capacity (8 KB).

Note: If the Fill Factor is defined as 0%, it would be the same respect of 100% which means all the pages is filled with data.

It is very significant in performance to specify a fill factor to keep reserved empty blank space in each page for updating data or deleting data. If the page is already 100% full and new item should be inserted into this page, the page would be split in order to hold the new record based on its order, while if it was having a predefined fill factor, the record would have a place to be stored in. Defining the fill factor ratio should be calculated based on the amount of data manipulation (insert, update or delete).

The disadvantage of the Fill factor, that is preserve extra disk space and it increases with the increase of the Fill factor ratio, therefore, it must be revised wisely.

In SharePoint Content Databases, the fill factor is 80%.

It is healthy to rebuild the index from time to other to organize the data into the pages based on the defined fill factor which could increase the data base size on the disk but in return, it maintains the performance.
Syntax:
Create Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With FillFactor = Fill_Ratio_As_Integer;

Create Unique Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With FillFactor = Fill_Ratio_As_Integer;

 

  • Pad Index

The Pad Index is optional keyword defined with the index creation statement to preserve the same defined Fill Factor value to the Intermediate Level pages (Non-Leaf Level) as well; otherwise, it will be 100% by default. (The intermediate Level fill factor cannot be different than the leaf level fill factor).
Syntax:
Create Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With Pad_Index, FillFactor = Fill_Ratio_As_Integer;

Create Unique Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With Pad_Index, FillFactor = Fill_Ratio_As_Integer;

 

 

  • XML Index

New index type was initiated by SQL Server 2005. XML indexes can be created on xml data type columns. They index all tags, values and paths over the XML instances in the column and benefit query performance. Your application may benefit from an XML index in the following situations:

  • Queries on XML columns are common in your workload. XML index maintenance cost during data modification must be considered.
  • Your XML values are relatively large and the retrieved parts are relatively small. Building the index avoids parsing the whole data at run time and benefits index lookups for efficient query processing.

XML indexes fall into the following categories:

  • Primary XML index
  • Secondary XML index

The first index on the xml type column must be the primary XML index. Using the primary XML index, the following types of secondary indexes are supported: PATH, VALUE, and PROPERTY. Depending on the type of queries, these secondary indexes might help improve query performance.

 

  • Spatial Index

Starting from SQL Server 2008 and higher are supporting spatial data and spatial indexes. A spatial index is a type of extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography.

The following table is representing the maximum indexes count per table:

Version

Columns/Table

Columns/Select

Columns/Index Key

Clustered Indexes

Non-Clustered Indexes

XML Indexes

SQL Server 2005

1024

4,096

16

1

249

249

SQL Server 2008

1024*

4,096

16

1

999

249

SQL Server 2008 R2

1024*

4,096

16

1

999

249

SQL Server 2012

1024*

4,096

16

1

999

249

SQL Server 2014

1024*

4,096

16

1

999

249

SQL Server 2016

1024*

4,096

16

1

999

249

* In wide tables which is initiated by SQL Server 2008, it could take up to 30,000 columns per table and also up to 1,000 indexes but with maximum row size of 8KB such as the non-wide table.

References

 

 

Version History

Version

Date (DD/MM/YYYY)

Comments

1.0

21/03/2015

First Release

1.1

21/03/2015

SQL Server 2016 References

SQL Server Editions Comparison

 

Add comment

Loading