Thursday, 7 July 2011

Sql Server Table and Index Architecure

Table and Index Architecture

Objects in the SQL Server 2005 database are stored as a collection of 8-KB pages. This section describes the way the pages for tables and indexes are organized, stored, and accessed.

Table and Index Organization
The following illustration shows the organization of a table. A table is contained in one or more partitions and each partition contains data rows in either a heap or a clustered index structure. The pages of the heap or clustered index are managed in one or more allocation units, depending on the column types in the data rows.

Partitions
In SQL Server 2005, table and index pages are contained in one or more partitions. A partition is a user-defined unit of data organization. By default, a table or index has only one partition that contains all the table or index pages. The partition resides in a single filegroup. A table or index with a single partition is equivalent to the organizational structure of tables and indexes in earlier versions of SQL Server.

When a table or index uses multiple partitions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column. The partitions can be put on one or more filegroups in the database. The table or index is treated as a single logical entity when queries or updates are performed on the data.

To view the partitions used by a table or index, use the sys.partitions (Transact-SQL) catalog view.

Clustered Tables, Heaps, and Indexes
SQL Server 2005 tables use one of two methods to organize their data pages within a partition:

·         Clustered tables are tables that have a clustered index.
The data rows are stored in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values. The pages in each level of the index, including the data pages in the leaf level, are linked in a doubly-linked list. However, navigation from one level to another is performed by using key values. For more information, see
Clustered Index Structures.
·         Heaps are tables that have no clustered index.
The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list.

Indexed views have the same storage structure as clustered tables.
When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition. For example, if a clustered table has four partitions, there are four B-trees; one in each partition.

Nonclustered Indexes
Nonclustered indexes have a B-tree index structure similar to the one in clustered indexes. The difference is that nonclustered indexes do not affect the order of the data rows. The leaf level contains index rows. Each index row contains the nonclustered key value, a row locator and any included, or nonkey, columns. The locator points to the data row that has the key value.

XML Indexes
One primary and several secondary XML indexes can be created on each xml column in the table. An XML index is a shredded and persisted representation of the XML binary large objects (BLOBs) in the xml data type column. XML indexes are stored as internal tables. To view information about xml indexes, use the sys.xml_indexes or sys.internal_tables catalog views.

Allocation Units
An allocation unit is a collection of pages within a heap or B-tree used to manage data based on their page type. The following table lists the types of allocation units used to manage data in tables and indexes.

Allocation unit type
Is used to manage
IN_ROW_DATA

Data or index rows that contain all data, except large object (LOB) data.
Pages are of type Data or Index.
LOB_DATA

Large object data stored in one or more of these data types: text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), or CLR user-defined types (CLR UDT).
Pages are of type Text/Image.
ROW_OVERFLOW_DATA

Variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060 byte row size limit.
Pages are of type Data.


A heap or B-tree can have only one allocation unit of each type in a specific partition. To view the table or index allocation unit information, use the sys.allocation_units catalog view.

IN_ROW_DATA Allocation Unit
For every partition used by a table (heap or clustered table), index, or indexed view, there is one IN_ROW_DATA allocation unit that is made up of a collection of data pages. This allocation unit also contains additional collections of pages to implement each nonclustered and XML index defined for the table or view. The page collections in each partition of a table, index, or indexed view are anchored by page pointers in the sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

Each table, index, and indexed view partition has a row in sys.system_internals_allocation_units uniquely identified by a container ID (container_id). The container ID has a one-to-one mapping to the partition_id in the sys.partitions catalog view that maintains the relationship between the table, index, or the indexed view data stored in a partition and the allocation units used to manage the data within the partition.

The allocation of pages to a table, index, or an indexed view partition is managed by a chain of IAM pages. The column first_iam_page in sys.system_internals_allocation_units points to the first IAM page in the chain of IAM pages managing the space allocated to the table, index, or the indexed view in the IN_ROW_DATA allocation unit.

sys.partitions returns a row for each partition in a table or index.

·         A heap has a row in sys.partitions with index_id = 0.
The first_iam_page column in sys.system_internals_allocation_units points to the IAM chain for the collection of heap data pages in the specified partition. The server uses the IAM pages to find the pages in the data page collection, because they are not linked.
·         A clustered index on a table or a view has a row in sys.partitions with index_id = 1.
The root_page column in sys.system_internals_allocation_units points to the top of the clustered index B-tree in the specified partition. The server uses the index B-tree to find the data pages in the partition.
·         Each nonclustered index created for a table or a view has a row in sys.partitions with index_id > 1.
The root_page column in sys.system_internals_allocation_units points to the top of the nonclustered index B-tree in the specified partition.
·         Each table that has at least one LOB column also has a row in sys.partitions with index_id > 250.
The first_iam_page column points to the chain of IAM pages that manage the pages in the LOB_DATA allocation unit.

ROW_OVERFLOW_DATA Allocation Unit
For every partition used by a table (heap or clustered table), index, or indexed view, there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero (0) pages until a data row with variable length columns (varchar, nvarchar, varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row size limit. When the size limitation is reached, SQL Server moves the column with the largest width from that row to a page in the ROW_OVERFLOW_DATA allocation unit. A 24-byte pointer to this off-row data is maintained on the original page.

Text/Image pages in the ROW_OVERFLOW_DATA allocation unit are managed in the same way pages in the LOB_DATA allocation unit are managed. That is, the Text/Image pages are managed by a chain of IAM pages.

LOB_DATA Allocation Unit
When a table or index has one or more LOB data types, one LOB_DATA allocation unit per partition is allocated to manage the storage of that data. The LOB data types include text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined types.

Partition and Allocation Unit Example
The following example returns partition and allocation unit data for two tables: DatabaseLog, a heap with LOB data and no nonclustered indexes, and Currency, a clustered table without LOB data and one nonclustered index. Both tables have a single partition.

USE AdventureWorks;
GO
SELECT o.name AS table_name,p.index_id, i.name AS index_name , au.type_desc AS allocation_type, au.data_pages, partition_number
FROM sys.allocation_units AS au
    JOIN sys.partitions AS p ON au.container_id = p.partition_id
    JOIN sys.objects AS o ON p.object_id = o.object_id
    JOIN sys.indexes AS i ON p.index_id = i.index_id AND i.object_id = p.object_id
WHERE o.name = N'DatabaseLog' OR o.name = N'Currency'
ORDER BY o.name, p.index_id;

Here is the result set. Notice that the DatabaseLog table uses all three allocation unit types, because it contains both data and Text/Image page types. The Currency table does not have LOB data, but does have the allocation unit required to manage data pages. If the Currency table is later modified to include a LOB data type column, a LOB_DATA allocation unit is created to manage that data.

table_name  index_id index_name               allocation_type     data_pages  partition_number
----------- -------- -----------------------  ---------------     -----------  ------------
Currency    1        PK_Currency_CurrencyCode IN_ROW_DATA         1           1
Currency    3        AK_Currency_Name         IN_ROW_DATA         1           1
DatabaseLog 0        NULL                     IN_ROW_DATA         160         1
DatabaseLog 0        NULL                     ROW_OVERFLOW_DATA   0           1
DatabaseLog 0        NULL                     LOB_DATA            49          1
(5 row(s) affected)

Heap Structures
A heap is a table without a clustered index. Heaps have one row in sys.partitions, with index_id = 0 for each partition used by the heap. By default, a heap has a single partition. When a heap has multiple partitions, each partition has a heap structure that contains the data for that specific partition. For example, if a heap has four partitions, there are four heap structures; one in each partition.

Depending on the data types in the heap, each heap structure will have one or more allocation units to store and manage the data for a specific partition. At a minimum, each heap will have one IN_ROW_DATA allocation unit per partition. The heap will also have one LOB_DATA allocation unit per partition, if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition, if it contains variable length columns that exceed the 8,060 byte row size limit.

The column first_iam_page in the sys.system_internals_allocation_units system view points to the first IAM page in the chain of IAM pages that manage the space allocated to the heap in a specific partition. SQL Server 2005 uses the IAM pages to move through the heap. The data pages and the rows within them are not in any specific order and are not linked. The only logical connection between data pages is the information recorded in the IAM pages.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

Table scans or serial reads of a heap can be performed by scanning the IAM pages to find the extents that are holding pages for the heap. Because the IAM represents extents in the same order that they exist in the data files, this means that serial heap scans progress sequentially through each file. Using the IAM pages to set the scan sequence also means that rows from the heap are not typically returned in the order in which they were inserted.

The following illustration shows how the SQL Server Database Engine uses IAM pages to retrieve data rows in a single partition heap.


Clustered Index Structures
In SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and leaf nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.
Clustered indexes have one row in sys.partitions, with index_id = 1 for each partition used by the index. By default, a clustered index has a single partition. When a clustered index has multiple partitions, each partition has a B-tree structure that contains the data for that specific partition. For example, if a clustered index has four partitions, there are four B-tree structures; one in each partition.
Depending on the data types in the clustered index, each clustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each clustered index will have one IN_ROW_DATA allocation unit per partition. The clustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit. For more information about allocation units, see Table and Index Organization.

The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows. The page collections for the B-tree are anchored by page pointers in the sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of the clustered index for a specific partition. SQL Server moves down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.

This illustration shows the structure of a clustered index in a single partition.


Nonclustered Index Structures
Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
·         The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
·         The leaf layer of a nonclustered index is made up of index pages instead of data pages.

Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.

The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following:

·         If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).
·         If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2005 makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

Nonclustered indexes have one row in sys.partitions with index_id >1 for each partition used by the index. By default, a nonclustered index has a single partition. When a nonclustered index has multiple partitions, each partition has a B-tree structure that contains the index rows for that specific partition. For example, if a nonclustered index has four partitions, there are four B-tree structures, with one in each partition.

Depending on the data types in the nonclustered index, each nonclustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each nonclustered index will have one IN_ROW_DATA allocation unit per partition that stores the index B-tree pages. The nonclustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns . Additionally, it will have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit. For more information about allocation units, see Table and Index Organization. The page collections for the B-tree are anchored by root_page pointers in the sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

Included Column Indexes
In SQL Server 2005, the functionality of nonclustered indexes can be extended by adding included columns, called nonkey columns, to the leaf level of the index. While the key columns are stored at all levels of the nonclustered index, nonkey columns are stored only at the leaf level.

Sql Server Physical Architecture

Introduction

SQL Server 2005 is a set of components that work together to meet the data storage and analysis needs of the largest Web sites and enterprise data processing systems. The following components work together to manage data effectively in SQL server 2005.

Physical Database Architecture
Describes the logical components defined in SQL Server databases and how they are physically implemented in database files.

Relational Database Engine Architecture
Describes the features of the database engine that make it efficient at processing large numbers of concurrent requests for data from many users.

Physical Database Architecture

The physical database architecture contains

·         Pages and Extents
·         Physical Database File and Filegroups
·         Space Allocation and Reuse
·         Table and Index Architecture
·         Transaction Log Architecture

Pages and Extends

The fundamental unit of data storage in SQL Server is the page. The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Disk I/O operations are performed at the page level. That is, SQL Server reads or writes whole data pages.

Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.

Pages
In SQL Server, the page size is 8 KB. This means SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.

The following table shows the page types used in the data files of a SQL Server database.
Page Type
Contents
Data
Data rows with all data, except text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data, when text in row is set to ON.
Index
Index entries.
Text/Image
Large object data types:
·         text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data
Variable length columns when the data row exceeds 8 KB:
·         varchar, nvarchar, varbinary, and sql_variant
Global Allocation Map, Shared Global Allocation Map
Information about whether extents are allocated.

Page Free Space
Information about page allocation and free space available on pages.

Index Allocation Map
Information about extents used by a table or index per allocation unit.

Bulk Changed Map
Information about extents modified by bulk operations since the last BACKUP LOG statement per allocation unit.

Differential Changed Map
Information about extents that have changed since the last BACKUP DATABASE statement per allocation unit.


Note: Log files do not contain pages; they contain a series of log records.

Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.


Large Row Support
Rows cannot span pages in SQL Server 2005, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. In SQL Server 2005, this restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page.


Extents
Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.
To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:

·         Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
·         Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.

A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

Physical Database Files and Filegroups

SQL Server 2005 maps a database over a set of operating-system files. Data and log information are never mixed in the same file, and individual files are used only by one database. Filegroups are named collections of files and are used to help with data placement and administrative tasks such as backup and restore operations.

Database Files
SQL Server 2005 databases have three types of files:
·         Primary data files
The primary data file is the starting point of the database and points to the other files in the database. Every database has one primary data file. The recommended file name extension for primary data files is .mdf.
·         Secondary data files
Secondary data files make up all the data files, other than the primary data file. Some databases may not have any secondary data files, while others have several secondary data files. The recommended file name extension for secondary data files is .ndf.
·         Log files
Log files hold all the log information that is used to recover the database. There must be at least one log file for each database, although there can be more than one. The recommended file name extension for log files is .ldf.

SQL Server 2005 does not enforce the .mdf, .ndf, and .ldf file name extensions, but these extensions help you identify the different kinds of files and their use.

In SQL Server 2005, the locations of all the files in a database are recorded in the primary file of the database and in the master database. The Database Engine uses the file location information from the master database most of the time. However, the database engine uses the file location information from the primary file to initialize the file location entries in the master database in the following situations:

·         When attaching a database using the CREATE DATABASE statement with either the FOR ATTACH or FOR ATTACH_REBUILD_LOG options.
·         When upgrading from SQL Server version 2000 or version 7.0 to SQL Server 2005.
·         When restoring the master database.

Logical and Physical File Names
SQL Server 2005 files have two names:

logical_file_name
The logical_file_name is the name used to refer to the physical file in all Transact-SQL statements. The logical file name must comply with the rules for SQL Server identifiers and must be unique among logical file names in the database.

os_file_name
The os_file_name is the name of the physical file including the directory path. It must follow the rules for the operating system file names.

The following illustration shows examples of the logical file names and the physical file names of a database created on a default instance of SQL Server 2005:


 SQL Server data and log files can be put on either FAT or NTFS file systems. NTFS is recommended for the security aspects of NTFS. Read/write data filegroups and log files cannot be placed on an NTFS compressed file system. Only read-only databases and read-only secondary filegroups can be put on an NTFS compressed file system.

When multiple instances of SQL Server are run on a single computer, each instance receives a different default directory to hold the files for the databases created in the instance.

Data File Pages
Pages in a SQL Server 2005 data file are numbered sequentially, starting with zero (0) for the first page in the file. Each file in a database has a unique file ID number. To uniquely identify a page in a database, both the file ID and the page number are required. The following example shows the page numbers in a database that has a 4-MB primary data file and a 1-MB secondary data file.


The first page in each file is a file header page that contains information about the attributes of the file. Several of the other pages at the start of the file also contain system information, such as allocation maps. One of the system pages stored in both the primary data file and the first log file is a database boot page that contains information about the attributes of the database.

File Size
SQL Server 2005 files can grow automatically from their originally specified size. When you define a file, you can specify a specific growth increment. Every time the file is filled, it increases its size by the growth increment. If there are multiple files in a filegroup, they will not autogrow until all the files are full. Growth then occurs in a round-robin fashion.

Each file can also have a maximum size specified. If a maximum size is not specified, the file can continue to grow until it has used all available space on the disk. This feature is especially useful when SQL Server is used as a database embedded in an application where the user does not have convenient access to a system administrator. The user can let the files autogrow as required to reduce the administrative burden of monitoring free space in the database and manually allocating additional space.

Database Filegroups
Database objects and files can be grouped together in filegroups for allocation and administration purposes. There are two types of filegroups:

Primary
The primary filegroup contains the primary data file and any other files not specifically assigned to another filegroup. All pages for the system tables are allocated in the primary filegroup.

User-defined
User-defined filegroups are any filegroups that are specified by using the FILEGROUP keyword in a CREATE DATABASE or ALTER DATABASE statement.

Log files are never part of a filegroup. Log space is managed separately from data space.
No file can be a member of more than one filegroup. Tables, indexes, and large object data can be associated with a specified filegroup. In this case, all their pages will be allocated in that filegroup, or the tables and indexes can be partitioned. The data of partitioned tables and indexes is divided into units each of which can be placed in a separate filegroup in a database. For more information about partitioned tables and indexes, see Partitioned Tables and Indexes.

One filegroup in each database is designated the default filegroup. When a table or index is created without specifying a filegroup, it is assumed all pages will be allocated from the default filegroup. Only one filegroup at a time can be the default filegroup. Members of the db_owner fixed database role can switch the default filegroup from one filegroup to another. If no default filegroup is specified, the primary filegroup is the default filegroup.

File and Filegroup Example
The following example creates a database on an instance of SQL Server. The database has a primary data file, a user-defined filegroup, and a log file. The primary data file is in the primary filegroup and the user-defined filegroup has two secondary data files. An ALTER DATABASE statement makes the user-defined filegroup the default. A table is then created specifying the user-defined filegroup.

USE master;
GO
-- Create the database with the default data
-- filegroup and a log file. Specify the
-- growth increment and the max size for the
-- primary data file.
CREATE DATABASE MyDB
ON PRIMARY
  ( NAME='MyDB_Primary',
    FILENAME=
       'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_Prm.mdf',
    SIZE=4MB,
    MAXSIZE=10MB,
    FILEGROWTH=1MB),
FILEGROUP MyDB_FG1
  ( NAME = 'MyDB_FG1_Dat1',
    FILENAME =
       'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_FG1_1.ndf',
    SIZE = 1MB,
    MAXSIZE=10MB,
    FILEGROWTH=1MB),
  ( NAME = 'MyDB_FG1_Dat2',
    FILENAME =
       'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_FG1_2.ndf',
    SIZE = 1MB,
    MAXSIZE=10MB,
    FILEGROWTH=1MB)
LOG ON
  ( NAME='MyDB_log',
    FILENAME =
       'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB.ldf',
    SIZE=1MB,
    MAXSIZE=10MB,
    FILEGROWTH=1MB);
GO
ALTER DATABASE MyDB
  MODIFY FILEGROUP MyDB_FG1 DEFAULT;
GO

-- Create a table in the user-defined filegroup.
USE MyDB;
CREATE TABLE MyTable
  ( cola int PRIMARY KEY,
    colb char(8) )
ON MyDB_FG1;
GO

The following illustration summarizes the results of the previous example.


Space Allocation and Reuse

SQL Server 2005 is effective at quickly allocating pages to objects and reusing space that is made available by deleted rows. These operations are internal to the system and use data structures that are not visible to users. However, these processes and structures are still occasionally referenced in SQL Server messages.

This section is an overview of the space allocation algorithms and the data structures. It also provides users and administrators with the knowledge they require to understand the references to the terms in the messages generated by SQL Server.

The SQL Server 2005 data structures that manage extent allocations and track free space have a relatively simple structure. This has the following benefits:

·         The free space information is densely packed, so relatively few pages contain this information.
This increases speed by reducing the amount of disk reads that are required to retrieve allocation information. This also increases the chance that the allocation pages will remain in memory and not require more reads.
·         Most of the allocation information is not chained together. This simplifies the maintenance of the allocation information.

Each page allocation or deallocation can be performed quickly. This decreases the contention between concurrent tasks having to allocate or deallocate pages.

SQL Server uses two types of allocation maps to record the allocation of extents:

·         Global Allocation Map (GAM)
GAM pages record what extents have been allocated. Each GAM covers 64,000 extents, or almost 4 GB of data. The GAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is free; if the bit is 0, the extent is allocated.
·         Shared Global Allocation Map (SGAM)
SGAM pages record which extents are currently being used as mixed extents and also have at least one unused page. Each SGAM covers 64,000 extents, or almost 4 GB of data. The SGAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is being used as a mixed extent and has a free page. If the bit is 0, the extent is not used as a mixed extent, or it is a mixed extent and all its pages are being used.

Each extent has the following bit patterns set in the GAM and SGAM, based on its current use.

Current use of Extent
GAM bit setting
SGAM bit setting
Free, not being used
1
0
Uniform extent, or full mixed extent
0
0
Mixed extent with free pages
0
1


This causes simple extent management algorithms. To allocate a uniform extent, the Database Engine searches the GAM for a 1 bit and sets it to 0. To find a mixed extent with free pages, the Database Engine searches the SGAM for a 1 bit. To allocate a mixed extent, the Database Engine searches the GAM for a 1 bit, sets it to 0, and then also sets the corresponding bit in the SGAM to 1. To deallocate an extent, the Database Engine makes sure that the GAM bit is set to 1 and the SGAM bit is set to 0. The algorithms that are actually used internally by the Database Engine are more sophisticated than what is described in this topic, because the Database Engine distributes data evenly in a database. However, even the real algorithms are simplified by not having to manage chains of extent allocation information.

Tracking Free Space
Page Free Space (PFS) pages record the allocation status of each page, whether an individual page has been allocated, and the amount of free space on each page. The PFS has one byte for each page, recording whether the page is allocated, and if so, whether it is empty, 1 to 50 percent full, 51 to 80 percent full, 81 to 95 percent full, or 96 to 100 percent full.

After an extent has been allocated to an object, the Database Engine uses the PFS pages to record which pages in the extent are allocated or free. This information is used when the Database Engine has to allocate a new page. The amount of free space in a page is only maintained for heap and Text/Image pages. It is used when the Database Engine has to find a page with free space available to hold a newly inserted row. Indexes do not require that the page free space be tracked, because the point at which to insert a new row is set by the index key values.

A PFS page is the first page after the file header page in a data file (page number 1). This is followed by a GAM page (page number 2), and then an SGAM page (page 3). There is a PFS page approximately 8,000 pages in size after the first PFS page. There is another GAM page 64,000 extents after the first GAM page on page 2, and another SGAM page 64,000 extents after the first SGAM page on page 3. The following illustration shows the sequence of pages used by the database engine to allocate and manage extents.


An Index Allocation Map (IAM) page maps the extents in a 4-GB part of a database file used by an allocation unit. An allocation unit is one of three types:

·         IN_ROW_DATA
Holds a partition of a heap or index.
·         LOB_DATA
Holds large object (LOB) data types, such as xml, varbinary(max), and varchar(max).
·         ROW_OVERFLOW_DATA
Holds variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060 byte row size limit.

Each partition of a heap or index contains at least an IN_ROW_DATA allocation unit. It may also contain a LOB_DATA or ROW_OVERFLOW_DATA allocation unit, depending on the heap or index schema. For more information about allocation units, see Table and Index Organization.

An IAM page covers a 4-GB range in a file and is the same coverage as a GAM or SGAM page. If the allocation unit contains extents from more than one file, or more than one 4-GB range of a file, there will be multiple IAM pages linked in an IAM chain. Therefore, each allocation unit has at least one IAM page for each file on which it has extents. There may also be more than one IAM page on a file, if the range of the extents on the file allocated to the allocation unit exceeds the range that a single IAM page can record.

IAM pages are allocated as required for each allocation unit and are located randomly in the file. The system view, sys.system_internals_allocation_units, points to the first IAM page for an allocation unit. All the IAM pages for that allocation unit are linked in a chain.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.


An IAM page has a header that indicates the starting extent of the range of extents mapped by the IAM page. The IAM page also has a large bitmap in which each bit represents one extent. The first bit in the map represents the first extent in the range, the second bit represents the second extent, and so on. If a bit is 0, the extent it represents is not allocated to the allocation unit owning the IAM. If the bit is 1, the extent it represents is allocated to the allocation unit owning the IAM page.

When the Database Engine has to insert a new row and no space is available in the current page, it uses the IAM and PFS pages to find a page to allocate, or, for a heap or a Text/Image page, a page with sufficient space to hold the row. The Database Engine uses the IAM pages to find the extents allocated to the allocation unit. For each extent, the Database Engine searches the PFS pages to see if there is a page that can be used. Each IAM and PFS page covers lots of data pages, so there are few IAM and PFS pages in a database. This means that the IAM and PFS pages are generally in memory in the SQL Server buffer pool, so they can be searched quickly. For indexes, the insertion point of a new row is set by the index key. In this case, the search process previously described does not occur.

The database engine allocates a new extent to an allocation unit only when it cannot quickly find a page in an existing extent with sufficient space to hold the row being inserted. The Database Engine allocates extents from those available in the filegroup using a proportional allocation algorithm. If a filegroup has two files and one has two times the free space as the other, two pages will be allocated from the file with the available space for every one page allocated from the other file. This means that every file in a filegroup should have a similar percentage of space used.

Tracking Modified Extends
SQL Server 2005 uses two internal data structures to track extents modified by bulk copy operations and extents modified since the last full backup. These data structures greatly speed up differential backups. They also speed up the logging of bulk copy operations when a database is using the bulk-logged recovery model. Like the Global Allocation Map (GAM) and Shared Global Allocation Map (SGAM) pages, these structures are bitmaps in which each bit represents a single extent.

·         Differential Changed Map (DCM):
This tracks the extents that have changed since the last BACKUP DATABASE statement. If the bit for an extent is 1, the extent has been modified since the last BACKUP DATABASE statement. If the bit is 0, the extent has not been modified.

Differential backups read just the DCM pages to determine which extents have been modified. This greatly reduces the number of pages that a differential backup must scan. The length of time that a differential backup runs is proportional to the number of extents modified since the last BACKUP DATABASE statement and not the overall size of the database.

·         Bulk Changed Map (BCM)
This tracks the extents that have been modified by bulk logged operations since the last BACKUP LOG statement. If the bit for an extent is 1, the extent has been modified by a bulk logged operation after the last BACKUP LOG statement. If the bit is 0, the extent has not been modified by bulk logged operations.

Although BCM pages appear in all databases, they are only relevant when the database is using the bulk-logged recovery model. In this recovery model, when a BACKUP LOG is performed, the backup process scans the BCMs for extents that have been modified. It then includes those extents in the log backup.

This lets the bulk logged operations be recovered if the database is restored from a database backup and a sequence of transaction log backups. BCM pages are not relevant in a database that is using the simple recovery model, because no bulk logged operations are logged. They are not relevant in a database that is using the full recovery model, because that recovery model treats bulk logged operations as fully logged operations.

The interval between DCM pages and BCM pages is the same as the interval between GAM and SGAM page, 64,000 extents. The DCM and BCM pages are located behind the GAM and SGAM pages in a physical file: