Thursday 7 July 2011

Sql Server Table and Index Architecure

Table and Index Architecture

Objects in the SQL Server 2005 database are stored as a collection of 8-KB pages. This section describes the way the pages for tables and indexes are organized, stored, and accessed.

Table and Index Organization
The following illustration shows the organization of a table. A table is contained in one or more partitions and each partition contains data rows in either a heap or a clustered index structure. The pages of the heap or clustered index are managed in one or more allocation units, depending on the column types in the data rows.

Partitions
In SQL Server 2005, table and index pages are contained in one or more partitions. A partition is a user-defined unit of data organization. By default, a table or index has only one partition that contains all the table or index pages. The partition resides in a single filegroup. A table or index with a single partition is equivalent to the organizational structure of tables and indexes in earlier versions of SQL Server.

When a table or index uses multiple partitions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column. The partitions can be put on one or more filegroups in the database. The table or index is treated as a single logical entity when queries or updates are performed on the data.

To view the partitions used by a table or index, use the sys.partitions (Transact-SQL) catalog view.

Clustered Tables, Heaps, and Indexes
SQL Server 2005 tables use one of two methods to organize their data pages within a partition:

·         Clustered tables are tables that have a clustered index.
The data rows are stored in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values. The pages in each level of the index, including the data pages in the leaf level, are linked in a doubly-linked list. However, navigation from one level to another is performed by using key values. For more information, see
Clustered Index Structures.
·         Heaps are tables that have no clustered index.
The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list.

Indexed views have the same storage structure as clustered tables.
When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition. For example, if a clustered table has four partitions, there are four B-trees; one in each partition.

Nonclustered Indexes
Nonclustered indexes have a B-tree index structure similar to the one in clustered indexes. The difference is that nonclustered indexes do not affect the order of the data rows. The leaf level contains index rows. Each index row contains the nonclustered key value, a row locator and any included, or nonkey, columns. The locator points to the data row that has the key value.

XML Indexes
One primary and several secondary XML indexes can be created on each xml column in the table. An XML index is a shredded and persisted representation of the XML binary large objects (BLOBs) in the xml data type column. XML indexes are stored as internal tables. To view information about xml indexes, use the sys.xml_indexes or sys.internal_tables catalog views.

Allocation Units
An allocation unit is a collection of pages within a heap or B-tree used to manage data based on their page type. The following table lists the types of allocation units used to manage data in tables and indexes.

Allocation unit type
Is used to manage
IN_ROW_DATA

Data or index rows that contain all data, except large object (LOB) data.
Pages are of type Data or Index.
LOB_DATA

Large object data stored in one or more of these data types: text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), or CLR user-defined types (CLR UDT).
Pages are of type Text/Image.
ROW_OVERFLOW_DATA

Variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060 byte row size limit.
Pages are of type Data.


A heap or B-tree can have only one allocation unit of each type in a specific partition. To view the table or index allocation unit information, use the sys.allocation_units catalog view.

IN_ROW_DATA Allocation Unit
For every partition used by a table (heap or clustered table), index, or indexed view, there is one IN_ROW_DATA allocation unit that is made up of a collection of data pages. This allocation unit also contains additional collections of pages to implement each nonclustered and XML index defined for the table or view. The page collections in each partition of a table, index, or indexed view are anchored by page pointers in the sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

Each table, index, and indexed view partition has a row in sys.system_internals_allocation_units uniquely identified by a container ID (container_id). The container ID has a one-to-one mapping to the partition_id in the sys.partitions catalog view that maintains the relationship between the table, index, or the indexed view data stored in a partition and the allocation units used to manage the data within the partition.

The allocation of pages to a table, index, or an indexed view partition is managed by a chain of IAM pages. The column first_iam_page in sys.system_internals_allocation_units points to the first IAM page in the chain of IAM pages managing the space allocated to the table, index, or the indexed view in the IN_ROW_DATA allocation unit.

sys.partitions returns a row for each partition in a table or index.

·         A heap has a row in sys.partitions with index_id = 0.
The first_iam_page column in sys.system_internals_allocation_units points to the IAM chain for the collection of heap data pages in the specified partition. The server uses the IAM pages to find the pages in the data page collection, because they are not linked.
·         A clustered index on a table or a view has a row in sys.partitions with index_id = 1.
The root_page column in sys.system_internals_allocation_units points to the top of the clustered index B-tree in the specified partition. The server uses the index B-tree to find the data pages in the partition.
·         Each nonclustered index created for a table or a view has a row in sys.partitions with index_id > 1.
The root_page column in sys.system_internals_allocation_units points to the top of the nonclustered index B-tree in the specified partition.
·         Each table that has at least one LOB column also has a row in sys.partitions with index_id > 250.
The first_iam_page column points to the chain of IAM pages that manage the pages in the LOB_DATA allocation unit.

ROW_OVERFLOW_DATA Allocation Unit
For every partition used by a table (heap or clustered table), index, or indexed view, there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero (0) pages until a data row with variable length columns (varchar, nvarchar, varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row size limit. When the size limitation is reached, SQL Server moves the column with the largest width from that row to a page in the ROW_OVERFLOW_DATA allocation unit. A 24-byte pointer to this off-row data is maintained on the original page.

Text/Image pages in the ROW_OVERFLOW_DATA allocation unit are managed in the same way pages in the LOB_DATA allocation unit are managed. That is, the Text/Image pages are managed by a chain of IAM pages.

LOB_DATA Allocation Unit
When a table or index has one or more LOB data types, one LOB_DATA allocation unit per partition is allocated to manage the storage of that data. The LOB data types include text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined types.

Partition and Allocation Unit Example
The following example returns partition and allocation unit data for two tables: DatabaseLog, a heap with LOB data and no nonclustered indexes, and Currency, a clustered table without LOB data and one nonclustered index. Both tables have a single partition.

USE AdventureWorks;
GO
SELECT o.name AS table_name,p.index_id, i.name AS index_name , au.type_desc AS allocation_type, au.data_pages, partition_number
FROM sys.allocation_units AS au
    JOIN sys.partitions AS p ON au.container_id = p.partition_id
    JOIN sys.objects AS o ON p.object_id = o.object_id
    JOIN sys.indexes AS i ON p.index_id = i.index_id AND i.object_id = p.object_id
WHERE o.name = N'DatabaseLog' OR o.name = N'Currency'
ORDER BY o.name, p.index_id;

Here is the result set. Notice that the DatabaseLog table uses all three allocation unit types, because it contains both data and Text/Image page types. The Currency table does not have LOB data, but does have the allocation unit required to manage data pages. If the Currency table is later modified to include a LOB data type column, a LOB_DATA allocation unit is created to manage that data.

table_name  index_id index_name               allocation_type     data_pages  partition_number
----------- -------- -----------------------  ---------------     -----------  ------------
Currency    1        PK_Currency_CurrencyCode IN_ROW_DATA         1           1
Currency    3        AK_Currency_Name         IN_ROW_DATA         1           1
DatabaseLog 0        NULL                     IN_ROW_DATA         160         1
DatabaseLog 0        NULL                     ROW_OVERFLOW_DATA   0           1
DatabaseLog 0        NULL                     LOB_DATA            49          1
(5 row(s) affected)

Heap Structures
A heap is a table without a clustered index. Heaps have one row in sys.partitions, with index_id = 0 for each partition used by the heap. By default, a heap has a single partition. When a heap has multiple partitions, each partition has a heap structure that contains the data for that specific partition. For example, if a heap has four partitions, there are four heap structures; one in each partition.

Depending on the data types in the heap, each heap structure will have one or more allocation units to store and manage the data for a specific partition. At a minimum, each heap will have one IN_ROW_DATA allocation unit per partition. The heap will also have one LOB_DATA allocation unit per partition, if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition, if it contains variable length columns that exceed the 8,060 byte row size limit.

The column first_iam_page in the sys.system_internals_allocation_units system view points to the first IAM page in the chain of IAM pages that manage the space allocated to the heap in a specific partition. SQL Server 2005 uses the IAM pages to move through the heap. The data pages and the rows within them are not in any specific order and are not linked. The only logical connection between data pages is the information recorded in the IAM pages.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

Table scans or serial reads of a heap can be performed by scanning the IAM pages to find the extents that are holding pages for the heap. Because the IAM represents extents in the same order that they exist in the data files, this means that serial heap scans progress sequentially through each file. Using the IAM pages to set the scan sequence also means that rows from the heap are not typically returned in the order in which they were inserted.

The following illustration shows how the SQL Server Database Engine uses IAM pages to retrieve data rows in a single partition heap.


Clustered Index Structures
In SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and leaf nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.
Clustered indexes have one row in sys.partitions, with index_id = 1 for each partition used by the index. By default, a clustered index has a single partition. When a clustered index has multiple partitions, each partition has a B-tree structure that contains the data for that specific partition. For example, if a clustered index has four partitions, there are four B-tree structures; one in each partition.
Depending on the data types in the clustered index, each clustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each clustered index will have one IN_ROW_DATA allocation unit per partition. The clustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit. For more information about allocation units, see Table and Index Organization.

The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows. The page collections for the B-tree are anchored by page pointers in the sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of the clustered index for a specific partition. SQL Server moves down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.

This illustration shows the structure of a clustered index in a single partition.


Nonclustered Index Structures
Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
·         The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
·         The leaf layer of a nonclustered index is made up of index pages instead of data pages.

Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.

The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following:

·         If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).
·         If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2005 makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

Nonclustered indexes have one row in sys.partitions with index_id >1 for each partition used by the index. By default, a nonclustered index has a single partition. When a nonclustered index has multiple partitions, each partition has a B-tree structure that contains the index rows for that specific partition. For example, if a nonclustered index has four partitions, there are four B-tree structures, with one in each partition.

Depending on the data types in the nonclustered index, each nonclustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each nonclustered index will have one IN_ROW_DATA allocation unit per partition that stores the index B-tree pages. The nonclustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns . Additionally, it will have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit. For more information about allocation units, see Table and Index Organization. The page collections for the B-tree are anchored by root_page pointers in the sys.system_internals_allocation_units system view.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

Included Column Indexes
In SQL Server 2005, the functionality of nonclustered indexes can be extended by adding included columns, called nonkey columns, to the leaf level of the index. While the key columns are stored at all levels of the nonclustered index, nonkey columns are stored only at the leaf level.

No comments:

Post a Comment