Job Search

Sunday, January 31, 2016

Index-Organized Tables

An index-organized table is a table stored in a variation of a B-tree index structure. In a heap-organized table, rows are inserted where they fit. In an index-organized table, rows are stored in an index defined on the primary key for the table. Each index entry in the B-tree also stores the non-key column values. Thus, the index is the data, and the data is the index. Applications manipulate index-organized tables just like heap-organized tables, using SQL statements.

For an analogy of an index-organized table, suppose a human resources manager has a book case of cardboard boxes. Each box is labeled with a number—1, 2, 3, 4, and so on—but the boxes do not sit on the shelves in sequential order. Instead, each box contains a pointer to the shelf location of the next box in the sequence.

Folders containing employee records are stored in each box. The folders are sorted by employee ID. Employee King has ID 100, which is the lowest ID, so his folder is at the bottom of box 1. The folder for employee 101 is on top of 100, 102 is on top of 101, and so on until box 1 is full. The next folder in the sequence is at the bottom of box 2.

In this analogy, ordering folders by employee ID makes it possible to search efficiently for folders without having to maintain a separate index. Suppose a user requests the records for employees 107, 120, and 122. Instead of searching an index in one step and retrieving the folders in a separate step, the manager can search the folders in sequential order and retrieve each folder as found.

Index-organized tables provide faster access to table rows by primary key or a valid prefix of the key. The presence of non-key columns of a row in the leaf block avoids an additional data block I/O. For example, the salary of employee 100 is stored in the index row itself. Also, because rows are stored in primary key order, range access by the primary key or prefix involves minimal block I/Os. Another benefit is the avoidance of the space overhead of a separate primary key index.

Index-organized tables are useful when related pieces of data must be stored together or data must be physically stored in a specific order. This type of table is often used for information retrieval, spatial and OLAP .

Index-organized tables are useful when

Ø  Basically an index-organized table is an index without a table. So if you have a table whose columns consist of the primary key and at most one other column then you have a possible candidate for INDEX ORGANIZED.

Ø  But if you find yourself contemplating the need for additional indexes on the non-primary key columns then you're probably better off with a regular heap table. So, as most tables probably need additional indexes most tables are not suitable for IOTs.

Ø  In practice, index organized tables are most likely to be reference data, code look-up affairs. Application tables are almost always just heap organized.

Ø  An Index-Organized Table--in contrast to an ordinary table--has its own way of structuring, storing, and indexing data.

Ø  Index organized tables (IOT) are indexes which actually hold the data which is being indexed, unlike the indexes which are stored somewhere else and have links to actual data.

Ø  An index-organized table is generally a good choice if you only access data from that table by the key, the whole key, and nothing but the key.

Ø  "small" "lookup" type tables (e.g. queried frequently, updated infrequently, fits in a relatively small number of blocks) are excellent candidates for IOTs.

Ø  Any table that you already are going to have an index that covers all the columns anyway (i.e. may as well save the space used by the table if the index duplicates 100% of the data) are excellent candidates for IOTs.

Index-Organized Table Characteristics

The database system performs all operations on index-organized tables by manipulating the B-tree index structure. Table summarizes the differences between index-organized tables and heap-organized tables.

Table : Comparison of Heap-Organized Tables with Index-Organized Tables
Heap-Organized Table
Index-Organized Table
The rowid uniquely identifies a row. Primary key constraint may optionally be defined.
Primary key uniquely identifies a row. Primary key constraint must be defined.
Physical rowid in ROWID pseudocolumn allows building secondary indexes.
Logical rowid in ROWID pseudocolumn allows building secondary indexes.
Individual rows may be accessed directly by rowid.
Access to individual rows may be achieved indirectly by primary key.
Sequential full table scan returns all rows in some order.
A full index scan or fast full index scan returns all rows in some order.
Can be stored in a table cluster with other tables.
Cannot be stored in a table cluster.
Can contain a column of the LONG data type and columns of LOB data types.
Can contain LOB columns but not LONG columns.
Can contain virtual columns (only relational heap tables are supported).
Cannot contain virtual columns.


Figure 1 illustrates the structure of an index-organized departments table. The leaf blocks contain the rows of the table, ordered sequentially by primary key. For example, the first value in the first leaf block shows a department ID of 20, department name of Marketing, manager ID of 201, and location ID of 1800.


Figure 1 Index-Organized Table



An index-organized table stores all data in the same structure and does not need to store the rowid. As shown in Figure 1, leaf block 1 in an index-organized table might contain entries as follows, ordered by primary key:

20,Marketing,201,1800
30,Purchasing,114,1700

Leaf block 2 in an index-organized table might contain entries as follows:
50,Shipping,121,1500
60,IT,103,1400

A scan of the index-organized table rows in primary key order reads the blocks in the following sequence:
  1. Block 1
  2. Block 2
  3.  
To contrast data access in a heap-organized table to an index-organized table, suppose block 1 of a heap-organized departments table segment contains rows as follows:

50,Shipping,121,1500
20,Marketing,201,1800

Block 2 contains rows for the same table as follows:
30,Purchasing,114,1700
60,IT,103,1400

A B-tree index leaf block for this heap-organized table contains the following entries, where the first value is the primary key and the second is the rowid:
20,AAAPeXAAFAAAAAyAAD
30,AAAPeXAAFAAAAAyAAA
50,AAAPeXAAFAAAAAyAAC
60,AAAPeXAAFAAAAAyAAB

A scan of the table rows in primary key order reads the table segment blocks in the following sequence:
  1. Block 1
  2. Block 2
  3. Block 1
  4. Block 2
Thus, the number of block I/Os in this example is double the number in the index-organized example.

Index-Organized Tables with Row Overflow Area

When creating an index-organized table, you can specify a separate segment as a row overflow area. In index-organized tables, B-tree index entries can be large because they contain an entire row, so a separate segment to contain the entries is useful. In contrast, B-tree entries are usually small because they consist of the key and rowid.
If a row overflow area is specified, then the database can divide a row in an index-organized table into the following parts:
·         The index entry
This part contains column values for all the primary key columns, a physical rowid that points to the overflow part of the row, and optionally a few of the non-key columns. This part is stored in the index segment.
·         The overflow part
This part contains column values for the remaining non-key columns. This part is stored in the overflow storage area segment.


Secondary Indexes on Index-Organized Tables

A secondary index is an index on an index-organized table. In a sense, it is an index on an index. The secondary index is an independent schema object and is stored separately from the index-organized table.

Oracle Database uses row identifiers called logical rowids for index-organized tables. A logical rowid is a base64-encoded representation of the table primary key. The logical rowid length depends on the primary key length.

Rows in index leaf blocks can move within or between blocks because of insertions. Rows in index-organized tables do not migrate as heap-organized rows do . Because rows in index-organized tables do not have permanent physical addresses, the database uses logical rowids based on primary key.

For example, assume that the departments table is index-organized. The location_id column stores the ID of each department. The table stores rows as follows, with the last value as the location ID:
10,Administration,200,1700
20,Marketing,201,1800
30,Purchasing,114,1700
40,Human Resources,203,2400

A secondary index on the location_id column might have index entries as follows, where the value following the comma is the logical rowid:
1700,*BAFAJqoCwR/+ 
1700,*BAFAJqoCwQv+
1800,*BAFAJqoCwRX+
2400,*BAFAJqoCwSn+

Secondary indexes provide fast and efficient access to index-organized tables using columns that are neither the primary key nor a prefix of the primary key. For example, a query of the names of departments whose ID is greater than 1700 could use the secondary index to speed data access.

Why Secondary Indexes have no ROWID

A direct pointer to the table row would be desirable for a secondary index as well. But that is only possible, if the table row stays at fixed storage positions. That is, unfortunately, not possible if the row is part of an index structure, which is kept in order. Keeping the index order needs to move rows occasionally. This is also true for operations that do not affect the row itself. An insert statement, for example, might split a leaf node to gain space for the new entry. That means that some entries are moved to a new data block at a different place.

A heap table, on the other hand, doesn’t keep the rows in any order. The database saves new entries wherever it finds enough space. Once written, data doesn’t move in heap tables.

Accessing a secondary index does not deliver a ROWID but a logical key for searching the clustered index. A single access, however, is not sufficient for searching clustered index—it requires a full tree traversal. That means that accessing a table via a secondary index searches two indexes: the secondary index once (INDEX RANGE SCAN), then the clustered index for each row found in the secondary index (INDEX UNIQUE SCAN).

Figure: Secondary Index on an IOT



Figure makes it clear, that the B-tree of the clustered index stands between the secondary index and the table data.

Important

Accessing an index-organized table via a secondary index is very inefficient.

Accessing an index-organized table via a secondary index is very inefficient, and it can be prevented in the same way one prevents a table access on a heap table: by using an index-only scan—in this case better described as “secondary-index-only scan”. The performance advantage of an index-only scan is even bigger because it not only prevents a single access but an entire INDEX UNIQUE SCAN.

Using this example we can also see that databases exploit all the redundancies they have. Bear in mind that a secondary index stores the clustering key for each index entry. Consequently, we can query the clustering key from a secondary index without accessing the index-organized table:

SELECT sale_id
  FROM sales_iot
 WHERE sale_date = ?
-------------------------------------------------
| Id | Operation        | Name           | Cost |
-------------------------------------------------
|  0 | SELECT STATEMENT |                |    4 |
|* 1 |  INDEX RANGE SCAN| SALES_IOT_DATE |    4 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SALE_DATE"=:DT)
 
The table SALES_IOT is an index-organized table that uses SALE_ID as clustering key. Although the index SALE_IOT_DATE is on the SALE_DATE column only, it still has a copy of the clustering key SALE_ID so it can satisfy the query using the secondary index only.
When selecting other columns, the database has to run an INDEX UNIQUE SCAN on the clustered index for each row:

SELECT eur_value
  FROM sales_iot
 WHERE sale_date = ?
---------------------------------------------------
| Id  | Operation         | Name           | Cost |
---------------------------------------------------
|   0 | SELECT STATEMENT  |                |   13 |
|*  1 |  INDEX UNIQUE SCAN| SALES_IOT_PK   |   13 |
|*  2 |   INDEX RANGE SCAN| SALES_IOT_DATE |    4 |
---------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SALE_DATE"=:DT)
   2 - access("SALE_DATE"=:DT)
 
Index-organized tables and clustered indexes are, after all, not as useful as it seems at first sight. Performance improvements on the clustered index are easily lost on when using a secondary index. The clustering key is usually longer than a ROWID so that the secondary indexes are larger than they would be on a heap table, often eliminating the savings from the omission of the heap table. The strength of index-organized tables and clustered indexes is mostly limited to tables that do not need a second index. Heap tables have the benefit of providing a stationary master copy that can be easily referenced.

Important

Tables with one index only are best implemented as clustered indexes or index-organized tables.

Tables with more indexes can often benefit from heap tables. You can still use index-only scans to avoid the table access. This gives you the select performance of a clustered index without slowing down other indexes.

The Oracle database uses heap tables by default. Index-organized tables can be created using the ORGANIZATION INDEX clause:

CREATE TABLE (
   id    NUMBER NOT NULL PRIMARY KEY,
   [...]
) ORGANIZATION INDEX
The Oracle database always uses the primary key as the clustering key.

Logical Rowids and Physical Guesses

Secondary indexes use the logical rowids to locate table rows. A logical rowid includes a physical guess, which is the physical rowid of the index entry when it was first made. Oracle Database can use physical guesses to probe directly into the leaf block of the index-organized table, bypassing the primary key search. When the physical location of a row changes, the logical rowid remains valid even if it contains a physical guess that is stale.

For a heap-organized table, access by a secondary index involves a scan of the secondary index and an additional I/O to fetch the data block containing the row. For index-organized tables, access by a secondary index varies, depending on the use and accuracy of physical guesses:
·         Without physical guesses, access involves two index scans: a scan of the secondary index followed by a scan of the primary key index.

·         With physical guesses, access depends on their accuracy:
o    With accurate physical guesses, access involves a secondary index scan and an additional I/O to fetch the data block containing the row.
o    With inaccurate physical guesses, access involves a secondary index scan and an I/O to fetch the wrong data block (as indicated by the guess), followed by an index unique scan of the index organized table by primary key value.

Bitmap Indexes on Index-Organized Tables

A secondary index on an index-organized table can be a bitmap index. As explained in "Bitmap Indexes", a bitmap index stores a bitmap for each index key.
When bitmap indexes exist on an index-organized table, all the bitmap indexes use a heap-organized mapping table. The mapping table stores the logical rowids of the index-organized table. Each mapping table row stores one logical rowid for the corresponding index-organized table row.
The database accesses a bitmap index using a search key. If the database finds the key, then the bitmap entry is converted to a physical rowid. With heap-organized tables, the database uses the physical rowid to access the base table. With index-organized tables, the database uses the physical rowid to access the mapping table, which in turn yields a logical rowid that the database uses to access the index-organized table. Figure 2 illustrates index access for a query of the departments_iot table.

Figure 2 Bitmap Index on Index-Organized Table



Note:
Movement of rows in an index-organized table does not leave the bitmap indexes built on that index-organized table unusable.


I hope you all have enjoyed reading this article. Comments are welcome....


No comments:

Post a Comment