An index-organized table is
a table stored in a variation of a B-tree index structure. In a heap-organized table, rows are inserted where they fit. In
an index-organized table, rows are stored in an index defined on the primary
key for the table. Each index entry in the B-tree also stores the non-key
column values. Thus, the index is the data, and the data is the index.
Applications manipulate index-organized tables just like heap-organized tables,
using SQL statements.
For an analogy of an index-organized table, suppose a human
resources manager has a book case of cardboard boxes. Each box is labeled with
a number—1, 2, 3, 4, and so on—but the boxes do not sit on the shelves in
sequential order. Instead, each box contains a pointer to the shelf location of
the next box in the sequence.
Folders containing employee records are stored in each
box. The folders are sorted by employee ID. Employee King has ID 100, which is
the lowest ID, so his folder is at the bottom of box 1. The folder for employee
101 is on top of 100, 102 is on top of 101, and so on until box 1 is full. The
next folder in the sequence is at the bottom of box 2.
In this analogy, ordering folders by employee ID makes it
possible to search efficiently for folders without having to maintain a
separate index. Suppose a user requests the records for employees 107, 120, and
122. Instead of searching an index in one step and retrieving the folders in a
separate step, the manager can search the folders in sequential order and
retrieve each folder as found.
Index-organized
tables provide faster access to table rows by primary key or a valid prefix of
the key. The presence of non-key columns of a row in the leaf block
avoids an additional data block I/O. For example, the salary of employee
100 is stored in the index row itself. Also, because rows are stored in primary
key order, range access by the primary key or prefix involves minimal block
I/Os. Another benefit is the avoidance of the space overhead of a separate
primary key index.
Index-organized
tables are useful when related pieces of data must be stored together or data
must be physically stored in a specific order. This type of table is often used
for information retrieval, spatial and OLAP .
Index-organized tables are useful when
Ø Basically an
index-organized table is an index without a table. So if you have a table whose
columns consist of the primary key and at most one other column then you have a
possible candidate for INDEX ORGANIZED.
Ø But if you find yourself
contemplating the need for additional indexes on the non-primary key columns
then you're probably better off with a regular heap table. So, as most tables
probably need additional indexes most tables are not suitable for IOTs.
Ø In practice, index
organized tables are most likely to be reference data, code look-up affairs.
Application tables are almost always just heap organized.
Ø An Index-Organized
Table--in contrast to an ordinary table--has its own way of structuring, storing,
and indexing data.
Ø Index organized tables
(IOT) are indexes which actually hold the data which is being indexed, unlike
the indexes which are stored somewhere else and have links to actual data.
Ø An index-organized table
is generally a good choice if you only access data from that table by the key,
the whole key, and nothing but the key.
Ø "small"
"lookup" type tables (e.g. queried frequently, updated infrequently,
fits in a relatively small number of blocks) are excellent candidates for IOTs.
Ø Any table that you
already are going to have an index that covers all the columns anyway (i.e. may
as well save the space used by the table if the index duplicates 100% of the
data) are excellent candidates for IOTs.
Index-Organized
Table Characteristics
The database system performs all operations on
index-organized tables by manipulating the B-tree index structure. Table summarizes the differences
between index-organized tables and heap-organized tables.
Heap-Organized Table
|
Index-Organized Table
|
The rowid uniquely identifies a row.
Primary key constraint may optionally be defined.
|
Primary key uniquely identifies
a row. Primary key constraint must be defined.
|
Logical rowid in
ROWID pseudocolumn allows building
secondary indexes. |
|
Individual rows may be accessed
directly by rowid.
|
Access to individual rows may
be achieved indirectly by primary key.
|
Sequential full table scan returns all rows in some order.
|
A full index scan or fast full index scan returns all
rows in some order.
|
Can be stored in a table cluster with other tables.
|
Cannot be stored in a table
cluster.
|
Can contain a column of the
LONG data type and columns of LOB data
types. |
Can contain LOB columns but not
LONG columns. |
Can contain virtual columns (only relational heap tables are
supported).
|
Cannot contain virtual columns.
|
Figure 1 illustrates the structure of an index-organized departments table. The leaf blocks contain
the rows of the table, ordered sequentially by primary key. For example, the
first value in the first leaf block shows a department ID of 20, department name of Marketing, manager ID of 201, and location ID of 1800.
An index-organized table stores all
data in the same structure and does not need to store the rowid. As shown in Figure 1, leaf block 1 in an index-organized
table might contain entries as follows, ordered by primary key:
20,Marketing,201,1800
30,Purchasing,114,1700
Leaf block 2 in an index-organized
table might contain entries as follows:
50,Shipping,121,1500
60,IT,103,1400
A scan of the index-organized table
rows in primary key order reads the blocks in the following sequence:
- Block 1
- Block 2
To contrast data access in a
heap-organized table to an index-organized table, suppose block 1 of a
heap-organized departments table segment contains rows as follows:
50,Shipping,121,1500
20,Marketing,201,1800
Block 2 contains rows for the same
table as follows:
30,Purchasing,114,1700
60,IT,103,1400
A B-tree index leaf block for this
heap-organized table contains the following entries, where the first value is
the primary key and the second is the rowid:
20,AAAPeXAAFAAAAAyAAD
30,AAAPeXAAFAAAAAyAAA
50,AAAPeXAAFAAAAAyAAC
60,AAAPeXAAFAAAAAyAAB
A scan of the table rows in primary
key order reads the table segment blocks in the following sequence:
- Block 1
- Block 2
- Block 1
- Block 2
Thus, the number of block I/Os in
this example is double the number in the index-organized example.
Index-Organized
Tables with Row Overflow Area
When
creating an index-organized table, you can specify a separate segment as a row overflow area.
In index-organized tables, B-tree index entries can be large because they
contain an entire row, so a separate segment to contain the entries is useful.
In contrast, B-tree entries are usually small because they consist of the key
and rowid.
If a row overflow area is specified, then the database
can divide a row in an index-organized table into the following parts:
·
The index entry
This
part contains column values for all the primary key columns, a physical rowid
that points to the overflow part of the row, and optionally a few of the
non-key columns. This part is stored in the index segment.
·
The overflow part
This
part contains column values for the remaining non-key columns. This part is
stored in the overflow storage area segment.
Secondary
Indexes on Index-Organized Tables
A secondary index is an index on an index-organized
table. In a sense, it is an index on an index. The secondary index is an
independent schema object and is stored separately from the index-organized
table.
Oracle Database uses row identifiers called logical rowids for
index-organized tables. A logical rowid is a base64-encoded representation of
the table primary key. The logical rowid length depends on the primary key
length.
Rows in index leaf blocks can move within or between
blocks because of insertions. Rows in index-organized tables do not migrate as
heap-organized rows do . Because rows in index-organized tables do not have
permanent physical addresses, the database uses logical
rowids based on primary key.
For example, assume that the
departments
table is
index-organized. The location_id
column stores the ID of
each department. The table stores rows as follows, with the last value as the
location ID:10,Administration,200,1700
20,Marketing,201,1800
30,Purchasing,114,1700
40,Human Resources,203,2400
A secondary index on the
location_id
column might have index
entries as follows, where the value following the comma is the logical rowid:1700,*BAFAJqoCwR/+
1700,*BAFAJqoCwQv+
1800,*BAFAJqoCwRX+
2400,*BAFAJqoCwSn+
Secondary indexes provide fast and efficient access to
index-organized tables using columns that are neither the primary key nor a
prefix of the primary key. For example, a query of the names of departments
whose ID is greater than 1700 could use the secondary index to speed data
access.
Why Secondary Indexes have no ROWID
A direct pointer to the table row would be desirable
for a secondary index as well. But that is only possible, if the table row
stays at fixed storage positions. That is, unfortunately, not possible if the
row is part of an index structure, which is kept in order. Keeping the index
order needs to move rows occasionally. This is also true for operations that do
not affect the row itself. An insert statement, for example, might split a leaf node to
gain space for the new entry. That means that some entries are moved to a new
data block at a different place.
A heap table, on
the other hand, doesn’t keep the rows in any order. The database saves new
entries wherever it finds enough space. Once written, data doesn’t move in heap
tables.
Accessing a secondary
index does not deliver a
ROWID
but a logical key for searching the
clustered index. A single access, however, is not sufficient for searching
clustered index—it requires a full tree traversal. That means that accessing a
table via a secondary index searches two indexes: the secondary index once (INDEX RANGE SCAN
), then the
clustered index for each row found in the secondary index (INDEX UNIQUE SCAN
).
Figure: Secondary Index on an IOT
Figure makes it clear, that the B-tree of the clustered index
stands between the secondary index and the table data.
Important
Accessing an
index-organized table via a secondary index is very inefficient.
Accessing an
index-organized table via a secondary index is very inefficient, and it can be
prevented in the same way one prevents a table access on a heap table: by using
an index-only scan—in this
case better described as “secondary-index-only scan”. The performance advantage
of an index-only scan is even bigger because it not only prevents a single
access but an entire
INDEX UNIQUE SCAN
.
Using this example
we can also see that databases exploit all the redundancies they have. Bear in
mind that a secondary index stores the clustering key for each index entry.
Consequently, we can query the clustering key from a secondary index without
accessing the index-organized table:
SELECT sale_id
FROM sales_iot
WHERE sale_date = ?
-------------------------------------------------
| Id | Operation | Name | Cost |
-------------------------------------------------
| 0 | SELECT STATEMENT | | 4 |
|* 1 | INDEX RANGE SCAN| SALES_IOT_DATE | 4 |
-------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("SALE_DATE"=:DT)
The table
SALES_IOT
is an
index-organized table that uses SALE_ID
as clustering key.
Although the index SALE_IOT_DATE
is on the SALE_DATE
column only, it
still has a copy of the clustering key SALE_ID
so it can satisfy
the query using the secondary index only.
When selecting
other columns, the database has to run an
INDEX
UNIQUE SCAN
on the clustered
index for each row:SELECT eur_value
FROM sales_iot
WHERE sale_date = ?
---------------------------------------------------
| Id | Operation | Name | Cost |
---------------------------------------------------
| 0 | SELECT STATEMENT | | 13 |
|* 1 | INDEX UNIQUE SCAN| SALES_IOT_PK | 13 |
|* 2 | INDEX RANGE SCAN| SALES_IOT_DATE | 4 |
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("SALE_DATE"=:DT)
2 - access("SALE_DATE"=:DT)
Index-organized
tables and clustered indexes are, after all, not as useful as it seems at first
sight. Performance improvements on the clustered index are easily lost on when
using a secondary index. The clustering key is usually longer than a
ROWID
so that the
secondary indexes are larger than they would be on a heap table, often
eliminating the savings from the omission of the heap table. The strength of
index-organized tables and clustered indexes is mostly limited to tables that
do not need a second index. Heap tables have the benefit of providing a
stationary master copy that can be easily referenced.
Important
Tables with one
index only are best implemented as clustered indexes or index-organized tables.
Tables with more
indexes can often benefit from heap tables. You can still use index-only scans
to avoid the table access. This gives you the select performance of a
clustered index without slowing down other indexes.
The Oracle database uses heap tables by default.
Index-organized tables can be created using the
ORGANIZATION INDEX
clause:CREATE TABLE (
id NUMBER NOT NULL PRIMARY KEY,
[...]
) ORGANIZATION INDEX
The Oracle database
always uses the primary key as the clustering key.
Logical Rowids and Physical
Guesses
Secondary indexes use the logical rowids to locate table
rows. A logical rowid includes a physical guess, which is the physical rowid of the
index entry when it was first made. Oracle Database can use physical guesses to
probe directly into the leaf block of the index-organized table, bypassing the
primary key search. When the physical location of a row changes, the logical
rowid remains valid even if it contains a physical guess that is stale.
For a heap-organized table, access by a secondary index
involves a scan of the secondary index and an additional I/O to fetch the data block containing the row. For
index-organized tables, access by a secondary index varies, depending on the
use and accuracy of physical guesses:
·
Without physical guesses, access involves two index
scans: a scan of the secondary index followed by a scan of the primary key
index.
·
With physical guesses, access depends on their accuracy:
o With accurate physical guesses, access involves a secondary
index scan and an additional I/O to fetch the data block containing the row.
o With inaccurate physical guesses, access involves a secondary
index scan and an I/O to fetch the wrong data block (as indicated by the
guess), followed by an index unique scan of the index organized table by
primary key value.
Bitmap Indexes on Index-Organized
Tables
A secondary index on an index-organized table can be a bitmap index. As explained in "Bitmap Indexes", a bitmap
index stores a bitmap for each index key.
When bitmap indexes exist on an index-organized table,
all the bitmap indexes use a heap-organized mapping table. The mapping table stores the logical
rowids of the index-organized table. Each mapping table row stores one logical
rowid for the corresponding index-organized table row.
The database accesses a bitmap index using a search key.
If the database finds the key, then the bitmap entry is converted to a physical
rowid. With heap-organized tables, the database uses the physical rowid to
access the base table. With index-organized tables, the database uses the
physical rowid to access the mapping table, which in turn yields a logical
rowid that the database uses to access the index-organized table. Figure
2 illustrates index access for a query of the
departments_iot
table.
Figure 2 Bitmap Index on
Index-Organized Table
Note:
Movement of rows in an index-organized
table does not leave the bitmap indexes built on that index-organized table
unusable.
I hope you all have enjoyed reading this article. Comments are welcome....
No comments:
Post a Comment