Oracle
Database Advanced Indexing
B-Tree Indexes
- B-Tree Indexes (balanced tree) are the most common type of index.
- B-Tree index stored the ROWID and the index key value in a tree
structure.
- When creating an index, a ROOT block is created, and then BRANCH
blocks are created and finally LEAF blocks.
- Each branch holds the range of data its leaf blocks hold, and each
root holds the range of data its branches hold:
- B-Tree indexes are most useful on columns that appear in the where
clause (SELECT … WHERE EMPNO=1).
- The Oracle server keeps the tree balanced by splitting index
blocks, when new data is inserted to the table.
- Whenever a DML statement is performed on the index’s table, index
activity occurs, making the index to grow (add leaf and branches).
Advantages
- All leaf blocks of the tree are at the same depth.
- B-tree indexes automatically stay balanced.
- All blocks of the B-tree are three-quarters full on the average.
- B-trees provide excellent retrieval performance for a wide range of
queries, including exact match and range searches.
- Inserts, updates, and deletes are efficient, maintaining key order
for fast retrieval.
- B-tree performance is good for both small and large tables, and
does not degrade as the size of a table grows.
CREATE
<UNIQUE|NON UNIQUE> INDEX <index_name>
ON <table_name> (<column_name>,<column_name>…)
TABLESPACE <tablespace_name>;
Example
Create index scott.exp_idx on table
scott.example( name)
Tablespace TOOLS;
What is Reverse-Key Indexes
They are special types of B-Tree indexes and are very useful
when created on columns contain sequential numbers.
When using a regular B-Tree, the index will grow to have many
branches and perhaps several levels, thus causing performance degradation, the
RKI solve the problem by reversing the bytes of each column key and indexing
the new data.
We
can create an index as a reverse key index for performance reasons; they’re
especially suited for Oracle RAC environments. A reverse key index stores the
index entries with their bytes reversed. The ROWIDs are stored in the same format as a
regular index.
When we insert rows in a
column where the database populates one of the columns using an increasing
sequence, each new entry is inserted into the same index block. When each new
key value in an index is greater than the previous value, it is said to be a monotonically
increasing value.
In a RAC database, when
multiple sessions simultaneously insert data into monotonically increasing
index nodes, it leads to contention for the same index block. A reverse key
index prevents this contention because the database inserts the successive
index entries into indexed blocks that are spread out. Although RAC
environments usually use reverse key indexes, any high volume transaction processing
system that is experiencing contention for index blocks could potentially
benefit from this type of an index.
A reverse key index is
actually simple: it simply reverses the index column values before inserting
(storing) into the index.
Instead of storing new
values in the same “hot” index block, the database spreads around the new
entries across a number of blocks, reducing contention for a busy block and
thus, buffer contention (denoted by the buffer busy wait event).
You can see how an index
that’s populated with sequentially increasing values results in all values
going to the rightmost index leaf block. Let’s say that you create a primary
key with the following values:
696900
696901
696902
696903
696904
...
In a regular B-tree index,
these sequential index values are all stored in an identical rightmost index
block, increasing contention for that block. In a reverse key index, Oracle
inserts the same values in the following manner:
009696
109696
209696
309696
409696
...
As we can see, the index values
aren’t in sequential order, although the actual values of the primary key are.
Oracle reverses the bytes in the data before inserting into the index. This
reversing of the values naturally spreads the key values by storing them
non-sequentially all through the index blocks instead of storing them in
sequential order. During an insertion, the reverse key index distributes
insertions across all the leaf keys in the index, thus avoiding hotspots and
enhancing performance.
Using
a reverse key index often speeds up data loads in an Oracle RAC environment
during batch loads. While reverse key indexes are often mentioned in the
context of an Oracle RAC environment, you can consider them even for a single
instance databases if you notice contention for index blocks, shown by the
buffer busy wait event. There are many applications where a primary key column
is populated by an increasing sequence. Often, you’ll notice a significant
buffer busy wait time for the index segments. Consider a reverse key index for these
kinds of situations. The reverse key index has the potential to dramatically
reduce the buffer busy waits and speed up performance.
Disadvantages
of a Reverse Key Index
Ø Can’t
perform range scans on these indexes.
This is because the index entries are
scattered all over instead of being stored sequentially. Reversing the index
key values randomly distributes the index blocks across the index leaf nodes.
Ø There
could be a slight overhead in terms of CPU usage when the database searches on
a reverse key index.
The reason, of course, is that the
database has to reverse the order of bytes in the values of the index so they
represent the actual values stored in the table.
When
to Use a Reverse Key Index
Facing
index block contention, which reveals itself through the well-known “buffer
busy” and “read by other session” wait events. This is usually due to a
situation where you’re dealing with an Oracle RAC OLTP environment and there
are large numbers of concurrent inserts.
If our
application uses monotonically increasing indexes, such as when you use a
primary key generated by an Oracle sequence, you often encounter these types of
contention. In this context, inserts are said to contend for the rightmost
index block because that where the highest values of the sequentially primary
key are stored. In a RAC environment especially, this rightmost index leaf
block contention leads to block transfers among the instances, leading to slow
performance.
In a
RAC environment, Oracle recommends that you use the NOORDER and the CACHE options when creating a sequence in
order to reduce index contention. The rowlock cache statistics from the V$SYSTEM_EVENT view
tells you if monotonically increasing sequences are causing contention in the
database. You can examine the EQ_TYPE column
in the GV$ENQUEUE_STAT view
to determine if sequences are causing any index enqueue waits. If you see a
value of SQ Enqueue for the EQ_TYPE column,
it is usually an indication of contention for a sequence.
If the index key values are derived
from a sequence, you can increase the size of the sequence cache to reduce
index contention. You can easily raise the cache size for a sequence with an alter sequence statement.
Creating
a Reverse Key Index
This method distributes the data evenly in the index. Creating a
RKI is done using the the reverse clause at the end of a regular index
creation statement.
CREATE
INDEX <index_name>
ON <table_name> (<column_name>)
TABLESPACE <tablespace_name>
REVERSE;
Example
CREATE INDEX emp_idx i ON emp_table (firstname,lastname)
REVERSE;
create index dept_idx on emp(department_id) reverse;
We can also convert a normal index into a reverse key index by rebuilding it with the reverse clause, as shown by the following alter index statement:
We can also convert a normal index into a reverse key index by rebuilding it with the reverse clause, as shown by the following alter index statement:
alter index my_idx1 rebuild reverse;
We can change a reverse key index into a regular index by specifying the noreverse clause in an alter index statement
We can change a reverse key index into a regular index by specifying the noreverse clause in an alter index statement
alter inde myidx1 rebuild norverse;
Composite
B-Tree Indexes
Composite Indexes we can
create an index on multiple columns in a table. If we want to create an index
on the EMPLOYEE_ID and DEPARTMENT_ID columns in the employees table, for
example, we can do so, and the result is called a composite or concatenated
index.
Here's an example:
SQL> create index test_idx1 on
employees(employee_id,department_id);
We can create composite
B-tree indexes as well bitmap indexes. The optimizer will take into account a
composite index when the WHERE clause in a query refers to all the columns in
the index or even the leading column. The previous example showed a composite
index with just two columns, but we can have even more columns if we wish.
The following example shows
a composite index created by using three columns (LAST_NAME, JOB_ID, and
SALARY) from the employees table in the HR schema. We do this when you have an
application that frequently refers to these three columns.
SQL> create index employees_idx1 on employees
(last_name,job_id,salary);
Once we create this index,
any query that refers to the LAST_NAME column, or the LAST_NAME and JOB_ID
columns, or all three columns is likely to cause the optimizer to use the
index. A query that doesn't include the leading column in a composite index (LAST_NAME,
in this example) will ignore the index. At least, this was the traditional
behavior. The introduction of index skip scans changes this default behavior.
Understanding Index Skip Scans and Composite Indexes
In earlier releases of the
Oracle Database, a SQL statement used a composite index only if the statement's
constructs used a leading portion of the index. The leading portion of an index
is one or more columns in the index that are specified first in the list of
columns.
For example, say we have the
following composite index:
SQL> create index mycomp_idx on table mytable(a,b,c);
In this index, a, ab, and abc are all considered leading portions of an index. The column or
column combinations b, c, and bc aren't considering leading portions. However,
the introduction of the index skip scan
feature (in the Oracle 9i release) has changed this behavior. An index skip
scan eliminates or skips through a composite index by using logical sub indexes.
Logical sub indexes mean just that: we don't create those indexes. The skip
scanning feature assumes that the composite index is indeed composed of
multiple sub indexes.
When does the database perform a skip scan?
It may do so when our query
predicate doesn't specify the leading column of a composite index. If the leading portion of a composite index
has a small number of distinct
values and the non-leading portion
of the index contains a large number
of distinct values, skip scanning proves
useful.
Let's use a simple example
to demonstrate how index skip scanning works.
The test query is as
follows:
SQL>Select * from customers where
cust_email='Sam@mycompany.com';
The customers table also has
a column named GENDER, which can, of course, take just two values:
M and F.
Here's a sample of the
composite index's entries from an index block:
F,Wolf@company.com,rowid
F,Wolsey@company.com,rowid
F,Wood@company.com,rowid
F,Woodman@company.com,rowid
F,Yang@company.com,rowid
F,Zimmerman@company.com,rowid
F,Wolsey@company.com,rowid
F,Wood@company.com,rowid
F,Woodman@company.com,rowid
F,Yang@company.com,rowid
F,Zimmerman@company.com,rowid
M,Abbassi@company.com,rowid
M,Alapati@company.com,rowid
M,Alapati@company.com,rowid
Say we issue a query that
only specifies the CUST_MAIL column,
and not the leading column GENDER,
in its WHERE clause. Since the leading column gender has only two distinct
values, it really doesn't matter if you don't specify it in the query.
The database divides our
composite index into two logical sub indexes, one with the key M and the other
with the key F. Even though you haven't specified the leading column gender,
the database searches the two logical sub indexes one after the other and gets
you the results. In other words, the database treats the query as this:
select * from sh.customers where cust_gender = 'F'
and cust_email = 'Alapati@company.com'
union all
select * from sh.customers WHERE cust_gender = 'M'
and cust_email = 'Alapati@company.com';
and cust_email = 'Alapati@company.com'
union all
select * from sh.customers WHERE cust_gender = 'M'
and cust_email = 'Alapati@company.com';
Ordering the Columns in a Composite Index
When creating a composite
index, a big question is how to order the columns in the multi-column index.
Oracle recommends that place the most commonly accessed column first in the
index. Traditionally, it was thought that you should avoid using a low cardinality column (a column with few distinct values) as
the leading column in a composite index. However, regardless of the index
order, the database can navigate straight to the leaf block containing the
indexed column values because the index leaf branch entries contain column
entries based on all indexed columns. In fact, a leading column with lower cardinality may have more advantages,
as the optimizer is likely to at least consider using an index skip scan in these cases.
It has
also been suggested to use the clustering factor as a criterion when deciding
which column should be the leading index column in a composite index. The
clustering factor indicates how well ordered the table's rows are in comparison
to the way the index entries are ordered in an index.
For example, an arrangement
that would “guarantee” the order of the table rows to match the order of the
index entries (and therefore be reflected by the resulting clustering factor),
would be if you loaded a table from a sorted set of input data in a single
action. One of the most common reasons we use a composite index is when an
important query refers to two or more columns, none of which have a high degree
of selectivity. By constructing a composite index, we increase the odds of the
optimizer choosing to use that composite index, whereas it would probably have
bypassed any indexes we created separately on the two columns, both of which
have a very low selectivity.
Note Selectivity is a computation based on
column statistics (particularly on the number of distinct values and high/low
values). The optimizer computes selectivity for multiple columns in an “ANDed”
predicate by computing the individual selectivity (1/number of distinct values)
for each column and then multiplying those results together to get the overall
selectivity.
For example, if we have
WHERE GENDER = 'F' AND STATUS = 'ACTIVE' where both gender and status have only
2 distinct values, each column will have a selectivity
of .5. The total predicate selectivity
is then .5 * .5 or .25. If a composite index exists that has both columns,
the optimizer will compute index selectivity using the .25 combined predicate
selectivity along with the index stats (like the clustering factor) to make its
final cost calculation for using the index. A big advantage of using a composite index is that if all the columns
required by a query are in the composite index itself, the database returns the
values of the columns from the index without subsequently having to access the
table.
Thus, we'll see a reduced
I/O when using composite indexes in most cases since we're avoiding the scan of
the table itself using the ROWIDs, as is the case when we use an index on a
single column. A key criterion in deciding how to order keys in a composite
index is to ensure that the leading portion of the index consists of keys used
in WHERE clauses. If some of the keys are often specified in WHERE clauses,
make sure that these keys make up the leading portion of the index. This
ensures that queries that specify only these keys will use the index.
Choosing Keys for Composite Indexes
We can create a composite
index with its columns in any order we want, but our goal should be to create a
composite index only when the different
keys appear frequently together in an application's WHERE clauses and we're
currently using an AND operator to combine the columns.
A composite index would be a
better choice in this case. The one thing we must do is get a rough estimate of
the selectivity of the columns we want to include in a composite index. If the
combined selectivity of the columns is better than the individual selectivity
of the columns, the composite index will be beneficial. We can also consider a
composite index in cases where key queries do a select of the same set of keys
based on multiple key values. Simply create a composite index with all the keys
for better performance.
Let's use a couple of simple
examples to drive home our point as to how a composite index will benefit us u
by reducing I/O when all the columns required to satisfy a query are in the
index itself.
In the first example, we
create a single column index and check the explain plan.
SQL> create table test_tab (a number, b varchar2(10),
c varchar2(10));
Table created.
SQL> create index single_idx1 on test_tab (a);
Index created.
SQL> select b,c,a from test_tab where b='pc-5895' and c='pc-2893' and a=564;
no rows selected
Table created.
SQL> create index single_idx1 on test_tab (a);
Index created.
SQL> select b,c,a from test_tab where b='pc-5895' and c='pc-2893' and a=564;
no rows selected
Execution Plan
--------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3182375932
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 2 (0) |00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| TEST_TAB | 1 | 20 | 2 (0) |00:00:01 |
|* 2 | INDEX RANGE SCAN | SINGLE_IDX1| 1 | | 1 (0) |00:00:01 |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------------------------------------------------------------------------------
1 - filter("B"='pc-5895' AND “C"='pc-2893')
2 - access("A"=564)
The optimizer uses an index scan, but it also has to scan the table rows since all the required columns are not part of your single column index. This means more I/O and more time to complete the query in most cases. We now drop the index on the single column and create a composite index using all three columns this time.
SQL> drop index single_idx1;
Index dropped.
SQL> create index comp_idx1 on test_tab(a,b,c);
Index created.
SQL> select b,c,a from test_tab where b='pc-5895' and c='pc-2893' and a=564;
no rows selected
Index dropped.
SQL> create index comp_idx1 on test_tab(a,b,c);
Index created.
SQL> select b,c,a from test_tab where b='pc-5895' and c='pc-2893' and a=564;
no rows selected
Execution Plan
-------------------------------------------------------------------------------------------------------------
Plan hash value: 1685463053
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes| Cost (%CPU) | Time |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 1 (0) | 00:00:01 |
|* 1 | INDEX RANGE SCAN | COMP_IDX1 | 1 | 20 | 1 (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
-------------------------------------------------------------------------------------------------------------
1 - access("A"=564 AND “B"='pc-5895' AND “C"='pc-2893')
-------------------------------------------------------------------------------------------------------------
Plan hash value: 1685463053
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes| Cost (%CPU) | Time |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 1 (0) | 00:00:01 |
|* 1 | INDEX RANGE SCAN | COMP_IDX1 | 1 | 20 | 1 (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
-------------------------------------------------------------------------------------------------------------
1 - access("A"=564 AND “B"='pc-5895' AND “C"='pc-2893')
Since all the requisite data
is found within the new composite index, the database doesn't have to perform
the additional table scan.
I hope you all have enjoyed reading this article. Comments are welcome....
Related Posts: