Job Search

Sunday, April 23, 2017

Oracle Database Indexing - Part 2

Oracle Database Advanced Indexing


B-Tree Indexes


  • B-Tree Indexes (balanced tree) are the most common type of index.
  • B-Tree index stored the ROWID and the index key value in a tree structure.
  • When creating an index, a ROOT block is created, and then BRANCH blocks are created and finally LEAF blocks.
  • Each branch holds the range of data its leaf blocks hold, and each root holds the range of data its branches hold:
  • B-Tree indexes are most useful on columns that appear in the where clause (SELECT … WHERE EMPNO=1).
  • The Oracle server keeps the tree balanced by splitting index blocks, when new data is inserted to the table.
  • Whenever a DML statement is performed on the index’s table, index activity occurs, making the index to grow (add leaf and branches).
Advantages
  • All leaf blocks of the tree are at the same depth.
  • B-tree indexes automatically stay balanced.
  • All blocks of the B-tree are three-quarters full on the average.
  • B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range searches.
  • Inserts, updates, and deletes are efficient, maintaining key order for fast retrieval.
  • B-tree performance is good for both small and large tables, and does not degrade as the size of a table grows.

CREATE <UNIQUE|NON UNIQUE>  INDEX <index_name>
ON <table_name> (<column_name>,<column_name>…)
TABLESPACE <tablespace_name>;

Example
Create index   scott.exp_idx   on table scott.example(  name)
Tablespace TOOLS;


What is Reverse-Key Indexes
They are special types of B-Tree indexes and are very useful when created on columns contain sequential numbers.
When using a regular B-Tree, the index will grow to have many branches and perhaps several levels, thus causing performance degradation, the RKI solve the problem by reversing the bytes of each column key and indexing the new data.

We can create an index as a reverse key index for performance reasons; they’re especially suited for Oracle RAC environments. A reverse key index stores the index entries with their bytes reversed. The ROWIDs are stored in the same format as a regular index.

When we insert rows in a column where the database populates one of the columns using an increasing sequence, each new entry is inserted into the same index block. When each new key value in an index is greater than the previous value, it is said to be a monotonically increasing value.

In a RAC database, when multiple sessions simultaneously insert data into monotonically increasing index nodes, it leads to contention for the same index block. A reverse key index prevents this contention because the database inserts the successive index entries into indexed blocks that are spread out. Although RAC environments usually use reverse key indexes, any high volume transaction processing system that is experiencing contention for index blocks could potentially benefit from this type of an index.

A reverse key index is actually simple: it simply reverses the index column values before inserting (storing) into the index.

Instead of storing new values in the same “hot” index block, the database spreads around the new entries across a number of blocks, reducing contention for a busy block and thus, buffer contention (denoted by the buffer busy wait event).

You can see how an index that’s populated with sequentially increasing values results in all values going to the rightmost index leaf block. Let’s say that you create a primary key with the following values:
696900
696901
696902
696903
696904
...
In a regular B-tree index, these sequential index values are all stored in an identical rightmost index block, increasing contention for that block. In a reverse key index, Oracle inserts the same values in the following manner:
009696
109696
209696
309696
409696
...

As we can see, the index values aren’t in sequential order, although the actual values of the primary key are. Oracle reverses the bytes in the data before inserting into the index. This reversing of the values naturally spreads the key values by storing them non-sequentially all through the index blocks instead of storing them in sequential order. During an insertion, the reverse key index distributes insertions across all the leaf keys in the index, thus avoiding hotspots and enhancing performance.

Using a reverse key index often speeds up data loads in an Oracle RAC environment during batch loads. While reverse key indexes are often mentioned in the context of an Oracle RAC environment, you can consider them even for a single instance databases if you notice contention for index blocks, shown by the buffer busy wait event. There are many applications where a primary key column is populated by an increasing sequence. Often, you’ll notice a significant buffer busy wait time for the index segments. Consider a reverse key index for these kinds of situations. The reverse key index has the potential to dramatically reduce the buffer busy waits and speed up performance.

Disadvantages of a Reverse Key Index


Ø  Can’t perform range scans on these indexes.
This is because the index entries are scattered all over instead of being stored sequentially. Reversing the index key values randomly distributes the index blocks across the index leaf nodes.

Ø  There could be a slight overhead in terms of CPU usage when the database searches on a reverse key index.
The reason, of course, is that the database has to reverse the order of bytes in the values of the index so they represent the actual values stored in the table.

When to Use a Reverse Key Index

Facing index block contention, which reveals itself through the well-known “buffer busy” and “read by other session” wait events. This is usually due to a situation where you’re dealing with an Oracle RAC OLTP environment and there are large numbers of concurrent inserts.

If our application uses monotonically increasing indexes, such as when you use a primary key generated by an Oracle sequence, you often encounter these types of contention. In this context, inserts are said to contend for the rightmost index block because that where the highest values of the sequentially primary key are stored. In a RAC environment especially, this rightmost index leaf block contention leads to block transfers among the instances, leading to slow performance.

In a RAC environment, Oracle recommends that you use the NOORDER and the CACHE options when creating a sequence in order to reduce index contention. The rowlock cache statistics from the V$SYSTEM_EVENT view tells you if monotonically increasing sequences are causing contention in the database. You can examine the EQ_TYPE column in the GV$ENQUEUE_STAT view to determine if sequences are causing any index enqueue waits. If you see a value of SQ Enqueue for the EQ_TYPE column, it is usually an indication of contention for a sequence. 

 If the index key values are derived from a sequence, you can increase the size of the sequence cache to reduce index contention. You can easily raise the cache size for a sequence with an alter sequence statement.

Creating a Reverse Key Index


This method distributes the data evenly in the index. Creating a RKI is done using the the reverse clause at the end of a regular index creation statement.

CREATE INDEX <index_name>
ON <table_name> (<column_name>)
TABLESPACE <tablespace_name>
REVERSE;

Example
CREATE INDEX emp_idx i ON emp_table (firstname,lastname) REVERSE;

create index dept_idx on emp(department_id) reverse;

We can also convert a normal index into a reverse key index by rebuilding it with the reverse clause, as shown by the following alter index statement:

alter index my_idx1 rebuild reverse;

We can change a reverse key index into a regular index by specifying the noreverse clause in an alter index statement
alter inde myidx1 rebuild norverse;

Composite B-Tree Indexes
Composite Indexes we can create an index on multiple columns in a table. If we want to create an index on the EMPLOYEE_ID and DEPARTMENT_ID columns in the employees table, for example, we can do so, and the result is called a composite or concatenated index.

Here's an example:
SQL> create index test_idx1 on employees(employee_id,department_id);

We can create composite B-tree indexes as well bitmap indexes. The optimizer will take into account a composite index when the WHERE clause in a query refers to all the columns in the index or even the leading column. The previous example showed a composite index with just two columns, but we can have even more columns if we wish.
The following example shows a composite index created by using three columns (LAST_NAME, JOB_ID, and SALARY) from the employees table in the HR schema. We do this when you have an application that frequently refers to these three columns.
SQL> create index employees_idx1 on employees (last_name,job_id,salary);

Once we create this index, any query that refers to the LAST_NAME column, or the LAST_NAME and JOB_ID columns, or all three columns is likely to cause the optimizer to use the index. A query that doesn't include the leading column in a composite index (LAST_NAME, in this example) will ignore the index. At least, this was the traditional behavior. The introduction of index skip scans changes this default behavior.

Understanding Index Skip Scans and Composite Indexes
In earlier releases of the Oracle Database, a SQL statement used a composite index only if the statement's constructs used a leading portion of the index. The leading portion of an index is one or more columns in the index that are specified first in the list of columns.
For example, say we have the following composite index:
SQL> create index mycomp_idx on table mytable(a,b,c);

In this index, a, ab, and abc are all considered leading portions of an index. The column or column combinations b, c, and bc aren't considering leading portions. However, the introduction of the index skip scan feature (in the Oracle 9i release) has changed this behavior. An index skip scan eliminates or skips through a composite index by using logical sub indexes. Logical sub indexes mean just that: we don't create those indexes. The skip scanning feature assumes that the composite index is indeed composed of multiple sub indexes.

When does the database perform a skip scan?
It may do so when our query predicate doesn't specify the leading column of a composite index. If the leading portion of a composite index has a small number of distinct values and the non-leading portion of the index contains a large number of distinct values, skip scanning proves useful.

Let's use a simple example to demonstrate how index skip scanning works.
The test query is as follows:
SQL>Select * from customers where cust_email='Sam@mycompany.com';

The customers table also has a column named GENDER, which can, of course, take just two values:
M and F.
Here's a sample of the composite index's entries from an index block:
F,Wolf@company.com,rowid
F,Wolsey@company.com,rowid
F,Wood@company.com,rowid
F,Woodman@company.com,rowid
F,Yang@company.com,rowid
F,Zimmerman@company.com,rowid

M,Abbassi@company.com,rowid
M,Alapati@company.com,rowid

Say we issue a query that only specifies the CUST_MAIL column, and not the leading column GENDER, in its WHERE clause. Since the leading column gender has only two distinct values, it really doesn't matter if you don't specify it in the query.

The database divides our composite index into two logical sub indexes, one with the key M and the other with the key F. Even though you haven't specified the leading column gender, the database searches the two logical sub indexes one after the other and gets you the results. In other words, the database treats the query as this:
select * from sh.customers where cust_gender = 'F'
  and cust_email = 'Alapati@company.com'
union all
select * from sh.customers WHERE cust_gender = 'M'
  and cust_email = 'Alapati@company.com'; 

Ordering the Columns in a Composite Index
When creating a composite index, a big question is how to order the columns in the multi-column index. Oracle recommends that place the most commonly accessed column first in the index. Traditionally, it was thought that you should avoid using a low cardinality column (a column with few distinct values) as the leading column in a composite index. However, regardless of the index order, the database can navigate straight to the leaf block containing the indexed column values because the index leaf branch entries contain column entries based on all indexed columns. In fact, a leading column with lower cardinality may have more advantages, as the optimizer is likely to at least consider using an index skip scan in these cases.

 It has also been suggested to use the clustering factor as a criterion when deciding which column should be the leading index column in a composite index. The clustering factor indicates how well ordered the table's rows are in comparison to the way the index entries are ordered in an index.

For example, an arrangement that would “guarantee” the order of the table rows to match the order of the index entries (and therefore be reflected by the resulting clustering factor), would be if you loaded a table from a sorted set of input data in a single action. One of the most common reasons we use a composite index is when an important query refers to two or more columns, none of which have a high degree of selectivity. By constructing a composite index, we increase the odds of the optimizer choosing to use that composite index, whereas it would probably have bypassed any indexes we created separately on the two columns, both of which have a very low selectivity. 

Note Selectivity is a computation based on column statistics (particularly on the number of distinct values and high/low values). The optimizer computes selectivity for multiple columns in an “ANDed” predicate by computing the individual selectivity (1/number of distinct values) for each column and then multiplying those results together to get the overall selectivity.

For example, if we have WHERE GENDER = 'F' AND STATUS = 'ACTIVE' where both gender and status have only 2 distinct values, each column will have a selectivity of .5. The total predicate selectivity is then .5 * .5 or .25. If a composite index exists that has both columns, the optimizer will compute index selectivity using the .25 combined predicate selectivity along with the index stats (like the clustering factor) to make its final cost calculation for using the index. A big advantage of using a composite index is that if all the columns required by a query are in the composite index itself, the database returns the values of the columns from the index without subsequently having to access the table.
Thus, we'll see a reduced I/O when using composite indexes in most cases since we're avoiding the scan of the table itself using the ROWIDs, as is the case when we use an index on a single column. A key criterion in deciding how to order keys in a composite index is to ensure that the leading portion of the index consists of keys used in WHERE clauses. If some of the keys are often specified in WHERE clauses, make sure that these keys make up the leading portion of the index. This ensures that queries that specify only these keys will use the index.

Choosing Keys for Composite Indexes
We can create a composite index with its columns in any order we want, but our goal should be to create a composite index only when the different keys appear frequently together in an application's WHERE clauses and we're currently using an AND operator to combine the columns.

A composite index would be a better choice in this case. The one thing we must do is get a rough estimate of the selectivity of the columns we want to include in a composite index. If the combined selectivity of the columns is better than the individual selectivity of the columns, the composite index will be beneficial. We can also consider a composite index in cases where key queries do a select of the same set of keys based on multiple key values. Simply create a composite index with all the keys for better performance.
Let's use a couple of simple examples to drive home our point as to how a composite index will benefit us u by reducing I/O when all the columns required to satisfy a query are in the index itself.

In the first example, we create a single column index and check the explain plan.
SQL> create table test_tab (a number, b varchar2(10), c varchar2(10));
Table created.

SQL> create index single_idx1 on test_tab (a);
Index created.

SQL> select b,c,a from test_tab where b='pc-5895' and c='pc-2893' and a=564;
no rows selected

Execution Plan
--------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3182375932
--------------------------------------------------------------------------------------------------------------------------
| Id   | Operation                                                | Name              | Rows | Bytes| Cost (%CPU)|  Time     |
--------------------------------------------------------------------------------------------------------------------------
|   0  | SELECT STATEMENT                         |                          |   1      |    20  |     2   (0)        |00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| TEST_TAB     |   1      |    20  |     2   (0)        |00:00:01 |
|*  2 |   INDEX RANGE SCAN                       | SINGLE_IDX1|   1     |          |     1   (0)        |00:00:01 |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
--------------------------------------------------------------------------------------------------------------------------
   1 - filter("B"='pc-5895' AND “C"='pc-2893')
   2 - access("A"=564)

The optimizer uses an index scan, but it also has to scan the table rows since all the required columns are not part of your single column index. This means more I/O and more time to complete the query in most cases. We now drop the index on the single column and create a composite index using all three columns this time.
SQL> drop index single_idx1;
Index dropped.

SQL>  create index comp_idx1 on test_tab(a,b,c);
Index created.

SQL> select b,c,a from test_tab where b='pc-5895' and c='pc-2893' and a=564;
no rows selected

Execution Plan
-------------------------------------------------------------------------------------------------------------
Plan hash value: 1685463053
-------------------------------------------------------------------------------------------------------------
| Id   | Operation                          | Name              | Rows  | Bytes| Cost (%CPU) |  Time       |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                         |     1     |    20   |     1   (0)         | 00:00:01  |
|*  1 |  INDEX RANGE SCAN  | COMP_IDX1 |     1     |    20   |     1   (0)         | 00:00:01  |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
-------------------------------------------------------------------------------------------------------------
   1 - access("A"=564 AND “B"='pc-5895' AND “C"='pc-2893')

Since all the requisite data is found within the new composite index, the database doesn't have to perform the additional table scan.


I hope you all have enjoyed reading this article. Comments are welcome....


Related Posts: