Search This Blog

Thursday, October 11, 2007

Insert Data Into 2 tables at 1 shot

Hi All,

Have you ever wondered how to insert data into 2 tables from one table at one shot?

oracle provides a feature called INSERT ALL.....

I have used the above concept like this.

I insert good data into the fact table along with their rowid's into rowid_tab table at one shot.

Then i insert the data into reject table whose rowid's are not present in the rowid_tab.

ex: To Insert data into fact table and rowid table....
INSERT /*+ APPEND */ ALL
INTO TABLE_FACT(col1,
col2,
col3)
VALUES( col1.val,
col2.val,
col3.val )
INTO ROWID_TAB(ROWID_COL) VALUES(ROWID)
SELECT col1,
col2,
col3,
rowid
FROM TABLE_STG
WHERE Cond1 AND Cond2...;

To insert data into Reject table

INSERT /*+ APPEND */ ALL
INTO TABLE_FACT_REJ(col1,
col2,
col3)
VALUES( col1.val,
col2.val,
col3.val )
SELECT col1,
col2,
col3,
rowid
FROM TABLE_STG
WHERE rowid NOT IN (SELECT ROWID_COL
FROM ROWID_TAB);

Another improvisation can be to make ROWID_TAB as a Global Temporary table.

For more imformation on this,please refer the links below
1.http://certcities.com/editorial/columns/story.asp?EditorialsID=51
2.http://www.dba-oracle.com/t_global_temporary_tables.htm

Hope this helps....

Wednesday, October 10, 2007

Selectivity factor when deciding on Indexes

Ever wondered how you should arrive at the indexing strategy to be used in your warehouse? What are the parameters to be considered while deciding on the fields to be indexed. Well,one parameter which I feel is important to consider is what I call as the 'Selectivity Factor'. The following is the definition of Selectivity factor( this is just my way of definition so no way of corroborating this)

Selectivity factor for a field - Percentage of rows selected from the table after applying the filter on the field.

You have to keep in mind that this factor will be different for different values of the same field. But again, you will have an estimate of the number of rows in the table for each of these values, so you can take an average of the Selectivity factor.The Average selectivity factor for a field is always the same irrespective of the SQL in which it is used.

Now if you have many SQLs having the same kind of filter ( a date field for example), then it makes sense to have an index on that field, right? Not always! What if the filter qualifies 90% of the records most of the time , then there will be no point in having that filter.In fact it will be an overhead, as the Database will first scan the index list then hit the actual row ids. If suppose you use a B*Tree index, then you might end up doing a very high number of logical I/Os, which will translate to reading the same set of blocks multiple times. But if you were reading only 10% of the table, then the number of logical I/Os will come down drastically. For example, suppose you have a query like this:


Select * from list_of_politicians where sex = 'M'

In this case the field "sex" has a cardinality of 2 -'M','F' (assuming there is no ambiguity about this data!). Assume that the table has 1,00,000 records out of which 90000 are with 'M' and 10000 with 'F'. Suppose the block size is 8kB and the row size is 80 bytes(which means 100rows/block ) then approximately 1000 block are used for the table.In this case, the query is returning 90% of the table. If the B*Tree index is used in this case then, a logical I/O is done for each of the 90000 records, which means on an average each block gets read 90 times!!! this will be a huge performance hit.Whereas if it had done a Full table scan, the execution would have been much faster.


That is where I feel , "Selectivity factor" makes much more sense as a quantifiable parameter to use for taking that decision. In the above scenario, the selectivity factor is 90% in case of 'M' and 10% in case of 'F' which means the Average Selectivity factor is 50%. This is a high number and thus not suited for B*Tree indexes. On the other hand ,a Bitmap index will be very useful in this scenario. In a bitmap index, each index entry will store references to many rows.The bitmap structure is something like this:

Row 1 2 3 4 5 6 7 8

M 0 1 1 1 1 1 1 1
F 1 0 0 0 0 0 0 0

As you can see there are only two entries here in the index.The first entry shows that the value 'M' is appearing in rows 2,3,4,5,6,7,8 and the value 'F' in row 1. So if I run the query,

Select * from list_of_politicians where sex = 'M'

the db can easily get the rowids of all the rows which have values 'M' for sex by reading only a few index entries( there will be more than one entry per value of a field based on the number of rows and the storage size). This will perform much much faster than in case of a B*Tree index.

So to sum it up,


When the selectivity factor(or the average selectivity factor) for the field is low ( say less than 10%) and a small portion of the table is selected, then it is helpful to have a B*Tree index on the field.

When the selectivity factor(or the average selectivity factor) for the field is high (say greater than 25 %) and you are selecting a small to medium portion of the table then it useful to have a Bitmap index on this field.

On the other hand , if the selectivity factor is high and you are reading a large portion of the table then it is better to leave the field alone.

Now all this applies to a Rule based optimizer. If you are using a Cost based optimizer, then just analyzing the table before running the query will tell the optimizer whether to use the index or not. So in the above case, where you are selecting 90% of the records, if the table is analyzed before running the query, then the CBO will decide to do a Full table Scan instead of using the B*tree index.

So next time you are planning your indexes, make sure to study the data, know the cardinality of the fields and also the Average Selectivity factor of the fields.

Tuesday, October 9, 2007

Concatenation on group by

Hi all,
As an extension to Chandan's previous post i would like to post this sample code which shows the usage of sys_connect_by_path operator in Oracle.

Using SYS_CONNECT_BY_PATH operator

Source table “temp”:

Name Deptno
------ -------
jagan 1
guru 2
varu 2
bharath 1
manju 1
giri 3
chandan 3

SELECT
deptno, substr(SYS_CONNECT_BY_PATH(name, ','),2) name_list
FROM
(
SELECT name,
deptno,
count(*) OVER ( partition by deptno ) cnt,
ROW_NUMBER () OVER ( partition by deptno order by name) seq
FROM temp
WHERE deptno is not null
)
WHERE
seq = cnt
START WITH
seq=1
CONNECT BY PRIOR
seq+1 = seq
AND PRIOR
deptno = deptno;

Result:
deptno Name_list
------- -------------
1 bharath,jagan,manju
2 guru,varun
3 chandan,giri

Thursday, October 4, 2007

Concatenation on group by

Hi,
I had this requirement in my project to group by a column of a table and produce a concatenation of row values for a second column for each group. For example consider a table with two columns:

Department Employee
1 Chandan
1 Jagannath
2 Rahul
2 Pradeep
2 Manjunath

I want the result like this:

Department Name_List
1 Chandan ,Jagannath
2 Rahul,Pradeep,Manjunath

And since this was in SQL SERVER 2000 , there was no recursive query and I had to write using sql only. I finally wrote it by transposing the rows and then concatenating the result. Something like:

Department Name1 Name2 Name3
1 Chandan Jagannath -
2 Rahul Pradeep Manjunath

As you can see, here there is a fundamental limit. How many columns will you consider for transposing as you don't know how many rows are there for each department. Well the only way to proceed is to find the maximum count of rows for any department and transpose it to that many (or more) number of columns. I did just that and it worked.

Now in Oracle there is something called SYS_CONNECT_BY_PATH which can be sued with a recursive query to achieve this. But wonders of wonders , MYSQL has a function called GROUP_CONCAT which does exactly this. A concatenation of rows for each group. Now I am changing my perception of MySql. With features like this, it is easily the coolest database for developers. May be it is not ideal for Data warehouses but who cares about that!!

So if you guys ever come across this kind of requirement in Oracle or MySQL make sure to use these features:)

See this links for more:
Oracle
http://www.oracle.com/technology/oramag/code/tips2006/101606.html
MySQL http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html

Enjoy!!