Knowledge Pool: 2007

Monday, December 31, 2007

Difference between a Primary Key and a Unique Index

The differences between a Primary Key and a Unique Index are:
1) Column(s) that make the Primary Key of a table cannot be NULL since by definition, the Primary Key cannot be NULL since it helps uniquely identify the record in the table. The column(s) that make up the unique index can be nullable. A note worth mentioning over here is that different RDBMS treat this differently –> while SQL Server and DB2 do not allow more than one NULL value in a unique index column, Oracle allows multiple NULL values. That is one of the things to look out for when designing/developing/porting applications across RDBMS.
2) There can be only one Primary Key defined on the table where as you can have many unique indexes defined on the table (if needed).
3) Also, in the case of SQL Server, if you go with the default options then a Primary Key is created as a clustered index while the unique index (constraint) is created as a non-clustered index.
This is just the default behavior though and can be changed at creation time, if needed.
So, if the unique index is defined on not null column(s), then it is essentially the same as the Primary Key and can be treated as an alternate key meaning it can also serve the purpose of identifying a record uniquely in the table.

I have directly picked the above information from http://decipherinfosys.wordpress.com/2007/07/04/back-to-the-basics-difference-between-primary-key-and-unique-index/ link.

Sunday, November 25, 2007

Selectivity Factor - Block Selectivity

Hello everyone.

I was just reading through an article on Indexes, and as I was reading it I realized that I had a mistake in my earlier post where I had claimed that Row selectivity is an important factor to consider while deciding on Indexes. As you know , Indexes are only helpful if they help in reducing the number of IO(physical or logical). Lets consider the case of Oracle. In Oracle, data in a table is stored in blocks. Lets assume a case where each block hold only 100 rows of a table. If there 1000 rows in a table, then assume that these are stored in 10 different blocks. In this case, even if a filter has a low selectivity (say 10%), having an index will result in 10 IOs per block(for 10 rows in each ). If Oracle had decided to do a Full Table scan, then it would have resulted in just 10 IOs as a whole. My point here is, instead of taking Row selectivity as a decision making parameter, Indexing decisions have to be based on something known as Block selectivity. A simple definition of Block selectivity is :

Block selectivity = P/Ptotal
where P = Number of blocks which have atleast 1 row which qualifies the filter condition.
Ptotal = Total number of blocks below the high water mark of the table.

The lesser the block selectivity for a field, the better the performance will be if the field is indexed. One way to ensure a good block selectivity is to physically cluster the data along this field. A good clustering factor will ensure that the rows with same value for this field are physically close and in a lesser number of blocks. On the other hand , if these rows were spread across almost the entire set of blocks, then an Index may prove as an overhead as each block would have to be visited only once. In fact a full table scan would ensure that each block is read once and only once. So people, next time you are planning to index a field make sure to physically cluster the data in the table and find out the block selectivity for this field. For more information refer this link:

http://www.hotsos.com/e-library/abstract.php?id=5

P.S: You will have to register to read this article. Registration is very simple and will take only 1 min of your time.

Friday, November 2, 2007

Moving To New Tablespace.

Hi Guys,

This is a small thing. But may be useful.
When we move tables and indexes to another tablespace, what all needs to be done (Oracle queries are used).
1) move the table.
alter table tablename move tablespace tablespacename;
2) now the indexes are of no use. You have to rebuild them in the new table space.
alter index indexname rebuild tablespacename;
3) Finally the packages need to be recompile. There is no need for explicit compilation as during execution it will be recompiled.
alter package packagename compile package;

Thats it!!

Thursday, October 11, 2007

Insert Data Into 2 tables at 1 shot

Hi All,

Have you ever wondered how to insert data into 2 tables from one table at one shot?

oracle provides a feature called INSERT ALL.....

I have used the above concept like this.

I insert good data into the fact table along with their rowid's into rowid_tab table at one shot.

Then i insert the data into reject table whose rowid's are not present in the rowid_tab.

ex: To Insert data into fact table and rowid table....
INSERT /*+ APPEND */ ALL
INTO TABLE_FACT(col1,
col2,
col3)
VALUES( col1.val,
col2.val,
col3.val )
INTO ROWID_TAB(ROWID_COL) VALUES(ROWID)
SELECT col1,
col2,
col3,
rowid
FROM TABLE_STG
WHERE Cond1 AND Cond2...;

To insert data into Reject table

INSERT /*+ APPEND */ ALL
INTO TABLE_FACT_REJ(col1,
col2,
col3)
VALUES( col1.val,
col2.val,
col3.val )
SELECT col1,
col2,
col3,
rowid
FROM TABLE_STG
WHERE rowid NOT IN (SELECT ROWID_COL
FROM ROWID_TAB);

Another improvisation can be to make ROWID_TAB as a Global Temporary table.

For more imformation on this,please refer the links below
1.http://certcities.com/editorial/columns/story.asp?EditorialsID=51
2.http://www.dba-oracle.com/t_global_temporary_tables.htm

Hope this helps....

Wednesday, October 10, 2007

Selectivity factor when deciding on Indexes

Ever wondered how you should arrive at the indexing strategy to be used in your warehouse? What are the parameters to be considered while deciding on the fields to be indexed. Well,one parameter which I feel is important to consider is what I call as the 'Selectivity Factor'. The following is the definition of Selectivity factor( this is just my way of definition so no way of corroborating this)

Selectivity factor for a field - Percentage of rows selected from the table after applying the filter on the field.

You have to keep in mind that this factor will be different for different values of the same field. But again, you will have an estimate of the number of rows in the table for each of these values, so you can take an average of the Selectivity factor.The Average selectivity factor for a field is always the same irrespective of the SQL in which it is used.

Now if you have many SQLs having the same kind of filter ( a date field for example), then it makes sense to have an index on that field, right? Not always! What if the filter qualifies 90% of the records most of the time , then there will be no point in having that filter.In fact it will be an overhead, as the Database will first scan the index list then hit the actual row ids. If suppose you use a B*Tree index, then you might end up doing a very high number of logical I/Os, which will translate to reading the same set of blocks multiple times. But if you were reading only 10% of the table, then the number of logical I/Os will come down drastically. For example, suppose you have a query like this:

Select * from list_of_politicians where sex = 'M'

In this case the field "sex" has a cardinality of 2 -'M','F' (assuming there is no ambiguity about this data!). Assume that the table has 1,00,000 records out of which 90000 are with 'M' and 10000 with 'F'. Suppose the block size is 8kB and the row size is 80 bytes(which means 100rows/block ) then approximately 1000 block are used for the table.In this case, the query is returning 90% of the table. If the B*Tree index is used in this case then, a logical I/O is done for each of the 90000 records, which means on an average each block gets read 90 times!!! this will be a huge performance hit.Whereas if it had done a Full table scan, the execution would have been much faster.

That is where I feel , "Selectivity factor" makes much more sense as a quantifiable parameter to use for taking that decision. In the above scenario, the selectivity factor is 90% in case of 'M' and 10% in case of 'F' which means the Average Selectivity factor is 50%. This is a high number and thus not suited for B*Tree indexes. On the other hand ,a Bitmap index will be very useful in this scenario. In a bitmap index, each index entry will store references to many rows.The bitmap structure is something like this:

Row 1 2 3 4 5 6 7 8

M 0 1 1 1 1 1 1 1
F 1 0 0 0 0 0 0 0

As you can see there are only two entries here in the index.The first entry shows that the value 'M' is appearing in rows 2,3,4,5,6,7,8 and the value 'F' in row 1. So if I run the query,

Select * from list_of_politicians where sex = 'M'

the db can easily get the rowids of all the rows which have values 'M' for sex by reading only a few index entries( there will be more than one entry per value of a field based on the number of rows and the storage size). This will perform much much faster than in case of a B*Tree index.

So to sum it up,

When the selectivity factor(or the average selectivity factor) for the field is low ( say less than 10%) and a small portion of the table is selected, then it is helpful to have a B*Tree index on the field.

When the selectivity factor(or the average selectivity factor) for the field is high (say greater than 25 %) and you are selecting a small to medium portion of the table then it useful to have a Bitmap index on this field.

On the other hand , if the selectivity factor is high and you are reading a large portion of the table then it is better to leave the field alone.

Now all this applies to a Rule based optimizer. If you are using a Cost based optimizer, then just analyzing the table before running the query will tell the optimizer whether to use the index or not. So in the above case, where you are selecting 90% of the records, if the table is analyzed before running the query, then the CBO will decide to do a Full table Scan instead of using the B*tree index.

So next time you are planning your indexes, make sure to study the data, know the cardinality of the fields and also the Average Selectivity factor of the fields.

Tuesday, October 9, 2007

Concatenation on group by

Hi all,
As an extension to Chandan's previous post i would like to post this sample code which shows the usage of sys_connect_by_path operator in Oracle.

Using SYS_CONNECT_BY_PATH operator

Source table “temp”:

Name Deptno
------ -------
jagan 1
guru 2
varu 2
bharath 1
manju 1
giri 3
chandan 3

SELECT
deptno, substr(SYS_CONNECT_BY_PATH(name, ','),2) name_list
FROM
(
SELECT name,
deptno,
count(*) OVER ( partition by deptno ) cnt,
ROW_NUMBER () OVER ( partition by deptno order by name) seq
FROM temp
WHERE deptno is not null
)
WHERE
seq = cnt
START WITH
seq=1
CONNECT BY PRIOR
seq+1 = seq
AND PRIOR
deptno = deptno;

Result:
deptno Name_list
------- -------------
1 bharath,jagan,manju
2 guru,varun
3 chandan,giri

Thursday, October 4, 2007

Concatenation on group by

Hi,
I had this requirement in my project to group by a column of a table and produce a concatenation of row values for a second column for each group. For example consider a table with two columns:

Department Employee
1 Chandan
1 Jagannath
2 Rahul
2 Pradeep
2 Manjunath

I want the result like this:

Department Name_List
1 Chandan ,Jagannath
2 Rahul,Pradeep,Manjunath

And since this was in SQL SERVER 2000 , there was no recursive query and I had to write using sql only. I finally wrote it by transposing the rows and then concatenating the result. Something like:

Department Name1 Name2 Name3
1 Chandan Jagannath -
2 Rahul Pradeep Manjunath

As you can see, here there is a fundamental limit. How many columns will you consider for transposing as you don't know how many rows are there for each department. Well the only way to proceed is to find the maximum count of rows for any department and transpose it to that many (or more) number of columns. I did just that and it worked.

Now in Oracle there is something called SYS_CONNECT_BY_PATH which can be sued with a recursive query to achieve this. But wonders of wonders , MYSQL has a function called GROUP_CONCAT which does exactly this. A concatenation of rows for each group. Now I am changing my perception of MySql. With features like this, it is easily the coolest database for developers. May be it is not ideal for Data warehouses but who cares about that!!

So if you guys ever come across this kind of requirement in Oracle or MySQL make sure to use these features:)

See this links for more:
Oracle
http://www.oracle.com/technology/oramag/code/tips2006/101606.html
MySQL http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html

Enjoy!!

Friday, September 7, 2007

Crystal Reports- +ve's and -ve's

Crystal reports is a very popular Reporting tool. Below is an attempt to measure its effectiveness in reporting.

+ve's

The simplicity of the tool.The tool asks for a connection and once you provide it either to a DB or a cube the report can be filled up through drag and drop of the fileds.
The parameter feature can be used for Yes/No questions etc as it does not have any radio buttons kind of feature.
The supress feature can be used extensively to get various functionalities. It depends on how and where you use it.
The formula fields can be used in tandem with supress feature and parameters,this can make many complex things simpler.
There are feature like on demand subreports(hyperlink to other reports) or normal subreports as well.

-ve's

Deleting any unused parameters is a pain sometimes as they dont get deleted and cause problems later
Only one level of sub report nesting is available.(You cant have subreport within a subreport)
We can have only one query for a report.(We cant have separate queries for different measures as in BO)
Aliases can be added but they really act weirdly.(so avoid them)
Formatting the report takes double the report development time.
Tables cant be added to the report(Lines have to be drawn..!!!)
Export feature is good only for PDF format. For other formats you cant even compare the report and the exported copy.

Wednesday, September 5, 2007

T-SQL : Sweet and Sour

I have working on SQL server 2005 (t-sql) lately. This blog is a comparison between the pl/sql features available in t-sql,db2 v 8.2 and oracle 9i pl/sql.
This blog will be updated when anything new turns up.
So here goes...

#	Feature	Available in SQL server 2005?	DB2	Oracle
1	Exception handling	Yes.	yes	yes
2	Transaction	Yes.	yes	yes
3	ELSIF construct	No	yes	yes
4	IF exists ()	Yes.	yes	no
5	Max column size	32000	32000	4000
6	Implicit Cursor	No	Yes	Yes
7	Recursive query	Yes. Similar to db2 but only 100 literations	yes	yes : using connect byprior
8	Print Command	Yes	No	Yes
9	Dynamic SQL	Yes	Yes	Yes
10	Dynamic SQL STRING DATATYPE	nvarchar and nchar	varchar	varchar
11	Variable and cursor name same	yes	yes	no
12	Default value while variable declaration	no	yes	yes
13	Sequences	No	Yes	yes

Monday, September 3, 2007

Extracting parts from a Comma seperated string

One of the requirements in our project was to extract individual column names from a string containing comma seperated values i.e if the string has say:
v_column_string := ',country_code,country_name,country_descripton,'
I wanted to get each column name from this string. Here's how I wrote the query for that..

[CODE]
SELECT *
FROM(
SELECT
SUBSTR(v_column_string ,
INSTR (v_column_string , ',' , 1 , A.LVL)+ 1,
INSTR (v_column_string , ',' , 1 , A.LVL + 1 ) -
INSTR (v_column_string , ',' , 1, A.LVL)-1
) AS COLUMNS
FROM
( SELECT ROWNUM LVL FROM ALL_OBJECTS WHERE ROWNUM <= no_of_words)A

) B
WHERE B.COLUMNS IS NOT NULL

OUTPUT :

country_code
country_name
country_desc

where no_of_words is the number of words in the String(in this case it is 3). In case you don't know how many words might be there, you can go ahead and give the length of the string here(although this will be an overhead as many nulls will be returned , which is why i have a filter for not null).

Notice that I have given the string with a leading and ending comma i.e ',country_code,country_name,country_desc,'. This is basically to extract the first word and the last. If you don't want to give these two comma's you can tweak the query to handle it.

Hope you wil find this useful someday:)

CLOB Datatype

Ah!! The first Tech Post...
Some time back, i had tried out some things to get the features of the CLOB datatype in DB2.
I will share the details, so that you guys need not spend more time on it if in some point of time you need to find the features of CLOB.
Some of the features are w.r.t storing a procedure text in a clob variable.

Features Of CLOB Variables

No Parameter Supports
1 Size upto 2GB - 1 byte

2 Logging Yes

3 Reading Directly from disk. Buffer Pool Not Used.

4 Execute immediate
of DMLs specified in
CLOB Yes
[Code]
CREATE PROCEDURE GTT_CORE.CLOB_TEST
LANGUAGE SQL
BEGIN
DECLARE V_CLOB CLOB;

SET V_CLOB ='INSERT INTO GTT_CORE.DUMMY VALUES(1)';
EXECUTE IMMEDIATE V_CLOB;

END

5 Compilation of Procedure
as dynamic statement Yes
[Code]
BEGIN
DECLARE V_CLOB CLOB;

SET V_CLOB =
'Create procedure GTT_CORE.PROC
LANGUAGE SQL
BEGIN
DECLARE V_COUNT INTEGER;
SELECT COUNT(1) INTO V_COUNT FROM GTT_CORE.DUMMY;
INSERT INTO GTT_CORE.DUMMY VALUES (V_COUNT);
END';
EXECUTE IMMEDIATE V_CLOB;

END

6 Indentation of procedure Yes

7 Global Temporary Table

With a CLOB column NO. DB2 SQL error: SQLCODE: -350, SQLSTATE: 42962,SQLERRMC: V
Message: LOB, DATALINK, or structured type column ""V"" cannot be used in an index, a key, a unique constraint, a functional dependency, a generated column, or a declared temporary table.

[Code]
DROP PROCEDURE GTT_CORE.CLOB_TEST
GO
CREATE PROCEDURE GTT_CORE.CLOB_TEST
LANGUAGE SQL
BEGIN
DECLARE V_CLOB CLOB;
declare GLOBAL TEMPORARY TABLE DUMMY_CLOB (V CLOB) ON COMMIT PRESERVE ROWS NOT LOGGED WITH REPLACE;
SET V_CLOB =
'Create procedure GTT_CORE.PROC
LANGUAGE SQL
BEGIN
DECLARE V_COUNT INTEGER;';
SET V_CLOB = V_CLOB||'
SELECT COUNT(1)
INTO V_COUNT
FROM GTT_CORE.DUMMY;

INSERT INTO GTT_CORE.DUMMY VALUES (V_COUNT);
END';

insert into SESSION.DUMMY_CLOB values(v_clob);

END

8 Storage of CLOB in a table.
Is it with indentation?? Yes
[Code]
DROP PROCEDURE GTT_CORE.CLOB_TEST
GO
CREATE PROCEDURE GTT_CORE.CLOB_TEST
LANGUAGE SQL
BEGIN
DECLARE V_CLOB CLOB;

SET V_CLOB =
'Create procedure GTT_CORE.PROC
LANGUAGE SQL
BEGIN
DECLARE V_COUNT INTEGER;';
SET V_CLOB = V_CLOB||'
SELECT COUNT(1)
INTO V_COUNT
FROM GTT_CORE.DUMMY;

INSERT INTO GTT_CORE.DUMMY VALUES (V_COUNT);
END';

insert into gtt_core.dummy1 values(v_clob);

END

Hope it will be useful someday

-- Rahul Nair

Saturday, September 1, 2007

Posting Guidelines

I have identified some posting guidelines to help us have better posts
Authors

Please Contribute through new posts. Don't Use comments section.
Reply to questions through Comments(responses) link.
Please Give appropriate labels

Some labels can be : Oracle, DB2, SQL server, DW, Java,SQL,PL/SQL
Labels can help to find postings fast if there are too many :)

Non Authors

Please Post Question in Response link

We can add to the above when need arises. But let us first kick this off to a good start.

Knowlege Pool

Welcome to Knowledege Pool Guys!!!
This is an Initiative for having a knowledge repository accessible, wherever you are.
Learn . Share and Question
So lets take it forward.....

Knowledge Pool

Search This Blog

Monday, December 31, 2007

Difference between a Primary Key and a Unique Index

Sunday, November 25, 2007

Selectivity Factor - Block Selectivity

Friday, November 2, 2007

Moving To New Tablespace.

Thursday, October 11, 2007

Insert Data Into 2 tables at 1 shot

Wednesday, October 10, 2007

Selectivity factor when deciding on Indexes

Tuesday, October 9, 2007

Concatenation on group by

Thursday, October 4, 2007

Concatenation on group by

Friday, September 7, 2007

Crystal Reports- +ve's and -ve's

Wednesday, September 5, 2007

T-SQL : Sweet and Sour

Monday, September 3, 2007

Extracting parts from a Comma seperated string

CLOB Datatype

Saturday, September 1, 2007

Posting Guidelines

Knowlege Pool

About Knowledge Pool

Labels

Blog Archive

Contibutors