Nick's Coding Blog: 2008

Tuesday, October 7, 2008

Oracle Lag Analytic Function

It is rare that you may use the lag function, however, if you do need it, it can be a life saver. I've only had to use it once.... ever. The reason for it was so convoluted and asinine that I'll spare you those painful details. The lag function allows you to access data from previous rows. Let's take a look at a simple example:

select product, order_date,
lag(order_date,1) over (order by order_date) as prev_order_date
from orders;

And the result:


PRODUCT                   ORDER_DATE                PREV_ORDER_DATE           
------------------------- ------------------------- ------------------------- 
Basketball                07-OCT-07                                           
Baseball                  27-OCT-07                 07-OCT-07                 
Football                  16-DEC-07                 27-OCT-07                 
Basketball                11-MAR-08                 16-DEC-07                 
Football                  04-AUG-08                 11-MAR-08                 
Basketball                07-OCT-08                 04-AUG-08                 

6 rows selected

So basically allowed me to query my orders table and get the previous order date, without some complicated inline query to try to figure that out. Well that isn't very interesting. So I got the previous order date on the current line, big deal. The cool part of this is what it now allows me to do. I can now answer questions like, "how many days is it between orders?". Let's take a look.

select product, order_date,
nvl(order_date - lag(order_date,1) over (order by order_date),0) as days_since_last_order
from orders;

And the resutls:


PRODUCT                   ORDER_DATE                DAYS_SINCE_LAST_ORDER  
------------------------- ------------------------- ---------------------- 
Basketball                07-OCT-07                 0                      
Baseball                  27-OCT-07                 20                     
Football                  16-DEC-07                 50                     
Basketball                11-MAR-08                 86                     
Football                  04-AUG-08                 146                    
Basketball                07-OCT-08                 64                     

6 rows selected

Its look as if Nick's Basketball, Baseball and Football Emporium will be going out of business. Especially if there are 146 days between orders. Okay so all this looks cool on a report, but I really need to know the average time between orders. Ask and ye shall receive. Sorry to go all biblely on you.

select avg(days_since_last_order)
from
(select
order_date - lag(order_date,1) over (order by order_date) as days_since_last_order
from orders);

And the answer:



AVG(DAYS_SINCE_LAST_ORDER) 
-------------------------- 
73.2                       

1 rows selected

(Note: Oracle doesn't include nulls in the average, which is precisely what we want. If we wanted the first record to show zero, I would have left the nvl function in there.)
So the average days between orders is 73.2. So apparently selling basketballs, baseballs and footballs is not my calling. One more thing, the number that appears inside the parenthesis tells Oracle how far back to go. So if you say 1 it looks at the previous row, 2 the row before the previous row and so on. Let's take a look.

select product, order_date,
lag(order_date,2) over (order by order_date) as prev_prev_order_date
from orders

And the results:


PRODUCT                   ORDER_DATE                PREV_PREV_ORDER_DATE      
------------------------- ------------------------- ------------------------- 
Basketball                07-OCT-07                                           
Baseball                  27-OCT-07                                           
Football                  16-DEC-07                 07-OCT-07                 
Basketball                11-MAR-08                 27-OCT-07                 
Football                  04-AUG-08                 16-DEC-07                 
Basketball                07-OCT-08                 11-MAR-08                 

6 rows selected

Notice now it gives me the order date for the record two rows back. Pretty cool stuff. Again, you may never use it, but its nice to have in your arsenal.

Tuesday, September 23, 2008

Group By and Grouping Id - Determining Levels of Aggregation

In Oracle, the ability to do aggregates are a very powerful feature. However, at times it can be confusing when you have several layers of aggregation or nulls in the columns that are being grouped by. For instance, take the following query:

select department_id, sum(salary)
from employees
group by rollup (department_id);

Which yields the following results:

DEPARTMENT_ID SUM(SALARY)
---------------------- ----------------------
10 4400
20 19000
30 24900
40 6500
50 156400
60 28800
70 10000
80 304500
90 58000
100 51600
110 20300
7000
691400

13 rows selected

Notice I want to rollup the results to get a sum total, however I have 2 rows with null, which one is the rollup value? While its very easy to spot for a human, when attempting to present the data to a user in a meaningful fashion, knowing which row that is the total is very handy. To help with this problem, Oracle has something called a grouping_id, which identifies the level of aggregation. Take the following query:

select department_id, sum(salary), grouping_id(department_id)
from employees
group by rollup (department_id);

Which yields the following results...

DEPARTMENT_ID SUM(SALARY) GROUPING_ID(DEPARTMENT_ID)
---------------------- ---------------------- --------------------------
10 4400 0
20 19000 0
30 24900 0
40 6500 0
50 156400 0
60 28800 0
70 10000 0
80 304500 0
90 58000 0
100 51600 0
110 20300 0
7000 0
691400 1

13 rows selected

Notice in the grouping_id column, the total is indicated by the number 1, which identifies the level of aggregation. Let's look at a more complex example that gives more explicitly indicates what is taking place.

select department_id, job_id, sum(salary),
grouping_id(department_id) dept_g, grouping_id(job_id) job_g,
case when grouping_id(department_id)||grouping_id(job_id) = '00'
then 'Detail Both'
when grouping_id(department_id)||grouping_id(job_id) = '10'
then 'Agg over department'
when grouping_id(department_id)||grouping_id(job_id) = '01'
then 'Agg over job'
when grouping_id(department_id)||grouping_id(job_id) = '11'
then 'Agg over both'
end
description
from employees
group by rollup (department_id, job_id);

Which yields the following:


DEPT                   JOB_ID     SUM(SALARY)            DEPT_G                 JOB_G                  DESCRIPTION         
---------------------- ---------- ---------------------- ---------------------- ---------------------- ------------------- 
                       SA_REP     7000                   0                      0                      Detail Both         
                                  7000                   0                      1                      Agg over job        
10                     AD_ASST    4400                   0                      0                      Detail Both         
10                                4400                   0                      1                      Agg over job        
20                     MK_MAN     13000                  0                      0                      Detail Both         
20                     MK_REP     6000                   0                      0                      Detail Both         
20                                19000                  0                      1                      Agg over job        
30                     PU_MAN     11000                  0                      0                      Detail Both         
30                     PU_CLERK   13900                  0                      0                      Detail Both         
30                                24900                  0                      1                      Agg over job        
40                     HR_REP     6500                   0                      0                      Detail Both         
40                                6500                   0                      1                      Agg over job        
50                     ST_MAN     36400                  0                      0                      Detail Both         
50                     SH_CLERK   64300                  0                      0                      Detail Both         
50                     ST_CLERK   55700                  0                      0                      Detail Both         
50                                156400                 0                      1                      Agg over job        
60                     IT_PROG    28800                  0                      0                      Detail Both         
60                                28800                  0                      1                      Agg over job        
70                     PR_REP     10000                  0                      0                      Detail Both         
70                                10000                  0                      1                      Agg over job        
80                     SA_MAN     61000                  0                      0                      Detail Both         
80                     SA_REP     243500                 0                      0                      Detail Both         
80                                304500                 0                      1                      Agg over job        
90                     AD_VP      34000                  0                      0                      Detail Both         
90                     AD_PRES    24000                  0                      0                      Detail Both         
90                                58000                  0                      1                      Agg over job        
100                    FI_MGR     12000                  0                      0                      Detail Both         
100                    FI_ACCOUNT 39600                  0                      0                      Detail Both         
100                               51600                  0                      1                      Agg over job        
110                    AC_MGR     12000                  0                      0                      Detail Both         
110                    AC_ACCOUNT 8300                   0                      0                      Detail Both         
110                               20300                  0                      1                      Agg over job        
                                  691400                 1                      1                      Agg over both       

33 rows selected

With the rollup clause, it clearly identifies which rows are the aggregate rows for job_id and which row is the grand total row. Just for completeness, lets look at cubing our querying, which provides aggregate totals for each column.

select department_id dept, job_id, sum(salary),
grouping_id(department_id) dept_g, grouping_id(job_id) job_g,
case when grouping_id(department_id)||grouping_id(job_id) = '00'
then 'Detail Both'
when grouping_id(department_id)||grouping_id(job_id) = '10'
then 'Agg over department'
when grouping_id(department_id)||grouping_id(job_id) = '01'
then 'Agg over job'
when grouping_id(department_id)||grouping_id(job_id) = '11'
then 'Agg over both'
end
description
from employees
group by cube (department_id, job_id);

Which yields the following:


DEPT                   JOB_ID     SUM(SALARY)            DEPT_G                 JOB_G                  DESCRIPTION         
---------------------- ---------- ---------------------- ---------------------- ---------------------- ------------------- 
                                  7000                   0                      1                      Agg over job        
                                  691400                 1                      1                      Agg over both       
                       AD_VP      34000                  1                      0                      Agg over department 
                       AC_MGR     12000                  1                      0                      Agg over department 
                       FI_MGR     12000                  1                      0                      Agg over department 
                       HR_REP     6500                   1                      0                      Agg over department 
                       MK_MAN     13000                  1                      0                      Agg over department 
                       MK_REP     6000                   1                      0                      Agg over department 
                       PR_REP     10000                  1                      0                      Agg over department 
                       PU_MAN     11000                  1                      0                      Agg over department 
                       SA_MAN     61000                  1                      0                      Agg over department 
                       SA_REP     7000                   0                      0                      Detail Both         
                       SA_REP     250500                 1                      0                      Agg over department 
                       ST_MAN     36400                  1                      0                      Agg over department 
                       AD_ASST    4400                   1                      0                      Agg over department 
                       AD_PRES    24000                  1                      0                      Agg over department 
                       IT_PROG    28800                  1                      0                      Agg over department 
                       PU_CLERK   13900                  1                      0                      Agg over department 
                       SH_CLERK   64300                  1                      0                      Agg over department 
                       ST_CLERK   55700                  1                      0                      Agg over department 
                       AC_ACCOUNT 8300                   1                      0                      Agg over department 
                       FI_ACCOUNT 39600                  1                      0                      Agg over department 
10                                4400                   0                      1                      Agg over job        
10                     AD_ASST    4400                   0                      0                      Detail Both         
20                                19000                  0                      1                      Agg over job        
20                     MK_MAN     13000                  0                      0                      Detail Both         
20                     MK_REP     6000                   0                      0                      Detail Both         
30                                24900                  0                      1                      Agg over job        
30                     PU_MAN     11000                  0                      0                      Detail Both         
30                     PU_CLERK   13900                  0                      0                      Detail Both         
40                                6500                   0                      1                      Agg over job        
40                     HR_REP     6500                   0                      0                      Detail Both         
50                                156400                 0                      1                      Agg over job        
50                     ST_MAN     36400                  0                      0                      Detail Both         
50                     SH_CLERK   64300                  0                      0                      Detail Both         
50                     ST_CLERK   55700                  0                      0                      Detail Both         
60                                28800                  0                      1                      Agg over job        
60                     IT_PROG    28800                  0                      0                      Detail Both         
70                                10000                  0                      1                      Agg over job        
70                     PR_REP     10000                  0                      0                      Detail Both         
80                                304500                 0                      1                      Agg over job        
80                     SA_MAN     61000                  0                      0                      Detail Both         
80                     SA_REP     243500                 0                      0                      Detail Both         
90                                58000                  0                      1                      Agg over job        
90                     AD_VP      34000                  0                      0                      Detail Both         
90                     AD_PRES    24000                  0                      0                      Detail Both         
100                               51600                  0                      1                      Agg over job        
100                    FI_MGR     12000                  0                      0                      Detail Both         
100                    FI_ACCOUNT 39600                  0                      0                      Detail Both         
110                               20300                  0                      1                      Agg over job        
110                    AC_MGR     12000                  0                      0                      Detail Both         
110                    AC_ACCOUNT 8300                   0                      0                      Detail Both         

52 rows selected

Notice now I can't just assume the aggregates come in the order I expect them, i.e., the grand total is at the bottom. If you were like me, rollups and cubes held more mystery that help in most cases, however with grouping_ids it becomes more apparent the meaning of each row and what exactly Oracle is aggregating for you. (Note: All queries were run against the HR schema that comes with Oracle 10g Express Edition (XE)).

DEPARTMENT_ID	JOB_ID	SUM(SALARY)	DEPT_G	JOB_G	DESCRIPTION
	SA_REP	7000	0	0	Detail Both
		7000	0	1	Agg over manager
10	AD_ASST	4400	0	0	Detail Both
10		4400	0	1	Agg over manager
20	MK_MAN	13000	0	0	Detail Both
20	MK_REP	6000	0	0	Detail Both
20		19000	0	1	Agg over manager
30	PU_MAN	11000	0	0	Detail Both
30	PU_CLERK	13900	0	0	Detail Both
30		24900	0	1	Agg over manager
40	HR_REP	6500	0	0	Detail Both
40		6500	0	1	Agg over manager
50	ST_MAN	36400	0	0	Detail Both
50	SH_CLERK	64300	0	0	Detail Both
50	ST_CLERK	55700	0	0	Detail Both
50		156400	0	1	Agg over manager
60	IT_PROG	28800	0	0	Detail Both
60		28800	0	1	Agg over manager
70	PR_REP	10000	0	0	Detail Both
70		10000	0	1	Agg over manager
80	SA_MAN	61000	0	0	Detail Both
80	SA_REP	243500	0	0	Detail Both
80		304500	0	1	Agg over manager
90	AD_VP	34000	0	0	Detail Both
90	AD_PRES	24000	0	0	Detail Both
90		58000	0	1	Agg over manager
100	FI_MGR	12000	0	0	Detail Both
100	FI_ACCOUNT	39600	0	0	Detail Both
100		51600	0	1	Agg over manager
110	AC_MGR	12000	0	0	Detail Both
110	AC_ACCOUNT	8300	0	0	Detail Both
110		20300	0	1	Agg over manager
		691400	1	1	Agg over both

Sunday, September 14, 2008

Oracle Extract function

Here is a function I ran across in Oracle that I consider very helpful. I used to use the to_char function to get the piece or pieces of a date I needed to display or for calculations or what have you. Now I simply use the EXTRACT function to, well, extract the piece of the date I need.

If I needed the year, I would simply issue the following:

select extract(YEAR from sysdate) from dual;

Or if I needed the month:

select extract(MONTH from sysdate) from dual;

This function works with dates, timestamps and even the interval datatype.

Tuesday, July 1, 2008

Oracle XML Generation

Even though Oracle is a relational database, it handles XML quite well. Specifically, its actually very efficient at generating XML from data in relational tables. There are a couple of different options when it comes to XML generation, using the dbms_xmlgen package or the SQL set of XML functions.

The dbms_xmlgen package provides a very easy to way to generate xml. Basically you give it a query and it gives you xml. Let's take a look:

declare
ctx dbms_xmlgen.ctxHandle;
xml xmlType;
begin
ctx := dbms_xmlgen.newContext('select * from employees where employee_id = :emp_id');
dbms_xmlgen.setbindvalue(ctx,'emp_id',100);
xml := dbms_xmlgen.getXMLType(ctx);
dbms_output.put_line(substr(xml.getClobVal(),0,4000));
end;

And here is the result....

<ROWSET>
<ROW>
<EMPLOYEE_ID>100</EMPLOYEE_ID>
<FIRST_NAME>Steven</FIRST_NAME>
<LAST_NAME>King</LAST_NAME>
<EMAIL>SKING</EMAIL>
<PHONE_NUMBER>515.123.4567</PHONE_NUMBER>
<HIRE_DATE>17-JUN-87</HIRE_DATE>
<JOB_ID>AD_PRES</JOB_ID>
<SALARY>24000</SALARY>
<DEPARTMENT_ID>90</DEPARTMENT_ID>
</ROW>
</ROWSET>

All right so that is pretty cool. Query in, XML out. This package has major pros and cons. Pros: For hierarchical data, specifically data created using a connect by statement, it really cannot be beat. (The method is called, newContextFromHierarchy, and you'll have to use the "level" psuedo-column in conjunction with the connect by to use that feature.) Cons: You really can't nest calls in there and expect it to be performant. Well you may ask, why would I want to nest those calls? In my experience, you'll need to pull data in from other tables to include as children xml elements on the main record. When you do that too much then you can have some problems with performance AND you may even experience actual memory errors from the underlying C code. Moral of the story: Just be careful if you are generating large amounts of XML. Another con, most of the cool features are 10g only, so you 9i'ers out there, will have to use the SQL XML functions.

Speaking of the SQL XML functions, that would be the second method of XML generation to discuss. If you look at these tags in the documentation, they are scary. No, really, like see your grandmother naked kind of scary. However, after you actually use the tags, they are very powerful and very performant even when generating lots of XML. Basically the tags you'll use most frequently are the xmlelement, xmlagg and xmlforest functions. Let's see them in action...

select xmlelement("ROWSET",
xmlagg(
xmlelement("ROW",
xmlforest(employee_id, first_name, last_name, email,
phone_number, hire_date, job_id, salary,
department_id)))) from employees
where employee_id = 100;

And the result is....

<ROWSET>
<ROW>
<EMPLOYEE_ID>100</EMPLOYEE_ID>
<FIRST_NAME>Steven</FIRST_NAME>
<LAST_NAME>King</LAST_NAME>
<EMAIL>SKING</EMAIL>
<PHONE_NUMBER>515.123.4567</PHONE_NUMBER>
<HIRE_DATE>1987-06-17</HIRE_DATE>
<JOB_ID>AD_PRES</JOB_ID>
<SALARY>24000</SALARY>
<DEPARTMENT_ID>90</DEPARTMENT_ID>
</ROW>
</ROWSET>

Okay so basically the same XML we got from the dbms_xmlgen package. A little more work, sure, but your DBA's will love you for it. Let's analyze what each function does in a little more detail.
Xmlelement, basically takes a varchar as the element name ("name" is different from "NAME") and the data you want to xml-ize.
Xmlagg will basically take your row xml and concatenate each together in one big xml document.
Xmlforest takes the list of columns and xml-izes them all, using the column name or the alias that you give it as the tag name.

Using those 3 tags will generate most of the xml you will need, however there are others like xmlattribute and xmlcomment that may come in handy.

Good luck XMLing.

Monday, June 30, 2008

Oracle and JDBC, multiple objects one query

In a world of JPA and various ORM's, sometimes the lowly JDBC is completely forgotten. However, JDBC has a powerful advantage over ORM and that is that it gives the developer complete control over what is executed and how the data is mapped relationally. More often than not, we retrieve data from multiple tables that we then map to multiple objects. There are several ways to accomplish this, we can make a "super-query" with all the columns for all the tables we need on one row and we do some fancy "if" statement fun to model the data to our objects. Or we can make a proc on the Oracle side and declare cursors and the whole deal to get what we want. I'm actually not a big fan of other one, because there is a little heavy lifting involved either way. My preferred method of getting records back is to use Oracle's "cursor expressions". Cursor expressions are simply queries embedded in the SQL that come back as ref cursors. So let's see them in action.

select d.*,
cursor(select e.* from employees e where e.department_id = d.department_id) as employees
from departments d;

Using Oracle 10g Express Edition's hr schema, I wanted all departments, with all the employees in each department in one call. Using the cursor keyword, I now have a ref cursor pointing to those employees per department record.

Enough already! Show me how to get this back in JDBC. Alright, since you were so nice about it. Its actually dirt simple.

//... normal jdbc type execution code here
//my result set object is called rs
ResultSet employees = (ResultSet)rs.getObject("employees");

You simply use the getObject call using the ResultSet object returned by JDBC when executing a select statement, and then cast it to a ResultSet object, which in my example will be a ResultSet containing employee information.

So why should you use this? Um, because its fast and its easy. The way to get the most out of your database call is to get the most amount of data back at a time as you can. Using cursor expressions allows you to do that, without creating lots of procedures to return various ref cursors.

But Nick, what about all those open cursors won't I exceed max number of cursors on Oracle? The answer is no, you won't. You just have to remember to close those ResultSet object you are creating. So if you do get that message, check to make sure all your ResultSet's are being closed properly.

Have fun with your fast JDBC code, your customers will love you for it.

Friday, February 29, 2008

Decode Function

I will admit that this isn't the best name Oracle could have given to this function. This is a very powerful SQL function that can really help developers. Its main purpose is to provide a pseudo if statement inside an sql statement. The decode statement takes the following parameters:

decode(text_to_compare,value_to_compare,result_if_true,result_if_false);

Let's see decode in action to firm up everyone's understanding.

select decode('test','test','They are equal!','Um.. not so much.') from dual;
DECODE('TEST','TEST','THEY ARE EQUAL!','UM..NOT SO MUCH.')
------------------------------------------------------
They are equal!

1 rows selected

As you can see, when 'test' equals 'test' then we get 'They are equal!'. Okay, so that's pretty boring on a good day. Let's dig in a more complex example. I have a table called decode_test for obvious reasons that contains information about products that have been sold by yours truly.

select name, product_id, decode(product_id,
1,'Stereo System',
2,'Blu-Ray DVD Player',
3,'Brick of Gold',
'Other stuff laying around store.') from decode_test;
NAME PRODUCT_ID PRODUCT_NAME
-------------------- ---------------------- --------------------------------
Nicholas 1 Stereo System
Nicholas 3 Brick of Gold
Nicholas 2 Blu-Ray DVD Player
Nicholas 1 Stereo System
4 rows selected

Okay so what happened here? Basically in my test, each product_id represented some text I wanted to display in there query. You'll notice decode can take multiple value pairs to test against the input text.

So I'll admit the previous example won't win any "Example of the Year" awards. So here is a more practical use of decode that involves dates and conditional summation.

select name,
sum(decode(to_char(date_sold,'MMYYYY'),current_month,amount_sold,null)) as sales_month_to_date,
sum(decode(to_char(date_sold,'YYYY'),current_year,amount_sold,null)) as sales_year_to_date
from decode_test,
(select to_char(sysdate,'MMYYYY') as current_month,
to_char(sysdate,'YYYY') as current_year from dual) d
group by name;
NAME SALES_MONTH_TO_DATE SALES_YEAR_TO_DATE
-------------------- ---------------------- ----------------------
Nicholas 67.5 132.5
1 rows selected

Well, well, that was halfway slick. In the previous example, I needed to sum only the sales for the current month and for year to date. Decode helped me to sum the amount_sold, when it was appropriate.

Wednesday, January 9, 2008

SQL - Row Numbers and Limits

A common occurrence that most programmers face are numbering rows in Oracle. There are several misconceptions about how to do this and what are the costs of such methods. I personally use the row_number() analytic function, available since version 9 to accomplish this.

select my_table.*, row_number() over (order by id desc)
from my_table;

Note in the above code the syntax of the call, you call the analytic function and then tell Oracle how you want the data to be ordered. If you used "rownum" to accomplish this you would get some interesting result when changing the order by or introducing a where clause.

So obviously row numbering is pretty easily accomplished, however there are times when I only need a certain amount of rows. For instance, I need 10 rows at a time or 20. Having written my share of web applications, I can say that feature is needed a lot. So how can we get say the first 5 results returned, its accomplished by the following:

select *
from
(
select my_table.*, row_number() over (order by id desc) rn
from my_table
)
where rn between 1 and 5;
;

Okay, now we have limited the number of results returned by our query. In addition, we can now accomplish things like pagination through a result set of data. As with most things, there is a cost associated. Notice my inner query has no where clause, thereby inducing a full table scan. You may say, 'Now wait, didn't we just limit the amount of results returned?'. And yes, we did, however Oracle has to run through the full set of results of the inner query to correctly row_number() the rows and then it can parse out the first five or the last five or whatever combination we need.

All of that being said, this is the most efficient way to row number and limit are result set of data based purely on the number of results returned in Oracle.

Nick's Coding Blog