Tuesday, October 7, 2008

Oracle Lag Analytic Function

It is rare that you may use the lag function, however, if you do need it, it can be a life saver. I've only had to use it once.... ever. The reason for it was so convoluted and asinine that I'll spare you those painful details. The lag function allows you to access data from previous rows. Let's take a look at a simple example:

  1. select product, order_date,   
  2. lag(order_date,1) over (order by order_date) as prev_order_date  
  3. from orders;  


And the result:


PRODUCT ORDER_DATE PREV_ORDER_DATE
------------------------- ------------------------- -------------------------
Basketball 07-OCT-07
Baseball 27-OCT-07 07-OCT-07
Football 16-DEC-07 27-OCT-07
Basketball 11-MAR-08 16-DEC-07
Football 04-AUG-08 11-MAR-08
Basketball 07-OCT-08 04-AUG-08

6 rows selected



So basically allowed me to query my orders table and get the previous order date, without some complicated inline query to try to figure that out. Well that isn't very interesting. So I got the previous order date on the current line, big deal. The cool part of this is what it now allows me to do. I can now answer questions like, "how many days is it between orders?". Let's take a look.

  1. select product, order_date,   
  2. nvl(order_date - lag(order_date,1) over (order by order_date),0) as days_since_last_order  
  3. from orders;  



And the resutls:


PRODUCT ORDER_DATE DAYS_SINCE_LAST_ORDER
------------------------- ------------------------- ----------------------
Basketball 07-OCT-07 0
Baseball 27-OCT-07 20
Football 16-DEC-07 50
Basketball 11-MAR-08 86
Football 04-AUG-08 146
Basketball 07-OCT-08 64

6 rows selected



Its look as if Nick's Basketball, Baseball and Football Emporium will be going out of business. Especially if there are 146 days between orders. Okay so all this looks cool on a report, but I really need to know the average time between orders. Ask and ye shall receive. Sorry to go all biblely on you.

  1. select avg(days_since_last_order)  
  2. from  
  3. (select   
  4. order_date - lag(order_date,1) over (order by order_date) as days_since_last_order  
  5. from orders);  


And the answer:



AVG(DAYS_SINCE_LAST_ORDER)
--------------------------
73.2

1 rows selected




(Note: Oracle doesn't include nulls in the average, which is precisely what we want. If we wanted the first record to show zero, I would have left the nvl function in there.)
So the average days between orders is 73.2. So apparently selling basketballs, baseballs and footballs is not my calling. One more thing, the number that appears inside the parenthesis tells Oracle how far back to go. So if you say 1 it looks at the previous row, 2 the row before the previous row and so on. Let's take a look.

  1. select product, order_date,  
  2. lag(order_date,2) over (order by order_date) as prev_prev_order_date  
  3. from orders  


And the results:


PRODUCT ORDER_DATE PREV_PREV_ORDER_DATE
------------------------- ------------------------- -------------------------
Basketball 07-OCT-07
Baseball 27-OCT-07
Football 16-DEC-07 07-OCT-07
Basketball 11-MAR-08 27-OCT-07
Football 04-AUG-08 16-DEC-07
Basketball 07-OCT-08 11-MAR-08

6 rows selected


Notice now it gives me the order date for the record two rows back. Pretty cool stuff. Again, you may never use it, but its nice to have in your arsenal.

Tuesday, September 23, 2008

Group By and Grouping Id - Determining Levels of Aggregation

In Oracle, the ability to do aggregates are a very powerful feature. However, at times it can be confusing when you have several layers of aggregation or nulls in the columns that are being grouped by. For instance, take the following query:



  1. select department_id, sum(salary)  
  2. from employees  
  3. group by rollup (department_id);  


Which yields the following results:


DEPARTMENT_ID SUM(SALARY)
---------------------- ----------------------
10 4400
20 19000
30 24900
40 6500
50 156400
60 28800
70 10000
80 304500
90 58000
100 51600
110 20300
7000
691400

13 rows selected

Notice I want to rollup the results to get a sum total, however I have 2 rows with null, which one is the rollup value? While its very easy to spot for a human, when attempting to present the data to a user in a meaningful fashion, knowing which row that is the total is very handy. To help with this problem, Oracle has something called a grouping_id, which identifies the level of aggregation. Take the following query:

  1. select department_id, sum(salary), grouping_id(department_id)
  2. from employees
  3. group by rollup (department_id);


Which yields the following results...

DEPARTMENT_ID SUM(SALARY) GROUPING_ID(DEPARTMENT_ID)
---------------------- ---------------------- --------------------------
10 4400 0
20 19000 0
30 24900 0
40 6500 0
50 156400 0
60 28800 0
70 10000 0
80 304500 0
90 58000 0
100 51600 0
110 20300 0
7000 0
691400 1

13 rows selected



Notice in the grouping_id column, the total is indicated by the number 1, which identifies the level of aggregation. Let's look at a more complex example that gives more explicitly indicates what is taking place.

  1. select department_id, job_id, sum(salary),
  2. grouping_id(department_id) dept_g, grouping_id(job_id) job_g,
  3. case when grouping_id(department_id)||grouping_id(job_id) = '00'
  4. then 'Detail Both'
  5. when grouping_id(department_id)||grouping_id(job_id) = '10'
  6. then 'Agg over department'
  7. when grouping_id(department_id)||grouping_id(job_id) = '01'
  8. then 'Agg over job'
  9. when grouping_id(department_id)||grouping_id(job_id) = '11'
  10. then 'Agg over both'
  11. end
  12. description
  13. from employees
  14. group by rollup (department_id, job_id);


Which yields the following:


DEPT JOB_ID SUM(SALARY) DEPT_G JOB_G DESCRIPTION
---------------------- ---------- ---------------------- ---------------------- ---------------------- -------------------
SA_REP 7000 0 0 Detail Both
7000 0 1 Agg over job
10 AD_ASST 4400 0 0 Detail Both
10 4400 0 1 Agg over job
20 MK_MAN 13000 0 0 Detail Both
20 MK_REP 6000 0 0 Detail Both
20 19000 0 1 Agg over job
30 PU_MAN 11000 0 0 Detail Both
30 PU_CLERK 13900 0 0 Detail Both
30 24900 0 1 Agg over job
40 HR_REP 6500 0 0 Detail Both
40 6500 0 1 Agg over job
50 ST_MAN 36400 0 0 Detail Both
50 SH_CLERK 64300 0 0 Detail Both
50 ST_CLERK 55700 0 0 Detail Both
50 156400 0 1 Agg over job
60 IT_PROG 28800 0 0 Detail Both
60 28800 0 1 Agg over job
70 PR_REP 10000 0 0 Detail Both
70 10000 0 1 Agg over job
80 SA_MAN 61000 0 0 Detail Both
80 SA_REP 243500 0 0 Detail Both
80 304500 0 1 Agg over job
90 AD_VP 34000 0 0 Detail Both
90 AD_PRES 24000 0 0 Detail Both
90 58000 0 1 Agg over job
100 FI_MGR 12000 0 0 Detail Both
100 FI_ACCOUNT 39600 0 0 Detail Both
100 51600 0 1 Agg over job
110 AC_MGR 12000 0 0 Detail Both
110 AC_ACCOUNT 8300 0 0 Detail Both
110 20300 0 1 Agg over job
691400 1 1 Agg over both

33 rows selected



With the rollup clause, it clearly identifies which rows are the aggregate rows for job_id and which row is the grand total row. Just for completeness, lets look at cubing our querying, which provides aggregate totals for each column.

  1. select department_id dept, job_id, sum(salary),
  2. grouping_id(department_id) dept_g, grouping_id(job_id) job_g,
  3. case when grouping_id(department_id)||grouping_id(job_id) = '00'
  4. then 'Detail Both'
  5. when grouping_id(department_id)||grouping_id(job_id) = '10'
  6. then 'Agg over department'
  7. when grouping_id(department_id)||grouping_id(job_id) = '01'
  8. then 'Agg over job'
  9. when grouping_id(department_id)||grouping_id(job_id) = '11'
  10. then 'Agg over both'
  11. end
  12. description
  13. from employees
  14. group by cube (department_id, job_id);



Which yields the following:


DEPT JOB_ID SUM(SALARY) DEPT_G JOB_G DESCRIPTION
---------------------- ---------- ---------------------- ---------------------- ---------------------- -------------------
7000 0 1 Agg over job
691400 1 1 Agg over both
AD_VP 34000 1 0 Agg over department
AC_MGR 12000 1 0 Agg over department
FI_MGR 12000 1 0 Agg over department
HR_REP 6500 1 0 Agg over department
MK_MAN 13000 1 0 Agg over department
MK_REP 6000 1 0 Agg over department
PR_REP 10000 1 0 Agg over department
PU_MAN 11000 1 0 Agg over department
SA_MAN 61000 1 0 Agg over department
SA_REP 7000 0 0 Detail Both
SA_REP 250500 1 0 Agg over department
ST_MAN 36400 1 0 Agg over department
AD_ASST 4400 1 0 Agg over department
AD_PRES 24000 1 0 Agg over department
IT_PROG 28800 1 0 Agg over department
PU_CLERK 13900 1 0 Agg over department
SH_CLERK 64300 1 0 Agg over department
ST_CLERK 55700 1 0 Agg over department
AC_ACCOUNT 8300 1 0 Agg over department
FI_ACCOUNT 39600 1 0 Agg over department
10 4400 0 1 Agg over job
10 AD_ASST 4400 0 0 Detail Both
20 19000 0 1 Agg over job
20 MK_MAN 13000 0 0 Detail Both
20 MK_REP 6000 0 0 Detail Both
30 24900 0 1 Agg over job
30 PU_MAN 11000 0 0 Detail Both
30 PU_CLERK 13900 0 0 Detail Both
40 6500 0 1 Agg over job
40 HR_REP 6500 0 0 Detail Both
50 156400 0 1 Agg over job
50 ST_MAN 36400 0 0 Detail Both
50 SH_CLERK 64300 0 0 Detail Both
50 ST_CLERK 55700 0 0 Detail Both
60 28800 0 1 Agg over job
60 IT_PROG 28800 0 0 Detail Both
70 10000 0 1 Agg over job
70 PR_REP 10000 0 0 Detail Both
80 304500 0 1 Agg over job
80 SA_MAN 61000 0 0 Detail Both
80 SA_REP 243500 0 0 Detail Both
90 58000 0 1 Agg over job
90 AD_VP 34000 0 0 Detail Both
90 AD_PRES 24000 0 0 Detail Both
100 51600 0 1 Agg over job
100 FI_MGR 12000 0 0 Detail Both
100 FI_ACCOUNT 39600 0 0 Detail Both
110 20300 0 1 Agg over job
110 AC_MGR 12000 0 0 Detail Both
110 AC_ACCOUNT 8300 0 0 Detail Both

52 rows selected



Notice now I can't just assume the aggregates come in the order I expect them, i.e., the grand total is at the bottom. If you were like me, rollups and cubes held more mystery that help in most cases, however with grouping_ids it becomes more apparent the meaning of each row and what exactly Oracle is aggregating for you. (Note: All queries were run against the HR schema that comes with Oracle 10g Express Edition (XE)).












































































































































































































































































































































































































































































































DEPARTMENT_ID JOB_ID SUM(SALARY) DEPT_G JOB_G DESCRIPTION
SA_REP700000Detail Both
700001Agg over manager
10AD_ASST440000Detail Both
10440001Agg over manager
20MK_MAN1300000Detail Both
20MK_REP600000Detail Both
201900001Agg over manager
30PU_MAN1100000Detail Both
30PU_CLERK1390000Detail Both
302490001Agg over manager
40HR_REP650000Detail Both
40650001Agg over manager
50ST_MAN3640000Detail Both
50SH_CLERK6430000Detail Both
50ST_CLERK5570000Detail Both
5015640001Agg over manager
60IT_PROG2880000Detail Both
602880001Agg over manager
70PR_REP1000000Detail Both
701000001Agg over manager
80SA_MAN6100000Detail Both
80SA_REP24350000Detail Both
8030450001Agg over manager
90AD_VP3400000Detail Both
90AD_PRES2400000Detail Both
905800001Agg over manager
100FI_MGR1200000Detail Both
100FI_ACCOUNT3960000Detail Both
1005160001Agg over manager
110AC_MGR1200000Detail Both
110AC_ACCOUNT830000Detail Both
1102030001Agg over manager
69140011Agg over both


Sunday, September 14, 2008

Oracle Extract function

Here is a function I ran across in Oracle that I consider very helpful. I used to use the to_char function to get the piece or pieces of a date I needed to display or for calculations or what have you. Now I simply use the EXTRACT function to, well, extract the piece of the date I need.

If I needed the year, I would simply issue the following:

  1. select extract(YEAR from sysdate) from dual;  


Or if I needed the month:

  1. select extract(MONTH from sysdate) from dual;  


This function works with dates, timestamps and even the interval datatype.

Tuesday, July 1, 2008

Oracle XML Generation

Even though Oracle is a relational database, it handles XML quite well. Specifically, its actually very efficient at generating XML from data in relational tables. There are a couple of different options when it comes to XML generation, using the dbms_xmlgen package or the SQL set of XML functions.

The dbms_xmlgen package provides a very easy to way to generate xml. Basically you give it a query and it gives you xml. Let's take a look:

  1. declare  
  2. ctx dbms_xmlgen.ctxHandle;  
  3. xml xmlType;  
  4. begin  
  5.  ctx := dbms_xmlgen.newContext('select * from employees where employee_id = :emp_id');  
  6.  dbms_xmlgen.setbindvalue(ctx,'emp_id',100);  
  7.  xml := dbms_xmlgen.getXMLType(ctx);  
  8.  dbms_output.put_line(substr(xml.getClobVal(),0,4000));  
  9. end;  


And here is the result....

  1. <ROWSET>  
  2.  <ROW>  
  3.   <EMPLOYEE_ID>100</EMPLOYEE_ID>  
  4.   <FIRST_NAME>Steven</FIRST_NAME>  
  5.   <LAST_NAME>King</LAST_NAME>  
  6.   <EMAIL>SKING</EMAIL>  
  7.   <PHONE_NUMBER>515.123.4567</PHONE_NUMBER>  
  8.   <HIRE_DATE>17-JUN-87</HIRE_DATE>  
  9.   <JOB_ID>AD_PRES</JOB_ID>  
  10.   <SALARY>24000</SALARY>  
  11.   <DEPARTMENT_ID>90</DEPARTMENT_ID>  
  12.  </ROW>  
  13. </ROWSET>  


All right so that is pretty cool. Query in, XML out. This package has major pros and cons. Pros: For hierarchical data, specifically data created using a connect by statement, it really cannot be beat. (The method is called, newContextFromHierarchy, and you'll have to use the "level" psuedo-column in conjunction with the connect by to use that feature.) Cons: You really can't nest calls in there and expect it to be performant. Well you may ask, why would I want to nest those calls? In my experience, you'll need to pull data in from other tables to include as children xml elements on the main record. When you do that too much then you can have some problems with performance AND you may even experience actual memory errors from the underlying C code. Moral of the story: Just be careful if you are generating large amounts of XML. Another con, most of the cool features are 10g only, so you 9i'ers out there, will have to use the SQL XML functions.

Speaking of the SQL XML functions, that would be the second method of XML generation to discuss. If you look at these tags in the documentation, they are scary. No, really, like see your grandmother naked kind of scary. However, after you actually use the tags, they are very powerful and very performant even when generating lots of XML. Basically the tags you'll use most frequently are the xmlelement, xmlagg and xmlforest functions. Let's see them in action...

  1. select xmlelement("ROWSET",  
  2.           xmlagg(  
  3.             xmlelement("ROW",  
  4.               xmlforest(employee_id, first_name, last_name, email,  
  5.                         phone_number, hire_date, job_id, salary,  
  6.                         department_id)))) from employees   
  7.                         where employee_id = 100;  


And the result is....
  1. <ROWSET>  
  2.   <ROW>  
  3. <EMPLOYEE_ID>100</EMPLOYEE_ID>  
  4. <FIRST_NAME>Steven</FIRST_NAME>  
  5. <LAST_NAME>King</LAST_NAME>  
  6. <EMAIL>SKING</EMAIL>  
  7. <PHONE_NUMBER>515.123.4567</PHONE_NUMBER>  
  8. <HIRE_DATE>1987-06-17</HIRE_DATE>  
  9. <JOB_ID>AD_PRES</JOB_ID>  
  10. <SALARY>24000</SALARY>  
  11. <DEPARTMENT_ID>90</DEPARTMENT_ID>  
  12. </ROW>  
  13. </ROWSET>  

Okay so basically the same XML we got from the dbms_xmlgen package. A little more work, sure, but your DBA's will love you for it. Let's analyze what each function does in a little more detail.
Xmlelement, basically takes a varchar as the element name ("name" is different from "NAME") and the data you want to xml-ize.
Xmlagg will basically take your row xml and concatenate each together in one big xml document.
Xmlforest takes the list of columns and xml-izes them all, using the column name or the alias that you give it as the tag name.

Using those 3 tags will generate most of the xml you will need, however there are others like xmlattribute and xmlcomment that may come in handy.

Good luck XMLing.

Monday, June 30, 2008

Oracle and JDBC, multiple objects one query

In a world of JPA and various ORM's, sometimes the lowly JDBC is completely forgotten. However, JDBC has a powerful advantage over ORM and that is that it gives the developer complete control over what is executed and how the data is mapped relationally. More often than not, we retrieve data from multiple tables that we then map to multiple objects. There are several ways to accomplish this, we can make a "super-query" with all the columns for all the tables we need on one row and we do some fancy "if" statement fun to model the data to our objects. Or we can make a proc on the Oracle side and declare cursors and the whole deal to get what we want. I'm actually not a big fan of other one, because there is a little heavy lifting involved either way. My preferred method of getting records back is to use Oracle's "cursor expressions". Cursor expressions are simply queries embedded in the SQL that come back as ref cursors. So let's see them in action.

  1. select d.*,  
  2. cursor(select e.* from employees e where e.department_id = d.department_id) as employees  
  3. from departments d;  


Using Oracle 10g Express Edition's hr schema, I wanted all departments, with all the employees in each department in one call. Using the cursor keyword, I now have a ref cursor pointing to those employees per department record.

Enough already! Show me how to get this back in JDBC. Alright, since you were so nice about it. Its actually dirt simple.

  1. //... normal jdbc type execution code here  
  2. //my result set object is called rs  
  3. ResultSet employees = (ResultSet)rs.getObject("employees");  


You simply use the getObject call using the ResultSet object returned by JDBC when executing a select statement, and then cast it to a ResultSet object, which in my example will be a ResultSet containing employee information.

So why should you use this? Um, because its fast and its easy. The way to get the most out of your database call is to get the most amount of data back at a time as you can. Using cursor expressions allows you to do that, without creating lots of procedures to return various ref cursors.

But Nick, what about all those open cursors won't I exceed max number of cursors on Oracle? The answer is no, you won't. You just have to remember to close those ResultSet object you are creating. So if you do get that message, check to make sure all your ResultSet's are being closed properly.

Have fun with your fast JDBC code, your customers will love you for it.

Friday, February 29, 2008

Decode Function

I will admit that this isn't the best name Oracle could have given to this function. This is a very powerful SQL function that can really help developers. Its main purpose is to provide a pseudo if statement inside an sql statement. The decode statement takes the following parameters:

decode(text_to_compare,value_to_compare,result_if_true,result_if_false);

Let's see decode in action to firm up everyone's understanding.


  1. select decode('test','test','They are equal!','Um.. not so much.'from dual;  
  2.   
  3. DECODE('TEST','TEST','THEY ARE EQUAL!','UM..NOT SO MUCH.')   
  4. ------------------------------------------------------   
  5. They are equal!   


1 rows selected


As you can see, when 'test' equals 'test' then we get 'They are equal!'. Okay, so that's pretty boring on a good day. Let's dig in a more complex example. I have a table called decode_test for obvious reasons that contains information about products that have been sold by yours truly.
  1. select name, product_id, decode(product_id,  
  2.               1,'Stereo System',  
  3.               2,'Blu-Ray DVD Player',  
  4.               3,'Brick of Gold',  
  5.               'Other stuff laying around store.'from decode_test;  
  6.   
  7.   
  8. NAME                 PRODUCT_ID             PRODUCT_NAME                       
  9. -------------------- ---------------------- --------------------------------   
  10. Nicholas             1                      Stereo System                      
  11. Nicholas             3                      Brick of Gold                      
  12. Nicholas             2                      Blu-Ray DVD Player                 
  13. Nicholas             1                      Stereo System                      
  14.   
  15. rows selected  


Okay so what happened here? Basically in my test, each product_id represented some text I wanted to display in there query. You'll notice decode can take multiple value pairs to test against the input text.

So I'll admit the previous example won't win any "Example of the Year" awards. So here is a more practical use of decode that involves dates and conditional summation.

  1. select name,   
  2. sum(decode(to_char(date_sold,'MMYYYY'),current_month,amount_sold,null)) as sales_month_to_date,   
  3. sum(decode(to_char(date_sold,'YYYY'),current_year,amount_sold,null)) as sales_year_to_date  
  4. from decode_test,  
  5. (select to_char(sysdate,'MMYYYY'as current_month,   
  6.         to_char(sysdate,'YYYY'as current_year from dual) d  
  7. group by name;  
  8.   
  9. NAME                 SALES_MONTH_TO_DATE    SALES_YEAR_TO_DATE       
  10. -------------------- ---------------------- ----------------------   
  11. Nicholas             67.5                   132.5                    
  12.   
  13. rows selected  


Well, well, that was halfway slick. In the previous example, I needed to sum only the sales for the current month and for year to date. Decode helped me to sum the amount_sold, when it was appropriate.

Wednesday, January 9, 2008

SQL - Row Numbers and Limits

A common occurrence that most programmers face are numbering rows in Oracle. There are several misconceptions about how to do this and what are the costs of such methods. I personally use the row_number() analytic function, available since version 9 to accomplish this.

  1. select my_table.*, row_number() over (order by id desc)  
  2. from my_table;  


Note in the above code the syntax of the call, you call the analytic function and then tell Oracle how you want the data to be ordered. If you used "rownum" to accomplish this you would get some interesting result when changing the order by or introducing a where clause.

So obviously row numbering is pretty easily accomplished, however there are times when I only need a certain amount of rows. For instance, I need 10 rows at a time or 20. Having written my share of web applications, I can say that feature is needed a lot. So how can we get say the first 5 results returned, its accomplished by the following:

  1. select *  
  2. from  
  3. (  
  4. select my_table.*, row_number() over (order by id desc) rn  
  5. from my_table  
  6. )  
  7. where rn between 1 and 5;  
  8. ;  


Okay, now we have limited the number of results returned by our query. In addition, we can now accomplish things like pagination through a result set of data. As with most things, there is a cost associated. Notice my inner query has no where clause, thereby inducing a full table scan. You may say, 'Now wait, didn't we just limit the amount of results returned?'. And yes, we did, however Oracle has to run through the full set of results of the inner query to correctly row_number() the rows and then it can parse out the first five or the last five or whatever combination we need.

All of that being said, this is the most efficient way to row number and limit are result set of data based purely on the number of results returned in Oracle.