The more you think about stored functions in SQL, the more you come up with ways to put them to use in every single one of your applications. To prod your creativity and get you started, here are a number of examples of the ways stored functions in SQL can change the way you build Oracle-based systems.
In just about any and every application, you will need to perform the same calculations over and over again. Whether it is a computation of net present value, mortgage balance, the distance between two points on a Cartesian plane, or a statistical variance, with native SQL you have to recode those computations in each of the SQL statements in which they are needed.
You can pay a big price for this kind of redundancy. The code that implements your business rules is repeated throughout the application. Even if the business rule doesn't change, the way you should implement the rule is almost sure to require modification. Worse than that, the business rule itself might evolve, which could necessitate fairly significant alterations.
To solve this problem, you can hide or encapsulate all of your formulas and calculations into stored functions. These functions can then be called from within both SQL statements and also PL/SQL programs.
One fine example of the value of encapsulated calculations arose when an insurance company needed to perform date-based analyses on its accounts. The last day of the month is, of course, a very important date for most financial institutions. To manipulate dates, the company's IS department planned to make use of the built-in LAST_DAY function to obtain the last day of the month, and ADD_MONTHS to move from one month to the next. It soon uncovered a very interesting nuance to the way ADD_MONTHS worked: if you pass a day to ADD_MONTHS which is the last day in the month, SQL always returns the last day in the resulting month, regardless of the number of actual days in each of the months.
In other words:
ADD_MONTHS ('28-FEB-1994', 2) ==> 30-APR-1993
This approach might make sense for some applications and queries. The requirement at the insurance company, however, was that when you move a month, you must always land on the same day of the month (or the last day, if the original month's day was past the last day of the target month). Without stored functions, the SQL required to perform this calculation is as follows:
SELECT DECODE (payment_date, LAST_DAY (payment_date), LEAST (ADD_MONTHS (payment_date, 1), TO_DATE (TO_CHAR (ADD_MONTHS (payment_date, 1), 'MMYYYY') || TO_CHAR (payment, 'DD'), 'MMYYYYDD')), ADD_MONTHS (payment_date, 1)) FROM premium_payments;
which may be read as, "If the last payment date falls on the last day of the month, then return as the next payment date the earliest of either the result of adding one month to payment date (using ADD_MONTHS) or the same day in the new month as the day in the month of the last payment date. If the last payment was not made on the last day of the month, simply use ADD_MONTHS to get the next payment date."
Not only is that difficult to understand, but it required three different calls to the ADD_MONTHS built-in. And remember that this complex SQL would have to be repeated in every SELECT list where ADD_MONTHS was used to increment or decrement dates. You can well imagine how happy the programmers in this company became when they installed Oracle Server Version 7.1 and were able to use the following function inside their SQL statements (for a full explanation of the function's logic, see Chapter 12, Date Functions ):
FUNCTION new_add_months (date_in IN DATE, months_shift IN NUMBER) RETURN DATE IS return_value DATE; BEGIN return_value := ADD_MONTHS (date_in, months_shift); IF date_in = LAST_DAY (date_in) THEN return_value := LEAST (return_value, TO_DATE (TO_CHAR (return_value, 'MMYYYY') || TO_CHAR (date_in, 'DD') , 'MMYYYYDD')); END IF; RETURN return_value; END new_add_months;
With the stored function, the SELECT statement to obtain the next payment date is simply:
SELECT new_add_months (payment_date,1) FROM premium_payments;
The more you look through your SQL statements, the more opportunities you find for stored functions that can hide calculations and therefore improve the long-term stability of your code. While it is unlikely that you will have the time and resources to go back and rewrite wholesale the SQL underpinnings of your application, you can at least build the functions and roll them into any new product development.
This simple question is hard to answer in SQL: "Show me the name and salary of the employee who has the highest salary in each department, as well as the total salary for that person's department."
Broken into two separate queries, this question poses no problem. Here is the first part:
SELECT department_id, last_name, salary FROM employee E1 WHERE salary = (SELECT MAX (salary) FROM employee E2 WHERE E.department_id = E2.department_id) GROUP BY department_id;
and here is the second part:
SELECT department_id, SUM (salary) FROM employee ORDER BY department_id;
However, I cannot very easily combine them since that would require listing and obtaining both scalar (single row) and aggregate (across multiple rows) values from the same table. The following SELECT list contains the information I want to display. How could I construct my FROM, WHERE, and GROUP BY clauses to show both an individual's salary and the departmental total?
SELECT department_id, last_name, salary, SUM (salary) FROM ...? WHERE ...? GROUP BY ...?
The most straightforward solution prior to Release 2.1 of PL/SQL was to create a view that "presummarized" the salary for each department:
CREATE VIEW dept_salary AS SELECT department_id, SUM (salary) total_salary FROM employee GROUP BY department_id;
Now, with this view, I can get at my answer with a single SQL statement as follows:
SELECT E.department_id, last_name, salary, total_salary FROM employee E, dept_salary DS WHERE E.department_id = DS.department_id AND salary = (SELECT MAX (salary) FROM employee E2 WHERE E.department_id = E2.department_id);
This doesn't seem like such a bad solution, except that you have to create a customized view each time you want to perform this kind of calculation. In addition, this SQL is far less than straightforward for many programmers.
A better solution is to make use of a stored function in SQL. Instead of creating a view, create a function that performs exactly the same calculation, but this time only for the specified department:
FUNCTION total_salary (dept_id_in IN department.department_id%TYPE) RETURN NUMBER IS CURSOR grp_cur IS SELECT SUM (salary) FROM employee WHERE department_id = dept_id_in; return_value NUMBER; BEGIN OPEN grp_cur; FETCH grp_cur INTO return_value; CLOSE grp_cur; RETURN return_value; END;
In this version I outer-join the department and employee tables. This way if a department does not exist (bad department ID number), I return NULL. If a department has no employees, I return 0; otherwise, I return the total salary in that department.
Now my query does not need to join to a view containing a GROUP BY. Instead, it calls the total_salary function, passing the employee's department ID number as a parameter:
SELECT E.department_id, last_name, salary, total_salary (E.department_id) FROM employee E WHERE salary = (SELECT MAX (salary) FROM employee E2 WHERE E.department_id = E2.department_id);
The resulting SQL statement is not only easier to read; it also executes faster, especially for larger tables. I could simplify the SQL statement further by creating a function that returns the maximum salary in a particular department. The above SELECT would then become simply:
SELECT department_id, last_name, salary, total_salary (department_id) FROM employee E WHERE salary = max_sal_in_dept (department_id);
You can also use stored functions in a SQL statement to replace correlated subqueries. A correlated subquery is a SELECT statement inside the WHERE clause of a SQL statement (SELECT, INSERT, or DELETE) which is correlated (or makes reference) to one or more columns in the enclosing SQL statement. In the preceding section I used a correlated subquery to determine the employee who receives the highest salary in each department:
SELECT E.department_id, last_name, salary, total_salary (E.department_id) FROM employee E WHERE salary = (SELECT MAX (salary) FROM employee E2 WHERE E.department_id = E2.department_id);
The last three lines in the query contain a SELECT statement matching the department ID number for the "inner" employee (E2) to the department ID number for the "outer" employee table (E1). The inner query is executed once for every row retrieved in the outer query.
The correlated subquery is a very powerful feature in SQL, since it offers the equivalent of a procedural language's nested loop capability, as in:
LOOP LOOP END LOOP; END LOOP;
Two drawbacks with a correlated subquery are:
The logic can become fairly complicated
The resulting SQL statement can be difficult to understand and follow
You can use a stored function in place of a correlated subquery to address these drawbacks; in the above example, I would want a function that calculates the highest salary in a given department:
FUNCTION max_salary (dept_id_in IN department.department_id%TYPE) RETURN NUMBER IS CURSOR grp_cur IS SELECT MAX (salary) FROM employee WHERE department_id = dept_id_in; return_value NUMBER; BEGIN OPEN grp_cur; FETCH grp_cur INTO return_value; CLOSE grp_cur; RETURN return_value; END;
I can now use both total_salary and max_salary in my SELECT statement that says, "Show me the name and salary of the employee who has the highest salary in each department, as well as the total salary for that person's department."
SELECT E.department_id, last_name, salary, total_salary (department_id) FROM employee WHERE salary = max_salary (department_id);
Compare that simple, self-documenting piece of SQL to the version requiring a view and correlated subquery:
CREATE VIEW dept_salary AS SELECT department_id, SUM (salary) total_salary FROM employee GROUP BY department_id; SELECT E.department_id, last_name, salary, total_salary FROM employee E, dept_salary DS WHERE E.department_id = DS.department_id AND salary = (SELECT MAX (salary) FROM employee E2 WHERE E.department_id = E2.department_id);
and I am sure you will agree that stored functions in SQL can make your life much easier.
You may have noticed that the total_salary function from the previous section and the max_salary from this section look very similar. The only difference between the two is that the cursor in total_salary uses the SUM group function and the cursor in max_salary uses the MAX group function. If you are as fanatical about consolidating your code into the smallest possible number of distinct "moving parts," you might consider a single function that returns a different group-level statistic for a department based on a second parameter, as follows:
FUNCTION salary_stat (dept_id_in IN department.department_id%TYPE, stat_type_in IN VARCHAR2) RETURN NUMBER IS v_stat_type VARCHAR2(20) := UPPER (stat_type_in); CURSOR grp_cur IS SELECT SUM (salary) sumsal, MAX (salary) maxsal, MIN (salary) minsal, AVG (salary) avgsal, COUNT (DISTINCT salary) countsal, FROM employee WHERE department_id = dept_id_in; grp_rec grp_cur%ROWTYPE; retval NUMBER; BEGIN OPEN grp_cur; FETCH grp_cur INTO grp_rec; CLOSE grp_cur; IF v_stat_type = 'SUM' THEN retval := grp_rec.sumsal; ELSIF v_stat_type = 'MAX' THEN retval := grp_rec.maxsal; ELSIF v_stat_type = 'MIN' THEN retval := grp_rec.minsal; ELSIF v_stat_type = 'COUNT' THEN retval := grp_rec.countsal; ELSIF v_stat_type = 'AVG' THEN retval := grp_rec.avgsal; END IF; RETURN retval; END;
The overhead of adding these additional expressions in the SELECT list -- and the processing of the IF statement -- is negligible. With this new, generic utility, my request for salary analysis shown above now becomes:
SELECT E.department_id, last_name, salary, salary_stat (department_id, 'sum') FROM employee WHERE salary = salary_stat (department_id, 'max');
If I ever have to change the SQL required to obtain departmental-level statistics for salary, I have to upgrade only this single function.
The DECODE function offers IF-like capabilities in the nonprocedural SQL environment provided by the Oracle Server. You can use the DECODE syntax to create matrix reports with a fixed number of columns and also perform complex IF-THEN-ELSE logic within a query. The downside to DECODE is that it can be difficult to write and very difficult to maintain. Consider the following example of DECODE to determine whether a date is within the prescribed range and, if it is, add to the count of rows that fulfill this requirement:
SELECT FC.year_number, SUM (DECODE (GREATEST (ship_date, FC.q1_sdate), ship_date, DECODE (LEAST (ship_date, FC.q1_edate), ship_date, 1, 0), 0)) Q1_results, SUM (DECODE (GREATEST (ship_date, FC.q2_sdate), ship_date, DECODE (LEAST (ship_date, FC.q2_edate), ship_date, 1, 0), 0)) Q2_results, SUM (DECODE (GREATEST (ship_date, FC.q3_sdate), ship_date, DECODE (LEAST (ship_date, FC.q3_edate), ship_date, 1, 0), 0)) Q3_results, SUM (DECODE (GREATEST (ship_date, FC.q4_sdate), ship_date, DECODE (LEAST (ship_date, FC.q4_edate), ship_date, 1, 0), 0)) Q4_results FROM orders O, fiscal_calendar FC GROUP BY year_number;
The result set for this query might look like this:
YEAR NUMBER Q1 RESULTS Q2 RESULTS Q3 RESULTS Q4 RESULTS ------------ ---------- ---------- ---------- ---------- 1993 12000 14005 22000 40000 1994 10000 15000 21000 55004
While it is very handy to use DECODE to produce such a report, the SQL required to accomplish the task is more than a little frightening. Here is how you might try to interpret the Q1 RESULTS nested DECODE: "If the ship date is greater than or equal to the first quarter start date and the shipdate is less than or equal to the first quarter end date, then add one to the sum of the total number of orders shipped in that quarter. Otherwise, add zero."
Unfortunately, unless you are experienced in interpreting DECODE statements, you will find it difficult to glean this understanding from that convoluted SQL statement. The repetition in that single SELECT also cries out for modularization, which we can supply with the following stored function (incr_in_range means "increment if in the range"):
FUNCTION incr_in_range (ship_date_in IN DATE, sdate_in IN DATE, edate_in IN DATE) RETURN INTEGER IS BEGIN IF ship_date_in BETWEEN sdate_in AND edate_in THEN RETURN 1; ELSE RETURN 0; END IF; END;
Yep, that's all there is to it! With the incr_in_range function, that long and winding SELECT statement simply becomes:
SELECT FC.year_number, SUM (incr_in_range (ship_date, q1_sdate, q1_edate)) Q1_results, SUM (incr_in_range (ship_date, q2_sdate, q2_edate)) Q2_results, SUM (incr_in_range (ship_date, q3_sdate, q3_edate)) Q3_results, SUM (incr_in_range (ship_date, q4_sdate, q4_edate)) Q4_results FROM orders O, fiscal_calendar FC GROUP BY year_number;
This stored function gets rid of the code redundancy and makes the SELECT statement much more readable. In addition, this function could be used in other SQL statements to perform the same logic.
The GROUP BY clause in a SELECT statement allows you to collect data across multiple records and group them by one or more columns. The following SELECT statement, for example, calculates the total amount of salary paid to all workers with the same job description (or "title"):
SELECT job_title_desc, SUM (salary) FROM employee E, job_title JT WHERE E.job_title_id = JT.job_title_id GROUP BY job_title_desc;
A portion of the result set from this query might look like this:
ACCOUNT MANAGER 32000 OFFICE CLERK 4000 PROJECT MANAGER 10006 SHIPPING CLERK 4200
What if you wanted to see a total amount of salary for each type or category of job title? How much do all the various kinds of clerks in the company earn? What about all the different types of vice presidents and managers?
With native SQL, this is very difficult to write. In effect, you want GROUP BY to assess all the different categories, but those categories do not exist in a separate table. They are denormalized into the job title descriptions, and the GROUP BY clause can be applied only to whole column values. Fortunately, stored functions in SQL provide a straightforward solution to the problem. The following function, job_category, takes as its single parameter the primary key to the title table and returns the job category. It does so by performing a LIKE operation on the text for that job title -- something you cannot do directly in the GROUP BY clause.
FUNCTION job_category (title_id_in IN job_title.job_title_id%TYPE) RETURN VARCHAR2 IS CURSOR title_cur IS SELECT job_title_desc FROM job_title WHERE job_title_id = title_id_in; title_rec title_cur%ROWTYPE; BEGIN OPEN title_cur; FETCH title_cur INTO title_rec; IF title_cur%NOTFOUND THEN CLOSE title_cur; RETURN NULL; ELSE CLOSE title_cur; IF title_rec.job_title_desc LIKE '%CLERK%' THEN RETURN 'CLERK'; ELSIF title_rec.job_title_desc LIKE '%VICE PRESIDENT%' THEN RETURN 'VICE PRESIDENT'; ELSIF title_rec.job_title_desc LIKE '%MANAGER%' THEN RETURN 'MANAGER'; END IF; END IF; END;
I can now rewrite my query using the job_category function and cut through all the confusion of my denormalized job title descriptions:
SELECT job_category (job_title_id) as title, SUM (salary) FROM employee GROUP BY title;
I use the function in both the SELECT list and the GROUP BY. This query now produces a result set that looks like the following:
CLERK 8200 MANAGER 42006 VICE PRESIDENT 75000
It is very easy for me to find out if a particular word or set of characters appears in a string with the INSTR function. It is also straightforward in SQL to determine which rows have a column that contains a certain word. It is much more difficult within native SQL to determine the number of times a particular word or set of characters appears in a string in a single row. The built-in set-at-a-time processing of SQL does not provide iterative or looping behavior within a row -- just across rows.
If I want to perform iterative analysis or computation on a particular value in a single row, I need to apply a PL/SQL function. Let's see how this would come into play.
Here is the SQL that shows me all lines of text containing the word "the":
SELECT text FROM notes WHERE INSTR (text, 'the') > 0;
I can even use INSTR to find all the lines of text containing the word "the" at least three times:
SELECT text FROM notes WHERE INSTR (text, 'the', 1, 3) > 0;
I could, if pressed, also display all lines of text that containing the word "the" precisely three times:
SELECT text FROM notes WHERE INSTR (text, 'the', 1, 3) != 0 AND INSTR (text, 'the', 1, 4) = 0;
In other words, if INSTR returns a nonzero number or location for a third occurrence of "the", but also returns a zero for a fourth occurrence of "the", that means that the string occurs precisely three times in the string.
If, however, I need to see a report that displays the number of occurrences of "the" in each of my notes, or if I want to GROUP BY the number of occurrences of "the", SQL comes up short. The best I can do is pseudocode to express my wish:
SELECT number-of-thes-in-text, text FROM notes GROUP BY number-of-thes-in-text;
As a nonprocedural language that deals with a set (multiple rows) of data at a time, SQL doesn't give me any way to programmatically analyze the contents of a particular column. PL/SQL, on the other hand, is perfectly equipped to solve these kinds of problems. The ps.parse package included on the disk demonstrates how to build a function that will return the number of atomics (words and/or delimiters) in a string. I will not repeat the implementation here, but will show how the function would be used in SQL:
Show the number of occurrences of "the" in each line of text:
SELECT text, ps_parse.number_of_atomics (text, 'the') FROM notes;
Show the distribution of the use of the word "urgent" across the days of the week for all text in which the word urgent appears at least once:
SELECT TO_CHAR (note_date, 'DAY'), ps_parse.number_of_atomics (text, 'urgent') urgent_count FROM notes WHERE urgent_count > 0;
The SQL language does not support recursion, yet this powerful programming method is at times crucial to solving a problem. PL/SQL does allow for recursive execution of function calls, however, so you can put it to use inside a SQL statement where recursion is needed.
Suppose your application needs to print checks. A check contains both the numeric version of the amount of the check (say, $99.70) and the written version of the amount of the check (ninety-nine dollars and seventy cents). You will not be able to use SQL to convert the numeric check amount (stored in the database) into the written version. You can, however, put PL/SQL to use, along with some of its more interesting features like recursion, global data, and PL/SQL tables, to provide a very elegant solution.
The package below offers a checks package to provide this conversion capability.[ 2 ]
[2] This implementation of "number-to-words" was first presented by Mike Burnside of Oracle Australia at the International Oracle User's Week in San Francisco in September 1994. I have made some minor modifications.
/* Filename on companion disk: checks.spp */ PACKAGE checks IS /* Convert number to words */ FUNCTION num2words (number_in IN NUMBER) RETURN VARCHAR2; /* Since a package, must assert my purity level. */ PRAGMA RESTRICT_REFERENCES (num2words, WNDS); END checks; PACKAGE BODY checks IS /* Table structure to hold names of numeric components. */ TYPE numwords_tabtype IS TABLE OF VARCHAR2(20) INDEX BY BINARY_INTEGER; words_table numwords_tabtype; /* Used in initialization section. */ v_date DATE; FUNCTION num2words (number_in IN NUMBER) RETURN VARCHAR2 IS my_number NUMBER; BEGIN /* Sorry, I don't do cents in this program! */ my_number := FLOOR (number_in); /* $1000+ */ IF my_number >= 1000 THEN /* Break up into two recursive calls to num2words. */ RETURN num2words (my_number/1000) || ' Thousand ' || num2words (MOD (my_number, 1000)); END IF; /* $100-999 */ IF my_number >= 100 THEN /* Break up into two recursive calls to num2words. */ RETURN num2words (my_number/100) || ' Hundred ' || num2words (MOD (my_number, 100)); END IF; /* $20-$99 */ IF my_number >= 20 THEN /* Break up into tens word and then final dollar amount. */ RETURN words_table (FLOOR (my_number/10)) || ' ' || num2words (MOD (my_number, 10)); END IF; /* Down to 19 or less. Get word from "upper register" of table. */ RETURN words_table (my_number + 10); END num2words; BEGIN /* Initialization section of package. Run just once per session. */ /* Manually construct the tens names. */ words_table (1) := 'Ten'; words_table (2) := 'Twenty'; words_table (3) := 'Thirty'; words_table (4) := 'Forty'; words_table (5) := 'Fifty'; words_table (6) := 'Sixty'; words_table (7) := 'Seventy'; words_table (8) := 'Eighty'; words_table (9) := 'Ninety'; /* Return NULL for zero. */ words_table (10) := NULL; /* Construct number names for one through nineteen. */ FOR day_index IN 1 .. 19 LOOP v_date := TO_DATE (to_char(day_index) || '-JAN-94'); words_table (day_index+10) := INITCAP (TO_CHAR (v_date, 'DDSP')); END LOOP; END checks;
Here are some examples of conversion from whole numbers to words:
checks.num2words (99) ==> Ninety Nine checks.num2words (12345) ==> Twelve Thousand Three Hundred Forty Five checks.num2words (5) ==> Five
I can also put this packaged function to use inside my SQL statement to query up all the information I need to print a check:
SELECT TO_CHAR (SYSDATE, 'Month DD, YYYY'), payee, amount, checks.num2words (amount), comment FROM bill WHERE bill_status = 'UNPAID'
Copyright (c) 2000 O'Reilly & Associates. All rights reserved.