Tuesday, April 3, 2012

Oracle Escape Characters

Oracle allows the assignment of special escape characters to tell Oracle that the character is interpreted literally.  Certain characters such as the underscore “_” are not interpreted literally because they have special meaning within Oracle.
In the example below, we want to find all Oracle parameter that relate to I/O, so we are tempted to use the filter LIKE  “%_io_%’.  Below we will select from the x$ksppi fixed table, filtering with the LIKE clause:

select ksppinm
from x$ksppi
where ksppinm like '%_io_%';
 
KSPPINM                                                                        
--------------------------------           
sessions                                                                        
license_max_sessions                                                           
license_sessions_warning                                                       
_session_idle_bit_latches                                                       
_enable_NUMA_optimization                                                      
java_soft_sessionspace_limit                                                   
java_max_sessionspace_size                                                     
_trace_options                                                                 
_io_slaves_disabled                                                            
dbwr_io_slaves                                                                 
_lgwr_io_slaves



As you can see above, we did not get the answer we expected.  The SQL displayed all values that contained “io”, and not just those with an underscore.  To remedy this problem, Oracle SQL supports an ESCAPE clause to tell Oracle that the character is to be interpreted literally:

select ksppinm
from x$ksppi
where ksppinm like '%\_io\_%' ESCAPE '\';
 
KSPPINM                                                                        
--------------------------------------           
_io_slaves_disabled                                                            
dbwr_io_slaves                                                                 
_lgwr_io_slaves                                                                
_arch_io_slaves                                                                
_backup_disk_io_slaves                                                         
backup_tape_io_slaves                                                          
_backup_io_pool_size                                                            
_db_file_direct_io_count                                                       
_log_io_size                                                                   
fast_start_io_target                                                            
_hash_multiblock_io_count                                                      
_smm_auto_min_io_size                                                          
_smm_auto_max_io_size                                                           
_ldr_io_size 

rowconcate Function

Start by creating this function:

SQL> CREATE OR REPLACE FUNCTION rowconcat(q in VARCHAR2) RETURN VARCHAR2 IS
  2    ret  VARCHAR2(4000);
  3    hold VARCHAR2(4000);
  4    cur  sys_refcursor;
  5  BEGIN
  6    OPEN cur FOR q;
  7    LOOP
  8      FETCH cur INTO hold;
  9      EXIT WHEN cur%NOTFOUND;
 10      IF ret IS NULL THEN
 11        ret := hold;
 12      ELSE
 13        ret := ret || ',' || hold;
 14      END IF;
 15    END LOOP;
 16    RETURN ret;
 17  END;
 18  /
Function created.
This function returns a string result with the concatenated non-NULL values from a SQL statement. It returns NULL if there are no non-NULL values.
Here is an example of how to map several rows to a single concatenated column:
SQL> SELECT rowconcat('SELECT dname FROM dept') AS departments
  2    FROM dual;
DEPARTMENTS
--------------------------------------------------------------------------------
ACCOUNTING,RESEARCH,SALES,OPERATIONS
This example is more interresting, it concatenates a column across several rows based on an aggregation:
SQL> col employees format a50
SQL> SELECT deptno,
  2         rowconcat('SELECT ename FROM emp a WHERE deptno='||deptno) AS Employees
  3    FROM emp
  4   GROUP BY deptno
  5  /
    DEPTNO EMPLOYEES
---------- --------------------------------------------------
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        10 CLARK,KING,MILLER

DATA BASE LINKS

Purpose
Use the CREATE DATABASE LINK statement to create a database link. A database link is a schema object in one database that enables you to access objects on another database. The other database need not be an Oracle Database system. However, to access non-Oracle systems you must use Oracle Heterogeneous Services.
Once you have created a database link, you can use it to refer to tables and views on the other database. In SQL statements, you can refer to a table or view on the other database by appending @dblink to the table or view name. You can query a table or view on the other database with the SELECT statement. You can also access remote tables and views using any INSERT, UPDATE, DELETE, or LOCK TABLE statement.

dblink
Specify the complete or partial name of the database link. If you specify only the database name, then Oracle Database implicitly appends the database domain of the local database.
If the value of the GLOBAL_NAMES initialization parameter is TRUE, then the database link must have the same name as the database to which it connects. If the value of GLOBAL_NAMES is FALSE, and if you have changed the global name of the database, then you can specify the global name.
The maximum number of database links that can be open in one session or one instance of a Real Application Clusters configuration depends on the value of the OPEN_LINKS and OPEN_LINKS_PER_INSTANCE initialization parameters.

Restriction on Creating Database Links
You cannot create a database link in another user's schema, and you cannot qualify dblink with the name of a schema. Periods are permitted in names of database links, so Oracle Database interprets the entire name, such as ralph.linktosales, as the name of a database link in your schema rather than as a database link named linktosales in the schema ralph.)

CONNECT TO Clause
The CONNECT TO clause lets you enable a connection to the remote database.

CURRENT_USER Clause
Specify CURRENT_USER to create a current user database link. The current user must be a global user with a valid account on the remote database.
If the database link is used directly, that is, not from within a stored object, then the current user is the same as the connected user.
When executing a stored object (such as a procedure, view, or trigger) that initiates a database link, CURRENT_USER is the username that owns the stored object, and not the username that called the object. For example, if the database link appears inside procedure scott.p (created by scott), and user jane calls procedure scott.p, the current user is scott.
However, if the stored object is an invoker-rights function, procedure, or package, the invoker's authorization ID is used to connect as a remote user. For example, if the privileged database link appears inside procedure scott.p (an invoker-rights procedure created by scott), and user Jane calls procedure scott.p, then CURRENT_USER is jane and the procedure executes with Jane's privileges.

user IDENTIFIED BY password
Specify the username and password used to connect to the remote database using a fixed user database link. If you omit this clause, the database link uses the username and password of each user who is connected to the database. This is called a connected user database link.

dblink_authentication
Specify the username and password on the target instance. This clause authenticates the user to the remote server and is required for security. The specified username and password must be a valid username and password on the remote instance. The username and password are used only for authentication. No other operations are performed on behalf of this user.
You must specify this clause when using the SHARED clause.

USING 'connect string'
Specify the service name of a remote database. If you specify only the database name, then Oracle Database implicitly appends the database domain to the connect string to create a complete service name. Therefore, if the database domain of the remote database is different from that of the current database, then you must specify the complete service name.

Examples
The examples that follow assume two databases, one with the database name local and the other with the database name remote. The examples use the Oracle Database domain. Your database domain will be different.

Defining a Public Database Link: Example
The following statement defines a shared public database link named remote that refers to the database specified by the service name remote:
CREATE PUBLIC DATABASE LINK remote 
   USING 'remote'; 

This database link allows user hr on the local database to update a table on the remote database (assuming hr has appropriate privileges):
UPDATE employees@remote
   SET salary=salary*1.1
   WHERE last_name = 'Baer';

Defining a Fixed-User Database Link: Example
In the following statement, user hr on the remote database defines a fixed-user database link named local to the hr schema on the local database:
CREATE DATABASE LINK local 
   CONNECT TO hr IDENTIFIED BY hr
   USING 'local';

Once this database link is created, hr can query tables in the schema hr on the local database in this manner:
SELECT * FROM employees@local;

User hr can also use DML statements to modify data on the local database:
INSERT INTO employees@local
   (employee_id, last_name, email, hire_date, job_id)
   VALUES (999, 'Claus', 'sclaus@oracle.com', SYSDATE, 'SH_CLERK');

UPDATE jobs@local SET min_salary = 3000
   WHERE job_id = 'SH_CLERK';

DELETE FROM employees@local 
   WHERE employee_id = 999;

Using this fixed database link, user hr on the remote database can also access tables owned by other users on the same database. This statement assumes that user hr has SELECT privileges on the oe.customers table. The statement connects to the user hr on the local database and then queries the oe.customers table:
SELECT * FROM oe.customers@local;

Defining a CURRENT_USER Database Link: Example
The following statement defines a current-user database link to the remote database, using the entire service name as the link name:
CREATE DATABASE LINK remote.us.oracle.com
   CONNECT TO CURRENT_USER
   USING 'remote';

The user who issues this statement must be a global user registered with the LDAP directory service.
You can create a synonym to hide the fact that a particular table is on the remote database. The following statement causes all future references to emp_table to access the employees table owned by hr on the remote database:
CREATE SYNONYM emp_table 
   FOR oe.employees@remote.us.oracle.com;

Can one generate HTML reports from SQL*Plus?

One can generate static HTML pages from SQL*Plus (8.1.6 and above) by setting the MARKUP option to HTML ON. This can be done by specifying -MARKUP "HTML ON" from command line, or with the "SET MARKUP HTML ON" command. Look at this example SQL Script:

specifying -MARKUP "HTML ON" from command line, or with the "SET MARKUP HTML ON" command. Look at this example SQL Script:
set markup HTML on
spool index.html
select * from tab;
spool off
set markup HTML off



Note: You can deploy this file on your web site or edit it in an HTML editor (like FrontPage or Dreamweaver). Another good idea is to develop a CSS to present the data more elegantly. One can also embed HTML tags in the select statement to create hyperlinks and add more HTML features.

How does one copy data from one database to another in SQL*Plus?

The SQL*Plus COPY command is one of the fastest ways of copying data between databases and schemas. This is also one of the few methods that will handle LONG columns correctly. Look at this example:

SQL> COPY FROM scott/tiger@db1 TO scott/tiger@db2 INSERT mytable USING select * from mytable;


COPY
----

Copies data from a query to a table in the same or another
database. COPY supports CHAR, DATE, LONG, NUMBER and VARCHAR2.

COPY {FROM database | TO database | FROM database TO database}
           {APPEND|CREATE|INSERT|REPLACE} destination_table
           [(column, column, column, ...)] USING query

where database has the following syntax:
    username[/password]@connect_identifier

Note that this command is no more enhanced since Oracle 8.0 and may be declared as obsolete in a future version (but is still there in 11gR2) but there is no replacement to easily copy tables with LONG column (LONG datatype is obsolete).

Tuesday, March 27, 2012

Analytical Function with examples

This article provides a clear, thorough concept of analytic functions and its various options by a series of simple yet concept building examples. The article is intended for SQL coders, who for might be not be using analytic functions due to unfamiliarity with its cryptic syntax or uncertainty about its logic of operation. Often I see that people tend to reinvent the feature provided by analytic functions by native join and sub-query SQL. This article assumes familiarity with basic Oracle SQL, sub-query, join and group function from the reader. Based on that familiarity, it builds the concept of analytic functions through a series of examples.
It is true that whatever an analytic function does can be done by native SQL, with join and sub-queries. But the same routine done by analytic function is always faster, or at least as fast, when compared to native SQL. Moreover, I am not considering here the amount of time that is spent in coding the native SQLs, testing, debugging and tuning them.
The general syntax of analytic function is:
Function(arg1,..., argn) OVER ( [PARTITION BY <...>] [ORDER BY <....>] [<window_clause>] )
<window_clause> is like "ROW <?>" or "RANK <?>"
All the keywords will be dealt in details as we walk through the examples. The script for creating the schema (SCOTT) on which the example queries of this article are run can be obtained in ORACLE_HOME/sqlplus/demo/demobld.sql of any standard Oracle installation.

How are analytic functions different from group or aggregate functions?



SELECT deptno,
COUNT(*) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30)
GROUP BY deptno;

DEPTNO                 DEPT_COUNT             
---------------------- ---------------------- 
20                     5                      
30                     6                      

2 rows selected
Query-1

Consider the Query-1 and its result. Query-1 returns departments and their employee count. Most importantly it groups the records into departments in accordance with the GROUP BY clause. As such any non-"group by" column is not allowed in the select clause.

SELECT empno, deptno, 
COUNT(*) OVER (PARTITION BY 
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);

     EMPNO     DEPTNO DEPT_COUNT
---------- ---------- ----------
      7369         20          5
      7566         20          5
      7788         20          5
      7902         20          5
      7876         20          5
      7499         30          6
      7900         30          6
      7844         30          6
      7698         30          6
      7654         30          6
      7521         30          6

11 rows selected.
Query-2

Now consider the analytic function query (Query-2) and its result. Note the repeating values of DEPT_COUNT column.
This brings out the main difference between aggregate and analytic functions. Though analytic functions give aggregate result they do not group the result set. They return the group value multiple times with each record. As such any other non-"group by" column or expression can be present in the select clause, for example, the column EMPNO in Query-2.
Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on the query. The main ORDER BY clause of the query operates after the analytic functions. So analytic functions can only appear in the select list and in the main ORDER BY clause of the query.
In absence of any PARTITION or <window_clause> inside the OVER( ) portion, the function acts on entire record set returned by the where clause. Note the results of Query-3 and compare it with the result of aggregate function query Query-4.

SELECT empno, deptno, 
COUNT(*) OVER ( ) CNT
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, 1;

     EMPNO     DEPTNO        CNT
---------- ---------- ----------
      7782         10          8
      7839         10          8
      7934         10          8
      7369         20          8
      7566         20          8
      7788         20          8
      7876         20          8
      7902         20          8
Query-3


SELECT COUNT(*) FROM emp
WHERE deptno IN (10, 20);

  COUNT(*)
----------
         8
Query-4


How to break the result set in groups or partitions?


It might be obvious from the previous example that the clause PARTITION BY is used to break the result set into groups. PARTITION BY can take any non-analytic SQL expression.
Some functions support the <window_clause> inside the partition to further limit the records they act on. In the absence of any <window_clause> analytic functions are computed on all the records of the partition clause.
The functions SUM, COUNT, AVG, MIN, MAX are the common analytic functions the result of which does not depend on the order of the records.
Functions like LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE depends on order of records. In the next example we will see how to specify that.

How to specify the order of the records in the partition?


The answer is simple, by the "ORDER BY" clause inside the OVER( ) clause. This is different from the ORDER BY clause of the main query which comes after WHERE. In this section we go ahead and introduce each of the very useful functions LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE and show how each depend on the order of the record.
The general syntax of specifying the ORDER BY clause in analytic function is:
ORDER BY <sql_expr> [ASC or DESC] NULLS [FIRST or LAST]
The syntax is self-explanatory.

ROW_NUMBER, RANK and DENSE_RANK


All the above three functions assign integer values to the rows depending on their order. That is the reason of clubbing them together.
ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers. In Query-5, the function ROW_NUMBER( ) is used to give separate sets of running serial to employees of departments 10 and 20 based on their HIREDATE.

SELECT empno, deptno, hiredate,
ROW_NUMBER( ) OVER (PARTITION BY
deptno ORDER BY hiredate
NULLS LAST) SRLNO
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, SRLNO;

EMPNO  DEPTNO HIREDATE       SRLNO
------ ------- --------- ----------
  7782      10 09-JUN-81          1
  7839      10 17-NOV-81          2
  7934      10 23-JAN-82          3
  7369      20 17-DEC-80          1
  7566      20 02-APR-81          2
  7902      20 03-DEC-81          3
  7788      20 09-DEC-82          4
  7876      20 12-JAN-83          5

8 rows selected.
Query-5 (ROW_NUMBER example)

RANK and DENSE_RANK both provide rank to the records based on some column value or expression. In case of a tie of 2 records at position N, RANK declares 2 positions N and skips position N+1 and gives position N+2 to the next record. While DENSE_RANK declares 2 positions N but does not skip position N+1.
Query-6 shows the usage of both RANK and DENSE_RANK. For DEPTNO 20 there are two contenders for the first position (EMPNO 7788 and 7902). Both RANK and DENSE_RANK declares them as joint toppers. RANK skips the next value that is 2 and next employee EMPNO 7566 is given the position 3. For DENSE_RANK there are no such gaps.

SELECT empno, deptno, sal,
RANK() OVER (PARTITION BY deptno
ORDER BY sal DESC NULLS LAST) RANK,
DENSE_RANK() OVER (PARTITION BY
deptno ORDER BY sal DESC NULLS
LAST) DENSE_RANK
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, RANK;

EMPNO  DEPTNO   SAL  RANK DENSE_RANK
------ ------- ----- ----- ----------
  7839      10  5000     1          1
  7782      10  2450     2          2
  7934      10  1300     3          3
  7788      20  3000     1          1
  7902      20  3000     1          1
  7566      20  2975     3          2
  7876      20  1100     4          3
  7369      20   800     5          4

8 rows selected.
Query-6 (RANK and DENSE_RANK example)


LEAD and LAG


LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row. The general syntax of LEAD is shown below:
LEAD (<sql_expr>, <offset>, <default>) OVER (<analytic_clause>)
<sql_expr> is the expression to compute from the leading row.
<offset> is the index of the leading row relative to the current row.
<offset> is a positive integer with default 1.
<default> is the value to return if the <offset> points to a row outside the partition range.
The syntax of LAG is similar except that the offset for LAG goes into the previous rows.
Query-7 and its result show simple usage of LAG and LEAD function.

SELECT deptno, empno, sal,
LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,
LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, sal DESC;

 DEPTNO  EMPNO   SAL NEXT_LOWER_SAL PREV_HIGHER_SAL
------- ------ ----- -------------- ---------------
     10   7839  5000           2450               0
     10   7782  2450           1300            5000
     10   7934  1300              0            2450
     20   7788  3000           3000               0
     20   7902  3000           2975            3000
     20   7566  2975           1100            3000
     20   7876  1100            800            2975
     20   7369   800              0            1100

8 rows selected.
Query-7 (LEAD and LAG)


FIRST VALUE and LAST VALUE function


The general syntax is:
FIRST_VALUE(<sql_expr>) OVER (<analytic_clause>)
The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY. The <sql_expr> is computed on the columns of this first record and results are returned. The LAST_VALUE function is used in similar context except that it acts on the last record of the partition.

-- How many days after the first hire of each department were the next
-- employees hired?

SELECT empno, deptno, hiredate ? FIRST_VALUE(hiredate)
OVER (PARTITION BY deptno ORDER BY hiredate) DAY_GAP
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, DAY_GAP;

     EMPNO     DEPTNO    DAY_GAP
---------- ---------- ----------
      7369         20          0
      7566         20        106
      7902         20        351
      7788         20        722
      7876         20        756
      7499         30          0
      7521         30          2
      7698         30         70
      7844         30        200
      7654         30        220
      7900         30        286

11 rows selected.
Query-8 (FIRST_VALUE)


FIRST and LAST function


The FIRST function (or more properly KEEP FIRST function) is used in a very special situation. Suppose we rank a group of record and found several records in the first rank. Now we want to apply an aggregate function on the records of the first rank. KEEP FIRST enables that.
The general syntax is:
Function( ) KEEP (DENSE_RANK FIRST ORDER BY <expr>) OVER (<partitioning_clause>)
Please note that FIRST and LAST are the only functions that deviate from the general syntax of analytic functions. They do not have the ORDER BY inside the OVER clause. Neither do they support any <window> clause. The ranking done in FIRST and LAST is always DENSE_RANK. The query below shows the usage of FIRST function. The LAST function is used in similar context to perform computations on last ranked records.

-- How each employee's salary compare with the average salary of the first
-- year hires of their department?

SELECT empno, deptno, TO_CHAR(hiredate,'YYYY') HIRE_YR, sal,
TRUNC(
AVG(sal) KEEP (DENSE_RANK FIRST
ORDER BY TO_CHAR(hiredate,'YYYY') )
OVER (PARTITION BY deptno)
     ) AVG_SAL_YR1_HIRE
FROM emp
WHERE deptno IN (20, 10)
ORDER BY deptno, empno, HIRE_YR;

     EMPNO     DEPTNO HIRE        SAL AVG_SAL_YR1_HIRE
---------- ---------- ---- ---------- ----------------
      7782         10 1981       2450             3725
      7839         10 1981       5000             3725
      7934         10 1982       1300             3725
      7369         20 1980        800              800
      7566         20 1981       2975              800
      7788         20 1982       3000              800
      7876         20 1983       1100              800
      7902         20 1981       3000              800

8 rows selected.
Query-9 (KEEP FIRST)


How to specify the Window clause (ROW type or RANGE type windows)?


Some analytic functions (AVG, COUNT, FIRST_VALUE, LAST_VALUE, MAX, MIN and SUM among the ones we discussed) can take a window clause to further sub-partition the result and apply the analytic function. An important feature of the windowing clause is that it is dynamic in nature.
The general syntax of the <window_clause> is

[ROW or RANGE] BETWEEN <start_expr> AND <end_expr>
<start_expr> can be any one of the following
  1. UNBOUNDED PECEDING
  2. CURRENT ROW
  3. <sql_expr> PRECEDING or FOLLOWING.

    <end_expr> can be any one of the following
  1. UNBOUNDED FOLLOWING or
  2. CURRENT ROW or
  3. <sql_expr> PRECEDING or FOLLOWING.
For ROW type windows the definition is in terms of row numbers before or after the current row. So for ROW type windows <sql_expr> must evaluate to a positive integer.
For RANGE type windows the definition is in terms of values before or after the current ORDER. We will take this up in details latter.
The ROW or RANGE window cannot appear together in one OVER clause. The window clause is defined in terms of the current row. But may or may not include the current row. The start point of the window and the end point of the window can finish before the current row or after the current row. Only start point cannot come after the end point of the window. In case any point of the window is undefined the default is UNBOUNDED PRECEDING for <start_expr> and UNBOUNDED FOLLOWING for <end_expr>.
If the end point is the current row, syntax only in terms of the start point can be can be
[ROW or RANGE] [<start_expr> PRECEDING or UNBOUNDED PRECEDING ]
[ROW or RANGE] CURRENT ROW is also allowed but this is redundant. In this case the function behaves as a single-row function and acts only on the current row.

ROW Type Windows


For analytic functions with ROW type windows, the general syntax is:
Function( ) OVER (PARTITIN BY <expr1> ORDER BY <expr2,..> ROWS BETWEEN <start_expr> AND <end_expr>)
or
Function( ) OVER (PARTITON BY <expr1> ORDER BY <expr2,..> ROWS [<start_expr> PRECEDING or UNBOUNDED PRECEDING]
For ROW type windows the windowing clause is in terms of record numbers.
The query Query-10 has no apparent real life description (except column FROM_PU_C) but the various windowing clause are illustrated by a COUNT(*) function. The count simply shows the number of rows inside the window definition. Note the build up of the count for each column for the YEAR 1981.
The column FROM_P3_TO_F1 shows an example where start point of the window is before the current row and end point of the window is after current row. This is a 5 row window; it shows values less than 5 during the beginning and end.

-- The query below has no apparent real life description (except 
-- column FROM_PU_C) but is remarkable in illustrating the various windowing
-- clause by a COUNT(*) function.
 
SELECT empno, deptno, TO_CHAR(hiredate, 'YYYY') YEAR,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) FROM_P3_TO_F1,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM_PU_TO_C,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) FROM_P2_TO_P1,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 1 FOLLOWING AND 3 FOLLOWING) FROM_F1_TO_F3
FROM emp
ORDEDR BY hiredate

 EMPNO  DEPTNO YEAR FROM_P3_TO_F1 FROM_PU_TO_C FROM_P2_TO_P1 FROM_F1_TO_F3
------ ------- ---- ------------- ------------ ------------- -------------
  7369      20 1980             1            1             0             0
  <font bgcolor=yellow>7499      30 1981             2            1             0             3
  7521      30 1981             3            2             1             3
  7566      20 1981             4            3             2             3
  7698      30 1981             5            4             3             3
  7782      10 1981             5            5             3             3
  7844      30 1981             5            6             3             3
  7654      30 1981             5            7             3             3
  7839      10 1981             5            8             3             2
  7900      30 1981             5            9             3             1
  7902      20 1981             4           10             3             0</font>
  7934      10 1982             2            1             0             1
  7788      20 1982             2            2             1             0
  7876      20 1983             1            1             0             0

14 rows selected.
Query-10 (ROW type windowing example)

The column FROM_PU_TO_CURR shows an example where start point of the window is before the current row and end point of the window is the current row. This column only has some real world significance. It can be thought of as the yearly employee build-up of the organization as each employee is getting hired.
The column FROM_P2_TO_P1 shows an example where start point of the window is before the current row and end point of the window is before the current row. This is a 3 row window and the count remains constant after it has got 3 previous rows.
The column FROM_F1_TO_F3 shows an example where start point of the window is after the current row and end point of the window is after the current row. This is a reverse of the previous column. Note how the count declines during the end.

RANGE Windows


For RANGE windows the general syntax is same as that of ROW:
Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE BETWEEN <start_expr> AND <end_expr>)
or
Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE [<start_expr> PRECEDING or UNBOUNDED PRECEDING]

For <start_expr> or <end_expr> we can use UNBOUNDED PECEDING, CURRENT ROW or <sql_expr> PRECEDING or FOLLOWING. However for RANGE type windows <sql_expr> must evaluate to value compatible with ORDER BY expression <expr1>.
<sql_expr> is a logical offset. It must be a constant or expression that evaluates to a positive numeric value or an interval literal. Only one ORDER BY expression is allowed.
If <sql_expr> evaluates to a numeric value, then the ORDER BY expr must be a NUMBER or DATE datatype. If <sql_expr> evaluates to an interval value, then the ORDER BY expr must be a DATE datatype.
Note the example (Query-11) below which uses RANGE windowing. The important thing here is that the size of the window in terms of the number of records can vary.

-- For each employee give the count of employees getting half more that their 
-- salary and also the count of employees in the departments 20 and 30 getting half 
-- less than their salary.
 
SELECT deptno, empno, sal,
Count(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) CNT_LT_HALF,
COUNT(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) CNT_MT_HALF
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, sal

 DEPTNO  EMPNO   SAL CNT_LT_HALF CNT_MT_HALF
------- ------ ----- ----------- -----------
     20   7369   800           0           3
     20   7876  1100           0           3
     20   7566  2975           2           0
     20   7788  3000           2           0
     20   7902  3000           2           0
     30   7900   950           0           3
     30   7521  1250           0           1
     30   7654  1250           0           1
     30   7844  1500           0           1
     30   7499  1600           0           1
     30   7698  2850           3           0

11 rows selected.
Query-11 (RANGE type windowing example)


Order of computation and performance tips


Defining the PARTITOIN BY and ORDER BY clauses on indexed columns (ordered in accordance with the PARTITION CLAUSE and then the ORDER BY clause in analytic function) will provide optimum performance. For Query-5, for example, a composite index on (deptno, hiredate) columns will prove effective.
It is advisable to always use CBO for queries using analytic functions. The tables and indexes should be analyzed and optimizer mode should be CHOOSE.
Even in absence of indexes analytic functions provide acceptable performance but need to do sorting for computing partition and order by clause. If the query contains multiple analytic functions, sorting and partitioning on two different columns should be avoided if they are both not indexed.

Conclusion


The aim of this article is not to make the reader try analytic functions forcibly in every other complex SQL. It is meant for a SQL coder, who has been avoiding analytic functions till now, even in complex analytic queries and reinventing the same feature much painstakingly by native SQL and join query. Its job is done if such a person finds analytic functions clear, understandable and usable after going through the article, and starts using them.

Thursday, March 15, 2012

Oracle WMSYS Functions

conn wmsys/wmsys


WM_CONCAT 
Cross-tabulates a comma delimited list 


WM_CONCAT(p1 IN VARCHAR2) RETURN VARCHAR2 


CREATE TABLE t (
col1 VARCHAR2(5),
col2 VARCHAR2(20));

INSERT INTO t VALUES (111, 'This');
INSERT INTO t VALUES (111, 'is');
INSERT INTO t VALUES (111, 'a');
INSERT INTO t VALUES (111, 'test');
INSERT INTO t VALUES (222, 'This is not');

SELECT * FROM t;

col concat format a40

SELECT col1, wmsys.wm_concat(col2) CONCAT
FROM t
GROUP BY col1;

SELECT col1, TRANSLATE(wmsys.wm_concat(col2), 'A,', 'A ') CONCAT
FROM t
GROUP BY col1; 







WM_CONTAINS 


Checks if the first period contains the second period 


WM_CONTAINS(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN INTEGER; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_CONTAINS';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_CONTAINS') FROM dual;

set serveroutput on

DECLARE
 retval INTEGER;
 p1b    DATE := TO_DATE('01-JAN-2010');
 p1e    DATE := TO_DATE('05-JAN-2010');
 p2b    DATE := TO_DATE('01-JAN-2010');
 p2e    DATE := TO_DATE('03-JAN-2010');
BEGIN
  SELECT WM_CONTAINS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);

  p1b := TO_DATE('02-JAN-2010');
  p1e := TO_DATE('06-JAN-2010');

  SELECT WM_CONTAINS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);  
END;
/ 





WM_EQUALS 


Checks if two periods are equal (that is, their start and end times are the same) 


WM_EQUALS(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN INTEGER; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_EQUALS';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_EQUALS') FROM dual;

set serveroutput on

DECLARE
 retval INTEGER;
 p1b    DATE := TO_DATE('01-JAN-2010');
 p1e    DATE := TO_DATE('03-JAN-2010');
 p2b    DATE := TO_DATE('01-JAN-2010');
 p2e    DATE := TO_DATE('03-JAN-2010');
BEGIN
  SELECT WM_EQUALS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);

  p1b := TO_DATE('01-JAN-2010');
  p1e := TO_DATE('04-JAN-2010');

  SELECT WM_EQUALS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);  
END; 

/


WM_GREATERTHAN 


Checks if the start of the first period is greater than (that is, later than) the end of the second period 


WM_GREATERTHAN(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN INTEGER; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_GREATERTHAN';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_GREATERTHAN') FROM dual;

set serveroutput on

DECLARE
 retval INTEGER;
 p1b    DATE := TO_DATE('01-FEB-2010');
 p1e    DATE := TO_DATE('03-FEB-2010');
 p2b    DATE := TO_DATE('01-JAN-2010');
 p2e    DATE := TO_DATE('31-JAN-2010');
BEGIN
  SELECT WM_GREATERTHAN(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);

  p2b := TO_DATE('02-FEB-2010');
  p2e := TO_DATE('31-DEC-2010');

  SELECT WM_GREATERTHAN(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);  
END;
/ 



WM_INTERSECTION 


Returns the intersection of the two periods, that is, the time range common to both periods 


WM_INTERSECTION(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN WM_PERIOD; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_INTERSECTION';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_INTERSECTION') FROM dual;

set serveroutput on

DECLARE
 retval WM_PERIOD;
 p1b    DATE := TO_DATE('01-JAN-2010');
 p1e    DATE := TO_DATE('10-JAN-2010');
 p2b    DATE := TO_DATE('08-JAN-2010');
 p2e    DATE := TO_DATE('12-JAN-2010');
BEGIN
  SELECT WM_INTERSECTION(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval.validfrom);
  dbms_output.put_line(retval.validtill);
END;
/ 



WM_LDIFF 


Returns the difference between the two periods on the left (that is, earlier in time) 


WM_LDIFF(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN WM_PERIOD; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_LDIFF';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_LDIFF') FROM dual;

set serveroutput on

DECLARE
 retval WM_PERIOD;
 p1b    DATE := TO_DATE('01-NOV-2010');
 p1e    DATE := TO_DATE('31-DEC-2010');
 p2b    DATE := TO_DATE('29-NOV-2010');
 p2e    DATE := TO_DATE('01-DEC-2010');
BEGIN
  SELECT WM_LDIFF(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval.validfrom);
  dbms_output.put_line(retval.validtill);
END;
/ 



WM_RDIFF 


Returns the difference between the two periods on the right (that is, later in time 


WM_RDIFF(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN WM_PERIOD; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_RDIFF';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_RDIFF') FROM dual;

set serveroutput on

DECLARE
 retval WM_PERIOD;
 p1b    DATE := TO_DATE('01-JAN-2010');
 p1e    DATE := TO_DATE('31-DEC-2010');
 p2b    DATE := TO_DATE('12-JAN-2009');
 p2e    DATE := TO_DATE('16-JAN-2010');
BEGIN
  SELECT WM_RDIFF(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval.validfrom);
  dbms_output.put_line(retval.validtill);
END;
/ 



WM_LESSTHAN 


Checks if the end of the first period is less than (that is, earlier than) the start of the second period 


WM_LESSTHAN(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN INTEGER; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_LESSTHAN';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_LESSTHAN') FROM dual;

set serveroutput on

DECLARE
 retval INTEGER;
 p1b    DATE := TO_DATE('01-JAN-2010');
 p1e    DATE := TO_DATE('02-JAN-2010');
 p2b    DATE := TO_DATE('03-JAN-2010');
 p2e    DATE := TO_DATE('16-JAN-2010');
BEGIN
  SELECT WM_LESSTHAN(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);

  p1b := TO_DATE('04-JAN-2010');
  p1e := TO_DATE('10-JAN-2010');

  SELECT WM_LESSTHAN(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);  
END;
/ 





WM_MEETS 


Checks if the end of the first period is the start of the second period 


WM_MEETS(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN INTEGER; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_MEETS';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_OVERLAPS') FROM dual;

set serveroutput on

DECLARE
 retval INTEGER;
 p1b    DATE := TO_DATE('01-JAN-2010');
 p1e    DATE := TO_DATE('03-JAN-2010');
 p2b    DATE := TO_DATE('03-JAN-2010');
 p2e    DATE := TO_DATE('06-JAN-2010');
BEGIN
  SELECT WM_MEETS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);

  p1b := TO_DATE('01-JAN-2010');
  p1e := TO_DATE('04-JAN-2010');

  SELECT WM_MEETS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);  
END;
/ 



WM_OVERLAPS 


Checks if two periods overlap 


WM_OVERLAPS(p1 IN WM_PERIOD, p2 IN WM_PERIOD) RETURN NUMBER; 


SELECT number_of_binds
FROM user_operators
WHERE owner = 'WMSYS'
AND operator_name = 'WM_OVERLAPS';

set long 1000000

SELECT dbms_metadata.get_ddl('OPERATOR', 'WM_OVERLAPS') FROM dual;

set serveroutput on

DECLARE
 retval INTEGER;
 p1b    DATE := TO_DATE('01-JAN-2010');
 p1e    DATE := TO_DATE('31-JAN-2010');
 p2b    DATE := TO_DATE('31-DEC-2009');
 p2e    DATE := TO_DATE('02-JAN-2010');
BEGIN
  SELECT WM_OVERLAPS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);

  p1b := TO_DATE('03-JAN-2010');
  p1e := TO_DATE('31-JAN-2010');

  SELECT WM_OVERLAPS(WM_PERIOD(p1b, p1e), WM_PERIOD(p2b, p2e))
  INTO retval
  FROM dual;

  dbms_output.put_line(retval);  
END;
/ 






Oracle Escape Characters

Oracle allows the assignment of special escape characters to tell Oracle that the character is interpreted literally.  Certain characters ...