Tuesday, September 20, 2011

Rows to Columns

Rows to Columns


This section presents techniques for transforming rows into columns. Three different types of transformations are covered - aggregating, transposing, and distributing.

Aggregating

Aggregating refers to transformations that take values like these
GROUP_KEY  PIVOT_KEY         VAL
---------- ---------- ----------
G0         P0                 10
G0         P1                  5
G0         P1                  3
G0         P2                 20
G0         P2                  6
G1         P1                  9
G1         P1         (null)
G1         P2         (null)
G1         P2                  5
G2         (null)              4
G2         (null)              7
G2         (null)             10
(null)     P0                 15
(null)     P2         (null)
(null)     (null)     (null)
 
and aggregate them into a crosstab matrix like this.
GROUP_KEY      P0_SUM     P1_SUM     P2_SUM UNASSIGNED_SUM
---------- ---------- ---------- ---------- --------------
G0                 10          8         26
G1                             9          5
G2                                                      21
(null)             15
 
This transformation is commonly referred to as "pivoting" or creating a "crosstab" report.

Transposing

Transposing refers to transformations that take values like these
GROUP_KEY  PIVOT_KEY  VAL
---------- ---------- ----------
G0         P0         a
G0         P1         A
G0         P2         1
G0         (null)     xyz
G1         P0         b
G1         P1         B
G1         P2         (null)
(null)     P0         c
(null)     P2         3
(null)     (null)     (null)
 
and rearrange them into a matrix like this.
GROUP_KEY  P0         P1         P2         UNASSIGNED
---------- ---------- ---------- ---------- ----------
G0         a          A          1          xyz
G1         b          B
(null)     c                     3
 
This transformation is also commonly referred to as "pivoting" or creating a "crosstab" report.

Distributing

Distributing refers to transformations that take values like these
GROUP_KEY  VAL
---------- ----------
G8         a
G8         b
G8         c
G8         d
G8         e
G8         f
G8         g
G8         h
 
and rearrange them over an arbitrary number of columns like as in result, which spreads values across three columns ordered from left to right
GROUP_KEY  COL_0      COL_1      COL_2
---------- ---------- ---------- ----------
G8         a          b          c
G8         d          e          f
G8         g          h
 
or this result, which spreads values across two columns ordered from top to bottom (aka "newspaper column style").
GROUP_KEY  COL_0      COL_1
---------- ---------- ----------
G8         a          e
G8         b          f
G8         c          g
G8         d          h                
 

Aggregating Values (11g)
This topic demonstrates how to take values from multiple rows under a single column, like those in VAL below
GROUP_KEY  PIVOT_KEY         VAL
---------- ---------- ----------
G0         P0                 10
G0         P1                  5
G0         P1                  3
G0         P2                 20
G0         P2                  6

G1         P1                  9
G1         P1         (null)
G1         P2         (null)
G1         P2                  5

G2         (null)              4
G2         (null)              7
G2         (null)             10

(null)     P0                 15
(null)     P2         (null)
(null)     (null)     (null)

and display aggregate results in a matrix, like this one which displays VAL sums in each cell.
GROUP_KEY      P0_SUM     P1_SUM     P2_SUM UNASSIGNED_SUM
---------- ---------- ---------- ---------- --------------
G0                 10          8         26
G1                             9          5
G2                                                      21
(null)             15

Non-null Pivot Keys
For the non-null pivot keys (P0, P1, and P2) the result is easily obtained using the SELECT command's PIVOT clause, which was first introduced in Oracle 11g for exactly these types of queries.
select
  group_key ,
  p0_sum ,
  p1_sum ,
  p2_sum
from
  t2 pivot
    ( sum(val)
      for pivot_key in
        (
          'P0' as p0_sum ,
          'P1' as p1_sum ,
          'P2' as p2_sum
        )
    )
order by
  group_key
;

GROUP_KEY      P0_SUM     P1_SUM     P2_SUM
---------- ---------- ---------- ----------
G0                 10          8         26
G1                             9          5
G2
(null)             15

Note how the query does not use a GROUP BY clause even though it includes the aggregate function SUM(). This is because with pivot queries Oracle implicitly groups by any column not in the PIVOT clause, e.g. GROUP_KEY, plus each value specified in the IN clause, e.g. 'P0', 'P1', 'P2'.

NULL Pivot Keys
To generate the UNASSIGNED_SUM column for values with null pivot keys our first inclination would be to use syntax such as this
select
  group_key ,
  unassigned_sum
from
  t2 pivot
    ( sum(val)
      for pivot_key in
        ( NULL AS UNASSIGNED_SUM )
    )
order by
  group_key
;

GROUP_KEY  UNASSIGNED_SUM
---------- --------------
G0
G1
G2
(null)

but, as typically happens with nulls, the solution is not so easy. As we can see the null VAL values in our table are not considered a match to the null expression in the PIVOT IN clause.
Our next attempt proves equally fruitless.
select
  group_key ,
  unassigned_sum
from
  t2 pivot
    ( sum(val)
      for NVL(PIVOT_KEY,'%') in
        ( '%' as unassigned_sum )
    )
order by
  group_key
;
      for NVL(PIVOT_KEY,'%') in
             *
ERROR at line 7:
ORA-01738: missing IN keyword



The error is caused by the fact that the FOR clause only accepts columns, not expressions.
To generate meaningful results we need to resort to a little inline view ugliness.
select
  group_key ,
  p0_sum ,
  p1_sum ,
  p2_sum ,
  unassigned_sum
from
  ( SELECT GROUP_KEY, NVL(PIVOT_KEY,'%') AS PIVOT_KEY, VAL FROM T2 ) pivot
    ( sum(val)
      for pivot_key in
        (
          'P0' as p0_sum ,
          'P1' as p1_sum ,
          'P2' as p2_sum ,
          '%'  as unassigned_sum
        )
    )
order by
  group_key
;

GROUP_KEY      P0_SUM     P1_SUM     P2_SUM UNASSIGNED_SUM
---------- ---------- ---------- ---------- --------------
G0                 10          8         26
G1                             9          5
G2                                                      21
(null)             15




Multiple Aggregates
If needed we can compute multiple aggregates, e.g. SUM and MAX, easily using PIVOT.
select
  group_key ,
  p1_sum ,
  p1_max ,
  p2_sum ,
  p2_max
from
  t2 pivot
    ( SUM(VAL) AS SUM ,
      MAX(VAL) AS MAX
      for pivot_key in
        (
          'P1' as p1 ,
          'P2' as p2
        )
    )
order by
  group_key
;

GROUP_KEY      P1_SUM     P1_MAX     P2_SUM     P2_MAX
---------- ---------- ---------- ---------- ----------
G0                  8          5         26         20
G1                  9          9          5          5
G2
(null)

The query above also demonstrates that when both a pivot key alias (e.g. "as p1") and an aggregate alias (e.g. "AS SUM") are specified the resulting column names take the following form.
(pivot key alias)_(aggregate function alias).
                                                                                          

Aggregating Values

This topic demonstrates how to take values from multiple rows under a single column, like those in VAL below
GROUP_KEY  PIVOT_KEY         VAL
---------- ---------- ----------
G0         P0                 10
G0         P1                  5
G0         P1                  3
G0         P2                 20
G0         P2                  6
G1         P1                  9
G1         P1         (null)
G1         P2         (null)
G1         P2                  5
G2         (null)              4
G2         (null)              7
G2         (null)             10
(null)     P0                 15
(null)     P2         (null)
(null)     (null)     (null)
 
and display aggregate results in a matrix, like this one which displays VAL sums in each cell.
GROUP_KEY      P0_SUM     P1_SUM     P2_SUM UNASSIGNED_SUM
---------- ---------- ---------- ---------- --------------
G0                 10          8         26
G1                             9          5
G2                                                      21
(null)             15
 
In Oracle 10g and earlier versions the PIVOT command used in the previous topic is not available. Fortunately we can easily achieve the desired results in any version of Oracle using the DECODE function instead.
column group_key null '(null)'
select
  group_key ,
  sum( decode( pivot_key, 'P0', val, null ) ) as p0_sum ,
  sum( decode( pivot_key, 'P1', val, null ) ) as p1_sum ,
  sum( decode( pivot_key, 'P2', val, null ) ) as p2_sum ,
  sum( decode( pivot_key, null, val, null ) ) as unassigned_sum
from t2
group by group_key
order by group_key
;
 
GROUP_KEY      P0_SUM     P1_SUM     P2_SUM UNASSIGNED_SUM
---------- ---------- ---------- ---------- --------------
G0                 10          8         26
G1                             9          5
G2                                                      21
(null)             15
 
Aggregate functions other than SUM can be used to provide a different analysis. The following examples display the average and maximum VAL's for each week/day combination.
select
  group_key ,
  AVG( decode( pivot_key, 'P0', val, null ) ) as AVG_P0 ,
  AVG( decode( pivot_key, 'P1', val, null ) ) as AVG_P1 ,
  AVG( decode( pivot_key, 'P2', val, null ) ) as AVG_P2 ,
  AVG( decode( pivot_key, null, val, null ) ) as AVG_UNASSIGNED
from t2
group by group_key
order by group_key
;
 
GROUP_KEY      AVG_P0     AVG_P1     AVG_P2 AVG_UNASSIGNED
---------- ---------- ---------- ---------- --------------
G0                 10          4         13
G1                             9          5
G2                                                       7
(null)             15
 
select
  group_key ,
  MAX( decode( pivot_key, 'P0', val, null ) ) as MAX_P0 ,
  MAX( decode( pivot_key, 'P1', val, null ) ) as MAX_P1 ,
  MAX( decode( pivot_key, 'P2', val, null ) ) as MAX_P2 ,
  MAX( decode( pivot_key, null, val, null ) ) as MAX_UNASSIGNED
from t2
group by group_key
order by group_key
;
 
GROUP_KEY      MAX_P0     MAX_P1     MAX_P2 MAX_UNASSIGNED
---------- ---------- ---------- ---------- --------------
G0                 10          5         20
G1                             9          5
G2                                                      10
(null)             15
 

How it Works

This solution forms the basis for the remaining topics in this section. It is therefore important to get a solid grasp on how it works before continuing to the next topic. To do this we will consider the following queries which illustrate conceptual steps in building a MAX value solution.
First let's look at a simple query of base table columns.
break on group_key skip 1 duplicates
select group_key, pivot_key, val
from   t2
order by group_key, pivot_key ;
 
GROUP_KEY  PIVOT_KEY         VAL
---------- ---------- ----------
G0         P0                 10
G0         P1                  5
G0         P1                  3
G0         P2                 20
G0         P2                  6
G1         P1                  9
G1         P1
G1         P2
G1         P2                  5
G2                             4
G2                             7
G2                            10
(null)     P0                 15
(null)     P2
(null)
 
Next we distribute VAL values across separate columns using the DECODE function.
select
  group_key ,
  pivot_key ,
  DECODE( pivot_key, 'P0', val, null ) as VAL_P0 ,
  DECODE( pivot_key, 'P1', val, null ) as VAL_P1 ,
  DECODE( pivot_key, 'P2', val, null ) as VAL_P2 ,
  DECODE( pivot_key, null, val, null ) as VAL_UNASSIGNED
from t2
order by group_key, pivot_key ;
 
GROUP_KEY  PIVOT_KEY      VAL_P0     VAL_P1     VAL_P2 VAL_UNASSIGNED
---------- ---------- ---------- ---------- ---------- --------------
G0         P0                 10
G0         P1                             5
G0         P1                             3
G0         P2                                       20
G0         P2                                        6
G1         P1                             9
G1         P1
G1         P2
G1         P2                                        5
G2                                                                  4
G2                                                                  7
G2                                                                 10
(null)     P0                 15
(null)     P2
(null)
 
Finally we can collapse the values under the VAL_P0, VAL_P1, VAL_P2, and VAL_UNASSIGNED columns using a GROUP BY clause plus an aggregate function like MAX.
clear breaks
select
  group_key ,
  MAX( decode( pivot_key, 'P0', val, null ) ) as MAX_P0 ,
  MAX( decode( pivot_key, 'P1', val, null ) ) as MAX_P1 ,
  MAX( decode( pivot_key, 'P2', val, null ) ) as MAX_P2 ,
  MAX( decode( pivot_key, null, val, null ) ) as MAX_UNASSIGNED
from t2
GROUP BY GROUP_KEY
order by group_key ;
 
GROUP_KEY      MAX_P0     MAX_P1     MAX_P2 MAX_UNASSIGNED
---------- ---------- ---------- ---------- --------------
G0                 10          5         20
G1                             9          5
G2                                                      10
(null)             15
 



No comments:

Post a Comment

Oracle Escape Characters

Oracle allows the assignment of special escape characters to tell Oracle that the character is interpreted literally.  Certain characters ...