Tuesday, September 20, 2011

Rows to Strings in Different Methods

In this section show how to collect values from multiple rows into a single, delimited string. This process is commonly known as "string aggregation". In practice it takes a table that looks like this
GROUP_KEY  VAL
---------- ----------
Group 1    a
Group 2    a
Group 2    b
Group 3    a
Group 3    b
Group 3    c
Group 4    a
Group 4    a
Group 4    b                   
Group 4    b
Group 5    a
Group 5    b
Group 5    d
Group 5    e
Group 5    (null)
Group 6    (null)
Group 7    (null)
Group 7    (null)
Group 8    a
Group 8    a
Group 8    b
Group 8    b
Group 8    (null)
Group 8    (null)
(null)     (null)
 
and yields a set of strings that look like this.
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    ,
Group 8    a,a,b,b,
(null)     (null)
 
Over the years many techniques for performing string aggregation have appeared. They all have their strengths and weaknesses so a few different methods will be examined here. Of the methods I've found on the web to-date we will explore four popular approaches.

Hierarchical Method
This  shows how to collect values from multiple rows into a single, comma delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY  VAL
---------- ----------
Group 3    a
Group 3    b
Group 3    c

and yields a string like this.
GROUP_KEY  STRING
---------- ---------------
Group 3    a,b,c

Oracle's hierarchical features will be used to achieve this result. Other tutorials in this section present alternate approaches. For tutorials on understanding Oracle's hierarchical features see SQL Features Tutorials: Hierarchical Data.
First, let's take a look at the test data.
desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 GROUP_KEY                              VARCHAR2(10)
 VAL                                    VARCHAR2(10)

set null "(null)"

break on group_key skip 1 duplicates

select * from t order by group_key, val ;

GROUP_KEY  VAL
---------- ----------
Group 1    a

Group 2    a
Group 2    b

Group 3    a
Group 3    b
Group 3    c

Group 4    a
Group 4    a
Group 4    b
Group 4    b

Group 5    a
Group 5    b
Group 5    d
Group 5    e
Group 5    (null)

Group 6    (null)

Group 7    (null)
Group 7    (null)

Group 8    a
Group 8    a
Group 8    b
Group 8    b
Group 8    (null)
Group 8    (null)

(null)     (null)

Now here is the solution.
clear breaks

select
  tb.group_key ,
  substr
  ( sys_connect_by_path( tb.val, ',' ) ,
    2
  ) as string
from
  ( select
      group_key ,
      val ,
      row_number() over
        ( partition by group_key
          order by val
        )
        as val_index
    from
      t
  ) tb
where
  connect_by_isleaf = 1
connect by
  val_index = prior val_index + 1 and
  group_key = prior group_key
start with
  val_index = 1
;

GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    ,
Group 8    a,a,b,b,,
(null)     (null)

To control the order of the delimited values, change the "ORDER BY" clause in the ROW_NUMBER() function.
select
  tb.group_key ,
  substr
  ( sys_connect_by_path( tb.val, ',' ) ,
    2
  ) as string
from
  ( select
      group_key ,
      val ,
      row_number() over
        ( partition by group_key
          ORDER BY VAL DESC       -- sort in descending order this time
        )
        as val_index
    from
      t
  ) tb
where
  connect_by_isleaf = 1
connect by
  val_index = prior val_index + 1 and
  group_key = prior group_key
start with
  val_index = 1
;

GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    b,a
Group 3    c,b,a
Group 4    b,b,a,a
Group 5    ,e,d,b,a
Group 6    (null)
Group 7    ,
Group 8    ,,b,b,a,a
(null)     (null)

To filter out duplicate VAL values, like those in Group 4, use a subquery as indicated below.
select
  tb.group_key ,
  substr
  ( sys_connect_by_path( tb.val, ',' ) ,
    2
  ) as string
from
  ( select
      group_key ,
      val ,
      row_number() over
        ( partition by group_key
          order by val
        )
        as val_index
    from
      ( SELECT DISTINCT GROUP_KEY, VAL FROM T ) tc  -- provides distinct values
  ) tb
where
  connect_by_isleaf = 1
connect by
  val_index = prior val_index + 1 and
  group_key = prior group_key
start with
  val_index = 1
;

GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    (null)
Group 8    a,b,
(null)     (null)

To filter null values out of your results, like the ones in Groups 5 and 6, include a WHERE clause in the FROM clause subquery.
select
  tb.group_key ,
  substr
  ( sys_connect_by_path( tb.val, ',' ) ,
    2
  ) as string
from
  ( select
      group_key ,
      val ,
      row_number() over
        ( partition by group_key
          order by val
        )
        as val_index
    from
      t
    WHERE
      VAL IS NOT NULL  -- filters out null VAL values
  ) tb
where
  connect_by_isleaf = 1
connect by
  val_index = prior val_index + 1 and
  group_key = prior group_key
start with
  val_index = 1
;

GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e
Group 8    a,a,b,b

Object Method

This tutorial shows how to collect values from multiple rows into a single, comma delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY  VAL
---------- ----------
Group 3    a
Group 3    b
Group 3    c
 
and yields a string like this.
GROUP_KEY  STRING
---------- ---------------
Group 3    a,b,c
 
Oracle's object relational features will be used to achieve this result. Other tutorials in this section present alternate approaches.

Prerequisites

This method uses the object relational SQL COLLECT function which was introduced in Oracle 10g. It therefore will not work in earlier versions. It also requires a VARCHAR2 table type and a custom function to work. The VARCHAR2 table type is created in the Setup topic for this section.
desc varchar2_table_type
 varchar2_table_type TABLE OF VARCHAR2(10)
 
The function is created as follows.
create function nested_table_to_csv
( p_table     in varchar2_table_type ,
  p_delimiter in varchar2 default ','
)
return varchar2
is
  ----------------------------------------------------------------------
  -- Usage:
  --
  -- p_table
  --   - contains the table you wish to convert to a string
  --
  -- p_delimiter
  --   - contains the varchar2 string you want to see between each
  --     value in the returned string
  --   - default value is ',' (comma)
  --   - set it to null if you don't want any delimiters
  --   - e.g. p_delimiter => '/' returns
  --
  --       a/b/c/d
  ----------------------------------------------------------------------
  i        binary_integer  := p_table.first ;
  v_return varchar2(32767) := null ;
begin
  while i is not null
  loop
    if i = p_table.first then
      v_return := p_table(i);
    else
      v_return := v_return || p_delimiter || p_table(i);
    end if;
    i := p_table.next(i) ;
  end loop;
  return( v_return );
end nested_table_to_csv;
/
show errors
No errors.
 

Test Data

desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 GROUP_KEY                              VARCHAR2(10)
 VAL                                    VARCHAR2(10)
set null "(null)"
break on group_key skip 1 duplicates
select * from t order by group_key, val ;
 
GROUP_KEY  VAL
---------- ----------
Group 1    a
Group 2    a
Group 2    b
Group 3    a
Group 3    b
Group 3    c
Group 4    a
Group 4    a
Group 4    b
Group 4    b
Group 5    a
Group 5    b
Group 5    d
Group 5    e
Group 5    (null)
Group 6    (null)
Group 7    (null)
Group 7    (null)
Group 8    a
Group 8    a
Group 8    b
Group 8    b
Group 8    (null)
Group 8    (null)
(null)     (null)
 

The Solution

We can now run the solution, which uses the type and function we created above in conjunction with COLLECT .
clear breaks
select
  group_key ,
  NESTED_TABLE_TO_CSV
    ( cast( COLLECT(val) as VARCHAR2_TABLE_TYPE )
    ) as string
from
  t
group by
  group_key
;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e
Group 6    (null)
Group 7    (null)
Group 8    a,a,b,b
(null)     (null)
 
There are two important things to note about this solution. First, unlike other solutions we have seen so far, these results do not include empty values in the strings for Groups 5 and 8. As well, the string for "Group 7" only contains one null, not two. This is because COLLECT is an aggregate function and all aggregate functions, except COUNT(*) and GROUPING, ignore null values.
Second, since COLLECT returns a nested table which is, by definition, an unordered set of data we cannot guarantee the values in our delimited string will appear in any given order. If you need ordered values you can add sorting logic to the NESTED_TABLE_TO_CSV function.
To filter out duplicate values, like those in Groups 4 and 8, we simply add the DISTINCT argument to the COLLECT function.
select
  group_key ,
  nested_table_to_csv
    ( cast( collect( DISTINCT val ) as varchar2_table_type )
    ) as string
from
  t
group by
  group_key
;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b
Group 5    a,b,d,e
Group 6    (null)
Group 7    (null)
Group 8    a,b
(null)     (null)
 

Tom Kyte's STRAGG

This tutorial shows how to collect values from multiple rows into a single, comma delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY  VAL
---------- ----------
Group 3    a
Group 3    b
Group 3    c
 
and yields a string like this.
GROUP_KEY  STRING
---------- ---------------
Group 3    a,b,c
 
A technique originally published by Tom Kyte at Ask Tom "Function to concatenate output" employing a user-defined function called STRAGG will be demonstrated. Other tutorials in this section discuss alternate approaches.
While Tom Kyte himself says he no longer uses STRAGG (see The Tom Kyte Blog:Stringing them up...) some people still prefer it over other solutions. STRAGG is also a good case study for anyone interested in using the Extensibility Framework to write their own aggregate functions.

Prerequisites

create or replace type stragg_type as object
(
  string varchar2(4000),
  static function ODCIAggregateInitialize
    ( sctx in out stragg_type )
    return number ,
  member function ODCIAggregateIterate
    ( self  in out stragg_type ,
      value in     varchar2
    ) return number ,
  member function ODCIAggregateTerminate
    ( self        in  stragg_type,
      returnvalue out varchar2,
      flags in number
    ) return number ,
  member function ODCIAggregateMerge
    ( self in out stragg_type,
      ctx2 in     stragg_type
    ) return number
);
/
create or replace type body stragg_type
is
  static function ODCIAggregateInitialize
  ( sctx in out stragg_type )
  return number
  is
  begin
    sctx := stragg_type( null ) ;
    return ODCIConst.Success ;
  end;
  member function ODCIAggregateIterate
  ( self  in out stragg_type ,
    value in     varchar2
  ) return number
  is
  begin
    self.string := self.string || ',' || value ;
    return ODCIConst.Success;
  end;
  member function ODCIAggregateTerminate
  ( self        in  stragg_type ,
    returnvalue out varchar2 ,
    flags       in  number
  ) return number
  is
  begin
    returnValue := ltrim( self.string, ',' );
    return ODCIConst.Success;
  end;
  member function ODCIAggregateMerge
  ( self in out stragg_type ,
    ctx2 in     stragg_type
  ) return number
  is
  begin
    self.string := self.string || ctx2.string;
    return ODCIConst.Success;
  end;
end;
/
create or replace function stragg
  ( input varchar2 )
  return varchar2
  deterministic
  parallel_enable
  aggregate using stragg_type
;
/
 

Test Data

desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 GROUP_KEY                              VARCHAR2(10)
 VAL                                    VARCHAR2(10)
set null "(null)"
break on group_key skip 1 duplicates
select * from t order by group_key, val ;
 
GROUP_KEY  VAL
---------- ----------
Group 1    a
Group 2    a
Group 2    b
Group 3    a
Group 3    b
Group 3    c
Group 4    a
Group 4    a
Group 4    b
Group 4    b
Group 5    a
Group 5    b
Group 5    d
Group 5    e
Group 5    (null)
Group 6    (null)
Group 7    (null)
Group 7    (null)
Group 8    a
Group 8    a
Group 8    b
Group 8    b
Group 8    (null)
Group 8    (null)
(null)     (null)
 

Aggregate Solution

select
  group_key ,
  STRAGG( VAL ) as string
from
  t
group by
  group_key
;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b,b,a
Group 5    a,b,d,e
Group 6    (null)
Group 7    (null)
Group 8    a,a,b,b
(null)     (null)
 
It is important to note that, unlike some other solutions we have seen, these results do not include empty values in the strings for Groups 5 and 8. As well, the string for "Group 7" only contains one null, not two. This is because STRAGG, and all other aggregate functions except COUNT(*) and GROUPING, ignore null values.
To only see distinct values in our result strings we can add the DISTINCT argument to the STRAGG function call. Note how the results for Groups 4 and 8 below differ from those above.
select
  group_key ,
  stragg( DISTINCT val ) as string
from
  t
group by
  group_key
;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b
Group 5    a,b,d,e
Group 6    (null)
Group 7    (null)
Group 8    a,b
(null)     (null)
 

Analytic Version

STRAGG can also be used as an analytical function without changing any of the underlying code.
break on group_key skip 1 duplicates
select
  group_key ,
  val ,
  stragg( val ) OVER
    (
      PARTITION BY GROUP_KEY
      ORDER BY VAL
      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    )
    as string
from
  t
;
 
GROUP_KEY  VAL        STRING
---------- ---------- ---------------
Group 1    a          a
Group 2    a          a,b
Group 2    b          a,b
Group 3    a          a,b,c
Group 3    b          a,b,c
Group 3    c          a,b,c
Group 4    a          a,a,b,b
Group 4    a          a,a,b,b
Group 4    b          a,a,b,b
Group 4    b          a,a,b,b
Group 5    a          a,b,d,e
Group 5    b          a,b,d,e
Group 5    d          a,b,d,e
Group 5    e          a,b,d,e
Group 5    (null)     a,b,d,e
Group 6    (null)     (null)
Group 7    (null)     (null)
Group 7    (null)     (null)
Group 8    a          a,a,b,b
Group 8    a          a,a,b,b
Group 8    b          a,a,b,b
Group 8    b          a,a,b,b
Group 8    (null)     a,a,b,b
Group 8    (null)     a,a,b,b
(null)     (null)     (null)
 
To change the sort order of values in the string simply change the ORDER BY VAL clause.
break on group_key skip 1 duplicates
select
  group_key ,
  val ,
  stragg( val ) OVER
    (
      partition by group_key
      order by val DESC
      range between unbounded preceding and unbounded following
    )
    as string
from
  t
;
 
GROUP_KEY  VAL        STRING
---------- ---------- ---------------
Group 1    a          a
Group 2    b          b,a
Group 2    a          b,a
Group 3    c          c,b,a
Group 3    b          c,b,a
Group 3    a          c,b,a
Group 4    b          b,b,a,a
Group 4    b          b,b,a,a
Group 4    a          b,b,a,a
Group 4    a          b,b,a,a
Group 5    (null)     e,d,b,a
Group 5    e          e,d,b,a
Group 5    d          e,d,b,a
Group 5    b          e,d,b,a
Group 5    a          e,d,b,a
Group 6    (null)     (null)
Group 7    (null)     (null)
Group 7    (null)     (null)
Group 8    (null)     b,b,a,a
Group 8    (null)     b,b,a,a
Group 8    b          b,b,a,a
Group 8    b          b,b,a,a
Group 8    a          b,b,a,a
Group 8    a          b,b,a,a
(null)     (null)     (null)

XML - CSV Method

This tutorial shows how to collect values from multiple rows into a single, CSV delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY  VAL
---------- ----------
Group 3    a
Group 3    b
Group 3    c
 
and yields a string like this one.
GROUP_KEY  STRING
---------- ---------------
Group 3    a,b,c
 
Oracle's XML features will be used to achieve this result.
In the XML Method tutorial we saw how to use the XMLAGG and XMLELEMENT functions to create an XML delimited string like this.
column string format a62
select
  group_key ,
  XMLAGG
    ( XMLELEMENT( "V", val )
    ).GETSTRINGVAL()
    as string
from
  t
group by
  group_key
;
 
GROUP_KEY  STRING
---------- --------------------------------------------------------------
Group 1    <V>a</V>
Group 2    <V>a</V><V>b</V>
Group 3    <V>a</V><V>b</V><V>c</V>
Group 4    <V>a</V><V>b</V><V>b</V><V>a</V>
Group 5    <V>a</V><V>b</V><V></V><V>e</V><V>d</V>
Group 6    <V></V>
Group 7    <V></V><V></V>
Group 8    <V>a</V><V>b</V><V>b</V><V></V><V></V><V>a</V>
(null)     <V></V>
 
By adding calls to EXTRACT and LTRIM we can convert this solution into one that gives us comma separated values.
select
  group_key ,
  LTRIM
  ( EXTRACT
    ( xmlagg( xmlelement( "V", ',' || val ) ),
      '/V/text()'
    ) ,
    ','
  ) as string
from
  t
group by
  group_key
;
 
GROUP_KEY  STRING
---------- --------------------------------------------------------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b,b,a
Group 5    a,b,,e,d
Group 6    (null)
Group 7    (null)
Group 8    a,b,b,,,a
(null)     (null)
 
To control the sort order of values within the string we can use XMLAGG's ORDER BY clause.
select
  group_key ,
  ltrim
  ( extract
    ( xmlagg
      ( xmlelement( "V", ',' || val )
        ORDER BY VAL DESC
      ),
      '/V/text()'
    ) ,
    ','
  ) as string
from
  t
group by
  group_key
;
 
GROUP_KEY  STRING
---------- --------------------------------------------------------------
Group 1    a
Group 2    b,a
Group 3    c,b,a
Group 4    b,b,a,a
Group 5    e,d,b,a
Group 6    (null)
Group 7    (null)
Group 8    b,b,a,a
(null)     (null)
 
As with the original solution, you can use a FROM clause subquery to produce distinct values and filter out null values if you need to.
select
  group_key ,
  ltrim
  ( extract
    ( xmlagg
      ( xmlelement( "V", ',' || val )
        order by val
      ),
      '/V/text()'
    ) ,
    ','
  ) as string
from
  ( SELECT DISTINCT GROUP_KEY, VAL -- produces distinct values
    FROM   T
    WHERE  VAL IS NOT NULL         -- filters our null values
  ) tb
group by
  group_key
;
 
GROUP_KEY  STRING
---------- --------------------------------------------------------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b
Group 5    a,b,d,e
Group 8    a,b
 

MAX DECODE Method

This tutorial shows how to collect values from multiple rows into a single, comma delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY  VAL
---------- ----------
Group 3    a
Group 3    b
Group 3    c
 
and yields a string like this.
GROUP_KEY  STRING
---------- ---------------
Group 3    a,b,c
 
The solution to follow is a good choice when the maximum number of values is known and small. See other solutions in this section when this is not the case.

Test Data

desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 GROUP_KEY                              VARCHAR2(10)
 VAL                                    VARCHAR2(10)
set null "(null)"
break on group_key skip 1 duplicates
select * from t order by group_key, val ;
 
GROUP_KEY  VAL
---------- ----------
Group 1    a
Group 2    a
Group 2    b
Group 3    a
Group 3    b
Group 3    c
Group 4    a
Group 4    a
Group 4    b
Group 4    b
Group 5    a
Group 5    b
Group 5    d
Group 5    e
Group 5    (null)
Group 6    (null)
Group 7    (null)
Group 7    (null)
Group 8    a
Group 8    a
Group 8    b
Group 8    b
Group 8    (null)
Group 8    (null)
(null)     (null)
 

Solution

In this solution, which uses the DECODE and MAX functions, we assume the total number of values ever collected will not exceed 6.
clear breaks
select
  group_key ,
  max( decode( val_number, 1 , val, null ) ) ||
  max( decode( val_number, 2 , ',' || val, null ) ) ||
  max( decode( val_number, 3 , ',' || val, null ) ) ||
  max( decode( val_number, 4 , ',' || val, null ) ) ||
  max( decode( val_number, 5 , ',' || val, null ) ) ||
  max( decode( val_number, 6 , ',' || val, null ) ) as string
from
  ( select
      group_key,
      row_number() over ( partition by group_key order by val ) as val_number ,
      val
    from t
  ) ta
group by group_key
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    ,
Group 8    a,a,b,b,,
(null)     (null)
 
To control the order of the delimited values, change the ORDER BY clause in the ROW_NUMBER() argument.
select
  group_key ,
  max( decode( val_number, 1 , val, null ) ) ||
  max( decode( val_number, 2 , ',' || val, null ) ) ||
  max( decode( val_number, 3 , ',' || val, null ) ) ||
  max( decode( val_number, 4 , ',' || val, null ) ) ||
  max( decode( val_number, 5 , ',' || val, null ) ) ||
  max( decode( val_number, 6 , ',' || val, null ) ) as string
from
  ( select
      group_key,
      row_number() over ( partition by group_key ORDER BY VAL DESC ) as val_number ,
      val
    from t
  ) t_a
group by group_key
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    b,a
Group 3    c,b,a
Group 4    b,b,a,a
Group 5    ,e,d,b,a
Group 6    (null)
Group 7    ,
Group 8    ,,b,b,a,a
(null)     (null)
 
To filter duplicate VAL values, like those in Groups 4, 7, and 8 from the end result we use an inline view in the FROM clause.
clear breaks
select
  group_key ,
  max( decode( val_number, 1 , val, null ) ) ||
  max( decode( val_number, 2 , ',' || val, null ) ) ||
  max( decode( val_number, 3 , ',' || val, null ) ) ||
  max( decode( val_number, 4 , ',' || val, null ) ) ||
  max( decode( val_number, 5 , ',' || val, null ) ) ||
  max( decode( val_number, 6 , ',' || val, null ) ) as string
from
  ( select
      group_key,
      row_number() over ( partition by group_key order by val ) as val_number ,
      val
    from ( SELECT DISTINCT GROUP_KEY, VAL FROM T ) T_A
  ) t_b
group by group_key
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    (null)
Group 8    a,b,
(null)     (null)
 
To filter null VAL values, like the ones in Groups 5 and higher from the end result we can add a WHERE clause to the query.
select
  group_key ,
  max( decode( val_number, 1 , val, null ) ) ||
  max( decode( val_number, 2 , ',' || val, null ) ) ||
  max( decode( val_number, 3 , ',' || val, null ) ) ||
  max( decode( val_number, 4 , ',' || val, null ) ) ||
  max( decode( val_number, 5 , ',' || val, null ) ) ||
  max( decode( val_number, 6 , ',' || val, null ) ) as string
from
  ( select
      group_key,
      row_number() over ( partition by group_key order by val ) as val_number ,
      val
    from t
    WHERE VAL IS NOT NULL
  ) ta
group by group_key
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e
Group 8    a,a,b,b

LEAD Method

This tutorial shows how to collect values from multiple rows into a single, comma delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY  VAL
---------- ----------
Group 3    a
Group 3    b
Group 3    c
 
and yields a string like this.
GROUP_KEY  STRING
---------- ---------------
Group 3    a,b,c
 
The solution to follow is a good choice when the maximum number of values is known and small. See other solutions in this section when this is not the case.

Test Data

desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 GROUP_KEY                              VARCHAR2(10)
 VAL                                    VARCHAR2(10)
set null "(null)"
break on group_key skip 1 duplicates
select * from t order by group_key, val ;
 
GROUP_KEY  VAL
---------- ----------
Group 1    a
Group 2    a
Group 2    b
Group 3    a
Group 3    b
Group 3    c
Group 4    a
Group 4    a
Group 4    b
Group 4    b
Group 5    a
Group 5    b
Group 5    d
Group 5    e
Group 5    (null)
Group 6    (null)
Group 7    (null)
Group 7    (null)
Group 8    a
Group 8    a
Group 8    b
Group 8    b
Group 8    (null)
Group 8    (null)
(null)     (null)
 

Solution

In this solution, which uses the LEAD analytic function, we assume the total number of values ever collected will not exceed 6.
clear breaks
select group_key, string
from
  (
    select
      group_key ,
      row_number() over ( partition by group_key order by val ) as position ,
      val ||
      lead( ','||val, 1 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 2 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 3 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 4 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 5 ) over ( partition by group_key order by val )
      as string
    from t
  )
where position = 1
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    ,
Group 8    a,a,b,b,,
(null)     (null)
 
To control the order of the delimited values, change the ORDER BY clauses.
select group_key, string
from
  (
    select
      group_key ,
      row_number() over ( partition by group_key ORDER BY VAL DESC) as position ,
      val ||
      lead( ','||val, 1 ) over ( partition by group_key ORDER BY VAL DESC) ||
      lead( ','||val, 2 ) over ( partition by group_key ORDER BY VAL DESC) ||
      lead( ','||val, 3 ) over ( partition by group_key ORDER BY VAL DESC) ||
      lead( ','||val, 4 ) over ( partition by group_key ORDER BY VAL DESC) ||
      lead( ','||val, 5 ) over ( partition by group_key ORDER BY VAL DESC)
      as string
    from t
  )
where position = 1
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    b,a
Group 3    c,b,a
Group 4    b,b,a,a
Group 5    ,e,d,b,a
Group 6    (null)
Group 7    ,
Group 8    ,,b,b,a,a
(null)     (null)
 
To filter duplicate VAL values, like those in Groups 4, 7, and 8 from the end result we use an inline view in the FROM clause.
clear breaks
select group_key, string
from
  (
    select
      group_key ,
      row_number() over ( partition by group_key order by val ) as position ,
      val ||
      lead( ','||val, 1 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 2 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 3 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 4 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 5 ) over ( partition by group_key order by val )
      as string
    from ( SELECT DISTINCT GROUP_KEY, VAL FROM T ) T_A
  )
where position = 1
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    (null)
Group 8    a,b,
(null)     (null)
 
To filter null VAL values, like the ones in Groups 5 and higher from the end result we can add a WHERE clause to the query.
select group_key, string
from
  (
    select
      group_key ,
      row_number() over ( partition by group_key order by val ) as position ,
      val ||
      lead( ','||val, 1 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 2 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 3 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 4 ) over ( partition by group_key order by val ) ||
      lead( ','||val, 5 ) over ( partition by group_key order by val )
      as string
    from t
    WHERE VAL IS NOT NULL
  )
where position = 1
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e
Group 8    a,a,b,b

MODEL Method 1

This tutorial shows how to collect values from multiple rows into a single, comma delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY  VAL
---------- ----------
Group 3    a
Group 3    b
Group 3    c
 
and yields a string like this.
GROUP_KEY  STRING
---------- ---------------
Group 3    a,b,c
 
While the solution to follow, which we will refer to as MODEL Method 1, is the simplest of the three MODEL solution presented in this section it unfortunately scales the worst when a large number of values is selected or very wide STRING values are produced (see MODEL Clause: MODEL Performance Tuning). MODEL Method 2 and MODEL Method 3 provide alternative MODEL techniques that scale better than MODEL Method 1, though they are more complex.

Test Data

desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 GROUP_KEY                              VARCHAR2(10)
 VAL                                    VARCHAR2(10)
set null "(null)"
break on group_key skip 1 duplicates
select * from t order by group_key, val ;
 
GROUP_KEY  VAL
---------- ----------
Group 1    a
Group 2    a
Group 2    b
Group 3    a
Group 3    b
Group 3    c
Group 4    a
Group 4    a
Group 4    b
Group 4    b
Group 5    a
Group 5    b
Group 5    d
Group 5    e
Group 5    (null)
Group 6    (null)
Group 7    (null)
Group 7    (null)
Group 8    a
Group 8    a
Group 8    b
Group 8    b
Group 8    (null)
Group 8    (null)
(null)     (null)
 

Solution

This solution uses the SQL MODEL clause which was introduced in Oracle 10g. If you are unfamiliar with MODEL you may wish to review the SQL Features Tutorials: MODEL Clause tutorial before proceeding.
clear breaks
select group_key, substr( string, 2 ) as string
from   t
model
  return updated rows
  partition by ( group_key )
  dimension by ( row_number() over (partition by group_key order by val) as position )
  measures     ( cast( val as varchar2(65) ) as string  ) -- Note 1
  rules
    upsert
    iterate( 6 ) -- Note 2
      until ( presentv(string[iteration_number+2],1,0) = 0 )
    ( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    ,
Group 8    a,a,b,b,,
(null)     (null)
 
Notes:
1.      65 = v * n + (n-1) where
o    v = maximum length of VAL values (this is 10 in our test table)
o    n = maximum number of rows with the same GROUP_KEY (6 in our test table)
2.      Use any number here greater than or equal to n.
To control the order of the delimited values, change the ORDER BY clause in the arguments for ROW_NUMBER().
select group_key, substr( string, 2 ) as string
from   t
model
  return updated rows
  partition by ( group_key )
  dimension by ( row_number() over (partition by group_key ORDER BY VAL DESC) as position )
  measures     ( cast( val as varchar2(65) ) as string  )
  rules
    upsert
    iterate( 6 )
      until ( presentv(string[iteration_number+2],1,0) = 0 )
    ( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    b,a
Group 3    c,b,a
Group 4    b,b,a,a
Group 5    ,e,d,b,a
Group 6    (null)
Group 7    ,
Group 8    ,,b,b,a,a
(null)     (null)
 
To filter duplicate VAL values, like those in Groups 4, 7, and 8 from the end result we use an inline view in the FROM clause.
select group_key, substr( string, 2 ) as string
from   ( SELECT DISTINCT GROUP_KEY, VAL FROM T ) T
model
  return updated rows
  partition by ( group_key )
  dimension by ( row_number() over (partition by group_key order by val) as position )
  measures     ( cast( val as varchar2(65) ) as string  )
  rules
    upsert
    iterate( 6 )
      until ( presentv(string[iteration_number+2],1,0) = 0 )
    ( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    (null)
Group 8    a,b,
(null)     (null)
 
To filter null VAL values, like the ones in Groups 5 and higher from the end result we can add a WHERE clause to the query.
select group_key, substr( string, 2 ) as string
from   t
WHERE  VAL IS NOT NULL
model
  return updated rows
  partition by ( group_key )
  dimension by ( row_number() over (partition by group_key order by val) as position )
  measures     ( cast( val as varchar2(65) ) as string  ) -- Note 1
  rules
    upsert
    iterate( 6 ) -- Note 2
      until ( presentv(string[iteration_number+2],1,0) = 0 )
    ( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e
Group 8    a,a,b,b
 
In the next tutorial, MODEL Method 1 - Explained, we look at how this technique works.

Gotchas

Distinct
If we attempt to produce distinct values in our strings using SELECT DISTINCT instead of an inline view we will be rewarded with incorrect results. This is because SELECT DISTINCT is evaluated after the MODEL clause is applied, not before.
select DISTINCT group_key, substr( string, 2 ) as string
from   t
model
  return updated rows
  partition by ( group_key )
  dimension by ( row_number() over (partition by group_key order by val) as position )
  measures     ( cast( val as varchar2(65) ) as string  )
  rules
    upsert
    iterate( 6 )
      until ( presentv(string[iteration_number+2],1,0) = 0 )
    ( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
 
GROUP_KEY  STRING
---------- ---------------
Group 1    a
Group 2    a,b
Group 3    a,b,c
Group 4    a,a,b,b
Group 5    a,b,d,e,
Group 6    (null)
Group 7    ,
Group 8    a,a,b,b,,
(null)     (null)
 
IS PRESENT
When I originally wrote this solution I tried to use the following UNTIL clause
iterate( 6 )
  until ( NOT( STRING[ITERATION_NUMBER+2] IS PRESENT ) = 0 )
 
Unfortunately it produced some rather nasty errors.
ERROR at line 1:
ORA-03113: end-of-file on communication channel
ERROR:
ORA-03114: not connected to ORACLE
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump
  [ACCESS_VIOLATION] [_qcss_process_expr+226] [PC:0x1E57EC6] [ADDR:0x18] [UNABLE_TO_READ] []
Current SQL statement for this session:
select group_key, string
from   t
...
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
_qcss_process_expr+           00000000
226
_qcss_process_cond+  CALLrel  _qcss_process_expr+  8CA6754 8171468 841BB64
76                            0                    8CA6698
_qcss_process_sprea  CALLrel  _qcss_process_cond+  8CA6754 8171468 841BB58
dsheet+102                    0                    8CA6698
_qcssSemanticCbks+4  CALLrel  _qcss_process_sprea  8CA6754 8171468 841A5E0
5                             dsheet+0


No comments:

Post a Comment

Oracle Escape Characters

Oracle allows the assignment of special escape characters to tell Oracle that the character is interpreted literally.  Certain characters ...