Tuesday, September 20, 2011

Strings to Rows 1

Union All Method      

Now, here is a solution that uses parsing logic from String to Columns: Separated Values - 10g+.
set null "(null)"
set pagesize 45
variable d varchar2(1)
execute :d := ','
break on key skip 1 duplicates
select
  key,
  1 as position,
  rtrim( regexp_substr( c || :d, '.*?' || :d, 1, 1 ), :d ) as val
from   t
UNION ALL
select
  key,
  2 as position,
  rtrim( regexp_substr( c || :d, '.*?' || :d, 1, 2 ), :d ) as val
from   t
where  regexp_substr( c || :d, '.*?' || :d, 1, 2 ) is not null
UNION ALL
select
  key,
  3 as position,
  rtrim( regexp_substr( c || :d, '.*?' || :d, 1, 3 ), :d ) as val
from   t
where  regexp_substr( c || :d, '.*?' || :d, 1, 3 ) is not null
order by
  key, position, val ;
 
KEY   POSITION VAL
--- ---------- ----------
r01          1 v1
r02          1 v1
r02          2 (null)
r03          1 v1
r03          2 (null)
r03          3 (null)
r04          1 (null)
r04          2 v2
r04          3 (null)
r05          1 (null)
r05          2 (null)
r05          3 v3
r06          1 v1
r06          2 v2
r07          1 v1
r07          2 v2
r07          3 (null)
r08          1 v1
r08          2 v2
r08          3 v3
r09          1 (null)
r10          1 (null)
r10          2 (null)
r11          1 (null)
r11          2 (null)
r11          3 (null)
 

When strings contain a large number of values we need a more flexible and efficient technique than the method presented in UNION ALL Method. One strategy involves using a parsing technique like one of those described in SQL Techniques Tutorials: String to Columns with an integer series generator like the ones in SQL Techniques Tutorials: Integer Series Generators. While many different combinations of the two types of techniques are possible, we will only examine a couple of these combinations here.
Before we see the first solution though, let's have a look at the test data we will use.
select * from t order by key ;
 
KEY C          NOTE
--- ---------- -------------------------
r01 v1         One value
r02 v1,        Two values, one empty
r03 v1,,       Three values, two empty
r04 ,v2,       Three values, two empty
r05 ,,v3       Three values, two empty
r06 v1,v2      Two values
r07 v1,v2,     Three values, one empty
r08 v1,v2,v3   Three values, none empty
r09 (null)     One empty value
r10 ,          Two values, two empty
r11 ,,         Three values, all empty
 
Now let's examine one possible solution.
set null "(null)"
set pagesize 45
break on key skip 1 duplicates
variable d varchar2(1)
execute :d := ','
select
  ta.key ,
  i.column_value as position ,
  substr
  ( ta.c2,
    instr( ta.c2, :d, 1, i.column_value  ) + 1,
    instr( ta.c2, :d, 1, i.column_value + 1 )
      - ( instr( ta.c2, :d, 1, i.column_value ) + 1 )
  )
  as val
from
  ( select
      key ,
      :d || c || :d as c2 ,
      length( c || :d ) - nvl( length( replace( c, :d, null ) ), 0 ) as value_count
    from t
  ) ta
      inner join table( integer_varray_type( 1,2,3,4,5 ) ) i
      on ( i.column_value <= ta.value_count )
order by
  key, position
;
 
KEY   POSITION VAL
--- ---------- ----------
r01          1 v1
r02          1 v1
r02          2 (null)
r03          1 v1
r03          2 (null)
r03          3 (null)
r04          1 (null)
r04          2 v2
r04          3 (null)
r05          1 (null)
r05          2 (null)
r05          3 v3
r06          1 v1
r06          2 v2
r07          1 v1
r07          2 v2
r07          3 (null)
r08          1 v1
r08          2 v2
r08          3 v3
r09          1 (null)
r10          1 (null)
r10          2 (null)
r11          1 (null)
r11          2 (null)
r11          3 (null)
 

How It Works

To understand how the query works consider its individual components.
Component 1
First we look at values derived from T.
clear breaks
select
  key ,
  c ,
  LENGTH( C || :D ) - NVL( LENGTH( REPLACE( C, :D, NULL ) ), 0 ) AS VALUE_COUNT
from  t
order by 1
;
 
KEY C          VALUE_COUNT
--- ---------- -----------
r01 v1                   1
r02 v1,                  2
r03 v1,,                 3
r04 ,v2,                 3
r05 ,,v3                 3
r06 v1,v2                2
r07 v1,v2,               3
r08 v1,v2,v3             3
r09 (null)               1
r10 ,                    2
r11 ,,                   3
 
In this query VALUE_COUNT is simply the total number of delimited values in the string, determined by deriving the total number of delimiter characters in the string plus one. If your delimiter is more than one character long you will need to adjust this logic to return a correct count.
Component 2
The next component is a simple integer series generator.
clear breaks
select * from table( integer_varray_type( 1,2,3,4,5 ) ) ;
 
COLUMN_VALUE
------------
           1
           2
           3
           4
           5
 
It contains a number of rows equal to the maximum number we expect to find in VALUE_COUNT. In our example we expect no more than 5 delimited values per string.
This technique is described in more detail at Integer Series Generators: Type Constructor Expression Method. It requires a custom collection type (created in the Setup code for this section) defined like this.
desc integer_varray_type
 integer_varray_type VARRAY(10) OF NUMBER(38)
 
If we did not have access to such a type or could not create our own there are other integer series generation techniques to choose from. See the feature graph at the end of SQL Techniques Tutorials: Integer Series Generators for a list of alternatives.
Components 1 + 2
Next we combine components 1 and 2 to form a single query that returns one row for each delimited value in each STRING.
break on key skip 1 duplicates
column c2 format a12
select
  ta.key, ta.c2, ta.value_count, i.column_value
from
  ( select
      key ,
      :d || c || :d as c2 ,
      LENGTH( C || :D ) - NVL( LENGTH( REPLACE( C, :D, NULL ) ), 0 ) AS VALUE_COUNT
    from t
  ) ta
    inner join TABLE( INTEGER_VARRAY_TYPE( 1,2,3,4,5 ) ) I
      on ( i.column_value <= ta.value_count )
order by
  1, 2, 3, 4
;
 
KEY C2           VALUE_COUNT COLUMN_VALUE
--- ------------ ----------- ------------
r01 ,v1,                   1            1
r02 ,v1,,                  2            1
r02 ,v1,,                  2            2
r03 ,v1,,,                 3            1
r03 ,v1,,,                 3            2
r03 ,v1,,,                 3            3
r04 ,,v2,,                 3            1
r04 ,,v2,,                 3            2
r04 ,,v2,,                 3            3
r05 ,,,v3,                 3            1
r05 ,,,v3,                 3            2
r05 ,,,v3,                 3            3
r06 ,v1,v2,                2            1
r06 ,v1,v2,                2            2
r07 ,v1,v2,,               3            1
r07 ,v1,v2,,               3            2
r07 ,v1,v2,,               3            3
r08 ,v1,v2,v3,             3            1
r08 ,v1,v2,v3,             3            2
r08 ,v1,v2,v3,             3            3
r09 ,,                     1            1
r10 ,,,                    2            1
r10 ,,,                    2            2
r11 ,,,,                   3            1
r11 ,,,,                   3            2
r11 ,,,,                   3            3
 
Component 3
To extract individual values from each string we use the parsing logic described in the String to Columns: Separated Values tutorial. Here is an example of how the logic extracts the first value from a delimited string.
column first_val format a9
select
  c2,
  instr( c2, :d, 1, 1 ) position_of_first_delim ,
  instr( c2, :d, 1, 2 ) position_of_second_delim ,
  substr
  ( c2,
    instr( c2, :d, 1, 1 ) + 1,        -- == position of 1st character in value
    instr( c2, :d, 1, 2 )
      - ( instr( c2, :d, 1, 1 ) + 1 ) -- == length of 1st value
  )
  as first_val
from
  ( select :d || c || :d as c2
    from   t
    where  key = 'r08'
  )
;
 
C2           POSITION_OF_FIRST_DELIM POSITION_OF_SECOND_DELIM FIRST_VAL
------------ ----------------------- ------------------------ ---------
,v1,v2,v3,                         1                        4 v1
 
Final Solution
Adding the parsing logic from Component 3 to the query from "Components 1 + 2" yields the final solution.
select
  ta.key ,
  i.column_value as position ,
  substr
  ( ta.c2,
    instr( ta.c2, :d, 1, i.column_value  ) + 1,
    instr( ta.c2, :d, 1, i.column_value + 1 )
      - ( instr( ta.c2, :d, 1, i.column_value ) + 1 )
  )
  as val
from
  ( select
      key ,
      :d || c || :d as c2 ,
      length( c || :d ) - nvl( length( replace( c, :d, null ) ), 0 ) as value_count
    from t
  ) ta
    inner join table( integer_varray_type( 1,2,3,4,5 ) ) i
      on ( i.column_value <= ta.value_count )
order by
  key, position
;
 
KEY   POSITION VAL
--- ---------- ----------
r01          1 v1
r02          1 v1
r02          2 (null)
r03          1 v1
r03          2 (null)
r03          3 (null)
r04          1 (null)
r04          2 v2
r04          3 (null)
r05          1 (null)
r05          2 (null)
r05          3 v3
r06          1 v1
r06          2 v2
r07          1 v1
r07          2 v2
r07          3 (null)
r08          1 v1
r08          2 v2
r08          3 v3
r09          1 (null)
r10          1 (null)
r10          2 (null)
r11          1 (null)
r11          2 (null)
r11          3 (null)      

No comments:

Post a Comment

Oracle Escape Characters

Oracle allows the assignment of special escape characters to tell Oracle that the character is interpreted literally.  Certain characters ...