Home > HowTo, XQuery > How To : Sort delimited values in a string using XQuery

How To : Sort delimited values in a string using XQuery

November 28, 2011 Leave a comment Go to comments

This follows a thread on OTN about a month ago. The question was : how to sort a string in ascending order?
I suggested an XQuery approach that turned out to be a lot more efficient than the traditional TABLE(SUBQUERY-FETCH) pattern.

Set up…

Let’s create a 10,000-row table holding strings of 100 comma-separated integer values :

SQL> create table csv_table as
  2  with tmp as (
  3    select mod(level,10000) as id
  4         , to_char(trunc(dbms_random.value(1,1000))) as val
  5    from dual
  6    connect by level <= 1000000
  7  )
  8  select id
  9       , cast(listagg(val, ',') within group(order by null) as varchar2(4000)) as str
 10  from tmp
 11  group by id
 12  ;
  
Table created
 
SQL> select * from csv_table where rownum <= 10;
 
        ID STR
---------- --------------------------------------------------------------------------------
         0 124,132,14,142,150,151,171,176,196,214,220,223,236,242,272,279,283,296,321,326,3
         1 104,104,111,117,128,130,173,178,182,185,204,214,23,264,279,295,30,304,32,32,344,
         2 115,118,12,123,124,142,148,148,159,160,18,196,198,199,208,22,229,234,236,242,247
         3 100,121,126,13,15,160,165,171,182,2,220,224,231,24,243,245,25,257,27,276,278,28,
         4 128,143,15,150,162,168,18,19,190,191,201,216,220,221,232,242,244,244,259,260,268
         5 111,124,148,149,152,156,158,159,160,163,170,180,182,185,188,199,203,203,206,210,
         6 123,127,141,164,164,167,169,179,189,192,192,194,195,206,220,222,235,24,244,267,2
         7 10,10,111,115,118,12,126,152,154,156,178,18,18,194,20,20,203,208,228,23,230,236,
         8 105,107,111,127,130,134,156,157,162,17,172,178,196,201,220,228,242,250,253,261,2
         9 145,145,15,151,160,164,166,179,181,197,204,210,222,239,255,256,257,265,270,285,2
 
10 rows selected
 

Query 1, traditional approach :
SQL> set autotrace traceonly statistics
SQL> set timing on
SQL> SELECT t.id
  2       , listagg(to_char(x.column_value), ',') within group (order by x.column_value) as sorted_str
  3  FROM csv_table t
  4     , TABLE(
  5         CAST(
  6           MULTISET(
  7             SELECT to_number(regexp_substr(str, '[^,]+', 1, level))
  8             FROM dual
  9             CONNECT BY level <= regexp_count(str, '[^,]+')
 10           )
 11           as sys.odcinumberlist
 12         )
 13       ) x
 14  WHERE id < 1000
 15  GROUP BY t.id
 16  ;

1000 rows selected.

Elapsed: 00:01:38.59

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        559  consistent gets
          0  physical reads
          0  redo size
     406277  bytes sent via SQL*Net to client
       1145  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
       1001  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Query 2, XQuery :
SQL> SELECT t.id
  2       , x.sorted_str
  3  FROM csv_table t
  4     , XMLTable('string-join(for $i in ora:tokenize($str, ",") order by xs:integer($i) return $i, ",")'
  5         passing t.str as "str"
  6         columns sorted_str varchar2(4000) path '.'
  7       ) x
  8  WHERE t.id < 1000
  9  ;

1000 rows selected.

Elapsed: 00:00:00.98

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       4204  consistent gets
          0  physical reads
          0  redo size
     406277  bytes sent via SQL*Net to client
       1145  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
       1000  sorts (memory)
          0  sorts (disk)
       1000  rows processed

Actually, more or less the same method is used for both queries : tokenizing and aggregating back in desired order, but internal XQuery functions outperform our custom tokenization technique.

Advertisements
Categories: HowTo, XQuery Tags: , ,
  1. Stew Ashton
    March 11, 2012 at 12:04

    Hello Marc,

    I found this blog after reading some of your replies on the otn forum. Thanks for sharing your knowledge both here and there.

    I just wanted to mention that the “custom” method you mention uses REGEXP* functions, which consume lots of CPU. There is a more traditional method that is a bit slower than XQUERY but more scalable, since it does lots fewer consistent gets.

    SELECT t.id
    , listagg(to_char(x.column_value), ',') within group (order by x.column_value) as sorted_str
    FROM csv_table t
    , TABLE(
      CAST(
        MULTISET(
          SELECT to_number(trim(SUBSTR(','||str||','
            , instr(','||str||',', ',', 1, level) + 1
            , INSTR(','||STR||',', ',', 1, LEVEL + 1) - INSTR(','||STR||',', ',', 1, LEVEL) - 1
          ) )) AS TOKEN 
          FROM dual
          CONNECT BY level <= LENGTH(str) - LENGTH(REPLACE(str, ',', '')) + 1
        )
        as sys.odcinumberlist
      )
    ) x
    GROUP BY T.ID
  1. August 1, 2016 at 07:10

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: