Here’s a new post in the category “How to deal with that unfriendly design?”
Again, since the title is not very explicit, an example will surely explain it all :
<root> <ids> <id>7369</id> <id>7499</id> <id>7521</id> </ids> <names> <name>SMITH</name> <name>ALLEN</name> <name>WARD</name> </names> <jobs> <job>CLERK</job> <job>SALESMAN</job> <job>SALESMAN</job> </jobs> </root>
This is the result we want out of that data :
ID NAME JOB ----- ------- ---------- 7369 SMITH CLERK 7499 ALLEN SALESMAN 7521 WARD SALESMAN
0. Set up
This script creates an XMLType table (binary XML storage) and a sample XML document :
create table tmp_xml of xmltype; insert into tmp_xml select xmlelement("root" , xmlelement("ids", xmlagg(xmlelement("id", level))) , xmlelement("names", xmlagg(xmlelement("name", 'NAME'||to_char(level,'fm09999')))) , xmlelement("dates", xmlagg(xmlelement("date", cast(date '2015-01-01' + level/24 as timestamp(0))))) ) from dual connect by level <= 10000;
The resulting XML document has the following structure :
<root> <ids> <id>1</id> <id>2</id> <id>3</id> <!-- up to 10000 --> </ids> <names> <name>NAME00001</name> <name>NAME00002</name> <name>NAME00003</name> <!-- up to NAME10000 --> </names> <dates> <date>2015-01-01T01:00:00</date> <date>2015-01-01T02:00:00</date> <date>2015-01-01T03:00:00</date> <!-- up to 2016-02-21T16:00:00 --> </dates> </root>
Expected output :
ID NAME DT ------ ------------ -------------------- 1 NAME00001 01/01/2015 01:00:00 2 NAME00002 01/01/2015 02:00:00 3 NAME00003 01/01/2015 03:00:00 4 NAME00004 01/01/2015 04:00:00 ...
Following are two methods to achieve that.
1. XMLTABLE with a single XQuery expression
select /*+ no_xml_query_rewrite */ x.* from tmp_xml t , xmltable( 'for $i at $p in $d/root/ids/id return element r { $i , $d/root/names/name[$p] , $d/root/dates/date[$p] }' passing t.object_value as "d" columns id number(5) path 'id' , name varchar2(10) path 'name' , dt timestamp path 'date' ) x ;
Here, the FLWOR expression iterates the sequence of ids, gathering the index of each item into the positional variable $p. The return clause then builds a “row” element consisting in the current id and both the name and date elements at the same position in their corresponding groups.
Also note the usage of the NO_XML_QUERY_REWRITE hint which forces evaluation of the XQuery expression by the XVM.
Without the hint, Oracle resolves the XQuery using internal SQL functions but it’s very inefficient when positional iteration is used.
2. Multiple XMLTABLEs + PIVOT operation
with unpivoted_data (cn, rn, val) as ( select 1, x.* from tmp_xml t, xmltable('/root/ids/id' passing t.object_value columns rn for ordinality, val varchar2(4000) path '.') x union all select 2, x.* from tmp_xml t, xmltable('/root/names/name' passing t.object_value columns rn for ordinality, val varchar2(4000) path '.') x union all select 3, x.* from tmp_xml t, xmltable('/root/dates/date' passing t.object_value columns rn for ordinality, val varchar2(4000) path '.') x ) select to_number(id) as id , name , to_date(dt, 'YYYY-MM-DD"T"HH24:MI:SS') as dt from unpivoted_data pivot ( min(val) for cn in (1 as "ID", 2 as "NAME", 3 as "DT") ) ;
The approach conducted here is more SQL-ish and leverages the power of set-based operations.
Each group (ids, names and dates) is broken into relational rows and assigned a common column number (CN). The whole dataset is then pivoted into three target columns, producing the required result.
3. Performance considerations and conclusion
First method, no hint :
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> insert into tmp_xml 2 select xmlelement("root" 3 , xmlelement("ids", xmlagg(xmlelement("id", level))) 4 , xmlelement("names", xmlagg(xmlelement("name", 'NAME'||to_char(level,'fm09999')))) 5 , xmlelement("dates", xmlagg(xmlelement("date", cast(date '2015-01-01' + level/24 as timestamp(0))))) 6 ) 7 from dual 8 connect by level <= 10000; 1 row created. SQL> commit; Commit complete. SQL> SQL> SQL> set autotrace traceonly SQL> set lines 200 SQL> set timing on SQL> SQL> select x.* 2 from tmp_xml t 3 , xmltable( 4 'for $i at $p in $d/root/ids/id 5 return element r { 6 $i 7 , $d/root/names/name[$p] 8 , $d/root/dates/date[$p] 9 }' 10 passing t.object_value as "d" 11 columns id number(5) path 'id' 12 , name varchar2(10) path 'name' 13 , dt timestamp path 'date' 14 ) x 15 ; ^C
I had to cancel the query after 2 hours of execution, waiting on ‘direct path read’ I/O event.
Here’s the corresponding explain plan anyway :
Execution Plan ---------------------------------------------------------- Plan hash value: 3262674909 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8168 | 725K| 32 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | COLLECTION ITERATOR PICKLER FETCH | XMLSEQUENCEFROMXMLTYPE | 82 | 328 | 29 (0)| 00:00:01 | | 3 | SORT AGGREGATE | | 1 | 4 | | | |* 4 | COLLECTION ITERATOR PICKLER FETCH| XMLSEQUENCEFROMXMLTYPE | 82 | 328 | 29 (0)| 00:00:01 | | 5 | NESTED LOOPS | | 8168 | 725K| 32 (0)| 00:00:01 | | 6 | TABLE ACCESS FULL | TMP_XML | 1 | 87 | 3 (0)| 00:00:01 | | 7 | COLLECTION ITERATOR PICKLER FETCH | XMLSEQUENCEFROMXMLTYPE | 8168 | 32672 | 29 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(SYS_XQFNPREDTRUTH(:B1,VALUE(KOKBF$) /* KOKBI$ */ )=1) 4 - filter(SYS_XQFNPREDTRUTH(:B1,VALUE(KOKBF$) /* KOKBI$ */ )=1) Note ----- - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
First method, with NO_XML_QUERY_REWRITE hint :
SQL> select /*+ no_xml_query_rewrite */ x.* 2 from tmp_xml t 3 , xmltable( 4 'for $i at $p in $d/root/ids/id 5 return element r { 6 $i 7 , $d/root/names/name[$p] 8 , $d/root/dates/date[$p] 9 }' 10 passing t.object_value as "d" 11 columns id number(5) path 'id' 12 , name varchar2(10) path 'name' 13 , dt timestamp path 'date' 14 ) x 15 ; 10000 rows selected. Elapsed: 00:00:34.68 Execution Plan ---------------------------------------------------------- Plan hash value: 1966229631 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8168 | 741K| 32 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 8168 | 741K| 32 (0)| 00:00:01 | | 2 | TABLE ACCESS STORAGE FULL| TMP_XML | 1 | 87 | 3 (0)| 00:00:01 | | 3 | XMLTABLE EVALUATION | | | | | | -------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 44 physical reads 0 redo size 288929 bytes sent via SQL*Net to client 7798 bytes received via SQL*Net from client 668 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
Second method :
SQL> with unpivoted_data (cn, rn, val) as ( 2 select 1, x.* from tmp_xml t, xmltable('/root/ids/id' passing t.object_value columns rn for ordinality, val varchar2(4000) path '.') x 3 union all 4 select 2, x.* from tmp_xml t, xmltable('/root/names/name' passing t.object_value columns rn for ordinality, val varchar2(4000) path '.') x 5 union all 6 select 3, x.* from tmp_xml t, xmltable('/root/dates/date' passing t.object_value columns rn for ordinality, val varchar2(4000) path '.') x 7 ) 8 select to_number(id) as id 9 , name 10 , to_date(dt, 'YYYY-MM-DD"T"HH24:MI:SS') as dt 11 from unpivoted_data 12 pivot ( 13 min(val) for cn in (1 as "ID", 2 as "NAME", 3 as "DT") 14 ) 15 ; 10000 rows selected. Elapsed: 00:00:01.79 Execution Plan ---------------------------------------------------------- Plan hash value: 4047723032 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 24504 | 47M| 98 (3)| 00:00:02 | | 1 | SORT GROUP BY PIVOT | | 24504 | 47M| 98 (3)| 00:00:02 | | 2 | VIEW | | 24504 | 47M| 96 (0)| 00:00:02 | | 3 | UNION-ALL | | | | | | | 4 | NESTED LOOPS | | 8168 | 16M| 32 (0)| 00:00:01 | | 5 | TABLE ACCESS STORAGE FULL| TMP_XML | 1 | 87 | 3 (0)| 00:00:01 | | 6 | VIEW | | 8168 | 15M| 29 (0)| 00:00:01 | | 7 | COUNT | | | | | | | 8 | XPATH EVALUATION | | | | | | | 9 | NESTED LOOPS | | 8168 | 16M| 32 (0)| 00:00:01 | | 10 | TABLE ACCESS STORAGE FULL| TMP_XML | 1 | 87 | 3 (0)| 00:00:01 | | 11 | VIEW | | 8168 | 15M| 29 (0)| 00:00:01 | | 12 | COUNT | | | | | | | 13 | XPATH EVALUATION | | | | | | | 14 | NESTED LOOPS | | 8168 | 16M| 32 (0)| 00:00:01 | | 15 | TABLE ACCESS STORAGE FULL| TMP_XML | 1 | 87 | 3 (0)| 00:00:01 | | 16 | VIEW | | 8168 | 15M| 29 (0)| 00:00:01 | | 17 | COUNT | | | | | | | 18 | XPATH EVALUATION | | | | | | ----------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 42 consistent gets 707 physical reads 0 redo size 288929 bytes sent via SQL*Net to client 7798 bytes received via SQL*Net from client 668 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 10000 rows processed
Conclusion :
Method | Rows fetched | Overall time (second) |
---|---|---|
Single XQuery | 10000 | +∞ |
Single XQuery + NO_XML_QUERY_REWRITE hint | 10000 | 35 |
XMLTABLEs + PIVOT | 10000 | 1.8 |