Home > HowTo, XML DB, XQuery > How To : using OUTER JOIN with XMLTABLE or XQuery

How To : using OUTER JOIN with XMLTABLE or XQuery

Here is a short reminder post about how to use or implement outer join in XMLTABLE or XQuery.

The typical scenario for which we would need to outer join is the flattening of a master-detail hierarchical structure with some logical records missing in the detail part.
For instance, given this input XML document :

<root>
  <person id="1">
    <name>John</name>
    <details>
      <phone type="mobile">111-11-2222-22</phone>
      <phone type="home">555-11-2222-22</phone>
    </details>
  </person>
  <person id="2">
    <name>Jane</name>
    <details>
      <phone type="mobile">111-11-2222-33</phone>
    </details>
  </person>
  <person id="3">
    <name>Sam</name>
    <details/>
  </person>
</root>

We’d like to get the following result set :

 PERSON_ID PERSON_NAME PHONE_TYPE PHONE_NUMBER
---------- ----------- ---------- ---------------
         1 John        mobile     111-11-2222-22
         1 John        home       555-11-2222-22
         2 Jane        mobile     111-11-2222-33
         3 Sam                    

 

1. Using chained XMLTABLEs

Flattening the above structure is very straightforward with chained XMLTABLEs.
Each XMLTABLE block is dedicated to the extraction of one level of records, and passes the collection of nested children to the next one to be extracted in their turn.
However, if one link of the chain is missing then the whole branch is discarded, just like a regular SQL inner join, e.g.

SQL> select x1.person_id, x1.person_name, x2.phone_type, x2.phone_number
  2  from tmp_xml t
  3     , xmltable(
  4         '/root/person'
  5         passing t.object_value
  6         columns person_id   number       path '@id'
  7               , person_name varchar2(10) path 'name'
  8               , phones      xmltype      path 'details/phone'
  9       ) x1
 10     , xmltable(
 11         '/phone'
 12         passing x1.phones
 13         columns phone_type   varchar2(10) path '@type'
 14               , phone_number varchar2(15) path 'text()'
 15       ) x2
 16  ;
 
 PERSON_ID PERSON_NAME PHONE_TYPE PHONE_NUMBER
---------- ----------- ---------- ---------------
         1 John        mobile     111-11-2222-22
         1 John        home       555-11-2222-22
         2 Jane        mobile     111-11-2222-33
 

Sam doesn’t have any phone number, therefore the corresponding record is not present in the output.

The solution is to outer join X2 to X1, and that can be done really simply by adding the (+) operator on the second XMLTABLE :

SQL> select x1.person_id, x1.person_name, x2.phone_type, x2.phone_number
  2  from tmp_xml t
  3     , xmltable(
  4         '/root/person'
  5         passing t.object_value
  6         columns person_id   number       path '@id'
  7               , person_name varchar2(10) path 'name'
  8               , phones      xmltype      path 'details/phone'
  9       ) x1
 10     , xmltable(
 11         '/phone'
 12         passing x1.phones
 13         columns phone_type   varchar2(10) path '@type'
 14               , phone_number varchar2(15) path 'text()'
 15       ) (+) x2
 16  ;
 
 PERSON_ID PERSON_NAME PHONE_TYPE PHONE_NUMBER
---------- ----------- ---------- ---------------
         1 John        mobile     111-11-2222-22
         1 John        home       555-11-2222-22
         2 Jane        mobile     111-11-2222-33
         3 Sam                                   
 

Now Sam gets is row back, with NULLs for the phone-related columns.

It’s also possible to define the outer join using the ANSI syntax :

select ...
from tmp_xml t
     join xmltable( ... ) x1 on 1 = 1
     left outer join xmltable( ... ) x2 on 1 = 1
;

The actual join predicate is handled by the PASSING clause, however a dummy “1 = 1” predicate is required here in order to comply with the syntax.
On 12c, we may also use an OUTER APPLY join, which provides a nice alternate syntax :

select ...
from tmp_xml t
     cross apply xmltable( ... ) x1
     outer apply xmltable( ... ) x2
;

 

2. Using an XQuery FLWOR expression

Flattening the master-detail structure can also be achieved with a single XQuery expression :

SQL> select x.person_id, x.person_name, x.phone_type, x.phone_number
  2  from tmp_xml t
  3     , xmltable(
  4         'for $i in /root/person
  5            , $j in $i/details/phone
  6          return element r {
  7            $i/@id
  8          , $i/name
  9          , element phone_type { data($j/@type) }
 10          , element phone_num  { $j/text() }
 11          }'
 12         passing t.object_value
 13         columns person_id    number       path '@id'
 14               , person_name  varchar2(10) path 'name'
 15               , phone_type   varchar2(10) path 'phone_type'
 16               , phone_number varchar2(15) path 'phone_num'
 17       ) x
 18  ;
 
 PERSON_ID PERSON_NAME PHONE_TYPE PHONE_NUMBER
---------- ----------- ---------- ---------------
         1 John        mobile     111-11-2222-22
         1 John        home       555-11-2222-22
         2 Jane        mobile     111-11-2222-33
 

Like the first chaining example above, Sam’s record is missing too.
XQuery supports correlated (nested) sequence iterators but only in an “inner-join” fashion.
So, implementing outer join like required here is left to us, end-users.

I’ll present below two approaches.

a-
Here the query first gets the sequence of phone elements into a separate variable then, if the sequence is not empty, iterates through it and outputs items made of both person and phone data; else just returns person data :

SQL> select x.person_id, x.person_name, x.phone_type, x.phone_number
  2  from tmp_xml t
  3     , xmltable(
  4         'for $i in /root/person
  5          let $phones := $i/details/phone
  6          return
  7            if ($phones) then
  8              for $j in $phones
  9              return element r {
 10                $i/@id
 11              , $i/name
 12              , element phone_type { data($j/@type) }
 13              , element phone_num  { $j/text() }
 14              }
 15            else
 16              element r {
 17                $i/@id
 18              , $i/name
 19              }'
 20         passing t.object_value
 21         columns person_id    number       path '@id'
 22               , person_name  varchar2(10) path 'name'
 23               , phone_type   varchar2(10) path 'phone_type'
 24               , phone_number varchar2(15) path 'phone_num'
 25       ) x
 26  ;
 
 PERSON_ID PERSON_NAME PHONE_TYPE PHONE_NUMBER
---------- ----------- ---------- ---------------
         1 John        mobile     111-11-2222-22
         1 John        home       555-11-2222-22
         2 Jane        mobile     111-11-2222-33
         3 Sam                    
 

b-
Similar approach but instead of building different items depending on the children cardinality, a dummy empty element is generated in case a person has no phone data :

SQL> select x.person_id, x.person_name, x.phone_type, x.phone_number
  2  from tmp_xml t
  3     , xmltable(
  4         'for $i in /root/person
  5          let $phones := if ($i/details/phone) then $i/details/phone else <empty/>
  6          return
  7            for $j in $phones
  8            return element r {
  9              $i/@id
 10            , $i/name
 11            , element phone_type { data($j/@type) }
 12            , element phone_num  { $j/text() }
 13            }'
 14         passing t.object_value
 15         columns person_id    number       path '@id'
 16               , person_name  varchar2(10) path 'name'
 17               , phone_type   varchar2(10) path 'phone_type'
 18               , phone_number varchar2(15) path 'phone_num'
 19       ) x
 20  ;
 
 PERSON_ID PERSON_NAME PHONE_TYPE PHONE_NUMBER
---------- ----------- ---------- ---------------
         1 John        mobile     111-11-2222-22
         1 John        home       555-11-2222-22
         2 Jane        mobile     111-11-2222-33
         3 Sam                    
 

 

3. Performance considerations

Let’s now compare how those methods perform on a larger XML document.

truncate table tmp_xml;

insert into tmp_xml
select xmlelement("root"
       , xmlagg(
           xmlelement("item"
           , xmlattributes(level as "id")
           , xmlelement("name", 'item'||to_char(level, 'fm09999'))
           , xmlelement("subitems"
             , case when mod(level,5) != 0 then
               (            
                 select xmlagg(
                          xmlelement("subitem", xmlattributes(level as "id"), 'subitem'||to_char(level, 'fm099'))
                        )
                 from dual
                 connect by level <= 100
               )
               end
             )
           )
         )
       )
from dual
connect by level <= 10000 ;

The generated document possesses 10,000 logical records (item element), each having a nested collection of 100 children (subitem element) except for every five item where the collection is left empty :

<root>
  <item id="1">
    <name>item00001</name>
    <subitems>
      <subitem id="1">subitem001</subitem>
      <subitem id="2">subitem002</subitem>
      <subitem id="3">subitem003</subitem>
      <subitem id="4">subitem004</subitem>
      <subitem id="5">subitem005</subitem>
      <subitem id="6">subitem006</subitem>
      
      ...
      
    </subitems>
  </item>
  <item id="10000">
    ...
  </item>
</root>

The test consists in a simple execution of the query as a CTAS. The generated table is dropped between each run :

SQL> set timing on
SQL>
SQL> create table tmp1 as
  2  select x1.item_id, x1.item_name, x2.subitem_id, x2.subitem_name
  3  from tmp_xml t
  4     , xmltable(
  5         '/root/item'
  6         passing t.object_value
  7         columns item_id   number       path '@id'
  8               , item_name varchar2(30) path 'name'
  9               , subitems  xmltype      path 'subitems/subitem'
 10       ) x1
 11     , xmltable(
 12         '/subitem'
 13         passing x1.subitems
 14         columns subitem_id   number       path '@id'
 15               , subitem_name varchar2(30) path '.'
 16       ) (+) x2
 17  ;

Table created.

Elapsed: 00:00:06.81
SQL> select count(*) from tmp1;

  COUNT(*)
----------
    802000

SQL> create table tmp1 as
  2  select x.item_id, x.item_name, x.subitem_id, x.subitem_name
  3  from tmp_xml t
  4     , xmltable(
  5         'for $i in /root/item
  6          let $subitems := $i/subitems/subitem
  7          return
  8            if ($subitems) then
  9              for $j in $subitems
 10              return element r {
 11                $i/@id
 12              , $i/name
 13              , element sub_id {data($j/@id)}
 14              , element sub_name {$j/text()}
 15              }
 16            else element r {
 17                   $i/@id
 18                 , $i/name
 19                 }'
 20         passing t.object_value
 21         columns item_id      number       path '@id'
 22               , item_name    varchar2(30) path 'name'
 23               , subitem_id   number       path 'sub_id'
 24               , subitem_name varchar2(30) path 'sub_name'
 25       ) x
 26  ;

Table created.

Elapsed: 00:42:20.77

SQL> create table tmp1 as
  2  select x.item_id, x.item_name, x.subitem_id, x.subitem_name
  3  from tmp_xml t
  4     , xmltable(
  5         'for $i in /root/item
  6          let $subitems := if ($i/subitems/subitem) then $i/subitems/subitem else <empty/>
  7          return
  8            for $j in $subitems
  9            return element r {
 10              $i/@id
 11            , $i/name
 12            , element sub_id {data($j/@id)}
 13            , element sub_name {$j/text()}
 14            }'
 15         passing t.object_value
 16         columns item_id      number       path '@id'
 17               , item_name    varchar2(30) path 'name'
 18               , subitem_id   number       path 'sub_id'
 19               , subitem_name varchar2(30) path 'sub_name'
 20       ) x
 21  ;

Table created.

Elapsed: 01:08:54.62

Last two again with NO_XML_QUERY_REWRITE hint :

SQL> create table tmp1 as
  2  select /*+ no_xml_query_rewrite */
  3         x.item_id, x.item_name, x.subitem_id, x.subitem_name
  4  from tmp_xml t
  5     , xmltable(
  6         'for $i in /root/item
  7          let $subitems := $i/subitems/subitem
  8          return
  9            if ($subitems) then
 10              for $j in $subitems
 11              return element r {
 12                $i/@id
 13              , $i/name
 14              , element sub_id {data($j/@id)}
 15              , element sub_name {$j/text()}
 16              }
 17            else element r {
 18                   $i/@id
 19                 , $i/name
 20                 }'
 21         passing t.object_value
 22         columns item_id      number       path '@id'
 23               , item_name    varchar2(30) path 'name'
 24               , subitem_id   number       path 'sub_id'
 25               , subitem_name varchar2(30) path 'sub_name'
 26       ) x
 27  ;
create table tmp1 as
*
ERROR at line 1:
ORA-00039: error during periodic action
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT


Elapsed: 00:02:16.86
SQL> create table tmp1 as
  2  select /*+ no_xml_query_rewrite */
  3         x.item_id, x.item_name, x.subitem_id, x.subitem_name
  4  from tmp_xml t
  5     , xmltable(
  6         'for $i in /root/item
  7          let $subitems := if ($i/subitems/subitem) then $i/subitems/subitem else <empty/>
  8          return
  9            for $j in $subitems
 10            return element r {
 11              $i/@id
 12            , $i/name
 13            , element sub_id {data($j/@id)}
 14            , element sub_name {$j/text()}
 15            }'
 16         passing t.object_value
 17         columns item_id      number       path '@id'
 18               , item_name    varchar2(30) path 'name'
 19               , subitem_id   number       path 'sub_id'
 20               , subitem_name varchar2(30) path 'sub_name'
 21       ) x
 22  ;

Table created.

Elapsed: 00:03:09.05

Summary :

Method Overall time (seconds) Comment
Chained XMLTABLEs 6.8
XQuery a 2540
XQuery a (+ NO_XML_QUERY_REWRITE) n/a Out of PGA on my system (2GB)
XQuery b 4134
XQuery b (+ NO_XML_QUERY_REWRITE) 189

(on my Win7 x64 laptop)

 

Advertisements
Categories: HowTo, XML DB, XQuery
  1. paulzip
    June 7, 2016 at 18:46

    I came across a bug in 11g regarding ANSI and XMLTable joins a while ago, when helping someone solve a problem. It seemed if the join chains involved too many XMLTables I was getting nulls back – even with no rewrite, Very weird and I couldn’t put my finger on the precise point it happened. SQL 89 joins didn’t have the same problem.

    Paulzip

  2. paulzip
    November 18, 2016 at 02:24

    I came across an interesting question the other day which effectively was the reverse of the flattening of master / detail / sub detail data into nested XML, with grouping done where possible. I wondered what would be the best way to achieve it? I’m sure you have some ideas.

    For example :

    Turning :

    with xml_example(region, service, affinity, client, rating, subscribers) as (
    select ‘East’, ‘TV’, ‘Primary’, ‘Legacy’, 4, 7 from dual union all
    select ‘West’, ‘Phone’, ‘Primary’, ‘New’, 4, 7 from dual union all
    select ‘West’, ‘Phone’, ‘Secondary’, ‘New’, 4, 7 from dual union all
    select ‘West’, ‘TV’, ‘Primary’, ‘New’, 1, 5 from dual
    )
    select * from xml_example

    Into :

    East
    TV
    Primary
    Legacy
    4
    7

    West
    TV
    Primary
    New
    1
    5

    Phone
    Primary
    New
    4
    7

    Secondary
    New
    4
    7

    You can see grouping has occurred at region, service, but ideally it could extend to the other levels if need be.

    My approach was to create nested XMLAgg using select distinct and joins, but is that the best approach? I’m not sure it is.

    I also thought of maybe generating ungrouped XML and grouping using XQuery.

    I also considered XSLT Muenchian Method.

    What do you think?

    • November 18, 2016 at 13:14

      I can’t access the pastebin content at the moment (company restriction) but I see what you mean.
      Another approach is to use nested GROUP-BYs.
      Here are some threads I’ve just digged up from my memory and OTN archives :
      https://community.oracle.com/thread/1558304
      https://community.oracle.com/thread/2420901

      Solutions involving XQuery 1.0 and XSLT 1.0 would be just for fun IMO, but interesting exercises nevertheless.
      Grouping is very tedious in those versions (not sure whether XSLT 2.0 finally made its way to (PL)SQL in Oracle 12.2).
      Good idea for a future blog post ;)

      • paulzip
        November 18, 2016 at 14:12

        I knew you’d have a more elegant approach! Yes, I think nested GROUP-BYs.on a single table access is definitely the best approach.

        I think later versions of XQuery (v3.0?) have group by for FLWOR, which seem to make things a lot easier, it’s a shame Oracle doesn’t support it yet.

        Also XSLT v2.0+ has for-each-group which makes it a doddle.

        Yes, a great idea for a future blog post. XML grouping crops up quite a lot in online questions, yet articles are quite sparse.

        Cheers Marc!

        Paul

  3. paulzip
    November 18, 2016 at 02:28

    My content of the prior post got mangled. I posted the with and XML here : http://pastebin.com/1PbE4pxS

  4. January 29, 2017 at 18:03

    Thanks a lot, you are a life saver :) … I was going through too many webpages to handle multi node xml file.

    Regards
    Jaganna

  5. Anonymous
    February 15, 2017 at 17:10

    Really useful!
    Thank you very much!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: