Home > HowTo, XML DB, XQuery, XSLT > How To : Flatten out a Recursive XML Document

How To : Flatten out a Recursive XML Document

January 17, 2017 Leave a comment Go to comments

I blogged a few times about XML flattening in the past :

Those posts were about converting any XML structure and data into a schema-less model called “Edge”.
What I’ll be presenting here applies to XML documents having a recursive structure,
e.g.

<emp id="1">
  <emp id="2"/>
  <emp id="3">
    <emp id="4"/>
    <emp id="5"/>
  </emp>
</emp>

which could be converted to an adjency list model :

ID  PARENT_ID
--- ---------
  1
  2         1
  3         1
  4         3
  5         3

or, denormalized into level-based columns, like this :

ID_1 ID_2 ID_3
---- ---- ----
   1    2
   1    3    4
   1    3    5

 

Preliminary setup

As usual, I’ll store my sample XML document in an XMLType table (TMP_XML).
For this example, it is generated on-the-fly using XMLNest aggregate function :

insert into tmp_xml
select xmlelement("Employees"
       , xmlnest(
           xmlnestitem(
             level
           , xmlelement("Employee"
             , xmlattributes(e.empno as "id")
             , xmlelement("Name", e.ename)
             )
           , 'Employees'
           )
         )
       )
from scott.emp e
start with e.mgr is null
connect by prior e.empno = e.mgr;

select xmlserialize(document object_value indent) from tmp_xml;

<Employees>
  <Employee id="7839">
    <Name>KING</Name>
    <Employees>
      <Employee id="7566">
        <Name>JONES</Name>
        <Employees>
          <Employee id="7788">
            <Name>SCOTT</Name>
            <Employees>
              <Employee id="7876">
                <Name>ADAMS</Name>
              </Employee>
            </Employees>
          </Employee>
          <Employee id="7902">
            <Name>FORD</Name>
            <Employees>
              <Employee id="7369">
                <Name>SMITH</Name>
              </Employee>
            </Employees>
          </Employee>
        </Employees>
      </Employee>
      <Employee id="7698">
        <Name>BLAKE</Name>
        <Employees>
          <Employee id="7499">
            <Name>ALLEN</Name>
          </Employee>
          <Employee id="7521">
            <Name>WARD</Name>
          </Employee>
          <Employee id="7654">
            <Name>MARTIN</Name>
          </Employee>
          <Employee id="7844">
            <Name>TURNER</Name>
          </Employee>
          <Employee id="7900">
            <Name>JAMES</Name>
          </Employee>
        </Employees>
      </Employee>
      <Employee id="7782">
        <Name>CLARK</Name>
        <Employees>
          <Employee id="7934">
            <Name>MILLER</Name>
          </Employee>
        </Employees>
      </Employee>
    </Employees>
  </Employee>
</Employees>

 

1. Converting to Adjency List

Using a recursive XQuery function :

SQL> select x.*
  2  from tmp_xml t
  3     , xmltable(
  4         'declare function local:getRows (
  5            $pid as xs:decimal?
  6          , $children as element(Employees)?
  7          )
  8          as element(r)*
  9          {
 10            for $e in $children/Employee
 11            let $id := xs:decimal($e/@id)
 12            return element r {
 13              element id {$id}
 14            , element ename {data($e/Name)}
 15            , element pid {$pid}
 16            } | local:getRows($id, $e/Employees)
 17          }; (: :)
 18          local:getRows((), /Employees)'
 19         passing t.object_value
 20         columns empno  number       path 'id'
 21               , ename  varchar2(10) path 'ename'
 22               , mgr    number       path 'pid'
 23       ) x
 24  ;
 
     EMPNO ENAME             MGR
---------- ---------- ----------
      7934 MILLER           7782
      7900 JAMES            7698
      7782 CLARK            7839
      7654 MARTIN           7698
      7844 TURNER           7698
      7499 ALLEN            7698
      7521 WARD             7698
      7369 SMITH            7902
      7698 BLAKE            7839
      7876 ADAMS            7788
      7902 FORD             7566
      7839 KING       
      7566 JONES            7839
      7788 SCOTT            7566

 

Quite similarly, we can also use XSLT to handle the recursive processing part.
Here’s the implementation :

SQL> select x.*
  2  from tmp_xml t
  3     , xmltable('/r'
  4         passing xmltransform(t.object_value,
  5          '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  6            <xsl:template match="Employee">
  7              <xsl:param name="pid"/>
  8              <r>
  9                <id><xsl:value-of select="@id"/></id>
 10                <ename><xsl:value-of select="Name"/></ename>
 11                <pid><xsl:value-of select="$pid"/></pid>
 12              </r>
 13              <xsl:apply-templates select="Employees/Employee">
 14                <xsl:with-param name="pid" select="@id"/>
 15              </xsl:apply-templates>
 16            </xsl:template>
 17          </xsl:stylesheet>')
 18         columns empno  number       path 'id'
 19               , ename  varchar2(10) path 'ename'
 20               , mgr    number       path 'pid'
 21       ) x
 22  ;
 
     EMPNO ENAME             MGR
---------- ---------- ----------
      7839 KING       
      7566 JONES            7839
      7788 SCOTT            7566
      7876 ADAMS            7788
      7902 FORD             7566
      7369 SMITH            7902
      7698 BLAKE            7839
      7499 ALLEN            7698
      7521 WARD             7698
      7654 MARTIN           7698
      7844 TURNER           7698
      7900 JAMES            7698
      7782 CLARK            7839
      7934 MILLER           7782
 

We’ll see how the XSLT approach compares to XQuery on a larger document in the last section.

 

2. Pivoting into level-based columns

In this approach, the output only consists of distinct paths (branches) of the hierarchy, with a group of columns for each level we want to show.
The recursive part can handle any number of levels, but as always the number of levels to consider in the output must be known at parse time :

SQL> select x.*
  2  from tmp_xml t
  3     , xmltable(
  4         'declare function local:getRows (
  5            $level as xs:integer
  6          , $parent as element()*
  7          , $children as element(Employees)
  8          )
  9          as element(r)*
 10          {
 11            for $e in $children/Employee
 12            let $child := element {concat("L",$level)} { $e/@id, $e/Name }
 13            let $nt := $e/Employees
 14            return
 15              if ($nt) then local:getRows($level + 1, $parent | $child, $nt)
 16                       else element r { $parent | $child }
 17          }; (: :)
 18          local:getRows(1, (), /Employees)'
 19         passing t.object_value
 20         columns empno_1 number       path 'L1/@id'
 21               , ename_1 varchar2(10) path 'L1/Name'
 22               , empno_2 number       path 'L2/@id'
 23               , ename_2 varchar2(10) path 'L2/Name'
 24               , empno_3 number       path 'L3/@id'
 25               , ename_3 varchar2(10) path 'L3/Name'
 26               , empno_4 number       path 'L4/@id'
 27               , ename_4 varchar2(10) path 'L4/Name'
 28       ) x
 29  ;
 
   EMPNO_1 ENAME_1       EMPNO_2 ENAME_2       EMPNO_3 ENAME_3       EMPNO_4 ENAME_4
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
      7839 KING             7566 JONES            7788 SCOTT            7876 ADAMS
      7839 KING             7566 JONES            7902 FORD             7369 SMITH
      7839 KING             7698 BLAKE            7499 ALLEN                 
      7839 KING             7698 BLAKE            7521 WARD                  
      7839 KING             7698 BLAKE            7654 MARTIN                
      7839 KING             7698 BLAKE            7844 TURNER                
      7839 KING             7698 BLAKE            7900 JAMES                 
      7839 KING             7782 CLARK            7934 MILLER                
 

 

3. XQuery vs. XSLT

Another example involving a larger input XML file (16MB), containing my C: drive structure.
I generated the document using the ALL_PATHS table from this post.
Here’s an excerpt of it :

<fs>
  <f id="1" name="c:">
    <f id="2" name=".rnd"/>
    <f id="3" name="0edb31f63f0be03c5ebc02a918933d">
      <f id="81" name="mpasbase.vdm._p"/>
      <f id="82" name="mpavbase.vdm._p"/>
      <f id="83" name="mpengine.dll._p"/>
      <f id="84" name="MPSigStub.exe"/>
    </f>
    <f id="4" name="2aa3728daf04642614b1ca"/>
    <f id="5" name="DAAA">
      ...
    </f>
  </f>
</fs>

 
a – Flattening using XSLT :

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> set autotrace traceonly statistics
SQL> set pages 100
SQL> set timing on
SQL>
SQL> select x.*
  2  from tmp_xml t
  3     , xmltable('/r'
  4         passing xmltransform(t.object_value,
  5          '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  6            <xsl:template match="f">
  7              <xsl:param name="pid"/>
  8              <r>
  9                <id><xsl:value-of select="@id"/></id>
 10                <name><xsl:value-of select="@name"/></name>
 11                <pid><xsl:value-of select="$pid"/></pid>
 12              </r>
 13              <xsl:apply-templates select="f">
 14                <xsl:with-param name="pid" select="@id"/>
 15              </xsl:apply-templates>
 16            </xsl:template>
 17          </xsl:stylesheet>')
 18         columns file_id    number        path 'id'
 19               , file_name  varchar2(256) path 'name'
 20               , parent_id  number        path 'pid'
 21       ) x
 22  ;

284895 rows selected.

Elapsed: 00:00:32.92

Statistics
----------------------------------------------------------
         65  recursive calls
     503400  db block gets
      96742  consistent gets
          0  physical reads
        140  redo size
   16561382  bytes sent via SQL*Net to client
     209464  bytes received via SQL*Net from client
      18994  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     284895  rows processed

 
b – Using recursive XQuery

SQL> select x.*
  2  from tmp_xml t
  3     , xmltable(
  4         'declare function local:getRows (
  5            $pid as xs:decimal?
  6          , $children as element(f)*
  7          )
  8          as element(r)*
  9          {
 10            for $i in $children
 11            let $id := xs:decimal($i/@id)
 12            return element r {
 13              element id {$id}
 14            , element name {data($i/@name)}
 15            , element pid {$pid}
 16            } | local:getRows($id, $i/f)
 17          }; (: :)
 18          local:getRows((), /fs/f)'
 19         passing t.object_value
 20         columns file_id    number        path 'id'
 21               , file_name  varchar2(256) path 'name'
 22               , parent_id  number        path 'pid'
 23       ) x
 24  ;

284895 rows selected.

Elapsed: 00:00:05.80

Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
       5408  consistent gets
          0  physical reads
          0  redo size
   16605731  bytes sent via SQL*Net to client
     209464  bytes received via SQL*Net from client
      18994  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     284895  rows processed

The stats speak for themselves. The recursive XQuery is much more efficient.

 

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: