Home > HowTo, XQuery, XSLT > How To : Flatten out an XML Hierarchical Structure

How To : Flatten out an XML Hierarchical Structure

December 13, 2011 Leave a comment Go to comments

Besides the usual requirement to present XML data in a relational way, sometimes we may also need to flatten the whole XML structure and build parent-child relationships from it.
For instance, converting this :

<root>
 <item>ABC<item>
 <item/>
 <item>
   <subitem>XYZ</subitem>
 </item>
<root>

to

    TAG_ID TAG_NAME        TAG_VALUE          PARENT_TAG_ID
---------- --------------- ---------------- ---------------
         1 root
         2 item            ABC                            1
         3 item                                           1
         4 item                                           1
         5 subitem         XYZ                            4

I’m presenting here three techniques to achieve that, each involving a recursive approach with XQuery, XSLT or SQL.
I’ll be using the following XML document (stored in a binary XMLType table) throughout the demo :

<?xml version="1.0" encoding="UTF-8"?>
<Product Type="Laptop">
 <Notebook Brand="HP" Model="Pavilion dv6-3132TX Notebook">
  <Harddisk>640 GB</Harddisk>
  <Processor>Intel Core i7</Processor>
  <RAM>4 GB</RAM>
 </Notebook>
 <Notebook Brand="HP" Model="HP Pavilion dv6-3032TX Notebook">
  <Harddisk>640 GB</Harddisk>
  <Processor>Intel Core i7</Processor>
  <RAM>6 GB</RAM>
 </Notebook>
 <Notebook Brand="Toshiba" Model="Satellite A660/07R 3D Notebook">
  <Harddisk>640 GB</Harddisk>
  <Processor>Intel Core i7</Processor>
  <RAM>4 GB</RAM>
 </Notebook>
 <Notebook Brand="Toshiba" Model="Satellite A660/15J Notebook">
  <Harddisk>640 GB</Harddisk>
  <Processor>Intel Core i5</Processor>
  <RAM>6 GB</RAM>
 </Notebook>
</Product>

Although it’s not technically correct, in the following examples, attributes will be considered as children of the element they belong to.

 

1. The XQuery way – part 1 : using a recursive function

SQL> SELECT x.*
  2  FROM xml_test t
  3     , XMLTable(
  4       'declare function local:getChildren($e as node(), $pID as xs:string?) as element()*
  5        {
  6          for $i at $p in $e/(child::*|attribute::*)
  7          let $ID := if ($pID) then concat($pID,".",$p) else "1"
  8          return element r
  9          {
 10            element node_id {$ID}
 11          , element parent_node_id {$pID}
 12          , element node_name {name($i)}
 13          , if ($i instance of attribute())
 14              then ( element node_value {data($i)}, element node_kind {"attribute"} )
 15              else ( element node_value {$i/text()}, element node_kind {"element"} )
 16          }
 17          | local:getChildren($i,$ID)
 18        }; (: :)
 19        local:getChildren($d,())'
 20        passing t.object_value as "d"
 21        columns node_id         varchar2(100)   path 'node_id'
 22              , node_name       varchar2(30)    path 'node_name'
 23              , node_value      varchar2(2000)  path 'node_value'
 24              , parent_node_id  varchar2(100)   path 'parent_node_id'
 25              , node_kind       varchar2(30)    path 'node_kind'
 26       ) x
 27  ;
 
NODE_ID    NODE_NAME       NODE_VALUE                               PARENT_NODE_ID  NODE_KIND
---------- --------------- ---------------------------------------- --------------- ----------
1          Product                                                                  element
1.1        Type            Laptop                                   1               attribute
1.2        Notebook                                                 1               element
1.2.1      Brand           HP                                       1.2             attribute
1.2.2      Model           Pavilion dv6-3132TX Notebook             1.2             attribute
1.2.3      Harddisk        640 GB                                   1.2             element
1.2.4      Processor       Intel Core i7                            1.2             element
1.2.5      RAM             4 GB                                     1.2             element
1.3        Notebook                                                 1               element
1.3.1      Brand           HP                                       1.3             attribute
1.3.2      Model           HP Pavilion dv6-3032TX Notebook          1.3             attribute
1.3.3      Harddisk        640 GB                                   1.3             element
1.3.4      Processor       Intel Core i7                            1.3             element
1.3.5      RAM             6 GB                                     1.3             element
1.4        Notebook                                                 1               element
1.4.1      Brand           Toshiba                                  1.4             attribute
1.4.2      Model           Satellite A660/07R 3D Notebook           1.4             attribute
1.4.3      Harddisk        640 GB                                   1.4             element
1.4.4      Processor       Intel Core i7                            1.4             element
1.4.5      RAM             4 GB                                     1.4             element
1.5        Notebook                                                 1               element
1.5.1      Brand           Toshiba                                  1.5             attribute
1.5.2      Model           Satellite A660/15J Notebook              1.5             attribute
1.5.3      Harddisk        640 GB                                   1.5             element
1.5.4      Processor       Intel Core i5                            1.5             element
1.5.5      RAM             6 GB                                     1.5             element
 
26 rows selected
 

 

2. The XQuery way – part 2 : using SQL recursive subquery factoring

Here, while the “extract” part is still handled by XQuery, the recursive part has been outsourced to a recursive subquery factoring clause (11.2 feature).

SQL> WITH xml_hierarchy (node_id, node_name, node_value, parent_node_id, node_kind, xml_data) as (
  2    SELECT '1', x.node_name, x.node_value, null, 'element', xt.object_value
  3    FROM xml_test xt,
  4         XMLTable('/*' passing xt.object_value
  5           columns node_name  varchar2(30)   path 'name()'
  6                 , node_value varchar2(2000) path 'text()'
  7                 , root       xmltype        path '.'
  8         ) x
  9    UNION ALL
 10    SELECT t.node_id || '.' || x.node_id
 11         , x.node_name
 12         , x.node_value
 13         , t.node_id
 14         , x.node_kind
 15         , x.xml_data
 16    FROM xml_hierarchy t
 17       , XMLTable(
 18         'for $i in /*/(attribute::*|child::*)
 19          return element r {
 20            typeswitch ($i)
 21              case attribute() return ( attribute kind {"attribute"}, element {name($i)}{data($i)} )
 22              case element()   return $i
 23              default          return ()
 24          }'
 25          passing t.xml_data
 26            columns node_id    for ordinality
 27                  , node_name  varchar2(30)   path 'name(*)'
 28                  , node_value varchar2(2000) path '*/text()'
 29                  , node_kind  varchar2(30)   path '@kind' default 'element'
 30                  , xml_data   xmltype        path '*'
 31         ) x
 32  )
 33  SEARCH DEPTH FIRST BY node_id SET node_order
 34  CYCLE node_id SET is_cycle TO '1' DEFAULT '0'
 35  SELECT node_id
 36       , node_name
 37       , node_value
 38       , parent_node_id
 39       , node_kind
 40  FROM xml_hierarchy
 41  ;
 
NODE_ID    NODE_NAME       NODE_VALUE                               PARENT_NODE_ID  NODE_KIND
---------- --------------- ---------------------------------------- --------------- ----------
1          Product                                                                  element
1.1        Type            Laptop                                   1               attribute
1.2        Notebook                                                 1               element
1.2.1      Brand           HP                                       1.2             attribute
1.2.2      Model           Pavilion dv6-3132TX Notebook             1.2             attribute
1.2.3      Harddisk        640 GB                                   1.2             element
1.2.4      Processor       Intel Core i7                            1.2             element
1.2.5      RAM             4 GB                                     1.2             element
1.3        Notebook                                                 1               element
1.3.1      Brand           HP                                       1.3             attribute
1.3.2      Model           HP Pavilion dv6-3032TX Notebook          1.3             attribute
1.3.3      Harddisk        640 GB                                   1.3             element
1.3.4      Processor       Intel Core i7                            1.3             element
1.3.5      RAM             6 GB                                     1.3             element
1.4        Notebook                                                 1               element
1.4.1      Brand           Toshiba                                  1.4             attribute
1.4.2      Model           Satellite A660/07R 3D Notebook           1.4             attribute
1.4.3      Harddisk        640 GB                                   1.4             element
1.4.4      Processor       Intel Core i7                            1.4             element
1.4.5      RAM             4 GB                                     1.4             element
1.5        Notebook                                                 1               element
1.5.1      Brand           Toshiba                                  1.5             attribute
1.5.2      Model           Satellite A660/15J Notebook              1.5             attribute
1.5.3      Harddisk        640 GB                                   1.5             element
1.5.4      Processor       Intel Core i5                            1.5             element
1.5.5      RAM             6 GB                                     1.5             element
 
26 rows selected
 

 

3. The XSLT way… with a bit of XQuery

In this example, the recursive logic is implemented by the XSL stylesheet, which is in charge of transforming the input document into a simple ROWSET/ROW structure to be passed to the XMLTable function.
The result-set is almost the same as previous methods, the difference lies in the way we compute the NODE_ID. Here, we can use the XSLT function fn:generate-id() to generate a unique identifier for each node in the document. The generated string pattern is implementation-specific, for instance, Oracle seems to be using ‘K’ + ‘node position’ in document order, but other implementors may use a different approach.

SQL> variable xsldoc varchar2(4000)
SQL> begin
  2    :xsldoc :=
  3  '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  4     <xsl:output method="xml"/>
  5     <xsl:template match="/">
  6       <ROWSET>
  7         <xsl:apply-templates select="*"/>
  8       </ROWSET>
  9     </xsl:template>
 10     <xsl:template match="*">
 11       <xsl:param name="pid"/>
 12       <xsl:variable name="id" select="generate-id(.)"/>
 13       <ROW id="{$id}" pid="{$pid}" name="{name(.)}" kind="element">
 14         <xsl:value-of select="text()"/>
 15       </ROW>
 16       <xsl:apply-templates select="*|@*">
 17         <xsl:with-param name="pid">
 18           <xsl:value-of select="$id"/>
 19         </xsl:with-param>
 20       </xsl:apply-templates>
 21     </xsl:template>
 22     <xsl:template match="@*">
 23       <xsl:param name="pid"/>
 24       <ROW id="{generate-id(.)}" pid="{$pid}" name="{name(.)}" kind="attribute">
 25         <xsl:value-of select="."/>
 26       </ROW>
 27     </xsl:template>
 28   </xsl:stylesheet>';
 29  end;
 30  /
 
PL/SQL procedure successfully completed
 
SQL> SELECT x.*
  2  FROM xml_test t
  3     , XMLTable('/ROWSET/ROW'
  4        passing xmltransform(t.object_value, xmltype(:xsldoc))
  5        columns node_id         varchar2(100)   path '@id'
  6              , node_name       varchar2(30)    path '@name'
  7              , node_value      varchar2(2000)  path 'text()'
  8              , parent_node_id  varchar2(100)   path '@pid'
  9              , node_kind       varchar2(30)    path '@kind'
 10       ) x
 11  ;
 
NODE_ID    NODE_NAME       NODE_VALUE                               PARENT_NODE_ID  NODE_KIND
---------- --------------- ---------------------------------------- --------------- ----------
K1         Product                                                                  element
K2         Type            Laptop                                   K1              attribute
K3         Notebook                                                 K1              element
K4         Brand           HP                                       K3              attribute
K5         Model           Pavilion dv6-3132TX Notebook             K3              attribute
K6         Harddisk        640 GB                                   K3              element
K8         Processor       Intel Core i7                            K3              element
K10        RAM             4 GB                                     K3              element
K12        Notebook                                                 K1              element
K13        Brand           HP                                       K12             attribute
K14        Model           HP Pavilion dv6-3032TX Notebook          K12             attribute
K15        Harddisk        640 GB                                   K12             element
K17        Processor       Intel Core i7                            K12             element
K19        RAM             6 GB                                     K12             element
K21        Notebook                                                 K1              element
K22        Brand           Toshiba                                  K21             attribute
K23        Model           Satellite A660/07R 3D Notebook           K21             attribute
K24        Harddisk        640 GB                                   K21             element
K26        Processor       Intel Core i7                            K21             element
K28        RAM             4 GB                                     K21             element
K30        Notebook                                                 K1              element
K31        Brand           Toshiba                                  K30             attribute
K32        Model           Satellite A660/15J Notebook              K30             attribute
K33        Harddisk        640 GB                                   K30             element
K35        Processor       Intel Core i5                            K30             element
K37        RAM             6 GB                                     K30             element
 
26 rows selected
 

 

Update 2012-11-18 – please see a 4th method in this article :
XML Flattening revisited : Java-based pipelined function

 

Advertisements
  1. David J
    March 8, 2012 at 23:36

    This article is extremely useful for a project I am working on. It is such an elegant solution (3 solutions, actually) that worked just as written. Thank you so much.

  2. Ganesh
    June 21, 2012 at 02:39

    Hi

    Thank you, Exactly what I was looking for. Perfectly worked for few columns.

    Oracle Version: 10204
    RAC Instance: 4 Node

    In my case we have to fetch more 300 columns from the xmltype. Which runs very slow.

    In Some case it takes more than 150 Minutes to fetch the data.

    Would you please suggest a faster way. thanks.

    • June 22, 2012 at 12:32

      Hi Ganesh,

      Which method(s) did you try?
      And what do you mean by “300 columns”? The query (whichever the method) returns a fixed number of columns.
      Did you mean 300 rows?

      Is it possible you send me a sample XML and what the expected output should look like? I’ll see what I can do in your version :
      mb[dot]perso[at]wanadoo[dot]fr or post the necessary details in the OTN XML DB forum if you want.
      Thanks.

  3. Graham Turner
    January 29, 2013 at 17:04

    Hi

    Wonder if u can shed light on an ora error i’m getting in method 1

    ORA-19228 XPST0008 – undeclared identifier prefix “i” local-name
    (Mine is as your code local:getChildren)

    I think this to do with the local: namespace prefix but omiting this gives

    ORA-19193 XQS0045 : It is a static error if the function name in a function declaration is in one of the following namespaces http://www …….

    Any ideas

  4. Stefan
    April 29, 2014 at 15:55

    Thanks for this great article. Your first solution really helped me out to get a quick start at this topic.

    Best regards
    Stefan

  5. Shrikant Bhadane
    July 11, 2014 at 12:02

    Hi thanks for sharing the above solutions. I have used last way to extract the XML nodes. In my XML in root node I have xsi:schemaLocation=”urn:fsa-gov-uk:MER:Complaints-AllActivities:1 http://www.abc.com/MER/DRG/Complaints-AllActivities/v1/Complaints-AllActivities-Schema.xsd” So there is a space in this tag at AllActivities:1 http which causing an error as your XML db is invalid. How can I solve this error. If I remove the space between 1 http it is working fine, but this is wrong.
    Let me know your responce.

    • July 13, 2014 at 17:00

      Which method exactly? XSLT-based or the Java function?
      I don’t think it should make a difference to have a space or not. What error did you get? What’s the db version?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s