Home > SQL, XML DB > Hierarchical XML Aggregation : the XMLNest function

Hierarchical XML Aggregation : the XMLNest function

I’ve been away a few months but here’s a new post :)

I’ll be presenting here another approach for generating a hierarchical XML structure, besides the three ones already explained in a previous post, which were, as a reminder :

  • DBMS_XMLGEN.newContextFromHierarchy()
  • XSLT
  • Recursive XQuery function

This new approach is based on a user-defined aggregate function – called XMLNest – and therefore requires a few additional objects to make it work.

 

1. How does it work?

The XMLNest function expects a hierarchically ordered dataset, typically one that a CONNECT-BY query can produce, with two mandatory columns :

  • A level indicator
  • The XML content to aggregate at that level

For instance :

select level
     , xmlelement("emp", xmlattributes(empno as "id"))
from scott.emp 
connect by prior empno = mgr
start with mgr is null ;

The implementation of the function relies on DOM manipulation (via DBMS_XMLDOM API) to assemble the final document by nesting input XML fragments at their respective depths.

In the aggregation context, we cannot directly pass around DBMS_XMLDOM handles (such as DOMDocument or DOMNode) because they are defined as a PL/SQL record structure :

TYPE DOMDocument IS RECORD (id RAW(13));

However, we can easily use its only attribute of RAW(13) datatype, and rebuild the record as needed by DBMS_XMLDOM routines.

 

2. Implementation

A user-defined aggregate function can only take one input value, but since we need two (level + XML fragment), we will therefore create an object type with those attributes, and feed the function with it :

create or replace type XMLNestItem as object (
  node_level   integer
, xml_content  xmltype
, wrapper      varchar2(4000)
);

Also note the wrapper attribute. When not NULL, a wrapper (container) element of that name will be generated to enclose children of the parent node.

In order to keep track of the ancestors of the current tree branch during aggregation, we’ll also defined a node stack, in the form of a collection of RAW(13) :

create or replace type XMLNestItemStack is table of raw(13);

 

The XMLNest() function is supported by a user-defined object that implements the following mandatory methods :

create or replace type XMLNestImpl as object (
  current_level    integer
, current_node_id  raw(13)
, parent_node_id   raw(13)
, doc_id           raw(13)
, st               XMLNestItemStack
, member function makeNode (self in out XMLNestImpl, fragment in xmltype) return raw
, static function ODCIAggregateInitialize (sctx in out XMLNestImpl) return number
, member function ODCIAggregateIterate (self in out XMLNestImpl, item in XMLNestItem) return number
, member function ODCIAggregateTerminate (self in XMLNestImpl, returnValue out XMLType, flags in number) return number
, member function ODCIAggregateMerge (self in out XMLNestImpl, sctx2 in XMLNestImpl) return number
);

And here’s the actual body :

create or replace type body XMLNestImpl is

  member function makeNode (
    self     in out XMLNestImpl
  , fragment in     XMLType
  ) 
  return raw 
  is
    tmp_doc    dbms_xmldom.DOMDocument := dbms_xmldom.newDOMDocument(fragment);
    tmp_node   dbms_xmldom.DOMNode := dbms_xmldom.makeNode(dbms_xmldom.getDocumentElement(tmp_doc));
    target_doc dbms_xmldom.DOMDocument;
  begin
    target_doc.id := self.doc_id;
    tmp_node := dbms_xmldom.adoptnode(target_doc, tmp_node);
    dbms_xmldom.freeDocument(tmp_doc);
    return tmp_node.id;
  end;

  static function ODCIAggregateInitialize(
    sctx in out XMLNestImpl
  )
  return number 
  is
  begin
    sctx := XMLNestImpl(null, null, null, null, XMLNestItemStack());
    return ODCIConst.Success;
  end;

  member function ODCIAggregateIterate(
    self in out XMLNestImpl
  , item in     XMLNestItem
  ) 
  return number 
  is
    previous_level  integer;
    tmp_doc         dbms_xmldom.DOMDocument;
    parent_node     dbms_xmldom.DOMNode;
    tmp_node        dbms_xmldom.DOMNode;
    tmp_elem        dbms_xmldom.DOMElement;
  begin

    previous_level := self.current_level;
    self.current_level := item.node_level;

    if self.current_level = 1 then

      tmp_doc := dbms_xmldom.newDOMDocument(item.xml_content);
      self.current_node_id := dbms_xmldom.makeNode(dbms_xmldom.getDocumentElement(tmp_doc)).id;
      self.doc_id := tmp_doc.id;

    else

      if self.current_level > previous_level then
        
        if item.wrapper is not null then
          parent_node.id := self.current_node_id;
          tmp_doc.id := self.doc_id;
          tmp_elem := dbms_xmldom.createElement(tmp_doc, item.wrapper);
          self.parent_node_id := dbms_xmldom.appendChild(parent_node, dbms_xmldom.makeNode(tmp_elem)).id;
          dbms_xmldom.freeElement(tmp_elem);
        else
          self.parent_node_id := self.current_node_id;
        end if;
        
        self.st.extend();
        st(st.last) := self.parent_node_id;
        
      elsif self.current_level < previous_level then
        st.trim(previous_level - self.current_level);
        self.parent_node_id := st(st.last);
      end if;

      parent_node.id := self.parent_node_id;
      tmp_node.id := makeNode(item.xml_content);
      self.current_node_id := dbms_xmldom.appendChild(parent_node, tmp_node).id;

    end if;

    return ODCIConst.Success;
    
  end;

  member function ODCIAggregateTerminate (
    self        in  XMLNestImpl
  , returnValue out XMLType
  , flags       in  number
  )
  return number
  is
    tmp_doc  dbms_xmldom.DOMDocument;
  begin
    tmp_doc.id := self.doc_id;
    returnValue := dbms_xmldom.getxmltype(tmp_doc);
    dbms_xmldom.freeDocument(tmp_doc);
    return ODCIConst.Success;
  end;

  member function ODCIAggregateMerge(
    self  in out XMLNestImpl
  , sctx2 in     XMLNestImpl
  ) 
  return number 
  is
  begin
    return ODCIConst.Error;
  end;
  
end;

 

The makeNode() method is a helper that converts an input XMLType instance into a DOMNode of the current DOMDocument being built.

The ODCIAggregateInitialize() method initializes the aggregation context by building a new instance of our supporting object XMLNestImpl.

The ODCIAggregateIterate() method is where all the work is done. It takes as input an instance of a XMLNestItem object containing both XML content and target level and, depending on the latter, inserts the resulting node at the correct place :

  • If current_level = 1 (i.e. root), it initializes a new DOMDocument and stores its handle in the doc_id attribute, so that it can be shared across fetches.
  • If current_level > previous_level, that means we're going down one level deeper in the hierarchy. The current node (or the associated wrapper) becomes the new parent node and is pushed onto the ancestors stack.
  • If current_level < previous_level, that means we're going back at least one level up in the hierarchy. It pops that much node handles from the ancestors stack and restore the parent node so that we can start a new branch from it.

The ODCIAggregateTerminate() method builds the final XMLType return value and frees the memory associated with the DOMDocument.

The ODCIAggregateMerge() method, which is normally only called during parallel execution, does nothing but returning an error status because I've not yet found an easy way to parallelize this kind of aggregation.

 

Last step is creating the aggregate function itself :

create or replace function xmlnest (
  input in XMLNestItem
) 
return XMLType
aggregate using XMLNestImpl ;

 

3. Examples

No wrapper :

SQL> select xmlnest(
  2           xmlnestitem(
  3             level
  4           , xmlelement("emp", xmlattributes(empno as "id"))
  5           , null
  6           )
  7         ) as result
  8  from scott.emp
  9  connect by prior empno = mgr
 10  start with mgr is null ;
 
RESULT
------------------------------------------------------------
<emp id="7839">
  <emp id="7566">
    <emp id="7788">
      <emp id="7876"/>
    </emp>
    <emp id="7902">
      <emp id="7369"/>
    </emp>
  </emp>
  <emp id="7698">
    <emp id="7499"/>
    <emp id="7521"/>
    <emp id="7654"/>
    <emp id="7844"/>
    <emp id="7900"/>
  </emp>
  <emp id="7782">
    <emp id="7934"/>
  </emp>
</emp>
 

With wrappers :

SQL> select xmlnest(
  2           xmlnestitem(
  3             level
  4           , xmlelement("emp"
  5             , xmlattributes(empno as "id")
  6             , xmlelement("name", ename)
  7             )
  8           , 'subordinates'
  9           )
 10         ) as result
 11  from scott.emp
 12  connect by prior empno = mgr
 13  start with mgr is null ;
 
RESULT
------------------------------------------------------------
<emp id="7839">
  <name>KING</name>
  <subordinates>
    <emp id="7566">
      <name>JONES</name>
      <subordinates>
        <emp id="7788">
          <name>SCOTT</name>
          <subordinates>
            <emp id="7876">
              <name>ADAMS</name>
            </emp>
          </subordinates>
        </emp>
        <emp id="7902">
          <name>FORD</name>
          <subordinates>
            <emp id="7369">
              <name>SMITH</name>
            </emp>
          </subordinates>
        </emp>
      </subordinates>
    </emp>
    <emp id="7698">
      <name>BLAKE</name>
      <subordinates>
        <emp id="7499">
          <name>ALLEN</name>
        </emp>
        <emp id="7521">
          <name>WARD</name>
        </emp>
        <emp id="7654">
          <name>MARTIN</name>
        </emp>
        <emp id="7844">
          <name>TURNER</name>
        </emp>
        <emp id="7900">
          <name>JAMES</name>
        </emp>
      </subordinates>
    </emp>
    <emp id="7782">
      <name>CLARK</name>
      <subordinates>
        <emp id="7934">
          <name>MILLER</name>
        </emp>
      </subordinates>
    </emp>
  </subordinates>
</emp>
 

Performance considerations :

Although it runs well for small datasets, do not expect good performance out of this for large aggregation context.

Below is a "run time vs. rows" plot that nicely sums up the tests I've conducted on my machine (laptop PC, win7, Oracle 12.1.0.2).
We can observe an apparent quadratic growth :

xmlnest_plot01

 

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: