Archive

Archive for the ‘XML DB’ Category

Parsing sibling repeating XML groups : Part 2

June 3, 2018 Leave a comment

This post is an addendum to a previously published How-To presenting various techniques to break sibling repeating groups into relational data :

<root>
  <id>1</id>
  <name>SMITH</name>
  <id>2</id>
  <name>WILSON</name>
  <id>3</id>
  <name>GRANT</name>
</root>

to :

ID    NAME
----- ---------
    1 SMITH
    2 WILSON
    3 GRANT

The topic was raised again recently on the Oracle Community forum, but with a particularity I didn’t cover originally : what about optional nodes?
For instance, if id 2 has no name :

<root>
  <id>1</id>
  <name>SMITH</name>
  <id>2</id>
  <id>3</id>
  <name>GRANT</name>
</root>

The “following-sibling” trick will not return the required output :

SQL> select x.*
  2  from xmltable(
  3         'for $i in /root/id
  4          return element r {
  5            $i/following-sibling::name[1]
  6          , $i
  7          }'
  8         passing xmlparse(document
  9         '<root>
 10            <id>1</id><name>SMITH</name>
 11            <id>2</id>
 12            <id>3</id><name>GRANT</name>
 13          </root>')
 14         columns id    number       path 'id'
 15               , name  varchar2(15) path 'name'
 16       ) x
 17  ;
 
        ID NAME
---------- ---------------
         1 SMITH
         2 GRANT          
         3 GRANT

One way to workaround this is to add another predicate to ensure that the node is actually the one coming directly after the current key element, and not one belonging to another logical sibling group further down in the document.

In the following example, I’ll use the node identity comparison operator ‘is’ :

SQL> select x.*
  2  from xmltable(
  3         'for $i in /root/id
  4          return element r {
  5            $i/following-sibling::name[1][preceding-sibling::id[1] is $i]
  6          , $i
  7          }'
  8         passing xmlparse(document
  9         '<root>
 10            <id>1</id><name>SMITH</name>
 11            <id>2</id>
 12            <id>3</id><name>GRANT</name>
 13          </root>')
 14         columns id    number       path 'id'
 15               , name  varchar2(15) path 'name'
 16       ) x
 17  ;
 
        ID NAME
---------- ---------------
         1 SMITH
         2                
         3 GRANT
 

 

Additional Approaches and Performance Considerations

The full XQuery approach (w/ or w/o RETURNING SEQUENCE BY REF clause) is rather compact and maintainable, but it is also very slow when applied on large XML documents.
Mixed approaches using XQuery and SQL post-processing provide far better performance.
I’ll present below three other ways using PIVOT, MODEL and MATCH_RECOGNIZE SQL features.

Read more…

Advertisements
Categories: SQL, XQuery Tags: , ,

Reading Large XML Node Values using the DOM API

March 24, 2018 Leave a comment

As of Oracle release 11.1, DBMS_XMLDOM API has been extended with a set of stream-based functions and procedures to read and write large node data (i.e. data exceeding VARCHAR2 size limit) :
Link to documentation

Sounds great on the paper and well documented, but looking closer and actually trying those functionalities on real examples turns out to be a bit disappointing.
That’s why I could have subtitled this post “The Good, the Bad and the Ugly”.

 

1- Reading a large node value composed of single-byte characters : the “Good”

In this example, I’ll use a simple XML document containing a single 128K text node :

<test>XXXX ... XXXX</test>

and try to read it into a CLOB instance via the DOM streaming API.

declare

  xmldoc   xmltype;
  domdoc   dbms_xmldom.DOMDocument;
  node     dbms_xmldom.DOMNode;
  istream  utl_characterinputstream;
  tmp      varchar2(32767);
  nread    integer := 32767;
  output   clob;
  
begin
  
  select xmlelement("test", rpad(to_clob('X'),131072,'X'))
  into xmldoc
  from dual;
  
  domdoc := dbms_xmldom.newDOMDocument(xmldoc);
  node := dbms_xslprocessor.selectSingleNode(dbms_xmldom.makenode(domdoc), '/test/text()');
  
  dbms_lob.createtemporary(output, true);
  
  istream := dbms_xmldom.getNodeValueAsCharacterStream(node);
  
  loop
    istream.read(tmp, nread);
    exit when nread = 0;
    dbms_lob.writeappend(output, nread, tmp);
  end loop;
  
  istream.close();
  
  dbms_output.put_line('output size = '||dbms_lob.getlength(output));
  
  dbms_lob.freetemporary(output);
  dbms_xmldom.freeDocument(domdoc);
  
end;  
/

output size = 131072
 
PL/SQL procedure successfully completed.

Works great.
We retrieve an input stream for the large node by calling getNodeValueAsCharacterStream() function.
Node data is then read from that stream in 32K chunks, and appended to a temp CLOB.

This was tested on 12.1.0.2, with database character set AL32UTF8.
Note that the large string we want to extract in this example consists only in single-byte characters.

Let’s see what happens when the data contains multi-byte characters…

 

Read more…

How To : Generate Empty XML Attributes

February 17, 2018 1 comment

It is well-known that XML attributes may be generated using SQL/XML publishing function XMLAttributes().
It is also well-known that XMLAttributes() does not generate anything when passed an expression that evaluates to NULL :

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/adxdb/generation-of-XML-data-from-relational-data.html

If an attribute value expression evaluates to NULL, then no corresponding attribute is created.

So, how can we generate an empty attribute using SQL?

<root attr=""/>

Following are a compilation of “tricks” (more than official methods) to generate empty XML attributes, and empty namespace declarations in a SQL context.
I’ll divide them into two categories :

  • Document post-processing : target attributes we want to be empty are transformed after the complete document has been generated
  • In-place generation : empty attributes are directly generated in the same “flow”

 

Read more…

Categories: HowTo, SQL, XQuery Tags: , ,

ExcelTable 1.6 : support for cell comments

January 3, 2018 Leave a comment

Here’s the new version of ExcelTable, which can now extract cell comments as regular columns.
Not much change in the user interface, except an extended column syntax specification to declare a request for cell metadata instead of its value :

For example, using sample file ooxdata3.xlsx :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , q'{
  7             "RN"             for ordinality
  8           , "SPARE2"         varchar2(30)   column 'F'
  9           , "SPARE2_COMMENT" varchar2(2000) column 'F' for metadata (comment)
 10           }'
 11         , '2:11'
 12         )
 13       ) t
 14  ;

 RN SPARE2 SPARE2_COMMENT
--- ------ ------------------------------
  1
  2
  3 OK     bleronm:
           This is a comment.

  4 OK
  5
  6 OK
  7
  8
  9        This is
           another comment
           on three lines

 10

10 rows selected.

 

Source code available on GitHub :

/mbleron/ExcelTable

 

A few words about the internals are following…

 

Read more…

Categories: Miscellaneous, PL/SQL, SQL, XQuery Tags:

ExcelTable 1.3 : support for password-encrypted files

June 5, 2017 3 comments

Here’s the new version of ExcelTable, which can now read password-encrypted files.
It supports Standard and Agile encryption methods, as specified in [MS-OFFCRYPTO].

By default, Office 2007 will encrypt using the Standard method whereas Office 2010 and onwards use Agile encryption.
AES (128 or 256) is usually the default algorithm on standard Office installations.
Because latest Office versions (2013+) make use of SHA-2 hashing algorithms, Oracle 12c is required to read Excel documents encrypted in those versions.

Basically, the only change from ExcelTable 1.2 is the addition of an optional argument p_password in getRows() function :

function getRows (
  p_file     in blob
, p_sheet    in varchar2
, p_cols     in varchar2
, p_range    in varchar2 default null
, p_method   in binary_integer default DOM_READ
, p_password in varchar2 default null
) 
return anydataset pipelined
using ExcelTableImpl;

The following dependencies are also required :
XUTL_CDF
XUTL_OFFCRYPTO

 

Source code available on GitHub :

/mbleron/ExcelTable

 

A few words about the internals are following…

 

Read more…

TreeBuilder – a PL/SQL graphical tree generator

February 26, 2017 Leave a comment

TreeBuilder computes the set of node coordinates necessary to represent a single-rooted tree in a graphical environment.
Node positioning is implemented using the improved version of Walker’s algorithm, published by Buchheim, J√ľnger and Leipert :
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.8757

Tree data is exposed as a pipelined function.
The API also provides a constructor to visualize the tree as an SVG object.

Source code available on GitHub :

/mbleron/oracle/TreeBuilder

 

Read more…

How To : Flatten out a Recursive XML Document

January 17, 2017 Leave a comment

I blogged a few times about XML flattening in the past :

Those posts were about converting any XML structure and data into a schema-less model called “Edge”.
What I’ll be presenting here applies to XML documents having a recursive structure,
e.g.

<emp id="1">
  <emp id="2"/>
  <emp id="3">
    <emp id="4"/>
    <emp id="5"/>
  </emp>
</emp>

which could be converted to an adjency list model :

ID  PARENT_ID
--- ---------
  1
  2         1
  3         1
  4         3
  5         3

or, denormalized into level-based columns, like this :

ID_1 ID_2 ID_3
---- ---- ----
   1    2
   1    3    4
   1    3    5

 

Read more…