Home > XML DB, XSLT > XMLTABLE vs. External XSLT preprocessor

XMLTABLE vs. External XSLT preprocessor

September 16, 2013 Leave a comment Go to comments

We all know – I hope ;) – about XMLTABLE function to shred an XML document into relational rows and columns. In this article, I will describe an alternative approach to achieve the same result with an external table and the preprocessor feature.
And in case you wonder, no, it’s not one of this attempt to implement an XML parser using SQL*Loader syntax, something we can see from time to time as an answer to “How can I load my XML file using SQL*Loader?”.
This method relies on the external table preprocessor to transform the input XML file into CSV format and pipe the result to the loader.

 

1. Set up

First, in order to use the PREPROCESSOR feature correctly, we’ll need two directory objects : one for the executable and related files (BIN_DIR), and one for the data (XML_DIR).
The following script creates both :

-- As SYS user : 
create or replace directory bin_dir as 'C:\oracle\test\BIN';
grant execute on directory bin_dir to dev;

create or replace directory xml_dir as 'C:\oracle\test';
grant read, write on directory xml_dir to dev;

The XSL transformation is achieved using the command-line version of the Oracle XSLT Virtual Machine (XVM).
The CLI for the XVM is available in the $ORACLE_HOME/BIN directory of the database software installation.
For example, on a Windows OS :

c:\oracle\product\12.1.0\dbhome\BIN>xvm
Usage:
  xvm  switches <xslfile> <xmlfile>
  xvm  switches <xpath> <xmlfile>

Switches:
    -c        Compile <xslfile>. The bytecode is in '<xmlfile>.xvm'.
    -ct       Compile <xslfile> and transform <xmlfile>.
    -t        Transform <xmlfile> using bytecode from <xslfile>.
    -xc       Compile <xpath>. The bytecode is in 'code.xvm'.
    -xct      Compile and evaluate <xpath> with <xmlfile>.
    -xt       Evaluate XPath bytecode from <xpath> with <xmlfile>.

Examples:
  xvm  -ct  db.xsl db.xml
  xvm  -t   db.xvm db.xml
  xvm  -xct "doc/emloyee[15]/family"  db.xml

The input XML document is a 3-column, 50000-row Excel worksheet saved as spreadSheetML format : smldata.xml (12.5 MB), and this is the XSL stylesheet (xml2csv.xsl) :

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
  <xsl:output method="text" encoding="UTF-8"/>
  <xsl:template match="/"> 
    <xsl:apply-templates select="ss:Workbook/ss:Worksheet/ss:Table/ss:Row"/>
  </xsl:template>
  <xsl:template match="ss:Row">
    <xsl:apply-templates select="ss:Cell"/>
    <xsl:text>
</xsl:text>
  </xsl:template>
  <xsl:template match="ss:Cell">
    <xsl:if test="position()>1">
      <xsl:text>;</xsl:text>
    </xsl:if>
    <xsl:value-of select="ss:Data"/>
  </xsl:template>
</xsl:stylesheet>

The stylesheet will not be used as such but compiled into bytecode to be executed by the XVM utility.
The following calls the XVM compiler, creates file “xml2csv.xvm” and tests the transformation on a 3-row file (smldata2.xml).

C:\Users\Marc>set oracle_home=c:\oracle\product\12.1.0\dbhome

C:\Users\Marc>cd c:\oracle\test

c:\oracle\test>%ORACLE_HOME%\BIN\xvm -c xml2csv.xsl

c:\oracle\test>move xml2csv.xvm .\bin
        1 fichier(s) déplacé(s).

c:\oracle\test>%ORACLE_HOME%\BIN\xvm -t bin\xml2csv.xvm smldata2.xml
1;LINE-00001;C2DBQCOZSL5KQ1FKDTNLLNREQKT6I4
2;LINE-00002;3IXC8IRXGRIH5V7PD7LYOK1QF2N6I8
3;LINE-00003;GSIPYE3HYVCVBTGTC4LBJHRJUILKJQ

Putting it all together, here’s the final batch file that will be called by the preprocessor (transform.bat) :

@echo off
cd c:\oracle\test
c:\oracle\product\12.1.0\dbhome\BIN\xvm -t bin\xml2csv.xvm %1

Both files “xml2csv.xvm” and “transform.bat” will go in the BIN_DIR directory.

Finally, the external table definition :

CREATE TABLE smldata_ext
(
  rec_id      NUMBER
, description VARCHAR2(80)
, rec_value   VARCHAR2(30)
)
ORGANIZATION EXTERNAL
(
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY xml_dir
  ACCESS PARAMETERS 
  (
    RECORDS DELIMITED BY NEWLINE 
    CHARACTERSET AL32UTF8
    PREPROCESSOR bin_dir:'transform.bat'
    FIELDS TERMINATED BY ';'
  )
  LOCATION ('smldata.xml')
)
REJECT LIMIT UNLIMITED ;

 

2. Testing

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> set timing on
SQL> set lines 200
SQL> set pages 100
SQL> set autotrace traceonly
SQL>
SQL>
SQL> select * from smldata_ext;

50000 rows selected.

Elapsed: 00:00:03.85

Execution Plan
----------------------------------------------------------
Plan hash value: 791840889

------------------------------------------------------------------------------------------
| Id  | Operation                  | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |             |  8168 |   574K|    29   (0)| 00:00:01 |
|   1 |  EXTERNAL TABLE ACCESS FULL| SMLDATA_EXT |  8168 |   574K|    29   (0)| 00:00:01 |
------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         51  recursive calls
          0  db block gets
        399  consistent gets
          0  physical reads
          0  redo size
    3036785  bytes sent via SQL*Net to client
      37206  bytes received via SQL*Net from client
       3335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      50000  rows processed

In comparison, here’s the execution of a single XMLTABLE query directly on the XML file, and giving the same result :

SQL> select *
  2  from xmltable(
  3         xmlnamespaces(default 'urn:schemas-microsoft-com:office:spreadsheet')
  4       , '/Workbook/Worksheet/Table/Row'
  5         passing xmltype(bfilename('XML_DIR', 'smldata.xml'), nls_charset_id('AL32UTF8'))
  6         columns rec_id      number       path 'Cell[1]/Data'
  7               , description varchar2(80) path 'Cell[2]/Data'
  8               , rec_value   varchar2(30) path 'Cell[3]/Data'
  9       ) ;

50000 rows selected.

Elapsed: 00:00:07.46

Execution Plan
----------------------------------------------------------
Plan hash value: 3781821901

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  8168 | 49008 |    29   (0)| 00:00:01 |
|   1 |  XMLTABLE EVALUATION |      |       |       |            |          |
-----------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        371  recursive calls
      44046  db block gets
      10715  consistent gets
          2  physical reads
          0  redo size
    3036785  bytes sent via SQL*Net to client
      37206  bytes received via SQL*Net from client
       3335  SQL*Net roundtrips to/from client
          9  sorts (memory)
          0  sorts (disk)
      50000  rows processed

Although a little bit slower than the preprocessor approach, we can see again the huge improvement that has been done on the XMLTABLE function. Tested on the same machine running 11.2.0.3, the query takes hours to return the result set, but less than 8 seconds in 12.1 !

Finally, same test with the document stored in a binary XMLType table :

SQL> insert into tmp_xml values (
  2    xmltype(bfilename('XML_DIR', 'smldata.xml'), nls_charset_id('AL32UTF8'))
  3  );

1 row created.

SQL> set timing on
SQL> set autotrace traceonly
SQL>
SQL> select /*+ no_xml_query_rewrite */ x.*
  2  from tmp_xml t
  3     , xmltable(
  4         xmlnamespaces(default 'urn:schemas-microsoft-com:office:spreadsheet')
  5       , '/Workbook/Worksheet/Table/Row'
  6         passing t.object_value
  7         columns rec_id      number       path 'Cell[1]/Data'
  8               , description varchar2(80) path 'Cell[2]/Data'
  9               , rec_value   varchar2(30) path 'Cell[3]/Data'
 10       ) x ;

50000 rows selected.

Elapsed: 00:00:04.66

Execution Plan
----------------------------------------------------------
Plan hash value: 1416101436

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |  2008 |    31   (0)| 00:00:01 |
|   1 |  NESTED LOOPS         |         |     1 |  2008 |    31   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | TMP_XML |     1 |  2002 |     2   (0)| 00:00:01 |
|   3 |   XMLTABLE EVALUATION |         |       |       |            |          |
---------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
        196  consistent gets
       1147  physical reads
          0  redo size
    3036785  bytes sent via SQL*Net to client
      37206  bytes received via SQL*Net from client
       3335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      50000  rows processed

 

6. Conclusion

Method Rows fetched Overall time (second)
External XSLT preprocessor 50000 3.9
XMLTABLE over Binary XML 50000 4.7
XMLTABLE over BFILE 50000 7.5
About these ads
  1. Anonymous
    July 10, 2014 at 02:38

    Hi Sir,
    I kindly need your help. I have a requirement to load a huge xml file into oracle db 11g using a stored procedure. We’ll be getting data weekly and the requirement is to sort of automate the process so that one only needs to place the file in the directory and run the procedure. I haven’t worked with XML before, any pointers? Thanks.

    • July 13, 2014 at 16:56

      Hi,

      Search for XMLTABLE examples here and in the XML DB forum.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: