Home > XML DB, XQuery > XML DB Events : Reading an Open Office XML document (.xlsx)

XML DB Events : Reading an Open Office XML document (.xlsx)

January 28, 2012 Leave a comment Go to comments

This post is actually an offshoot of a recent article by Marco Gralike on his blog. Marco describes how to automatically read an Open Office XML (OOX) document, specifically docx format, and store the data in a database table for subsequent queries.
One of the key features is the definition of XML DB Events handlers that will “unzip” the archive for us whenever we drop it in the repository.

The funny thing is that, in the meantime, I had already begun to work on the xlsx version, including my own unzip utility. Marco’s example is using Anton Scheffer’s great zip/unzip package which, although based on the same extraction technique, is much more complete than mine.

Installation script and package for my demo are available here : oox_sml.zip
I’ll explain some of the steps below in detail.

 

1) XMLType tables

In order to retrieve structured data from a spreadsheetML (SML) document, we have to look – at least – at the following archive parts :

  • [Content_Types].xml “dictionary” file
  • workbook file
  • worksheet file(s)
  • “shared strings” file

Here, only the dictionary file “[Content_Types].xml” has a fixed name, it’s the entry point that describes the structure of the OOX document, and the one file we’re going to read first to determine which part goes to which table :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
  <Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
  <Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/>
  <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Default Extension="xml" ContentType="application/xml"/>
  <Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
  <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
  <Override PartName="/xl/worksheets/sheet2.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
  <Override PartName="/xl/worksheets/sheet3.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
  <Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
  <Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/>
  <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>

 

The four types mentioned above will be stored in separate XMLType tables, plus two relational tables holding relationships between workbook and worksheets, and shared strings :

create table oox_sml_sharedstrings of xmltype;

create table oox_sml_workbook of xmltype;

create table oox_sml_worksheet of xmltype;

create table oox_sml_content_types of xmltype;

create table oox_sml_workbook_rels (
  filename   varchar2(2000)
, sheetname  varchar2(31)
, target     varchar2(2000) not null
, ws_xmlref  ref xmltype
, constraint oox_swr_pk primary key (filename, sheetname)
, constraint oox_sws_xmlref_fk foreign key (ws_xmlref) references oox_sml_worksheet on delete cascade
);

create table oox_sml_sst_table (
  filename varchar2(2000)
, idx number
, val varchar2(4000)
, constraint oox_sml_sst_pk primary key (filename, idx)
);

On 11.2.0.2, there’s a bug when we define a foreign key on a REF XMLType column. The workaround in this case just consists in removing the FK and doing the delete “manually” in the delete handler.

We’ll also create a structured XML index (and its associated secondary index) for fast relational access to the Content_Types table :

create index oox_sml_content_types_sxi1 on oox_sml_content_types (object_value) 
indextype is XDB.XMLIndex
parameters (
q'#
XMLTable oox_sml_content_types_xtb
  XMLNamespaces(default 'http://schemas.openxmlformats.org/package/2006/content-types')
, '/Types/Override'
  passing object_value
  columns partname    varchar2(260) path '@PartName'
        , contenttype varchar2(260) path '@ContentType'
#'
);

create index oox_sml_ct_xtb_ix1 on oox_sml_content_types_xtb (oid, contenttype);

 

2) Repository Configuration

This part describes the necessary steps to configure the XML DB repository and events for automatic processing of an incoming file.
First, we’ll create some folders :

declare
  res boolean;
begin
  res := DBMS_XDB.createFolder('/office');
  res := DBMS_XDB.createFolder('/office/excel');
  res := DBMS_XDB.createFolder('/office/excel/conf');
  res := DBMS_XDB.createFolder('/office/excel/temp');
  res := DBMS_XDB.createFolder('/office/excel/docs');
end;
/

commit;

  • conf will hold the resource configuration file for events handling
  • temp will hold uncompressed archives
  • docs will hold original xlsx files

In order to “recognize” an incoming xlsx file as an SML instance document, we’ll also declare a specific MIME mapping (taken from here). That way, any file with extension “.xlsx” dropped in the repository will be assigned the correct MIME type, instead of the default application/octet-stream :

call dbms_xdb.ADDMIMEMAPPING('xlsx', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');

Next step : installing the handler “OOX_SML_STORE”

SQL> @oox_sml_store.pck

Package created.


Package body created.

… and creating the resource configuration file :

declare
  res boolean;
begin
  if dbms_xdb.ExistsResource('/office/excel/conf/sml_rescfg.xml') then
    dbms_xdb.DeleteResource('/office/excel/conf/sml_rescfg.xml');
  end if;
  res := DBMS_XDB.createResource( 
           abspath => '/office/excel/conf/sml_rescfg.xml'
         , data => 
'<ResConfig xmlns="http://xmlns.oracle.com/xdb/XDBResConfig.xsd"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://xmlns.oracle.com/xdb/XDBResConfig.xsd
                                http://xmlns.oracle.com/xdb/XDBResConfig.xsd">
  <event-listeners>
    <listener>
      <description>Handling of Office Open XML spreadsheets</description>
      <schema>OOX</schema>
      <source>OOX_SML_STORE</source>
      <language>PL/SQL</language>
      <events>
        <Pre-Create/>
        <Pre-Delete/>
      </events>
      <pre-condition>
        <existsNode>
          <XPath>/Resource[ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"]</XPath>
        </existsNode>
      </pre-condition>
    </listener>
  </event-listeners>
  <defaultChildConfig>
    <configuration>
      <pre-condition>
        <existsNode>
          <XPath>/Resource[ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"]</XPath>
        </existsNode>
      </pre-condition>
      <path>/office/excel/conf/sml_rescfg.xml</path>
    </configuration>
  </defaultChildConfig>
</ResConfig>'
         , schemaurl => 'http://xmlns.oracle.com/xdb/XDBResConfig.xsd'
         , elem => 'ResConfig'
         );
end;
/

This resource defines the following :

  • event handling program : PL/SQL package OOX_SML_STORE
  • types of event handled : Pre-Create and Pre-Delete
  • triggering condition : the resource created must be a SML document (when applied to a folder)
  • child configuration : when a resource (of type SML) is created, it is assigned the same configuration file

And finally, we assign the configuration to the docs folder :

begin
  DBMS_RESCONFIG.appendResConfig('/office/excel/docs', '/office/excel/conf/sml_rescfg.xml', DBMS_RESCONFIG.APPEND_RESOURCE);
end;
/

 

3) The Event Handler

This part will focus on some key steps in the event handling process.

a. unzip_doc procedure

Once we’ve read the archive structure to retrieve part names (first half of unzip_doc procedure), the following recreates the hierarchy in the XML DB repository and calls unzip_part procedure to extract each part and create the corresponding resource :

    -- This recreates the archive structure in the repository : 
    IF dbms_xdb.ExistsResource(ARCHIVE_BASE_URI || g_filename) THEN
      dbms_xdb.DeleteResource(ARCHIVE_BASE_URI || g_filename, dbms_xdb.DELETE_RECURSIVE);
    END IF;

    FOR r IN  (
      SELECT distinct
             '/' || substr(t.column_value, 1, instr(t.column_value, '/', 1, x.column_value) - 1) dir
      FROM TABLE(g_partlist) t,
           TABLE(
             CAST(
               MULTISET(
                 SELECT level
                 FROM dual
                 CONNECT BY level <= length(regexp_replace(t.column_value, '[^/]'))
               )
               AS sys.odcinumberlist
             )
           ) x
      ORDER BY dir
    )
    LOOP
      res := dbms_xdb.CreateFolder(ARCHIVE_BASE_URI || g_filename || r.dir);
    END LOOP;

    -- [Content_Types].xml is unzipped first
    unzip_part(p_zip, content_types_idx, true);

    -- then the other parts
    FOR i IN g_headlist.first..g_headlist.last LOOP
      IF i != content_types_idx THEN
        unzip_part(p_zip, i);
      END IF;
    END LOOP;

    -- sets REFs to worksheet object rows in the relation table
    UPDATE oox_sml_workbook_rels
    SET ws_xmlref = dbms_xdb.getContentXMLRef(target)
    WHERE filename = g_filename
    ;

b. unzip_part procedure

This procedure unzips a single part and store it as a resource. When an XMLType table is associated with the part type, then the XML instance is stored in that table and a REF pointer is used to create the associated resource in the repository :

    IF is_content_types THEN

      insert into oox_sml_content_types t values (xmltype(unzip, XML_CSID))
      returning ref(t) into xmlref;
      res := dbms_xdb.CreateResource(respath, xmlref);

      FOR r IN (
        SELECT x.partname
             , x.contenttype
        FROM oox_sml_content_types t
           , XMLTable(
               XMLNamespaces(default 'http://schemas.openxmlformats.org/package/2006/content-types')
             , '/Types/Override'
               passing t.object_value
               columns partname    varchar2(260) path '@PartName'
                     , contenttype varchar2(260) path '@ContentType'
             ) x
        WHERE ref(t) = xmlref
      )
      LOOP
        g_content_types(r.partname) := r.contenttype;
      END LOOP;

    ELSIF g_content_types.exists(partname) THEN

      case g_content_types(partname)

        when CT_SHAREDSTRINGS then

          insert into oox_sml_sharedstrings t values (xmltype(unzip, XML_CSID))
          returning ref(t) into xmlref;

          res := dbms_xdb.CreateResource(respath, xmlref);
          
          insert into oox_sml_sst_table (filename, idx, val)
          select g_filename, idx-1, val
          from oox_sml_sharedstrings t
             , XMLTable(
                 XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
               , '/sst/si'
                 passing t.object_value
                 columns idx for ordinality
                       , val varchar2(4000) path 't'
               ) x
          where ref(t) = xmlref
          ;

        when CT_WORKBOOK then

          insert into oox_sml_workbook t values (xmltype(unzip, XML_CSID))
          returning ref(t) into xmlref;
          
          res := dbms_xdb.CreateResource(respath, xmlref);
          
          insert into oox_sml_workbook_rels (filename, sheetname, target)
          select g_filename, x.sheetname, x.target
          from XMLTable(
                 XMLNamespaces( 'http://schemas.openxmlformats.org/officeDocument/2006/relationships' as "rel"
                              , 'http://schemas.openxmlformats.org/package/2006/relationships' as "r"
                              , default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' )
               , 'for $i in (#ora:defaultTable OOX_SML_WORKBOOK #){doc($d)/workbook/sheets/sheet}
                    , $j in doc(ora:replace($d, "(.*)/(.*$)", "\1/_rels/\2.rels"))/r:Relationships/r:Relationship
                  where $j/@Id = $i/@rel:id
                  return element r {
                    element sheetname {data($i/@name)}
                  , element target    {resolve-uri($j/@Target, $d)}
                  }'
                 passing respath as "d"
                 columns sheetname  varchar2(31)   path 'sheetname'
                       , target     varchar2(2000) path 'target'
               ) x
          ;

        when CT_WORKSHEET then

          insert into oox_sml_worksheet t values (xmltype(unzip, XML_CSID))
          returning ref(t) into xmlref;

          res := dbms_xdb.CreateResource(respath, xmlref);

        else

          res := dbms_xdb.CreateResource(respath, unzip, XML_CSID);

      end case;

    ELSE
      res := dbms_xdb.CreateResource(respath, unzip, XML_CSID);
    END IF;

Let’s focus a little more on the query used to load the OOX_SML_WORKBOOK_RELS table :

select g_filename, x.sheetname, x.target
from XMLTable(
       XMLNamespaces( 'http://schemas.openxmlformats.org/officeDocument/2006/relationships' as "rel"
                    , 'http://schemas.openxmlformats.org/package/2006/relationships' as "r"
                    , default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' )
     , 'for $i in (#ora:defaultTable OOX_SML_WORKBOOK #){doc($d)/workbook/sheets/sheet}
          , $j in doc(ora:replace($d, "(.*)/(.*$)", "\1/_rels/\2.rels"))/r:Relationships/r:Relationship
        where $j/@Id = $i/@rel:id
        return element r {
          element sheetname {data($i/@name)}
        , element target    {resolve-uri($j/@Target, $d)}
        }'
       passing respath as "d"
       columns sheetname  varchar2(31)   path 'sheetname'
             , target     varchar2(2000) path 'target'
     ) x
;

Basically, it joins a workbook instance to its relationships document (“workbook.xml.rels”) so that we can associate a sheet name (stored in workbook.xml) to its physical sheet instance, e.g. worksheets/sheet1.xml (see diagram below in §4).

As far as I can tell, Excel implementation of the SML format always stores sheet documents in a worksheets folder located in the same folder as workbook.xml.
But actually, a sheet can be located anywhere in the archive, provided the Target attribute of the relationship specifies a valid location relative to the workbook part, for example :

<Relationship Id="rId1" Type="..." Target="../myworksheets/sheet1.xml" />

In that case, sheet1.xml would be located in /myworksheets instead of /xl/worksheets folder.

Here, as the query shows, a simple solution is to use function fn:resolve-uri() which provides an easy way to resolve any relative path (including parent axis).

 

4) Queries

This is where the fun begins!
Below is a simplified diagram representing the structure of an OOX SML document and the relationships between the different parts we’re interested in :

OOX SpreadsheetML diagram

In order to access a specific sheet’s data, we must do the following :

  1. Read [Content_Types].xml to determine the path to the workbook part in the archive, and therefore in the repository once the archive is uncompressed
  2. Read workbook.xml to get the relationship Id of the sheet of interest, for example the one named “Feuil1″
  3. Read workbook.xml.rels to extract the path to the worksheet part corresponding to the relationship Id (“rId1″), in this case “worksheets/sheet1.xml”. Note that this path is relative to the parent folder of the workbook part i.e. “xl”
  4. Read /xl/worksheets/sheet1.xml and retrieve data in row/c/v elements
  5. For each cell element, if the t attribute is “s”, then it means the cell contains a string value whose real location (in the sharedStrings.xml part) is given by the v element. It’s a zero-based index representing the position of the sst/si/t element in the sharedStrings part

Steps 1 to 4 are easy to perform, however, the nature of the sharedStrings implementation is problematic for a relational database, and we’ll ultimately have to choose between different approaches :

  • Joining to a different instance of sharedStrings for each cell
  • Having each cell in separate rows, join once to sharedStrings, and pivot each row’s cells back into columns
  • Using a lookup function

 

Let’s proceed step by step :

a. Storing a document in the repository

In this step, we just drop the test.xlsx file in the /office/excel/docs folder. The archive will be automatically uncompressed into the /office/excel/temp/test.xlsx folder.
The workbook contains three worksheets, first two having two columns and 50000 rows (+ headers), and the third one is empty.

b. Accessing [Content_Types].xml

SQL> variable base_uri varchar2(2000)
SQL> exec :base_uri := '/office/excel/temp/test.xlsx'
 
PL/SQL procedure successfully completed
 
SQL> SELECT ctx.*
  2  FROM oox_sml_content_types ct
  3     , XMLTable(
  4         XMLNamespaces(default 'http://schemas.openxmlformats.org/package/2006/content-types')
  5       , '/Types/Override'
  6         passing ct.object_value
  7         columns partname    varchar2(260) path '@PartName'
  8               , contenttype varchar2(260) path '@ContentType'
  9       ) ctx
 10  WHERE REF(ct) = dbms_xdb.getContentXMLRef(:base_uri || '/[Content_Types].xml')
 11  ;
 
PARTNAME                            CONTENTTYPE
----------------------------------- --------------------------------------------------------------------------------
/xl/theme/theme1.xml                application/vnd.openxmlformats-officedocument.theme+xml
/xl/styles.xml                      application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml
/xl/workbook.xml                    application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml
/docProps/app.xml                   application/vnd.openxmlformats-officedocument.extended-properties+xml
/xl/worksheets/sheet2.xml           application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml
/xl/worksheets/sheet3.xml           application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml
/xl/worksheets/sheet1.xml           application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml
/xl/sharedStrings.xml               application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml
/docProps/core.xml                  application/vnd.openxmlformats-package.core-properties+xml
 
9 rows selected
 

c. Accessing workbook.xml

The above query (well, similar query) is encapsulated in the oox_sml_util.getContentTypeRef function.
The function accesses the Content_Types table to get the part name corresponding to the p_contenttype parameter (here /xl/workbook.xml) and returns a REF to the instance in the workbook table.

SQL> SELECT wb.*
  2  FROM oox_sml_workbook t
  3     , XMLTable(
  4         XMLNamespaces( 'http://schemas.openxmlformats.org/officeDocument/2006/relationships' as "r",
  5                        default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
  6         '/workbook/sheets/sheet'
  7         passing t.object_value
  8         columns sheetname  varchar2(31) path '@name'
  9               , rid        varchar2(30) path '@r:id'
 10       ) wb
 11  WHERE REF(t) = oox_sml_util.getContentTypeRef('test.xlsx'
 12                                              , 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml')
 13  ;
 
SHEETNAME                       RID
------------------------------- ------------------------------
Feuil1                          rId1
Feuil2                          rId2
Feuil3                          rId3
 

Steps b and c are actually not necessary as we’ve already combined them earlier (when creating the workbook resource) and have the result set stored in OOX_SML_WORKBOOK_RELS for further optimal access (Cf. §3b), but since they’re part of the methodology I’m describing them anyway.

d. Accessing worksheets

From now on, I’ll be querying directly from the relations table.
The following retrieves “raw” cell values from the worksheet “Feuil1″ :

SQL> SELECT wsx.*
  2  FROM oox_sml_workbook_rels wr
  3     , oox_sml_worksheet ws
  4     , XMLTable(
  5         XMLNamespaces( default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
  6         '/worksheet/sheetData/row'
  7         passing ws.object_value
  8         columns col1 varchar2(30) path 'c[1]/v'
  9               , col2 number       path 'c[2]/v'
 10       ) wsx
 11  WHERE wr.filename = 'test.xlsx'
 12  AND wr.sheetname = 'Feuil1'
 13  AND ref(ws) = wr.ws_xmlref
 14  ;
 
COL1                                 COL2
------------------------------ ----------
2568                                 2569
2570                                    1
2571                                    2
2572                                    3
2573                                    4
2574                                    5
2575                                    6
2576                                    7
2577                                    8
2578                                    9

...
 

e. Joining to shared strings

  • Two outer joins, one for each cell :
    SQL> SELECT v.rn
      2       , nvl(sst1.val, v.cell1) as cell1
      3       , nvl(sst2.val, v.cell2) as cell2
      4  FROM (
      5    SELECT wr.filename, wsx.*
      6    FROM oox_sml_workbook_rels wr
      7       , oox_sml_worksheet ws
      8       , XMLTable(
      9           XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
     10         , '/worksheet/sheetData/row'
     11           passing ws.object_value
     12           columns rn     number  path '@r'
     13                 , cell1  number  path 'c[1]', type1 varchar2(10) path 'c[1]/@t'
     14                 , cell2  number  path 'c[2]', type2 varchar2(10) path 'c[2]/@t'
     15         ) wsx
     16    WHERE wr.filename = 'test.xlsx'
     17    AND wr.sheetname = 'Feuil1'
     18    AND ref(ws) = wr.ws_xmlref
     19  ) v
     20    LEFT OUTER JOIN oox_sml_sst_table sst1 ON sst1.filename = v.filename AND sst1.idx = v.cell1 AND type1 = 's'
     21    LEFT OUTER JOIN oox_sml_sst_table sst2 ON sst2.filename = v.filename AND sst2.idx = v.cell2 AND type2 = 's'
     22  ;
     
            RN CELL1           CELL2
    ---------- --------------- ---------------
             1 COLUMN1         COLUMN2
             2 TEST00001       1
             3 TEST00002       2
             4 TEST00003       3
             5 TEST00004       4
             6 TEST00005       5
             7 TEST00006       6
             8 TEST00007       7
             9 TEST00008       8
            10 TEST00009       9
     ...
    
    

     

  • Cells extracted into separate rows + a single outer join + pivot :
    SELECT rn
         , min(case when cn = 'A' then nvl(sst.val, v.val) end) as cell_a
         , min(case when cn = 'B' then nvl(sst.val, v.val) end) as cell_b
    FROM (
      SELECT wr.filename, wsx.rn, substr(wsc.cn,1,1) as cn, wsc.val, wsc.type
      FROM oox_sml_workbook_rels wr
         , oox_sml_worksheet ws
         , XMLTable(
             XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
           , '/worksheet/sheetData/row'
             passing ws.object_value
             columns rn    number  path '@r'
                   , cells xmltype path 'c'
           ) wsx
         , XMLTable(
             XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
           , '/c'
             passing wsx.cells
             columns cn    varchar2(30)   path '@r'
                   , val   number         path 'v'
                   , type  varchar2(10)   path '@t'
           ) wsc
      WHERE wr.filename = 'test.xlsx'
      AND wr.sheetname = 'Feuil1'
      AND ref(ws) = wr.ws_xmlref
    ) v
      LEFT OUTER JOIN oox_sml_sst_table sst 
                   ON sst.filename = v.filename
                   AND v.type = 's'
                   AND sst.idx = v.val
    GROUP BY v.rn
    ;
    

     

  • Using a lookup function in the SELECT :
    SELECT wsx.rn
         , case when type1 = 's' then oox_sml_util.getValue2(wr.filename, wsx.cell1) else cell1 end as cell1
         , case when type2 = 's' then oox_sml_util.getValue2(wr.filename, wsx.cell2) else cell2 end as cell2
    FROM oox_sml_workbook_rels wr
       , oox_sml_worksheet ws
       , XMLTable(
           XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
         , '/worksheet/sheetData/row'
           passing ws.object_value
           columns rn     number  path '@r'
                 , cell1  varchar2(4000) path 'c[1]', type1 varchar2(10) path 'c[1]/@t'
                 , cell2  varchar2(4000) path 'c[2]', type2 varchar2(10) path 'c[2]/@t'
         ) wsx
    WHERE wr.filename = 'test.xlsx'
    AND wr.sheetname = 'Feuil1'
    AND ref(ws) = wr.ws_xmlref
    ;
    

 

5) Performance consideration

Unfortunately, neither of the queries above are satisfying in terms of performance.
The “pivot” solution has the longest response time but fetches the whole result set (50,001 rows) a lot faster (less IOs).
The other two solutions fetches the first rows almost immediately, but show a lot more IOs and a process time much higher.

Below are the respective SQL*Plus traces of the three queries :

“n-JOIN” query

Elapsed: 00:05:43.77

Execution Plan
----------------------------------------------------------
Plan hash value: 133199948

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                       |  8168 |  8814K|       |  1180   (1)| 00:00:15 |
|*  1 |  HASH JOIN RIGHT OUTER           |                       |  8168 |  8814K|  1808K|  1180   (1)| 00:00:15 |
|   2 |   TABLE ACCESS FULL              | OOX_SML_SST_TABLE     | 50004 |  1220K|       |    69   (2)| 00:00:01 |
|*  3 |   HASH JOIN RIGHT OUTER          |                       |  8168 |  8614K|  1808K|   601   (1)| 00:00:08 |
|   4 |    TABLE ACCESS FULL             | OOX_SML_SST_TABLE     | 50004 |  1220K|       |    69   (2)| 00:00:01 |
|   5 |    VIEW                          |                       |  8168 |  8415K|       |    31   (0)| 00:00:01 |
|   6 |     NESTED LOOPS                 |                       |  8168 |  1746K|       |    31   (0)| 00:00:01 |
|   7 |      NESTED LOOPS                |                       |     1 |   213 |       |     2   (0)| 00:00:01 |
|   8 |       TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKBOOK_RELS |     1 |    44 |       |     1   (0)| 00:00:01 |
|*  9 |        INDEX UNIQUE SCAN         | OOX_SWR_PK            |     1 |       |       |     0   (0)| 00:00:01 |
|  10 |       TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKSHEET     |     3 |   507 |       |     1   (0)| 00:00:01 |
|* 11 |        INDEX UNIQUE SCAN         | SYS_C007137           |     1 |       |       |     0   (0)| 00:00:01 |
|  12 |      XPATH EVALUATION            |                       |       |       |       |            |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("SST2"."IDX"(+)="V"."CELL2" AND "SST2"."FILENAME"(+)="V"."FILENAME" AND "V"."TYPE2"=CASE
              WHEN ("SST2"."FILENAME"(+) IS NOT NULL) THEN 's' ELSE 's' END )
   3 - access("SST1"."IDX"(+)="V"."CELL1" AND "SST1"."FILENAME"(+)="V"."FILENAME" AND "TYPE1"=CASE  WHEN
              ("SST1"."FILENAME"(+) IS NOT NULL) THEN 's' ELSE 's' END )
   9 - access("WR"."FILENAME"='test.xlsx' AND "WR"."SHEETNAME"='Feuil1')
  11 - access("WR"."WS_XMLREF"="WS"."SYS_NC_OID$")

Note
-----
   - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)


Statistics
----------------------------------------------------------
         13  recursive calls
          0  db block gets
     284078  consistent gets
     678534  physical reads
          0  redo size
    1512223  bytes sent via SQL*Net to client
      37082  bytes received via SQL*Net from client
       3335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      50001  rows processed

 

“PIVOT” query

Elapsed: 00:01:11.85

Execution Plan
----------------------------------------------------------
Plan hash value: 2144088588

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                       |    66M|    66G|       |  3541K  (1)| 11:48:18 |
|   1 |  HASH GROUP BY                    |                       |    66M|    66G|       |  3541K  (1)| 11:48:18 |
|*  2 |   HASH JOIN RIGHT OUTER           |                       |    66M|    66G|  1808K|  3537K  (1)| 11:47:35 |
|   3 |    TABLE ACCESS FULL              | OOX_SML_SST_TABLE     | 50004 |  1220K|       |    69   (2)| 00:00:01 |
|   4 |    VIEW                           |                       |    66M|    64G|       |   222K  (1)| 00:44:26 |
|   5 |     NESTED LOOPS                  |                       |    66M|    13G|       |   222K  (1)| 00:44:26 |
|   6 |      NESTED LOOPS                 |                       |  8168 |  1730K|       |    31   (0)| 00:00:01 |
|   7 |       NESTED LOOPS                |                       |     1 |   213 |       |     2   (0)| 00:00:01 |
|   8 |        TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKBOOK_RELS |     1 |    44 |       |     1   (0)| 00:00:01 |
|*  9 |         INDEX UNIQUE SCAN         | OOX_SWR_PK            |     1 |       |       |     0   (0)| 00:00:01 |
|  10 |        TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKSHEET     |     3 |   507 |       |     1   (0)| 00:00:01 |
|* 11 |         INDEX UNIQUE SCAN         | SYS_C007137           |     1 |       |       |     0   (0)| 00:00:01 |
|  12 |       XPATH EVALUATION            |                       |       |       |       |            |          |
|  13 |      XPATH EVALUATION             |                       |       |       |       |            |          |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SST"."IDX"(+)="V"."VAL" AND "SST"."FILENAME"(+)="V"."FILENAME" AND "V"."TYPE"=CASE  WHEN
              ("SST"."FILENAME"(+) IS NOT NULL) THEN 's' ELSE 's' END )
   9 - access("WR"."FILENAME"='test.xlsx' AND "WR"."SHEETNAME"='Feuil1')
  11 - access("WR"."WS_XMLREF"="WS"."SYS_NC_OID$")


Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
      50264  consistent gets
     151864  physical reads
          0  redo size
    1512225  bytes sent via SQL*Net to client
      37082  bytes received via SQL*Net from client
       3335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      50001  rows processed

 

“Lookup-function” query

Elapsed: 00:09:22.50

Execution Plan
----------------------------------------------------------
Plan hash value: 3026739006

-------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                       |  8168 |  1746K|    31   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |                       |  8168 |  1746K|    31   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |                       |     1 |   213 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKBOOK_RELS |     1 |    44 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN         | OOX_SWR_PK            |     1 |       |     0   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKSHEET     |     3 |   507 |     1   (0)| 00:00:01 |
|*  6 |     INDEX UNIQUE SCAN         | SYS_C007137           |     1 |       |     0   (0)| 00:00:01 |
|   7 |   XPATH EVALUATION            |                       |       |       |            |          |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("WR"."FILENAME"='test.xlsx' AND "WR"."SHEETNAME"='Feuil1')
   6 - access("WR"."WS_XMLREF"="WS"."SYS_NC_OID$")

Note
-----
   - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)


Statistics
----------------------------------------------------------
      50270  recursive calls
          0  db block gets
     400435  consistent gets
     598893  physical reads
          0  redo size
    1512223  bytes sent via SQL*Net to client
      37082  bytes received via SQL*Net from client
       3335  SQL*Net roundtrips to/from client
         16  sorts (memory)
          0  sorts (disk)
      50001  rows processed

 

That concludes this article.
Of course, all comments are welcomed, especially regarding potential improvements of the sharedStrings access.

About these ads
  1. January 30, 2012 at 19:09 | #1

    Didn’t read all your source code but, where possible, fetch data directly from the xdb$resource table and not via the XDB repository. Try cheating a bit more via xdb$resource and “xmldata”. Mark D won’t like it but that probably the fastest access to the “xdb repository”

  2. January 25, 2013 at 16:13 | #2

    Hello Marc,

    “we just drop the test.xlsx file in the /office/excel/docs folder.”
    That’s where I’m stuck. I have to admit that I usually only use a db and I have little knowledge about administration. I simply installed a default db on my laptop and access it via SQL*PLUS and SQL Developer. So when I searched how to “just drop” a test file I came to http://docs.oracle.com/cd/E11882_01/appdev.112/e23094/xdb03usg.htm#ADXDB5993
    I have no clue how to access my db using FTP, WebDAV etc. Therefore I tried “Accessing Oracle XML DB Repository Programmatically” somewhere down at the linked page. It says I can use DBMS_XDB.createResource. But when I try to transfer a file I only get “ORA-19233: XQST0013 – invalid pragma” in Handler OOX.OOX_SML_STORE.handlePreCreate.

    My code:
    DECLARE
    res BOOLEAN;
    BEGIN
    res := DBMS_XDB.createResource(‘/office/excel/docs/test.xlsx’,
    bfilename(‘DOC’, ‘test.xlsx’),
    nls_charset_id(‘AL32UTF8′));
    END;
    /
    The file/access itself should not be the problem, because when I try the solution from Anton Scheffer It works http://technology.amis.nl/2013/01/19/read-a-excel-xlsx-with-plsql/

    I have absolutely no clue how to do it. Can you please guide me?

    Regards
    Marcus

    • February 19, 2013 at 20:52 | #3

      Hello Marc,

      thanks, the hint about the compatible parameter was the solution. Setting it to 11.2.0.3.0 solved the problem

      Best Regards
      Marcus

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: