XML DB Events : Reading an Open Office XML document (.xlsx)
This post is actually an offshoot of a recent article by Marco Gralike on his blog. Marco describes how to automatically read an Open Office XML (OOX) document, specifically docx format, and store the data in a database table for subsequent queries.
One of the key features is the definition of XML DB Events handlers that will “unzip” the archive for us whenever we drop it in the repository.
The funny thing is that, in the meantime, I had already begun to work on the xlsx version, including my own unzip utility. Marco’s example is using Anton Scheffer’s great zip/unzip package which, although based on the same extraction technique, is much more complete than mine.
Installation script and package for my demo are available here : oox_sml.zip
I’ll explain some of the steps below in detail.
1) XMLType tables
In order to retrieve structured data from a spreadsheetML (SML) document, we have to look – at least – at the following archive parts :
- [Content_Types].xml “dictionary” file
- workbook file
- worksheet file(s)
- “shared strings” file
Here, only the dictionary file “[Content_Types].xml” has a fixed name, it’s the entry point that describes the structure of the OOX document, and the one file we’re going to read first to determine which part goes to which table :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> <Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/> <Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/> <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/> <Default Extension="xml" ContentType="application/xml"/> <Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/> <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/> <Override PartName="/xl/worksheets/sheet2.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/> <Override PartName="/xl/worksheets/sheet3.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/> <Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/> <Override PartName="/xl/sharedStrings.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml"/> <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/> </Types>
The four types mentioned above will be stored in separate XMLType tables, plus two relational tables holding relationships between workbook and worksheets, and shared strings :
create table oox_sml_sharedstrings of xmltype; create table oox_sml_workbook of xmltype; create table oox_sml_worksheet of xmltype; create table oox_sml_content_types of xmltype; create table oox_sml_workbook_rels ( filename varchar2(2000) , sheetname varchar2(31) , target varchar2(2000) not null , ws_xmlref ref xmltype , constraint oox_swr_pk primary key (filename, sheetname) , constraint oox_sws_xmlref_fk foreign key (ws_xmlref) references oox_sml_worksheet on delete cascade ); create table oox_sml_sst_table ( filename varchar2(2000) , idx number , val varchar2(4000) , constraint oox_sml_sst_pk primary key (filename, idx) );
On 11.2.0.2, there’s a bug when we define a foreign key on a REF XMLType column. The workaround in this case just consists in removing the FK and doing the delete “manually” in the delete handler.
We’ll also create a structured XML index (and its associated secondary index) for fast relational access to the Content_Types table :
create index oox_sml_content_types_sxi1 on oox_sml_content_types (object_value)
indextype is XDB.XMLIndex
parameters (
q'#
XMLTable oox_sml_content_types_xtb
XMLNamespaces(default 'http://schemas.openxmlformats.org/package/2006/content-types')
, '/Types/Override'
passing object_value
columns partname varchar2(260) path '@PartName'
, contenttype varchar2(260) path '@ContentType'
#'
);
create index oox_sml_ct_xtb_ix1 on oox_sml_content_types_xtb (oid, contenttype);
2) Repository Configuration
This part describes the necessary steps to configure the XML DB repository and events for automatic processing of an incoming file.
First, we’ll create some folders :
declare
res boolean;
begin
res := DBMS_XDB.createFolder('/office');
res := DBMS_XDB.createFolder('/office/excel');
res := DBMS_XDB.createFolder('/office/excel/conf');
res := DBMS_XDB.createFolder('/office/excel/temp');
res := DBMS_XDB.createFolder('/office/excel/docs');
end;
/
commit;
- conf will hold the resource configuration file for events handling
- temp will hold uncompressed archives
- docs will hold original xlsx files
In order to “recognize” an incoming xlsx file as an SML instance document, we’ll also declare a specific MIME mapping (taken from here). That way, any file with extension “.xlsx” dropped in the repository will be assigned the correct MIME type, instead of the default application/octet-stream :
call dbms_xdb.ADDMIMEMAPPING('xlsx', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
Next step : installing the handler “OOX_SML_STORE”
SQL> @oox_sml_store.pck Package created. Package body created.
… and creating the resource configuration file :
declare
res boolean;
begin
if dbms_xdb.ExistsResource('/office/excel/conf/sml_rescfg.xml') then
dbms_xdb.DeleteResource('/office/excel/conf/sml_rescfg.xml');
end if;
res := DBMS_XDB.createResource(
abspath => '/office/excel/conf/sml_rescfg.xml'
, data =>
'<ResConfig xmlns="http://xmlns.oracle.com/xdb/XDBResConfig.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.oracle.com/xdb/XDBResConfig.xsd
http://xmlns.oracle.com/xdb/XDBResConfig.xsd">
<event-listeners>
<listener>
<description>Handling of Office Open XML spreadsheets</description>
<schema>OOX</schema>
<source>OOX_SML_STORE</source>
<language>PL/SQL</language>
<events>
<Pre-Create/>
<Pre-Delete/>
</events>
<pre-condition>
<existsNode>
<XPath>/Resource[ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"]</XPath>
</existsNode>
</pre-condition>
</listener>
</event-listeners>
<defaultChildConfig>
<configuration>
<pre-condition>
<existsNode>
<XPath>/Resource[ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"]</XPath>
</existsNode>
</pre-condition>
<path>/office/excel/conf/sml_rescfg.xml</path>
</configuration>
</defaultChildConfig>
</ResConfig>'
, schemaurl => 'http://xmlns.oracle.com/xdb/XDBResConfig.xsd'
, elem => 'ResConfig'
);
end;
/
This resource defines the following :
- event handling program : PL/SQL package OOX_SML_STORE
- types of event handled : Pre-Create and Pre-Delete
- triggering condition : the resource created must be a SML document (when applied to a folder)
- child configuration : when a resource (of type SML) is created, it is assigned the same configuration file
And finally, we assign the configuration to the docs folder :
begin
DBMS_RESCONFIG.appendResConfig('/office/excel/docs', '/office/excel/conf/sml_rescfg.xml', DBMS_RESCONFIG.APPEND_RESOURCE);
end;
/
3) The Event Handler
This part will focus on some key steps in the event handling process.
a. unzip_doc procedure
Once we’ve read the archive structure to retrieve part names (first half of unzip_doc procedure), the following recreates the hierarchy in the XML DB repository and calls unzip_part procedure to extract each part and create the corresponding resource :
-- This recreates the archive structure in the repository :
IF dbms_xdb.ExistsResource(ARCHIVE_BASE_URI || g_filename) THEN
dbms_xdb.DeleteResource(ARCHIVE_BASE_URI || g_filename, dbms_xdb.DELETE_RECURSIVE);
END IF;
FOR r IN (
SELECT distinct
'/' || substr(t.column_value, 1, instr(t.column_value, '/', 1, x.column_value) - 1) dir
FROM TABLE(g_partlist) t,
TABLE(
CAST(
MULTISET(
SELECT level
FROM dual
CONNECT BY level <= length(regexp_replace(t.column_value, '[^/]'))
)
AS sys.odcinumberlist
)
) x
ORDER BY dir
)
LOOP
res := dbms_xdb.CreateFolder(ARCHIVE_BASE_URI || g_filename || r.dir);
END LOOP;
-- [Content_Types].xml is unzipped first
unzip_part(p_zip, content_types_idx, true);
-- then the other parts
FOR i IN g_headlist.first..g_headlist.last LOOP
IF i != content_types_idx THEN
unzip_part(p_zip, i);
END IF;
END LOOP;
-- sets REFs to worksheet object rows in the relation table
UPDATE oox_sml_workbook_rels
SET ws_xmlref = dbms_xdb.getContentXMLRef(target)
WHERE filename = g_filename
;
b. unzip_part procedure
This procedure unzips a single part and store it as a resource. When an XMLType table is associated with the part type, then the XML instance is stored in that table and a REF pointer is used to create the associated resource in the repository :
IF is_content_types THEN
insert into oox_sml_content_types t values (xmltype(unzip, XML_CSID))
returning ref(t) into xmlref;
res := dbms_xdb.CreateResource(respath, xmlref);
FOR r IN (
SELECT x.partname
, x.contenttype
FROM oox_sml_content_types t
, XMLTable(
XMLNamespaces(default 'http://schemas.openxmlformats.org/package/2006/content-types')
, '/Types/Override'
passing t.object_value
columns partname varchar2(260) path '@PartName'
, contenttype varchar2(260) path '@ContentType'
) x
WHERE ref(t) = xmlref
)
LOOP
g_content_types(r.partname) := r.contenttype;
END LOOP;
ELSIF g_content_types.exists(partname) THEN
case g_content_types(partname)
when CT_SHAREDSTRINGS then
insert into oox_sml_sharedstrings t values (xmltype(unzip, XML_CSID))
returning ref(t) into xmlref;
res := dbms_xdb.CreateResource(respath, xmlref);
insert into oox_sml_sst_table (filename, idx, val)
select g_filename, idx-1, val
from oox_sml_sharedstrings t
, XMLTable(
XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')
, '/sst/si'
passing t.object_value
columns idx for ordinality
, val varchar2(4000) path 't'
) x
where ref(t) = xmlref
;
when CT_WORKBOOK then
insert into oox_sml_workbook t values (xmltype(unzip, XML_CSID))
returning ref(t) into xmlref;
res := dbms_xdb.CreateResource(respath, xmlref);
insert into oox_sml_workbook_rels (filename, sheetname, target)
select g_filename, x.sheetname, x.target
from XMLTable(
XMLNamespaces( 'http://schemas.openxmlformats.org/officeDocument/2006/relationships' as "rel"
, 'http://schemas.openxmlformats.org/package/2006/relationships' as "r"
, default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' )
, 'for $i in (#ora:defaultTable OOX_SML_WORKBOOK #){doc($d)/workbook/sheets/sheet}
, $j in doc(ora:replace($d, "(.*)/(.*$)", "\1/_rels/\2.rels"))/r:Relationships/r:Relationship
where $j/@Id = $i/@rel:id
return element r {
element sheetname {data($i/@name)}
, element target {resolve-uri($j/@Target, $d)}
}'
passing respath as "d"
columns sheetname varchar2(31) path 'sheetname'
, target varchar2(2000) path 'target'
) x
;
when CT_WORKSHEET then
insert into oox_sml_worksheet t values (xmltype(unzip, XML_CSID))
returning ref(t) into xmlref;
res := dbms_xdb.CreateResource(respath, xmlref);
else
res := dbms_xdb.CreateResource(respath, unzip, XML_CSID);
end case;
ELSE
res := dbms_xdb.CreateResource(respath, unzip, XML_CSID);
END IF;
Let’s focus a little more on the query used to load the OOX_SML_WORKBOOK_RELS table :
select g_filename, x.sheetname, x.target
from XMLTable(
XMLNamespaces( 'http://schemas.openxmlformats.org/officeDocument/2006/relationships' as "rel"
, 'http://schemas.openxmlformats.org/package/2006/relationships' as "r"
, default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' )
, 'for $i in (#ora:defaultTable OOX_SML_WORKBOOK #){doc($d)/workbook/sheets/sheet}
, $j in doc(ora:replace($d, "(.*)/(.*$)", "\1/_rels/\2.rels"))/r:Relationships/r:Relationship
where $j/@Id = $i/@rel:id
return element r {
element sheetname {data($i/@name)}
, element target {resolve-uri($j/@Target, $d)}
}'
passing respath as "d"
columns sheetname varchar2(31) path 'sheetname'
, target varchar2(2000) path 'target'
) x
;
Basically, it joins a workbook instance to its relationships document (“workbook.xml.rels”) so that we can associate a sheet name (stored in workbook.xml) to its physical sheet instance, e.g. worksheets/sheet1.xml (see diagram below in §4).
As far as I can tell, Excel implementation of the SML format always stores sheet documents in a worksheets folder located in the same folder as workbook.xml.
But actually, a sheet can be located anywhere in the archive, provided the Target attribute of the relationship specifies a valid location relative to the workbook part, for example :
<Relationship Id="rId1" Type="..." Target="../myworksheets/sheet1.xml" />
In that case, sheet1.xml would be located in /myworksheets instead of /xl/worksheets folder.
Here, as the query shows, a simple solution is to use function fn:resolve-uri() which provides an easy way to resolve any relative path (including parent axis).
4) Queries
This is where the fun begins!
Below is a simplified diagram representing the structure of an OOX SML document and the relationships between the different parts we’re interested in :
In order to access a specific sheet’s data, we must do the following :
- Read [Content_Types].xml to determine the path to the workbook part in the archive, and therefore in the repository once the archive is uncompressed
- Read workbook.xml to get the relationship Id of the sheet of interest, for example the one named “Feuil1″
- Read workbook.xml.rels to extract the path to the worksheet part corresponding to the relationship Id (“rId1″), in this case “worksheets/sheet1.xml”. Note that this path is relative to the parent folder of the workbook part i.e. “xl”
- Read /xl/worksheets/sheet1.xml and retrieve data in row/c/v elements
- For each cell element, if the t attribute is “s”, then it means the cell contains a string value whose real location (in the sharedStrings.xml part) is given by the v element. It’s a zero-based index representing the position of the sst/si/t element in the sharedStrings part
Steps 1 to 4 are easy to perform, however, the nature of the sharedStrings implementation is problematic for a relational database, and we’ll ultimately have to choose between different approaches :
- Joining to a different instance of sharedStrings for each cell
- Having each cell in separate rows, join once to sharedStrings, and pivot each row’s cells back into columns
- Using a lookup function
Let’s proceed step by step :
a. Storing a document in the repository
In this step, we just drop the test.xlsx file in the /office/excel/docs folder. The archive will be automatically uncompressed into the /office/excel/temp/test.xlsx folder.
The workbook contains three worksheets, first two having two columns and 50000 rows (+ headers), and the third one is empty.
b. Accessing [Content_Types].xml
SQL> variable base_uri varchar2(2000) SQL> exec :base_uri := '/office/excel/temp/test.xlsx' PL/SQL procedure successfully completed SQL> SELECT ctx.* 2 FROM oox_sml_content_types ct 3 , XMLTable( 4 XMLNamespaces(default 'http://schemas.openxmlformats.org/package/2006/content-types') 5 , '/Types/Override' 6 passing ct.object_value 7 columns partname varchar2(260) path '@PartName' 8 , contenttype varchar2(260) path '@ContentType' 9 ) ctx 10 WHERE REF(ct) = dbms_xdb.getContentXMLRef(:base_uri || '/[Content_Types].xml') 11 ; PARTNAME CONTENTTYPE ----------------------------------- -------------------------------------------------------------------------------- /xl/theme/theme1.xml application/vnd.openxmlformats-officedocument.theme+xml /xl/styles.xml application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml /xl/workbook.xml application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml /docProps/app.xml application/vnd.openxmlformats-officedocument.extended-properties+xml /xl/worksheets/sheet2.xml application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml /xl/worksheets/sheet3.xml application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml /xl/worksheets/sheet1.xml application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml /xl/sharedStrings.xml application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml /docProps/core.xml application/vnd.openxmlformats-package.core-properties+xml 9 rows selected
c. Accessing workbook.xml
The above query (well, similar query) is encapsulated in the oox_sml_util.getContentTypeRef function.
The function accesses the Content_Types table to get the part name corresponding to the p_contenttype parameter (here /xl/workbook.xml) and returns a REF to the instance in the workbook table.
SQL> SELECT wb.*
2 FROM oox_sml_workbook t
3 , XMLTable(
4 XMLNamespaces( 'http://schemas.openxmlformats.org/officeDocument/2006/relationships' as "r",
5 default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ),
6 '/workbook/sheets/sheet'
7 passing t.object_value
8 columns sheetname varchar2(31) path '@name'
9 , rid varchar2(30) path '@r:id'
10 ) wb
11 WHERE REF(t) = oox_sml_util.getContentTypeRef('test.xlsx'
12 , 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml')
13 ;
SHEETNAME RID
------------------------------- ------------------------------
Feuil1 rId1
Feuil2 rId2
Feuil3 rId3
Steps b and c are actually not necessary as we’ve already combined them earlier (when creating the workbook resource) and have the result set stored in OOX_SML_WORKBOOK_RELS for further optimal access (Cf. §3b), but since they’re part of the methodology I’m describing them anyway.
d. Accessing worksheets
From now on, I’ll be querying directly from the relations table.
The following retrieves “raw” cell values from the worksheet “Feuil1″ :
SQL> SELECT wsx.* 2 FROM oox_sml_workbook_rels wr 3 , oox_sml_worksheet ws 4 , XMLTable( 5 XMLNamespaces( default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main' ), 6 '/worksheet/sheetData/row' 7 passing ws.object_value 8 columns col1 varchar2(30) path 'c[1]/v' 9 , col2 number path 'c[2]/v' 10 ) wsx 11 WHERE wr.filename = 'test.xlsx' 12 AND wr.sheetname = 'Feuil1' 13 AND ref(ws) = wr.ws_xmlref 14 ; COL1 COL2 ------------------------------ ---------- 2568 2569 2570 1 2571 2 2572 3 2573 4 2574 5 2575 6 2576 7 2577 8 2578 9 ...
e. Joining to shared strings
- Two outer joins, one for each cell :
SQL> SELECT v.rn 2 , nvl(sst1.val, v.cell1) as cell1 3 , nvl(sst2.val, v.cell2) as cell2 4 FROM ( 5 SELECT wr.filename, wsx.* 6 FROM oox_sml_workbook_rels wr 7 , oox_sml_worksheet ws 8 , XMLTable( 9 XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main') 10 , '/worksheet/sheetData/row' 11 passing ws.object_value 12 columns rn number path '@r' 13 , cell1 number path 'c[1]', type1 varchar2(10) path 'c[1]/@t' 14 , cell2 number path 'c[2]', type2 varchar2(10) path 'c[2]/@t' 15 ) wsx 16 WHERE wr.filename = 'test.xlsx' 17 AND wr.sheetname = 'Feuil1' 18 AND ref(ws) = wr.ws_xmlref 19 ) v 20 LEFT OUTER JOIN oox_sml_sst_table sst1 ON sst1.filename = v.filename AND sst1.idx = v.cell1 AND type1 = 's' 21 LEFT OUTER JOIN oox_sml_sst_table sst2 ON sst2.filename = v.filename AND sst2.idx = v.cell2 AND type2 = 's' 22 ; RN CELL1 CELL2 ---------- --------------- --------------- 1 COLUMN1 COLUMN2 2 TEST00001 1 3 TEST00002 2 4 TEST00003 3 5 TEST00004 4 6 TEST00005 5 7 TEST00006 6 8 TEST00007 7 9 TEST00008 8 10 TEST00009 9 ... - Cells extracted into separate rows + a single outer join + pivot :
SELECT rn , min(case when cn = 'A' then nvl(sst.val, v.val) end) as cell_a , min(case when cn = 'B' then nvl(sst.val, v.val) end) as cell_b FROM ( SELECT wr.filename, wsx.rn, substr(wsc.cn,1,1) as cn, wsc.val, wsc.type FROM oox_sml_workbook_rels wr , oox_sml_worksheet ws , XMLTable( XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main') , '/worksheet/sheetData/row' passing ws.object_value columns rn number path '@r' , cells xmltype path 'c' ) wsx , XMLTable( XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main') , '/c' passing wsx.cells columns cn varchar2(30) path '@r' , val number path 'v' , type varchar2(10) path '@t' ) wsc WHERE wr.filename = 'test.xlsx' AND wr.sheetname = 'Feuil1' AND ref(ws) = wr.ws_xmlref ) v LEFT OUTER JOIN oox_sml_sst_table sst ON sst.filename = v.filename AND v.type = 's' AND sst.idx = v.val GROUP BY v.rn ; - Using a lookup function in the SELECT :
SELECT wsx.rn , case when type1 = 's' then oox_sml_util.getValue2(wr.filename, wsx.cell1) else cell1 end as cell1 , case when type2 = 's' then oox_sml_util.getValue2(wr.filename, wsx.cell2) else cell2 end as cell2 FROM oox_sml_workbook_rels wr , oox_sml_worksheet ws , XMLTable( XMLNamespaces(default 'http://schemas.openxmlformats.org/spreadsheetml/2006/main') , '/worksheet/sheetData/row' passing ws.object_value columns rn number path '@r' , cell1 varchar2(4000) path 'c[1]', type1 varchar2(10) path 'c[1]/@t' , cell2 varchar2(4000) path 'c[2]', type2 varchar2(10) path 'c[2]/@t' ) wsx WHERE wr.filename = 'test.xlsx' AND wr.sheetname = 'Feuil1' AND ref(ws) = wr.ws_xmlref ;
5) Performance consideration
Unfortunately, neither of the queries above are satisfying in terms of performance.
The “pivot” solution has the longest response time but fetches the whole result set (50,001 rows) a lot faster (less IOs).
The other two solutions fetches the first rows almost immediately, but show a lot more IOs and a process time much higher.
Below are the respective SQL*Plus traces of the three queries :
“n-JOIN” query
Elapsed: 00:05:43.77
Execution Plan
----------------------------------------------------------
Plan hash value: 133199948
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 8814K| | 1180 (1)| 00:00:15 |
|* 1 | HASH JOIN RIGHT OUTER | | 8168 | 8814K| 1808K| 1180 (1)| 00:00:15 |
| 2 | TABLE ACCESS FULL | OOX_SML_SST_TABLE | 50004 | 1220K| | 69 (2)| 00:00:01 |
|* 3 | HASH JOIN RIGHT OUTER | | 8168 | 8614K| 1808K| 601 (1)| 00:00:08 |
| 4 | TABLE ACCESS FULL | OOX_SML_SST_TABLE | 50004 | 1220K| | 69 (2)| 00:00:01 |
| 5 | VIEW | | 8168 | 8415K| | 31 (0)| 00:00:01 |
| 6 | NESTED LOOPS | | 8168 | 1746K| | 31 (0)| 00:00:01 |
| 7 | NESTED LOOPS | | 1 | 213 | | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKBOOK_RELS | 1 | 44 | | 1 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | OOX_SWR_PK | 1 | | | 0 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKSHEET | 3 | 507 | | 1 (0)| 00:00:01 |
|* 11 | INDEX UNIQUE SCAN | SYS_C007137 | 1 | | | 0 (0)| 00:00:01 |
| 12 | XPATH EVALUATION | | | | | | |
------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("SST2"."IDX"(+)="V"."CELL2" AND "SST2"."FILENAME"(+)="V"."FILENAME" AND "V"."TYPE2"=CASE
WHEN ("SST2"."FILENAME"(+) IS NOT NULL) THEN 's' ELSE 's' END )
3 - access("SST1"."IDX"(+)="V"."CELL1" AND "SST1"."FILENAME"(+)="V"."FILENAME" AND "TYPE1"=CASE WHEN
("SST1"."FILENAME"(+) IS NOT NULL) THEN 's' ELSE 's' END )
9 - access("WR"."FILENAME"='test.xlsx' AND "WR"."SHEETNAME"='Feuil1')
11 - access("WR"."WS_XMLREF"="WS"."SYS_NC_OID$")
Note
-----
- Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
Statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
284078 consistent gets
678534 physical reads
0 redo size
1512223 bytes sent via SQL*Net to client
37082 bytes received via SQL*Net from client
3335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50001 rows processed
“PIVOT” query
Elapsed: 00:01:11.85
Execution Plan
----------------------------------------------------------
Plan hash value: 2144088588
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 66M| 66G| | 3541K (1)| 11:48:18 |
| 1 | HASH GROUP BY | | 66M| 66G| | 3541K (1)| 11:48:18 |
|* 2 | HASH JOIN RIGHT OUTER | | 66M| 66G| 1808K| 3537K (1)| 11:47:35 |
| 3 | TABLE ACCESS FULL | OOX_SML_SST_TABLE | 50004 | 1220K| | 69 (2)| 00:00:01 |
| 4 | VIEW | | 66M| 64G| | 222K (1)| 00:44:26 |
| 5 | NESTED LOOPS | | 66M| 13G| | 222K (1)| 00:44:26 |
| 6 | NESTED LOOPS | | 8168 | 1730K| | 31 (0)| 00:00:01 |
| 7 | NESTED LOOPS | | 1 | 213 | | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKBOOK_RELS | 1 | 44 | | 1 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | OOX_SWR_PK | 1 | | | 0 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKSHEET | 3 | 507 | | 1 (0)| 00:00:01 |
|* 11 | INDEX UNIQUE SCAN | SYS_C007137 | 1 | | | 0 (0)| 00:00:01 |
| 12 | XPATH EVALUATION | | | | | | |
| 13 | XPATH EVALUATION | | | | | | |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("SST"."IDX"(+)="V"."VAL" AND "SST"."FILENAME"(+)="V"."FILENAME" AND "V"."TYPE"=CASE WHEN
("SST"."FILENAME"(+) IS NOT NULL) THEN 's' ELSE 's' END )
9 - access("WR"."FILENAME"='test.xlsx' AND "WR"."SHEETNAME"='Feuil1')
11 - access("WR"."WS_XMLREF"="WS"."SYS_NC_OID$")
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
50264 consistent gets
151864 physical reads
0 redo size
1512225 bytes sent via SQL*Net to client
37082 bytes received via SQL*Net from client
3335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50001 rows processed
“Lookup-function” query
Elapsed: 00:09:22.50
Execution Plan
----------------------------------------------------------
Plan hash value: 3026739006
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 1746K| 31 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 8168 | 1746K| 31 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 213 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKBOOK_RELS | 1 | 44 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | OOX_SWR_PK | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| OOX_SML_WORKSHEET | 3 | 507 | 1 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | SYS_C007137 | 1 | | 0 (0)| 00:00:01 |
| 7 | XPATH EVALUATION | | | | | |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("WR"."FILENAME"='test.xlsx' AND "WR"."SHEETNAME"='Feuil1')
6 - access("WR"."WS_XMLREF"="WS"."SYS_NC_OID$")
Note
-----
- Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
Statistics
----------------------------------------------------------
50270 recursive calls
0 db block gets
400435 consistent gets
598893 physical reads
0 redo size
1512223 bytes sent via SQL*Net to client
37082 bytes received via SQL*Net from client
3335 SQL*Net roundtrips to/from client
16 sorts (memory)
0 sorts (disk)
50001 rows processed
That concludes this article.
Of course, all comments are welcomed, especially regarding potential improvements of the sharedStrings access.

Didn’t read all your source code but, where possible, fetch data directly from the xdb$resource table and not via the XDB repository. Try cheating a bit more via xdb$resource and “xmldata”. Mark D won’t like it but that probably the fastest access to the “xdb repository”
Hello Marc,
“we just drop the test.xlsx file in the /office/excel/docs folder.”
That’s where I’m stuck. I have to admit that I usually only use a db and I have little knowledge about administration. I simply installed a default db on my laptop and access it via SQL*PLUS and SQL Developer. So when I searched how to “just drop” a test file I came to http://docs.oracle.com/cd/E11882_01/appdev.112/e23094/xdb03usg.htm#ADXDB5993
I have no clue how to access my db using FTP, WebDAV etc. Therefore I tried “Accessing Oracle XML DB Repository Programmatically” somewhere down at the linked page. It says I can use DBMS_XDB.createResource. But when I try to transfer a file I only get “ORA-19233: XQST0013 – invalid pragma” in Handler OOX.OOX_SML_STORE.handlePreCreate.
My code:
DECLARE
res BOOLEAN;
BEGIN
res := DBMS_XDB.createResource(‘/office/excel/docs/test.xlsx’,
bfilename(‘DOC’, ‘test.xlsx’),
nls_charset_id(‘AL32UTF8′));
END;
/
The file/access itself should not be the problem, because when I try the solution from Anton Scheffer It works http://technology.amis.nl/2013/01/19/read-a-excel-xlsx-with-plsql/
I have absolutely no clue how to do it. Can you please guide me?
Regards
Marcus
Hello Marc,
thanks, the hint about the compatible parameter was the solution. Setting it to 11.2.0.3.0 solved the problem
Best Regards
Marcus