How To : Read and Write Office 2003 Excel XML Files
Office Open XML (OOX) has become the default format with the release of Office 2007, but back in the 2003′s days, Microsoft had already developed a format to store Excel workbooks as XML.
A comprehensive overview is available here :
Dive into SpreadsheetML (Part 1 of 2)
Dive into SpreadsheetML (Part 2 of 2)
Contrary to OOX where data and metadata are stored in a multipart archive, an Excel workbook file in SpreadsheetML 2003 format consists in a single XML instance, and therefore easily managed using built-in Oracle XML functions and XML DB features.
In this article, I’ll focus on how to create and read such files with the help of SQL/XML functions, XSLT and XQuery.
1. Writing a file
The minimum valid structure for an instance looks like this :
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<Worksheet ss:Name="Sheet1">
<Table>
<Row>
<Cell>
<Data ss:Type="String">Employee No</Data>
</Cell>
<Cell>
<Data ss:Type="String">Employee Name</Data>
</Cell>
<Cell>
<Data ss:Type="String">Job</Data>
</Cell>
</Row>
<Row>
<Cell>
<Data ss:Type="Number">7839</Data>
</Cell>
<Cell>
<Data ss:Type="String">KING</Data>
</Cell>
<Cell>
<Data ss:Type="String">PRESIDENT</Data>
</Cell>
</Row>
<!-- More rows -->
</Table>
</Worksheet>
</Workbook>
It can be generated this way, with SQL/XML functions :
SELECT XMLConcat(
XMLPi("mso-application", 'progid="Excel.Sheet"')
, XMLElement("Workbook",
XMLAttributes(
'urn:schemas-microsoft-com:office:spreadsheet' as "xmlns"
, 'urn:schemas-microsoft-com:office:spreadsheet' as "xmlns:ss"
)
, XMLElement("Worksheet",
XMLAttributes('Sheet1' as "ss:Name")
, XMLElement("Table",
XMLElement("Row",
XMLForest(
XMLElement("Data", XMLAttributes('String' as "ss:Type"), 'Employee No') as "Cell"
, XMLElement("Data", XMLAttributes('String' as "ss:Type"), 'Employee Name') as "Cell"
, XMLElement("Data", XMLAttributes('String' as "ss:Type"), 'Job') as "Cell"
)
)
, XMLAgg(
XMLElement("Row",
XMLForest(
XMLElement("Data", XMLAttributes('Number' as "ss:Type"), e.empno) as "Cell"
, XMLElement("Data", XMLAttributes('String' as "ss:Type"), e.ename) as "Cell"
, XMLElement("Data", XMLAttributes('String' as "ss:Type"), e.job) as "Cell"
)
)
order by e.empno
)
)
)
)
)
FROM scott.emp e
;
Although this query is relatively simple and efficient, we can imagine how cumbersome it could get to write queries for more complex requirements.
So this is where XSLT comes into play. By creating a stylesheet working on a canonical XML input, we can hide the transformation logic and separate the data layer from the presentation layer.
Following is an example generating a multisheet workbook and some additional Excel-specific formattings (frozen headers and tab color set to red for total salaries higher than 10,000).
Here, I first stored the XSLT stylesheet in the XML DB repository. That’s not mandatory, we can also declare the stylesheet inline in the PL/SQL block, but it’s a good practice to keep the stylesheets in the database (repository or XMLType column in a relational table) if we intend to use them with the internal XSLT processor.
The stylesheet (test.xsl) :
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:processing-instruction name="mso-application">progid="Excel.Sheet"</xsl:processing-instruction>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<Styles>
<Style ss:ID="h">
<Interior ss:Color="#C0C0C0" ss:Pattern="Solid"/>
</Style>
</Styles>
<xsl:apply-templates/>
</Workbook>
</xsl:template>
<xsl:template match="ROWSET">
<Worksheet ss:Name="{@name}">
<Table>
<Row>
<xsl:for-each select="ROW[1]/*">
<Cell ss:StyleID="h">
<Data ss:Type="String">
<xsl:value-of select="translate(local-name(), '_', ' ')"/>
</Data>
</Cell>
</xsl:for-each>
</Row>
<xsl:apply-templates/>
</Table>
<x:WorksheetOptions>
<x:FrozenNoSplit/>
<x:SplitHorizontal>1</x:SplitHorizontal>
<x:TopRowBottomPane>1</x:TopRowBottomPane>
<x:ActivePane>2</x:ActivePane>
<xsl:if test="@color">
<x:TabColorIndex><xsl:value-of select="@color"/></x:TabColorIndex>
</xsl:if>
</x:WorksheetOptions>
</Worksheet>
</xsl:template>
<xsl:template match="ROW">
<Row>
<xsl:apply-templates/>
</Row>
</xsl:template>
<xsl:template match="ROW/*">
<Cell>
<Data ss:Type="String">
<xsl:value-of select="."/>
</Data>
</Cell>
</xsl:template>
</xsl:stylesheet>
The transformation code :
DECLARE
xmldoc CLOB;
BEGIN
select xmlserialize(document
xmltransform(
xmlelement("ROOT",
xmlagg(
xmlelement("ROWSET",
xmlattributes(
d.dname as "name"
, case when sum(e.sal) > 10000 then '2' end as "color"
)
, xmlagg(
xmlelement("ROW",
xmlforest(
e.empno as "Employee_No"
, e.ename as "Employee_Name"
, e.job as "Job"
, e.sal as "Salary"
)
) order by e.empno
)
) order by d.deptno
)
)
, xdburitype('/office/excel/stylesheets/out/test.xsl').getXML()
)
as clob
)
into xmldoc
from scott.dept d
join scott.emp e on e.deptno = d.deptno
group by d.deptno
, d.dname
;
dbms_xslprocessor.clob2file(xmldoc, 'TEST_DIR', 'test.xml');
END;
/
The output file :
One of the most used Excel features is the Pivot Table generator. Creating such content is also possible directly from the database, using XSLT.
For instance, here’s some “raw” data :
SQL> select employee_id
2 , first_name
3 , last_name
4 , extract(year from hire_date) as hire_year
5 , job_id
6 from hr.employees
7 ;
EMPLOYEE_ID FIRST_NAME LAST_NAME HIRE_YEAR JOB_ID
----------- -------------------- ------------------------- ---------- ----------
198 Donald OConnell 2007 SH_CLERK
199 Douglas Grant 2008 SH_CLERK
200 Jennifer Whalen 2003 AD_ASST
201 Michael Hartstein 2004 MK_MAN
202 Pat Fay 2005 MK_REP
203 Susan Mavris 2002 HR_REP
204 Hermann Baer 2002 PR_REP
205 Shelley Higgins 2002 AC_MGR
206 William Gietz 2002 AC_ACCOUNT
100 Steven King 2003 AD_PRES
101 Neena Kochhar 2005 AD_VP
102 Lex De Haan 2001 AD_VP
103 Alexander Hunold 2006 IT_PROG
...
195 Vance Jones 2007 SH_CLERK
196 Alana Walsh 2006 SH_CLERK
197 Kevin Feeney 2006 SH_CLERK
107 rows selected
and we want to display, for a given job, the number of employees hired per year.
In SQL, that’s called a dynamic pivot but it’s not possible – with conventional methods – to produce such a result set out of a single SELECT statement (because the number of columns has to be known at parse time).
The PIVOT XML operator (11g) provides a partial answer to the problem by generating an XMLType containing aggregated “columns” (actually XML “elements”). The same functionality can be simulated in 10g too with XMLAgg and a partitioned outer join.
But with that method, we still have to build the pivot in SQL, in the database.
What I describe below let Excel do the job for us, through its standard pivot table functionality. We just have to generate a tab containing the raw data (hereafter named “DataSource”), and a tab (“PivotTable”) containing the minimum pivot table definition, i.e. no data and no cache.
The stylesheet (pivot.xsl) :
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:param name="filename"/>
<xsl:template match="/">
<xsl:processing-instruction name="mso-application">progid="Excel.Sheet"</xsl:processing-instruction>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<xsl:apply-templates/>
</Workbook>
</xsl:template>
<xsl:template match="ROWSET">
<Worksheet ss:Name="DataSource">
<Table>
<Row>
<Cell><Data ss:Type="String">EMPLOYEE_ID</Data></Cell>
<Cell><Data ss:Type="String">FIRST_NAME</Data></Cell>
<Cell><Data ss:Type="String">LAST_NAME</Data></Cell>
<Cell><Data ss:Type="String">HIRE_YEAR</Data></Cell>
<Cell><Data ss:Type="String">JOB_ID</Data></Cell>
</Row>
<xsl:apply-templates/>
</Table>
</Worksheet>
<Worksheet ss:Name="PivotTable">
<Table/>
<x:PivotTable>
<x:Name>My Pivot Table</x:Name>
<x:ImmediateItemsOnDrop/>
<x:ShowPageMultipleItemLabel/>
<x:GrandTotalString>Total</x:GrandTotalString>
<x:Location>R3C1:R4C2</x:Location>
<x:PivotField>
<x:Name>EMPLOYEE_ID</x:Name>
<x:DataType>Integer</x:DataType>
</x:PivotField>
<x:PivotField>
<x:Name>FIRST_NAME</x:Name>
</x:PivotField>
<x:PivotField>
<x:Name>LAST_NAME</x:Name>
</x:PivotField>
<x:PivotField>
<x:Name>HIRE_YEAR</x:Name>
<x:Orientation>Column</x:Orientation>
<x:AutoSortOrder>Ascending</x:AutoSortOrder>
<x:Position>1</x:Position>
<x:DataType>Integer</x:DataType>
<x:PivotItem>
<x:Name/>
</x:PivotItem>
</x:PivotField>
<x:PivotField>
<x:Name>JOB_ID</x:Name>
<x:Orientation>Row</x:Orientation>
<x:AutoSortOrder>Ascending</x:AutoSortOrder>
<x:Position>1</x:Position>
<x:PivotItem>
<x:Name/>
</x:PivotItem>
</x:PivotField>
<x:PivotField>
<x:DataField/>
<x:Name>Data</x:Name>
<x:Orientation>Row</x:Orientation>
<x:Position>-1</x:Position>
</x:PivotField>
<x:PivotField>
<x:Name>Number of Employees</x:Name>
<x:ParentField>EMPLOYEE_ID</x:ParentField>
<x:Orientation>Data</x:Orientation>
<x:Function>Count</x:Function>
<x:Position>1</x:Position>
</x:PivotField>
<x:PTLineItems>
<x:PTLineItem>
<x:Item>0</x:Item>
</x:PTLineItem>
</x:PTLineItems>
<x:PTLineItems>
<x:Orientation>Column</x:Orientation>
<x:PTLineItem>
<x:Item>0</x:Item>
</x:PTLineItem>
</x:PTLineItems>
<x:PTSource>
<x:RefreshOnFileOpen/>
<x:ConsolidationReference>
<x:FileName>[<xsl:value-of select="$filename"/>]DataSource</x:FileName>
<x:Reference>R1C1:R<xsl:value-of select="count(ROW)+1"/>C5</x:Reference>
</x:ConsolidationReference>
</x:PTSource>
</x:PivotTable>
</Worksheet>
</xsl:template>
<xsl:template match="ROW">
<Row>
<Cell><Data ss:Type="Number"><xsl:value-of select="EMPLOYEE_ID"/></Data></Cell>
<Cell><Data ss:Type="String"><xsl:value-of select="FIRST_NAME"/></Data></Cell>
<Cell><Data ss:Type="String"><xsl:value-of select="LAST_NAME"/></Data></Cell>
<Cell><Data ss:Type="Number"><xsl:value-of select="HIRE_YEAR"/></Data></Cell>
<Cell><Data ss:Type="String"><xsl:value-of select="JOB_ID"/></Data></Cell>
</Row>
</xsl:template>
<xsl:template name="PivotTable">
</xsl:template>
</xsl:stylesheet>
The transformation code :
DECLARE
res clob;
v_filename varchar2(260) := 'test_pivot.xml';
BEGIN
select xmlserialize(document
xmltransform(
xmlelement("ROWSET",
xmlagg(
xmlelement("ROW",
xmlforest(
employee_id
, first_name
, last_name
, extract(year from hire_date) as hire_year
, job_id
)
)
)
)
, xdburitype('/office/excel/stylesheets/out/pivot.xsl').getXML()
, 'filename="'''||v_filename||'''"'
)
)
into res
from hr.employees
;
dbms_xslprocessor.clob2file(res, 'TEST_DIR', v_filename);
END;
/
The output file :
2. Reading a file
I’ll divide this section in two parts : querying and optimizing.
a) “One-shot” queries
Let’s say we want to read this document (saved as XML 2003 format) as if it were a relational table :
As usual, we’ll use an XMLType table to store the original file and then query from it.
Examples in the present article were tested on 11g XE (11.2.0.2) so storage is Binary XML by default :
create table tmp_xml of xmltype;
insert into tmp_xml values(
xmltype(
bfilename('XML_DIR','test.xml')
, nls_charset_id('AL32UTF8')
)
);
The query involves two XMLTable() functions, the first one to break the document into separate worksheets, and the second to extract each row from them :
SQL> select x1.sheetname 2 , x2.id 3 , x2.comments 4 , x2.dt 5 from tmp_xml t 6 , xmltable( 7 xmlnamespaces( default 'urn:schemas-microsoft-com:office:spreadsheet' 8 , 'urn:schemas-microsoft-com:office:spreadsheet' as "ss" ) 9 , '/Workbook/Worksheet' 10 passing t.object_value 11 columns sheetname varchar2(31) path '@ss:Name' 12 , rowset xmltype path 'Table/Row' 13 ) x1 14 , xmltable( 15 xmlnamespaces(default 'urn:schemas-microsoft-com:office:spreadsheet') 16 , '/Row[position()>1]' 17 passing x1.rowset 18 columns id number path 'Cell[1]/Data' 19 , comments varchar2(2000) path 'Cell[2]/Data' 20 , dt timestamp path 'substring-before(Cell[3]/Data,".")' 21 ) x2 22 where x1.sheetname = 'MyData-1' 23 ; SHEETNAME ID COMMENTS DT --------------- ---------- -------------------------------------------------- ------------------------- MyData-1 1 This is a comment for line #1 09/02/12 12:09:37,000000 MyData-1 2 This is a comment for line #2 10/02/12 12:09:36,000000 MyData-1 3 This is a comment for line #3 11/02/12 12:09:36,000000 MyData-1 4 This is a comment for line #4 12/02/12 12:09:36,000000 MyData-1 5 This is a comment for line #5 13/02/12 12:09:36,000000 MyData-1 6 This is a comment for line #6 14/02/12 12:09:36,000000 MyData-1 7 This is a comment for line #7 15/02/12 12:09:36,000000 MyData-1 8 This is a comment for line #8 16/02/12 12:09:36,000000 MyData-1 9 This is a comment for line #9 17/02/12 12:09:36,000000 MyData-1 10 This is a comment for line #10 18/02/12 12:09:36,000000 MyData-1 11 This is a comment for line #11 19/02/12 12:09:36,000000 MyData-1 12 This is a comment for line #12 20/02/12 12:09:36,000000 MyData-1 13 This is a comment for line #13 21/02/12 12:09:36,000000 MyData-1 14 This is a comment for line #14 22/02/12 12:09:36,000000 MyData-1 15 This is a comment for line #15 23/02/12 12:09:36,000000 MyData-1 16 This is a comment for line #16 24/02/12 12:09:36,000000 16 rows selected
b) Optimized access of the document
If loading these documents in the database is a recurring task then, provided the structure doesn’t change, queries on the data can be optimized by creating a structured XML index on the XMLType table.
With such an index in place, and depending on the size of the document, there could be a significant overhead at insert time, but it’s a trade-off : subsequent queries will be considerably faster.
Here’s a small test case based on the following document (a 50,000-row worksheet, no header) :
Document properties (I’ll define a virtual column to hold the title property) :

Set up and query plan :
-- Table creation :
create table ext_smldata of xmltype
xmltype store as binary xml
virtual columns (
title as (
XMLCast(
XMLQuery(
'declare default element namespace "urn:schemas-microsoft-com:office:spreadsheet"; (::)
declare namespace o = "urn:schemas-microsoft-com:office:office"; (::)
/Workbook/o:DocumentProperties/o:Title'
passing object_value returning content
)
as varchar2(200)
)
)
);
-- Index on the "TITLE" virtual column :
create index ext_smldata_title_idx on ext_smldata (title);
-- Structured XML index on the table :
create index ext_smldata_sxi on ext_smldata (object_value)
indextype is xdb.xmlindex
parameters (q'#
XMLTable ext_smldata_xtb
XMLNamespaces (default 'urn:schemas-microsoft-com:office:spreadsheet')
, '/Workbook/Worksheet/Table/Row'
COLUMNS rec_id NUMBER PATH 'Cell[1]/Data/text()'
, description VARCHAR2(80) PATH 'Cell[2]/Data/text()'
, rec_value VARCHAR2(30) PATH 'Cell[3]/Data/text()'
#');
-- Insert :
insert into ext_smldata values(
xmltype(
bfilename('XML_DIR','smldata.xml')
, nls_charset_id('AL32UTF8')
)
);
SQL> set timing on
SQL> set autotrace traceonly
SQL> SELECT x.*
2 FROM ext_smldata t
3 , XMLTable(
4 XMLNamespaces (default 'urn:schemas-microsoft-com:office:spreadsheet')
5 , '/Workbook/Worksheet/Table/Row'
6 PASSING t.object_value
7 COLUMNS rec_id NUMBER PATH 'Cell[1]/Data/text()'
8 , description VARCHAR2(80) PATH 'Cell[2]/Data/text()'
9 , rec_value VARCHAR2(30) PATH 'Cell[3]/Data/text()'
10 ) x
11 WHERE t.title = 'SampleData1'
12 ;
50000 rows selected.
Elapsed: 00:00:01.69
Execution Plan
----------------------------------------------------------
Plan hash value: 3987672269
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 59520 | 5405K| 174 (2)| 00:00:03 |
|* 1 | HASH JOIN | | 59520 | 5405K| 174 (2)| 00:00:03 |
| 2 | TABLE ACCESS BY INDEX ROWID| EXT_SMLDATA | 1 | 29 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EXT_SMLDATA_TITLE_IDX | 1 | | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | EXT_SMLDATA_XTB | 59520 | 3720K| 171 (1)| 00:00:03 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T"."SYS_NC_OID$"="SYS_SXI_0"."OID")
3 - access("T"."TITLE"='SampleData1')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3962 consistent gets
77 physical reads
4796 redo size
2823321 bytes sent via SQL*Net to client
37083 bytes received via SQL*Net from client
3335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50000 rows processed
The explain plan shows that the underlying relational table supporting the XML index is used to retrieve the data.



Hi odie
I have an xml file which is stored in a table as xmltype from which i need an output like :-
R_NO REQ STATUS_CODE
123 525 JOB
123 560 CNCLD
”
123
525
JOB
123
560
CNCLD
”
My problem here the column names i cannot hardcode for example PASSING XML_VALUE
columns R_NO VARCHAR2(20) PATH ‘R_NO’
is not the solution i need to get my code has to traverse the xml and each node has to transform into a column , i cannot hardcode column name R_NO as above. runtime it has to identify, any idea?
Hi,
Sorry I’m not sure I understand.
If you have an unknown number of resulting columns then you can’t do it with a single SELECT.
I’d be glad to help though, but need more details. Could you post your issue in the XML DB forum?
Make sure you provide :
Thanks.
Hi Odie,
How do you specifiy the index parameter when this index spec is longer than 1000 chars ?
That would be the index ext_smldata_sxi in your example.
In my case the spec is longer than 1000 chars and thus can not be created with parameters construct of your example.
Hope you can help.
Mike
Hi Mike,
You’ll have to do it in two steps : registering the parameter (we can use CLOB there), then creating the index by referencing the parameter.
begin dbms_xmlindex.registerparameter( 'ext_smldata_xtb_param' , 'XMLTable ext_smldata_xtb ... big text content here ... ' ); end; / create index ext_smldata_sxi on ext_smldata (object_value) indextype is xdb.xmlindex parameters ('PARAM ext_smldata_xtb_param') ;Odie,
This artcile was soooo incredibly helpful to me. You have provided me with a fantastic solution to a major issue that has been hanging over my head for months. A million thanks!
One question: Given than the spreadhsheet column-names are in the top row of the sheet (in the X1 table?), would it somehow be possible to read in the column-names rather than needing to list them (hard-coded) under the COLUMNS clause under X2?
–Jon
Hi Jon,
No, it’s not possible with a static query, the SQL projection must be known at parse time.
It can be done with dynamic SQL though, in two steps :
1- Read the top row to find column names
2- Build an SQL string using these names and access the data via a REF CURSOR (or DBMS_SQL package)
How to display multiple row within a single cell.
For eg: In Cell “A1″ i need to display
1-2
3-4
(One below the other).
I had tried with ,, CDATA, 
,&#A,&#a. That is not working for me.
Can anyone please help me in this issue ???
This one is tricky.
You have to replace LF characters with the corresponding entity, e.g.
XMLElement("ITEM", replace(my_col, chr(10), ' '))Then in the stylesheet, use disable-output-escaping attribute so that is preserved in the output :
<xsl:value-of select="ITEM" disable-output-escaping="yes"/>