June 21, 2016March 27, 2018 Marc

Oracle SQL – Reading an Excel File (xlsx) as an External Table

I’ve been thinking about it for quite a long time and never really had time to implement it, but it’s finally there : a pipelined table interface to read an Excel file (.xlsx) as if it were an external table.

It’s entirely implemented in PL/SQL using an object type (for the ODCI routines) and a package supporting the core functionalities.
Available for download on GitHub :

/mbleron/ExcelTable

Usage

ExcelTable.getRows table function :

This is the actual SQL interface, to be used in conjunction with the TABLE operator.
It returns an ANYDATASET instance whose structure is defined by the p_cols parameter.

  function getRows (
    p_file   in  blob
  , p_sheet  in  varchar2
  , p_cols   in  varchar2
  , p_range  in  varchar2 default null
  ) 
  return anydataset pipelined
  using ExcelTableImpl;

Arg	Data type	Desc	Mandatory
`p_file`	BLOB	Input Excel file in Office Open XML format (.xlsx or .xlsm). A helper function (`ExcelTable.getFile`) is available to directly reference the file from a directory.	Yes
`p_sheet`	VARCHAR2	Worksheet name	Yes
`p_cols`	VARCHAR2	Column list (see specs below)	Yes
`p_range`	VARCHAR2	Excel-like range expression that defines the table boundaries in the worksheet (see specs below)	No

Range syntax specification

 range_expr ::= ( cell_ref [ ":" cell_ref ] | col_ref ":" col_ref | row_ref ":" row_ref )
 cell_ref   ::= col_ref row_ref
 col_ref    ::= { "A".."Z" }
 row_ref    ::= integer

If the range is empty, the table implicitly starts at cell A1.

Otherwise, there are four ways to specify the table range :

Range of rows : '1:100'
In this case the range of columns implicitly starts at A.

Range of columns : 'B:E'
In this case the range of rows implicitly starts at 1.

Range of cells (top-left to bottom-right) : 'B2:F150'
Single cell anchor (top-left cell) : 'C3'

Columns syntax specification

The syntax is similar to the column list in a CREATE TABLE statement, with a couple of specifics :

 column_list    ::= column_expr { "," column_expr }
 column_expr    ::= ( identifier datatype [ "column" string_literal ] | identifier for_ordinality )
 datatype       ::= ( number_expr | varchar2_expr | date_expr | clob_expr )
 number_expr    ::= "number" [ "(" ( integer | "*" ) [ "," [ "-" ] integer ] ")" ]
 varchar2_expr  ::= "varchar2" "(" integer [ "char" | "byte" ] ")"
 date_expr      ::= "date" [ "format" string_literal ]
 clob_expr      ::= "clob"
 for_ordinality ::= "for" "ordinality"
 identifier     ::= "\"" { char } "\""
 string_literal ::= "'" { char } "'"

Column names must be declared using a quoted identifier.

Supported data types are :

NUMBER – with optional precision and scale specs

VARCHAR2 – including CHAR/BYTE semantics
Values larger than the maximum length declared are silently truncated and no error is reported.

DATE – with optional format mask
The format mask is used if the value is stored as text in the spreadsheet, otherwise the date value is assumed to be stored as date in Excel’s internal serial format.

CLOB

A special “FOR ORDINALITY” clause (like XMLTABLE or JSON_TABLE’s one) is also available to autogenerate a sequence number.

Each column definition (except for the one qualified with FOR ORDINALITY) may be complemented with an optional “COLUMN” clause to explicitly target a named column in the spreadsheet, instead of relying on the order of the declarations (relative to the range).
Positional and named column definitions cannot be mixed.

For instance :

  "RN"    for ordinality
, "COL1"  number
, "COL2"  varchar2(10)
, "COL3"  varchar2(4000)
, "COL4"  date           format 'YYYY-MM-DD'
, "COL5"  number(10,2)
, "COL6"  varchar2(5)

  "COL1"  number        column 'A'
, "COL2"  varchar2(10)  column 'C'
, "COL3"  clob          column 'D'

Examples

Using this sample file :

1- Loading all six columns, starting at cell A2, in order to skip the header :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , ' "SRNO"    number
  7           , "NAME"    varchar2(10)
  8           , "VAL"     number
  9           , "DT"      date
 10           , "SPARE1"  varchar2(6)
 11           , "SPARE2"  varchar2(6)'
 12         , 'A2'
 13         )
 14       ) t
 15  ;

      SRNO NAME              VAL DT                  SPARE1 SPARE2
---------- ---------- ---------- ------------------- ------ ------
         1 LINE-00001 66916.2986 13/10/1923 11:45:52
         2 LINE-00002 96701.3427 05/09/1906 10:12:35
         3 LINE-00003 68778.8698 23/01/1911 09:26:22        OK
         4 LINE-00004  95110.028 03/05/1907 13:52:30        OK
         5 LINE-00005 62561.5708 04/04/1927 18:10:39
         6 LINE-00006 28677.1166 11/07/1923 15:10:59        OK
         7 LINE-00007 16141.0202 20/11/1902 02:02:24
         8 LINE-00008 80362.6256 19/09/1910 14:06:42
         9 LINE-00009 10384.1973 16/07/1902 04:54:12
        10 LINE-00010  5266.9097 08/08/1921 11:51:34
        11 LINE-00011 12513.0679 01/07/1908 21:53:55
        12 LINE-00012 66596.9707 22/03/1913 05:20:10

...

        95 LINE-00095 96274.2193 08/04/1914 22:48:31
        96 LINE-00096  29783.146 06/04/1915 23:49:23
        97 LINE-00097 19857.7661 16/02/1909 21:21:52
        98 LINE-00098 19504.3969 05/12/1917 01:56:05
        99 LINE-00099 98675.8673 05/06/1906 17:41:10
       100 LINE-00100 24288.2885 22/07/1920 13:25:59

100 rows selected.

2- Loading columns B and F only, from rows 2 to 10, with a generated sequence :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , q'{
  7             "R_NUM"   for ordinality
  8           , "NAME"    varchar2(10) column 'B'
  9           , "SPARE2"  varchar2(6)  column 'F'
 10           }'
 11         , '2:10'
 12         )
 13       ) t
 14  ;

     R_NUM NAME       SPARE2
---------- ---------- ------
         1 LINE-00001
         2 LINE-00002
         3 LINE-00003 OK
         4 LINE-00004 OK
         5 LINE-00005
         6 LINE-00006 OK
         7 LINE-00007
         8 LINE-00008
         9 LINE-00009

9 rows selected.

Posted in Miscellaneous, XML DBTagged Excel, ExcelTable, ExcelTableImpl, ODCI, Office Open XML, xlsx30 Comments

30 thoughts on “Oracle SQL – Reading an Excel File (xlsx) as an External Table”

paulzip says:

July 1, 2016 at 19:12

This is an excellent piece of work on so many levels. The grammar for mapping types is excellent.

Reply
Anonymous says:

October 24, 2016 at 20:48

Odie,
I am trying to use your, Code for a POC, but I am stuck at the point in the oox_install.sql Script where there ina Anonymous Block,
declare
res boolean;
begin
res := DBMS_XDB.createFolder(‘/office’);
res := DBMS_XDB.createFolder(‘/office/excel’);
res := DBMS_XDB.createFolder(‘/office/excel/conf’);
res := DBMS_XDB.createFolder(‘/office/excel/temp’);
res := DBMS_XDB.createFolder(‘/office/excel/docs’);
end;
/

I am getting the Following Error
[Error] Execution (77: 1): ORA-31050: Access denied
ORA-06512: at “XDB.DBMS_XDB”, line 340
ORA-06512: at line 4

I requested my DBA to Provide XDBADMIN to my schema, still the Same.

What could be wrong.?

Reply
1. Marc says:
  
  October 25, 2016 at 14:45
  
  Seems like you want to use the demo from this post : https://odieweblog.wordpress.com/2012/01/28/xml-db-events-reading-an-open-office-xml-document-xlsx/
  yet you’re posting your comment in another post.
  What do you want to do?
  Try EXCELTABLE package instead.
  
  Reply
Anand says:

October 25, 2016 at 16:53

Odie,
Thanks for your Reply, I apologize for posting this Comment in the Wrong window, I had Opened Almost All your XML + Reading Excel file related article pages in my browser.

Reply
Anonymous says:

March 8, 2017 at 18:09

Hi,

The solution sounds promising, but I followed your installation steps on GitHub, and I got following compilation errrors for “ExcelTable.pkb”:

Error(979,3): PL/SQL: Item ignored
Error(989,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.initialize(java.sql.Blob, java.sql.Blob, java.lang.String, int, int, int) return int” is incomplete or malformed
Error(992,3): PL/SQL: Item ignored
Error(995,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.iterate(int, int) return java.sql.Array” is incomplete or malformed
Error(998,3): PL/SQL: Item ignored
Error(1000,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.terminate(int)” is incomplete or malformed

Any ideas/suggestions?

Thanks

Reply
1. Marc says:
  
  March 11, 2017 at 12:12
  
  What’s your database version? (select * from v$version)
  
  Reply
Robert Chalton says:

May 30, 2017 at 18:07

Very useful package, thanks for sharing. I was trying to use as part of an ETL process written in PL/SQL. I have the call to your package within a cursor. When I run the SQL bit as SQL it works fine. But when I try to compile the exact same SQL but in a PL/SQL cursor (cursor xyz is select … from table(exceltable.getrows….) I get ORA-22905, cannot access rows from a non-nested table item. Any clues? Thanks again.

Reply
1. Marc says:
  
  May 31, 2017 at 11:14
  
  Hi Robert, thanks for your feedback.
  It is a known error (unfortunately) when trying to use table functions returning AnyDataset within PL/SQL.
  Apparently, Oracle is not able to call the ODCITableDescribe routine at compile time and therefore cannot retrieve the projection from the SELECT statement.
  The workaround is to use dynamic SQL :
  OPEN my_refcursor FOR 'SELECT ...' USING <bind variables>
  
  Reply
  1. Robert Chalton says:
    
    June 7, 2017 at 16:52
    
    Thanks for feedback – that’s exactly how I got around the issue. Good to know I’m not totally stupid :)
  2. Marc says:
    
    June 12, 2017 at 20:52
    
    FYI, in ExcelTable 1.4, I’ve added a new function getCursor() with same parameters as getRows() and returning a REF cursor.
Anonymous says:

August 17, 2017 at 17:36

The solution seems to have everything we need for our problem. I just have one question. Our Excel files are stored in a table as BLOB. Can you please post an example how getRows() can be used with BLOB?

Reply
1. Marc says:
  
  August 18, 2017 at 23:19
  That’s easy. Just pass your BLOB column to the p_file parameter :
```
SELECT ...
FROM my_table t
   , TABLE(
       ExcelTable.getRows(
         p_file   => t.my_blob_col
       , p_sheet  => 'sheetName'
       , p_cols   => ' ... COLUMNS ... '
       , p_range  => 'A2'
       , p_method => 0
       )
     ) x
WHERE ...
;
```
  Reply
Ryan Docena says:

November 13, 2017 at 08:37

Hi, this is excellent work. Could you provide an example using getCursor() fetched into a record variable? Thanks.

Reply
Anonymous says:

November 20, 2017 at 15:42

Hello. I’m trying to use your code with example 1 to test but I can’t run it. Surely a mistake from my side, but i don’t find the clue …
Directory created and granted sur my user REF
file ooxdata3.xlsx exists and test of function getfile is OK

But this code :

select t.*
from table(
ExcelTable.getRows(
ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
, ‘DataSource’
, ‘ “SRNO” number
, “NAME” varchar2(10)
, “VAL” number
, “DT” date
, “SPARE1” varchar2(6)
, “SPARE2” varchar2(6)’
, ‘A2’
)
) t
;

return a PLS-00225 : line 4 col 6 subprogram of cursor ‘REF’ is out of scope.

Any idea ?

Reply
1. Anonymous says:
  
  November 22, 2017 at 13:50
  
  I’m on oracle 11.2.0.3 x64 and windows 2008 R2 … All is well compiled, I don’t know where to search.
  
  Reply
2. Anonymous says:
  
  November 23, 2017 at 09:49
  
  I’ve done another test on oracle 12c and it works.
  
  Reply
Demis Isella says:

December 15, 2017 at 18:00

Hello. I’m trying to use your code with example 1 to test but I can’t run it.

select t.*
from table(
ExcelTable.getRows(
ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
, ‘DataSource’
, ‘ “SRNO” number
, “NAME” varchar2(10)
, “VAL” number
, “DT” date
, “SPARE1” varchar2(6)
, “SPARE2” varchar2(6)’
, ‘A2’
)
) t
;

return ORA-00955: name is already used by an existing object al line 1121 in EXCELTABLE object.
At line 1121 your code tries to create a global temporary table.
WHat can i do?
Thanks

Reply
Ahmad Arafeh says:

January 10, 2018 at 10:20

How can we use This query inside SP or PL/SQL block? I get error always cannot access rows from non-nested table

Reply
1. Marc says:
  
  January 10, 2018 at 19:41
  
  It’s explained in one of the comments above : use dynamic SQL, or ExcelTable.getCursor() if you want to use the query in a cursor (see the README).
  
  Reply
hari Chattopadhyay says:

July 10, 2018 at 01:01

Great work. It helped me a lot. Thank you for sharing.
It’s working nicely for small file for me but
while inserting a large excel file 200000 rows into oracle table, I getting
insert /+APPEND/ into REASSIGNMENT_tmp
*
ERROR at line 1:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00664: VM Node-Stack overflow.
ORA-06512: at “XDB.DBMS_XSLPROCESSOR”, line 743
ORA-06512: at “XDB.DBMS_XSLPROCESSOR”, line 771
ORA-06512: at “HARI.EXCELTABLE”, line 1760
ORA-06512: at “HARI.EXCELTABLE”, line 1862
ORA-06512: at “HARI.EXCELTABLE”, line 2295
ORA-06512: at “HARI.EXCELTABLEIMPL”, line 60
ORA-06512: at line 1

Please advice;
2) I need to commit every 10000 record, what method you suggest ?

Reply
1. Marc says:
  
  July 10, 2018 at 19:41
  
  For large files, you’ll have to use the streaming read method (requires Java).
  See the README for instructions, and this post.
  
  About your other question, I’ve never understood the need for any “COMMIT every N record” requirement.
  Anyway, it’s not possible with a single INSERT/SELECT statement, let alone with the APPEND hint.
  I suggest you use the cursor-based approach. See procedure ExcelTable.getCursor() and insert in a loop, then you’ll be able to implement your commit-every stuff.
  
  Regards,
  Marc.
  
  Reply
Anu says:

August 16, 2018 at 11:29

How we have to specify the Boolean data type

Reply
1. Marc says:
  
  August 17, 2018 at 21:38
  
  What do you mean? Boolean data type from Excel?
  
  Reply
Bryan says:

August 16, 2018 at 17:19
I am trying to create a materialized view from the select of the table. The direct select works fine, but when I place into the materialized view the it only takes the first column (for each column …

Has Anyone come across this issue? is there a way to resolve?
Is there something in my code I could use to fix this?

In the following example, the Materialized view give the value of the row_number in each column…

SELECT xl.*
FROM TABLE (IDATA.PAK_ExcelTable.F_getRows (
IDATA.PAK_ExcelTable.F_getFile (‘IDATA_DATA’, ‘EMIDB.XLSX’),
‘Sheet1’,
‘”ROW_NUMBER” for ordinality
, “OPEN_INCIDENT_FLAG” varchar2(5) column ”A”
, “BAND_NUMBER” number(10, 0) column ”B”
, “EMAP_ELIGIBLE_FLAG” varchar2(1) column ”C”
, “MAIN_CATEGORY” varchar2(5) column ”D”’,
‘A2’)) xl;

ROW_NUMBER OPEN_INCIDENT_FLAG BAND_NUMBER EMAP_ELIGIBLE_FLAG MAIN_CATEGORY
```
     1 N                          290 Y                  E            
     2 N                          261 Y                  E            
     3 N                          264 Y                  E            
     4 N                          269 Y                  E            
     5 N                          274 Y                  E            
```
CREATE MATERIALIZED VIEW BB_TEST (ROW_NUMBER,OPEN_INCIDENT_FLAG,BAND_NUMBER,EMAP_ELIGIBLE_FLAG,MAIN_CATEGORY)
BUILD IMMEDIATE
REFRESH FORCE ON DEMAND
WITH PRIMARY KEY
AS
/* Formatted on 8/15/2018 11:52:16 AM (QP5 v5.269.14213.34746) /
SELECT xl.
FROM TABLE (IDATA.PAK_ExcelTable.F_getRows (
IDATA.PAK_ExcelTable.F_getFile (‘IDATA_DATA’, ‘EMIDB.XLSX’),
‘Sheet1’,
‘”ROW_NUMBER” for ordinality
, “OPEN_INCIDENT_FLAG” varchar2(5) column ”A”
, “BAND_NUMBER” number(10, 0) column ”B”
, “EMAP_ELIGIBLE_FLAG” varchar2(1) column ”C”
, “MAIN_CATEGORY” varchar2(5) column ”D”’,
‘A2’)) xl;

select * from bb_test
;
ROW_NUMBER OPEN_INCIDENT_FLAG BAND_NUMBER EMAP_ELIGIBLE_FLAG MAIN_CATEGORY
```
     1                  1           1                  1             1
     2                  2           2                  2             2
     3                  3           3                  3             3
     4                  4           4                  4             4
     5                  5           5                  5             5
```
Reply
1. Marc says:
  
  August 17, 2018 at 21:50
  
  I can’t reproduce the problem. What’s your exact database version please?
  
  Reply
Anonymous says:

October 31, 2018 at 14:07

Hello, please help with the error:
ORA-20722: Error at position 1 : unexpected symbol ” instead of ”
ORA-06512: на “DWH.EXCELTABLE”, line 1041
ORA-06512: на “DWH.EXCELTABLE”, line 1414
ORA-06512: на “DWH.EXCELTABLE”, line 1427
ORA-06512: на “DWH.EXCELTABLE”, line 1574
ORA-06512: на “DWH.EXCELTABLE”, line 1585
ORA-06512: на “DWH.EXCELTABLE”, line 1597
ORA-06512: на “DWH.EXCELTABLE”, line 2688
ORA-06512: на “DWH.EXCELTABLEIMPL”, line 16
ORA-06512: на line 4

select t.*
from table(
ExcelTable.getRows(
ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
, ‘DataSource’
, ‘ “SRNO” number
, “NAME” varchar2(10)
, “VAL” number
, “DT” date
, “SPARE1” varchar2(6)
, “SPARE2” varchar2(6)’
, ‘A2’
)
) t
;

Reply
Lan Nguyen says:

December 9, 2019 at 20:56

Hello,
Just found this post and it’s exactly what I for my requirement. However, it seems the Excel file must be on the DB server for the store function to work, correct? How can we load the Excel file that reside on Window client since I don’t have access to the DB server? Thank you in advance,

Reply
Marc says:

December 28, 2019 at 09:23

Hi, sorry for the late reply.
You are correct, the file must reside in a location the DB server has access to, not necessarily the DB server itself though, could be a NFS (Network File System) for example.
To address your situation, a classical approach is to mount a shared Windows directory on the DB server, so that end users may deposit files to be read by the DB.
If such a setup is not possible, the only other option I see is to send the file to the server as a separate step (e.g. SFTP/FTP protocol), or directly load the file in a BLOB in a temp table using SQL*Loader, you’ll then be able to use ExcelTable over the BLOB.

Reply
Alex K says:

October 30, 2020 at 01:19

This is really cool Thank you for sharing.

DATE type columns gave me some trouble initially, but turns out that was related to the header row not being a valid DATE format.

Is there a good way to have this ignore the row of headers, specifically when the column needs to be DATE type?

Reply
Alex K says:

October 30, 2020 at 01:27

I just answered my own question. Adding ,’A2′ at the end of the selection, as you show in an example, will resolve the header situation.

Reply

Odie's Oracle Blog

XML & JSON functionalities in the database, and related stuff…

Oracle SQL – Reading an Excel File (xlsx) as an External Table

Usage

Range syntax specification

Columns syntax specification

Examples

30 thoughts on “Oracle SQL – Reading an Excel File (xlsx) as an External Table”

Leave a reply to Marc Cancel reply

Usage

Range syntax specification

Columns syntax specification

Examples

Share this:

Related

30 thoughts on “Oracle SQL – Reading an Excel File (xlsx) as an External Table”

Leave a reply to Marc Cancel reply