I’ve been thinking about it for quite a long time and never really had time to implement it, but it’s finally there : a pipelined table interface to read an Excel file (.xlsx) as if it were an external table.
It’s entirely implemented in PL/SQL using an object type (for the ODCI routines) and a package supporting the core functionalities.
Available for download on GitHub :
Usage
ExcelTable.getRows
table function :
This is the actual SQL interface, to be used in conjunction with the TABLE operator.
It returns an ANYDATASET instance whose structure is defined by the p_cols parameter.
function getRows ( p_file in blob , p_sheet in varchar2 , p_cols in varchar2 , p_range in varchar2 default null ) return anydataset pipelined using ExcelTableImpl;
Arg | Data type | Desc | Mandatory |
---|---|---|---|
p_file |
BLOB | Input Excel file in Office Open XML format (.xlsx or .xlsm). A helper function ( ExcelTable.getFile ) is available to directly reference the file from a directory. |
Yes |
p_sheet |
VARCHAR2 | Worksheet name | Yes |
p_cols |
VARCHAR2 | Column list (see specs below) | Yes |
p_range |
VARCHAR2 | Excel-like range expression that defines the table boundaries in the worksheet (see specs below) | No |
Range syntax specification
range_expr ::= ( cell_ref [ ":" cell_ref ] | col_ref ":" col_ref | row_ref ":" row_ref ) cell_ref ::= col_ref row_ref col_ref ::= { "A".."Z" } row_ref ::= integer
If the range is empty, the table implicitly starts at cell A1.
Otherwise, there are four ways to specify the table range :
- Range of rows :
'1:100'
In this case the range of columns implicitly starts at A. - Range of columns :
'B:E'
In this case the range of rows implicitly starts at 1. - Range of cells (top-left to bottom-right) :
'B2:F150'
- Single cell anchor (top-left cell) :
'C3'
Columns syntax specification
The syntax is similar to the column list in a CREATE TABLE statement, with a couple of specifics :
column_list ::= column_expr { "," column_expr } column_expr ::= ( identifier datatype [ "column" string_literal ] | identifier for_ordinality ) datatype ::= ( number_expr | varchar2_expr | date_expr | clob_expr ) number_expr ::= "number" [ "(" ( integer | "*" ) [ "," [ "-" ] integer ] ")" ] varchar2_expr ::= "varchar2" "(" integer [ "char" | "byte" ] ")" date_expr ::= "date" [ "format" string_literal ] clob_expr ::= "clob" for_ordinality ::= "for" "ordinality" identifier ::= "\"" { char } "\"" string_literal ::= "'" { char } "'"
Column names must be declared using a quoted identifier.
Supported data types are :
- NUMBER – with optional precision and scale specs
- VARCHAR2 – including CHAR/BYTE semantics
Values larger than the maximum length declared are silently truncated and no error is reported. - DATE – with optional format mask
The format mask is used if the value is stored as text in the spreadsheet, otherwise the date value is assumed to be stored as date in Excel’s internal serial format. - CLOB
A special “FOR ORDINALITY” clause (like XMLTABLE or JSON_TABLE’s one) is also available to autogenerate a sequence number.
Each column definition (except for the one qualified with FOR ORDINALITY) may be complemented with an optional “COLUMN” clause to explicitly target a named column in the spreadsheet, instead of relying on the order of the declarations (relative to the range).
Positional and named column definitions cannot be mixed.
For instance :
"RN" for ordinality , "COL1" number , "COL2" varchar2(10) , "COL3" varchar2(4000) , "COL4" date format 'YYYY-MM-DD' , "COL5" number(10,2) , "COL6" varchar2(5)
"COL1" number column 'A' , "COL2" varchar2(10) column 'C' , "COL3" clob column 'D'
Examples
Using this sample file :
1- Loading all six columns, starting at cell A2, in order to skip the header :
SQL> select t.* 2 from table( 3 ExcelTable.getRows( 4 ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx') 5 , 'DataSource' 6 , ' "SRNO" number 7 , "NAME" varchar2(10) 8 , "VAL" number 9 , "DT" date 10 , "SPARE1" varchar2(6) 11 , "SPARE2" varchar2(6)' 12 , 'A2' 13 ) 14 ) t 15 ; SRNO NAME VAL DT SPARE1 SPARE2 ---------- ---------- ---------- ------------------- ------ ------ 1 LINE-00001 66916.2986 13/10/1923 11:45:52 2 LINE-00002 96701.3427 05/09/1906 10:12:35 3 LINE-00003 68778.8698 23/01/1911 09:26:22 OK 4 LINE-00004 95110.028 03/05/1907 13:52:30 OK 5 LINE-00005 62561.5708 04/04/1927 18:10:39 6 LINE-00006 28677.1166 11/07/1923 15:10:59 OK 7 LINE-00007 16141.0202 20/11/1902 02:02:24 8 LINE-00008 80362.6256 19/09/1910 14:06:42 9 LINE-00009 10384.1973 16/07/1902 04:54:12 10 LINE-00010 5266.9097 08/08/1921 11:51:34 11 LINE-00011 12513.0679 01/07/1908 21:53:55 12 LINE-00012 66596.9707 22/03/1913 05:20:10 ... 95 LINE-00095 96274.2193 08/04/1914 22:48:31 96 LINE-00096 29783.146 06/04/1915 23:49:23 97 LINE-00097 19857.7661 16/02/1909 21:21:52 98 LINE-00098 19504.3969 05/12/1917 01:56:05 99 LINE-00099 98675.8673 05/06/1906 17:41:10 100 LINE-00100 24288.2885 22/07/1920 13:25:59 100 rows selected.
2- Loading columns B and F only, from rows 2 to 10, with a generated sequence :
SQL> select t.* 2 from table( 3 ExcelTable.getRows( 4 ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx') 5 , 'DataSource' 6 , q'{ 7 "R_NUM" for ordinality 8 , "NAME" varchar2(10) column 'B' 9 , "SPARE2" varchar2(6) column 'F' 10 }' 11 , '2:10' 12 ) 13 ) t 14 ; R_NUM NAME SPARE2 ---------- ---------- ------ 1 LINE-00001 2 LINE-00002 3 LINE-00003 OK 4 LINE-00004 OK 5 LINE-00005 6 LINE-00006 OK 7 LINE-00007 8 LINE-00008 9 LINE-00009 9 rows selected.
This is an excellent piece of work on so many levels. The grammar for mapping types is excellent.
Odie,
I am trying to use your, Code for a POC, but I am stuck at the point in the oox_install.sql Script where there ina Anonymous Block,
declare
res boolean;
begin
res := DBMS_XDB.createFolder(‘/office’);
res := DBMS_XDB.createFolder(‘/office/excel’);
res := DBMS_XDB.createFolder(‘/office/excel/conf’);
res := DBMS_XDB.createFolder(‘/office/excel/temp’);
res := DBMS_XDB.createFolder(‘/office/excel/docs’);
end;
/
I am getting the Following Error
[Error] Execution (77: 1): ORA-31050: Access denied
ORA-06512: at “XDB.DBMS_XDB”, line 340
ORA-06512: at line 4
I requested my DBA to Provide XDBADMIN to my schema, still the Same.
What could be wrong.?
Seems like you want to use the demo from this post : https://odieweblog.wordpress.com/2012/01/28/xml-db-events-reading-an-open-office-xml-document-xlsx/
yet you’re posting your comment in another post.
What do you want to do?
Try EXCELTABLE package instead.
Odie,
Thanks for your Reply, I apologize for posting this Comment in the Wrong window, I had Opened Almost All your XML + Reading Excel file related article pages in my browser.
Hi,
The solution sounds promising, but I followed your installation steps on GitHub, and I got following compilation errrors for “ExcelTable.pkb”:
Error(979,3): PL/SQL: Item ignored
Error(989,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.initialize(java.sql.Blob, java.sql.Blob, java.lang.String, int, int, int) return int” is incomplete or malformed
Error(992,3): PL/SQL: Item ignored
Error(995,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.iterate(int, int) return java.sql.Array” is incomplete or malformed
Error(998,3): PL/SQL: Item ignored
Error(1000,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.terminate(int)” is incomplete or malformed
Any ideas/suggestions?
Thanks
What’s your database version? (select * from v$version)
Very useful package, thanks for sharing. I was trying to use as part of an ETL process written in PL/SQL. I have the call to your package within a cursor. When I run the SQL bit as SQL it works fine. But when I try to compile the exact same SQL but in a PL/SQL cursor (cursor xyz is select … from table(exceltable.getrows….) I get ORA-22905, cannot access rows from a non-nested table item. Any clues? Thanks again.
Hi Robert, thanks for your feedback.
It is a known error (unfortunately) when trying to use table functions returning AnyDataset within PL/SQL.
Apparently, Oracle is not able to call the ODCITableDescribe routine at compile time and therefore cannot retrieve the projection from the SELECT statement.
The workaround is to use dynamic SQL :
OPEN my_refcursor FOR 'SELECT ...' USING <bind variables>
Thanks for feedback – that’s exactly how I got around the issue. Good to know I’m not totally stupid :)
FYI, in ExcelTable 1.4, I’ve added a new function getCursor() with same parameters as getRows() and returning a REF cursor.
The solution seems to have everything we need for our problem. I just have one question. Our Excel files are stored in a table as BLOB. Can you please post an example how getRows() can be used with BLOB?
That’s easy. Just pass your BLOB column to the
p_file
parameter :Hi, this is excellent work. Could you provide an example using getCursor() fetched into a record variable? Thanks.
Hello. I’m trying to use your code with example 1 to test but I can’t run it. Surely a mistake from my side, but i don’t find the clue …
Directory created and granted sur my user REF
file ooxdata3.xlsx exists and test of function getfile is OK
But this code :
select t.*
from table(
ExcelTable.getRows(
ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
, ‘DataSource’
, ‘ “SRNO” number
, “NAME” varchar2(10)
, “VAL” number
, “DT” date
, “SPARE1” varchar2(6)
, “SPARE2” varchar2(6)’
, ‘A2’
)
) t
;
return a PLS-00225 : line 4 col 6 subprogram of cursor ‘REF’ is out of scope.
Any idea ?
I’m on oracle 11.2.0.3 x64 and windows 2008 R2 … All is well compiled, I don’t know where to search.
I’ve done another test on oracle 12c and it works.
Hello. I’m trying to use your code with example 1 to test but I can’t run it.
select t.*
from table(
ExcelTable.getRows(
ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
, ‘DataSource’
, ‘ “SRNO” number
, “NAME” varchar2(10)
, “VAL” number
, “DT” date
, “SPARE1” varchar2(6)
, “SPARE2” varchar2(6)’
, ‘A2’
)
) t
;
return ORA-00955: name is already used by an existing object al line 1121 in EXCELTABLE object.
At line 1121 your code tries to create a global temporary table.
WHat can i do?
Thanks
How can we use This query inside SP or PL/SQL block? I get error always cannot access rows from non-nested table
It’s explained in one of the comments above : use dynamic SQL, or ExcelTable.getCursor() if you want to use the query in a cursor (see the README).
Great work. It helped me a lot. Thank you for sharing.
It’s working nicely for small file for me but
while inserting a large excel file 200000 rows into oracle table, I getting
insert /+APPEND/ into REASSIGNMENT_tmp
*
ERROR at line 1:
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00664: VM Node-Stack overflow.
ORA-06512: at “XDB.DBMS_XSLPROCESSOR”, line 743
ORA-06512: at “XDB.DBMS_XSLPROCESSOR”, line 771
ORA-06512: at “HARI.EXCELTABLE”, line 1760
ORA-06512: at “HARI.EXCELTABLE”, line 1862
ORA-06512: at “HARI.EXCELTABLE”, line 2295
ORA-06512: at “HARI.EXCELTABLEIMPL”, line 60
ORA-06512: at line 1
Please advice;
2) I need to commit every 10000 record, what method you suggest ?
For large files, you’ll have to use the streaming read method (requires Java).
See the README for instructions, and this post.
About your other question, I’ve never understood the need for any “COMMIT every N record” requirement.
Anyway, it’s not possible with a single INSERT/SELECT statement, let alone with the APPEND hint.
I suggest you use the cursor-based approach. See procedure ExcelTable.getCursor() and insert in a loop, then you’ll be able to implement your commit-every stuff.
Regards,
Marc.
How we have to specify the Boolean data type
What do you mean? Boolean data type from Excel?
I am trying to create a materialized view from the select of the table. The direct select works fine, but when I place into the materialized view the it only takes the first column (for each column …
Has Anyone come across this issue? is there a way to resolve?
Is there something in my code I could use to fix this?
In the following example, the Materialized view give the value of the row_number in each column…
SELECT xl.*
FROM TABLE (IDATA.PAK_ExcelTable.F_getRows (
IDATA.PAK_ExcelTable.F_getFile (‘IDATA_DATA’, ‘EMIDB.XLSX’),
‘Sheet1’,
‘”ROW_NUMBER” for ordinality
, “OPEN_INCIDENT_FLAG” varchar2(5) column ”A”
, “BAND_NUMBER” number(10, 0) column ”B”
, “EMAP_ELIGIBLE_FLAG” varchar2(1) column ”C”
, “MAIN_CATEGORY” varchar2(5) column ”D”’,
‘A2’)) xl;
ROW_NUMBER OPEN_INCIDENT_FLAG BAND_NUMBER EMAP_ELIGIBLE_FLAG MAIN_CATEGORY
CREATE MATERIALIZED VIEW BB_TEST (ROW_NUMBER,OPEN_INCIDENT_FLAG,BAND_NUMBER,EMAP_ELIGIBLE_FLAG,MAIN_CATEGORY)
BUILD IMMEDIATE
REFRESH FORCE ON DEMAND
WITH PRIMARY KEY
AS
/* Formatted on 8/15/2018 11:52:16 AM (QP5 v5.269.14213.34746) /
SELECT xl.
FROM TABLE (IDATA.PAK_ExcelTable.F_getRows (
IDATA.PAK_ExcelTable.F_getFile (‘IDATA_DATA’, ‘EMIDB.XLSX’),
‘Sheet1’,
‘”ROW_NUMBER” for ordinality
, “OPEN_INCIDENT_FLAG” varchar2(5) column ”A”
, “BAND_NUMBER” number(10, 0) column ”B”
, “EMAP_ELIGIBLE_FLAG” varchar2(1) column ”C”
, “MAIN_CATEGORY” varchar2(5) column ”D”’,
‘A2’)) xl;
select * from bb_test
;
ROW_NUMBER OPEN_INCIDENT_FLAG BAND_NUMBER EMAP_ELIGIBLE_FLAG MAIN_CATEGORY
I can’t reproduce the problem. What’s your exact database version please?
Hello, please help with the error:
ORA-20722: Error at position 1 : unexpected symbol ” instead of ”
ORA-06512: на “DWH.EXCELTABLE”, line 1041
ORA-06512: на “DWH.EXCELTABLE”, line 1414
ORA-06512: на “DWH.EXCELTABLE”, line 1427
ORA-06512: на “DWH.EXCELTABLE”, line 1574
ORA-06512: на “DWH.EXCELTABLE”, line 1585
ORA-06512: на “DWH.EXCELTABLE”, line 1597
ORA-06512: на “DWH.EXCELTABLE”, line 2688
ORA-06512: на “DWH.EXCELTABLEIMPL”, line 16
ORA-06512: на line 4
select t.*
from table(
ExcelTable.getRows(
ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
, ‘DataSource’
, ‘ “SRNO” number
, “NAME” varchar2(10)
, “VAL” number
, “DT” date
, “SPARE1” varchar2(6)
, “SPARE2” varchar2(6)’
, ‘A2’
)
) t
;
Hello,
Just found this post and it’s exactly what I for my requirement. However, it seems the Excel file must be on the DB server for the store function to work, correct? How can we load the Excel file that reside on Window client since I don’t have access to the DB server? Thank you in advance,
Hi, sorry for the late reply.
You are correct, the file must reside in a location the DB server has access to, not necessarily the DB server itself though, could be a NFS (Network File System) for example.
To address your situation, a classical approach is to mount a shared Windows directory on the DB server, so that end users may deposit files to be read by the DB.
If such a setup is not possible, the only other option I see is to send the file to the server as a separate step (e.g. SFTP/FTP protocol), or directly load the file in a BLOB in a temp table using SQL*Loader, you’ll then be able to use ExcelTable over the BLOB.
This is really cool Thank you for sharing.
DATE type columns gave me some trouble initially, but turns out that was related to the header row not being a valid DATE format.
Is there a good way to have this ignore the row of headers, specifically when the column needs to be DATE type?
I just answered my own question. Adding ,’A2′ at the end of the selection, as you show in an example, will resolve the header situation.