Oracle SQL – Reading an Excel File (xlsx) as an External Table

I’ve been thinking about it for quite a long time and never really had time to implement it, but it’s finally there : a pipelined table interface to read an Excel file (.xlsx) as if it were an external table.

It’s entirely implemented in PL/SQL using an object type (for the ODCI routines) and a package supporting the core functionalities.
Available for download on GitHub :

/mbleron/ExcelTable

 

Usage

ExcelTable.getRows table function :

This is the actual SQL interface, to be used in conjunction with the TABLE operator.
It returns an ANYDATASET instance whose structure is defined by the p_cols parameter.

  function getRows (
    p_file   in  blob
  , p_sheet  in  varchar2
  , p_cols   in  varchar2
  , p_range  in  varchar2 default null
  ) 
  return anydataset pipelined
  using ExcelTableImpl;
Arg Data type Desc Mandatory
p_file BLOB Input Excel file in Office Open XML format (.xlsx or .xlsm).
A helper function (ExcelTable.getFile) is available to directly reference the file from a directory.
Yes
p_sheet VARCHAR2 Worksheet name Yes
p_cols VARCHAR2 Column list (see specs below) Yes
p_range VARCHAR2 Excel-like range expression that defines the table boundaries in the worksheet (see specs below) No

 

Range syntax specification

 range_expr ::= ( cell_ref [ ":" cell_ref ] | col_ref ":" col_ref | row_ref ":" row_ref )
 cell_ref   ::= col_ref row_ref
 col_ref    ::= { "A".."Z" }
 row_ref    ::= integer

If the range is empty, the table implicitly starts at cell A1.

Otherwise, there are four ways to specify the table range :

  • Range of rows : '1:100'
    In this case the range of columns implicitly starts at A.
  • Range of columns : 'B:E'
    In this case the range of rows implicitly starts at 1.
  • Range of cells (top-left to bottom-right) : 'B2:F150'
  • Single cell anchor (top-left cell) : 'C3'

 

Columns syntax specification

The syntax is similar to the column list in a CREATE TABLE statement, with a couple of specifics :

 column_list    ::= column_expr { "," column_expr }
 column_expr    ::= ( identifier datatype [ "column" string_literal ] | identifier for_ordinality )
 datatype       ::= ( number_expr | varchar2_expr | date_expr | clob_expr )
 number_expr    ::= "number" [ "(" ( integer | "*" ) [ "," [ "-" ] integer ] ")" ]
 varchar2_expr  ::= "varchar2" "(" integer [ "char" | "byte" ] ")"
 date_expr      ::= "date" [ "format" string_literal ]
 clob_expr      ::= "clob"
 for_ordinality ::= "for" "ordinality"
 identifier     ::= "\"" { char } "\""
 string_literal ::= "'" { char } "'"

Column names must be declared using a quoted identifier.

Supported data types are :

  • NUMBER – with optional precision and scale specs
  • VARCHAR2 – including CHAR/BYTE semantics
    Values larger than the maximum length declared are silently truncated and no error is reported.
  • DATE – with optional format mask
    The format mask is used if the value is stored as text in the spreadsheet, otherwise the date value is assumed to be stored as date in Excel’s internal serial format.
  • CLOB

A special “FOR ORDINALITY” clause (like XMLTABLE or JSON_TABLE’s one) is also available to autogenerate a sequence number.

Each column definition (except for the one qualified with FOR ORDINALITY) may be complemented with an optional “COLUMN” clause to explicitly target a named column in the spreadsheet, instead of relying on the order of the declarations (relative to the range).
Positional and named column definitions cannot be mixed.

For instance :

  "RN"    for ordinality
, "COL1"  number
, "COL2"  varchar2(10)
, "COL3"  varchar2(4000)
, "COL4"  date           format 'YYYY-MM-DD'
, "COL5"  number(10,2)
, "COL6"  varchar2(5)
  "COL1"  number        column 'A'
, "COL2"  varchar2(10)  column 'C'
, "COL3"  clob          column 'D'

 

Examples

Using this sample file :

1- Loading all six columns, starting at cell A2, in order to skip the header :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , ' "SRNO"    number
  7           , "NAME"    varchar2(10)
  8           , "VAL"     number
  9           , "DT"      date
 10           , "SPARE1"  varchar2(6)
 11           , "SPARE2"  varchar2(6)'
 12         , 'A2'
 13         )
 14       ) t
 15  ;

      SRNO NAME              VAL DT                  SPARE1 SPARE2
---------- ---------- ---------- ------------------- ------ ------
         1 LINE-00001 66916.2986 13/10/1923 11:45:52
         2 LINE-00002 96701.3427 05/09/1906 10:12:35
         3 LINE-00003 68778.8698 23/01/1911 09:26:22        OK
         4 LINE-00004  95110.028 03/05/1907 13:52:30        OK
         5 LINE-00005 62561.5708 04/04/1927 18:10:39
         6 LINE-00006 28677.1166 11/07/1923 15:10:59        OK
         7 LINE-00007 16141.0202 20/11/1902 02:02:24
         8 LINE-00008 80362.6256 19/09/1910 14:06:42
         9 LINE-00009 10384.1973 16/07/1902 04:54:12
        10 LINE-00010  5266.9097 08/08/1921 11:51:34
        11 LINE-00011 12513.0679 01/07/1908 21:53:55
        12 LINE-00012 66596.9707 22/03/1913 05:20:10

...

        95 LINE-00095 96274.2193 08/04/1914 22:48:31
        96 LINE-00096  29783.146 06/04/1915 23:49:23
        97 LINE-00097 19857.7661 16/02/1909 21:21:52
        98 LINE-00098 19504.3969 05/12/1917 01:56:05
        99 LINE-00099 98675.8673 05/06/1906 17:41:10
       100 LINE-00100 24288.2885 22/07/1920 13:25:59

100 rows selected.

2- Loading columns B and F only, from rows 2 to 10, with a generated sequence :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , q'{
  7             "R_NUM"   for ordinality
  8           , "NAME"    varchar2(10) column 'B'
  9           , "SPARE2"  varchar2(6)  column 'F'
 10           }'
 11         , '2:10'
 12         )
 13       ) t
 14  ;

     R_NUM NAME       SPARE2
---------- ---------- ------
         1 LINE-00001
         2 LINE-00002
         3 LINE-00003 OK
         4 LINE-00004 OK
         5 LINE-00005
         6 LINE-00006 OK
         7 LINE-00007
         8 LINE-00008
         9 LINE-00009

9 rows selected.

 

30 thoughts on “Oracle SQL – Reading an Excel File (xlsx) as an External Table

  1. This is an excellent piece of work on so many levels. The grammar for mapping types is excellent.

  2. Odie,
    I am trying to use your, Code for a POC, but I am stuck at the point in the oox_install.sql Script where there ina Anonymous Block,
    declare
    res boolean;
    begin
    res := DBMS_XDB.createFolder(‘/office’);
    res := DBMS_XDB.createFolder(‘/office/excel’);
    res := DBMS_XDB.createFolder(‘/office/excel/conf’);
    res := DBMS_XDB.createFolder(‘/office/excel/temp’);
    res := DBMS_XDB.createFolder(‘/office/excel/docs’);
    end;
    /

    I am getting the Following Error
    [Error] Execution (77: 1): ORA-31050: Access denied
    ORA-06512: at “XDB.DBMS_XDB”, line 340
    ORA-06512: at line 4

    I requested my DBA to Provide XDBADMIN to my schema, still the Same.

    What could be wrong.?

  3. Odie,
    Thanks for your Reply, I apologize for posting this Comment in the Wrong window, I had Opened Almost All your XML + Reading Excel file related article pages in my browser.

  4. Hi,

    The solution sounds promising, but I followed your installation steps on GitHub, and I got following compilation errrors for “ExcelTable.pkb”:

    Error(979,3): PL/SQL: Item ignored
    Error(989,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.initialize(java.sql.Blob, java.sql.Blob, java.lang.String, int, int, int) return int” is incomplete or malformed
    Error(992,3): PL/SQL: Item ignored
    Error(995,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.iterate(int, int) return java.sql.Array” is incomplete or malformed
    Error(998,3): PL/SQL: Item ignored
    Error(1000,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.terminate(int)” is incomplete or malformed

    Any ideas/suggestions?

    Thanks

  5. Very useful package, thanks for sharing. I was trying to use as part of an ETL process written in PL/SQL. I have the call to your package within a cursor. When I run the SQL bit as SQL it works fine. But when I try to compile the exact same SQL but in a PL/SQL cursor (cursor xyz is select … from table(exceltable.getrows….) I get ORA-22905, cannot access rows from a non-nested table item. Any clues? Thanks again.

    1. Hi Robert, thanks for your feedback.
      It is a known error (unfortunately) when trying to use table functions returning AnyDataset within PL/SQL.
      Apparently, Oracle is not able to call the ODCITableDescribe routine at compile time and therefore cannot retrieve the projection from the SELECT statement.
      The workaround is to use dynamic SQL :
      OPEN my_refcursor FOR 'SELECT ...' USING <bind variables>

      1. Thanks for feedback – that’s exactly how I got around the issue. Good to know I’m not totally stupid :)

      2. FYI, in ExcelTable 1.4, I’ve added a new function getCursor() with same parameters as getRows() and returning a REF cursor.

  6. The solution seems to have everything we need for our problem. I just have one question. Our Excel files are stored in a table as BLOB. Can you please post an example how getRows() can be used with BLOB?

    1. That’s easy. Just pass your BLOB column to the p_file parameter :

      SELECT ...
      FROM my_table t
         , TABLE(
             ExcelTable.getRows(
               p_file   => t.my_blob_col
             , p_sheet  => 'sheetName'
             , p_cols   => ' ... COLUMNS ... '
             , p_range  => 'A2'
             , p_method => 0
             )
           ) x
      WHERE ...
      ;
  7. Hi, this is excellent work. Could you provide an example using getCursor() fetched into a record variable? Thanks.

  8. Hello. I’m trying to use your code with example 1 to test but I can’t run it. Surely a mistake from my side, but i don’t find the clue …
    Directory created and granted sur my user REF
    file ooxdata3.xlsx exists and test of function getfile is OK

    But this code :

    select t.*
    from table(
    ExcelTable.getRows(
    ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
    , ‘DataSource’
    , ‘ “SRNO” number
    , “NAME” varchar2(10)
    , “VAL” number
    , “DT” date
    , “SPARE1” varchar2(6)
    , “SPARE2” varchar2(6)’
    , ‘A2’
    )
    ) t
    ;

    return a PLS-00225 : line 4 col 6 subprogram of cursor ‘REF’ is out of scope.

    Any idea ?

    1. I’m on oracle 11.2.0.3 x64 and windows 2008 R2 … All is well compiled, I don’t know where to search.

  9. Hello. I’m trying to use your code with example 1 to test but I can’t run it.

    select t.*
    from table(
    ExcelTable.getRows(
    ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
    , ‘DataSource’
    , ‘ “SRNO” number
    , “NAME” varchar2(10)
    , “VAL” number
    , “DT” date
    , “SPARE1” varchar2(6)
    , “SPARE2” varchar2(6)’
    , ‘A2’
    )
    ) t
    ;

    return ORA-00955: name is already used by an existing object al line 1121 in EXCELTABLE object.
    At line 1121 your code tries to create a global temporary table.
    WHat can i do?
    Thanks

  10. Great work. It helped me a lot. Thank you for sharing.
    It’s working nicely for small file for me but
    while inserting a large excel file 200000 rows into oracle table, I getting
    insert /+APPEND/ into REASSIGNMENT_tmp
    *
    ERROR at line 1:
    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00664: VM Node-Stack overflow.
    ORA-06512: at “XDB.DBMS_XSLPROCESSOR”, line 743
    ORA-06512: at “XDB.DBMS_XSLPROCESSOR”, line 771
    ORA-06512: at “HARI.EXCELTABLE”, line 1760
    ORA-06512: at “HARI.EXCELTABLE”, line 1862
    ORA-06512: at “HARI.EXCELTABLE”, line 2295
    ORA-06512: at “HARI.EXCELTABLEIMPL”, line 60
    ORA-06512: at line 1

    Please advice;
    2) I need to commit every 10000 record, what method you suggest ?

    1. For large files, you’ll have to use the streaming read method (requires Java).
      See the README for instructions, and this post.

      About your other question, I’ve never understood the need for any “COMMIT every N record” requirement.
      Anyway, it’s not possible with a single INSERT/SELECT statement, let alone with the APPEND hint.
      I suggest you use the cursor-based approach. See procedure ExcelTable.getCursor() and insert in a loop, then you’ll be able to implement your commit-every stuff.

      Regards,
      Marc.

  11. I am trying to create a materialized view from the select of the table. The direct select works fine, but when I place into the materialized view the it only takes the first column (for each column …

    Has Anyone come across this issue? is there a way to resolve?
    Is there something in my code I could use to fix this?

    In the following example, the Materialized view give the value of the row_number in each column…

    SELECT xl.*
    FROM TABLE (IDATA.PAK_ExcelTable.F_getRows (
    IDATA.PAK_ExcelTable.F_getFile (‘IDATA_DATA’, ‘EMIDB.XLSX’),
    ‘Sheet1’,
    ‘”ROW_NUMBER” for ordinality
    , “OPEN_INCIDENT_FLAG” varchar2(5) column ”A”
    , “BAND_NUMBER” number(10, 0) column ”B”
    , “EMAP_ELIGIBLE_FLAG” varchar2(1) column ”C”
    , “MAIN_CATEGORY” varchar2(5) column ”D”’,
    ‘A2’)) xl;

    ROW_NUMBER OPEN_INCIDENT_FLAG BAND_NUMBER EMAP_ELIGIBLE_FLAG MAIN_CATEGORY

         1 N                          290 Y                  E            
         2 N                          261 Y                  E            
         3 N                          264 Y                  E            
         4 N                          269 Y                  E            
         5 N                          274 Y                  E            
    

    CREATE MATERIALIZED VIEW BB_TEST (ROW_NUMBER,OPEN_INCIDENT_FLAG,BAND_NUMBER,EMAP_ELIGIBLE_FLAG,MAIN_CATEGORY)
    BUILD IMMEDIATE
    REFRESH FORCE ON DEMAND
    WITH PRIMARY KEY
    AS
    /* Formatted on 8/15/2018 11:52:16 AM (QP5 v5.269.14213.34746) /
    SELECT xl.

    FROM TABLE (IDATA.PAK_ExcelTable.F_getRows (
    IDATA.PAK_ExcelTable.F_getFile (‘IDATA_DATA’, ‘EMIDB.XLSX’),
    ‘Sheet1’,
    ‘”ROW_NUMBER” for ordinality
    , “OPEN_INCIDENT_FLAG” varchar2(5) column ”A”
    , “BAND_NUMBER” number(10, 0) column ”B”
    , “EMAP_ELIGIBLE_FLAG” varchar2(1) column ”C”
    , “MAIN_CATEGORY” varchar2(5) column ”D”’,
    ‘A2’)) xl;

    select * from bb_test
    ;
    ROW_NUMBER OPEN_INCIDENT_FLAG BAND_NUMBER EMAP_ELIGIBLE_FLAG MAIN_CATEGORY

         1                  1           1                  1             1
         2                  2           2                  2             2
         3                  3           3                  3             3
         4                  4           4                  4             4
         5                  5           5                  5             5
    
  12. Hello, please help with the error:
    ORA-20722: Error at position 1 : unexpected symbol ” instead of ”
    ORA-06512: на “DWH.EXCELTABLE”, line 1041
    ORA-06512: на “DWH.EXCELTABLE”, line 1414
    ORA-06512: на “DWH.EXCELTABLE”, line 1427
    ORA-06512: на “DWH.EXCELTABLE”, line 1574
    ORA-06512: на “DWH.EXCELTABLE”, line 1585
    ORA-06512: на “DWH.EXCELTABLE”, line 1597
    ORA-06512: на “DWH.EXCELTABLE”, line 2688
    ORA-06512: на “DWH.EXCELTABLEIMPL”, line 16
    ORA-06512: на line 4

    select t.*
    from table(
    ExcelTable.getRows(
    ExcelTable.getFile(‘TMP_DIR’,’ooxdata3.xlsx’)
    , ‘DataSource’
    , ‘ “SRNO” number
    , “NAME” varchar2(10)
    , “VAL” number
    , “DT” date
    , “SPARE1” varchar2(6)
    , “SPARE2” varchar2(6)’
    , ‘A2’
    )
    ) t
    ;

  13. Hello,
    Just found this post and it’s exactly what I for my requirement. However, it seems the Excel file must be on the DB server for the store function to work, correct? How can we load the Excel file that reside on Window client since I don’t have access to the DB server? Thank you in advance,

  14. Hi, sorry for the late reply.
    You are correct, the file must reside in a location the DB server has access to, not necessarily the DB server itself though, could be a NFS (Network File System) for example.
    To address your situation, a classical approach is to mount a shared Windows directory on the DB server, so that end users may deposit files to be read by the DB.
    If such a setup is not possible, the only other option I see is to send the file to the server as a separate step (e.g. SFTP/FTP protocol), or directly load the file in a BLOB in a temp table using SQL*Loader, you’ll then be able to use ExcelTable over the BLOB.

  15. This is really cool Thank you for sharing.

    DATE type columns gave me some trouble initially, but turns out that was related to the header row not being a valid DATE format.

    Is there a good way to have this ignore the row of headers, specifically when the column needs to be DATE type?

  16. I just answered my own question. Adding ,’A2′ at the end of the selection, as you show in an example, will resolve the header situation.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.