Home > Miscellaneous, XML DB > Oracle SQL – Reading an Excel File (xlsx) as an External Table

Oracle SQL – Reading an Excel File (xlsx) as an External Table

I’ve been thinking about it for quite a long time and never really had time to implement it, but it’s finally there : a pipelined table interface to read an Excel file (.xlsx) as if it were an external table.

It’s entirely implemented in PL/SQL using an object type (for the ODCI routines) and a package supporting the core functionalities.
Available for download on GitHub :

/mbleron/oracle/ExcelTable

 

Usage

ExcelTable.getRows table function :

This is the actual SQL interface, to be used in conjunction with the TABLE operator.
It returns an ANYDATASET instance whose structure is defined by the p_cols parameter.

  function getRows (
    p_file   in  blob
  , p_sheet  in  varchar2
  , p_cols   in  varchar2
  , p_range  in  varchar2 default null
  ) 
  return anydataset pipelined
  using ExcelTableImpl;
Arg Data type Desc Mandatory
p_file BLOB Input Excel file in Office Open XML format (.xlsx or .xlsm).
A helper function (ExcelTable.getFile) is available to directly reference the file from a directory.
Yes
p_sheet VARCHAR2 Worksheet name Yes
p_cols VARCHAR2 Column list (see specs below) Yes
p_range VARCHAR2 Excel-like range expression that defines the table boundaries in the worksheet (see specs below) No

 

Range syntax specification

 range_expr ::= ( cell_ref [ ":" cell_ref ] | col_ref ":" col_ref | row_ref ":" row_ref )
 cell_ref   ::= col_ref row_ref
 col_ref    ::= { "A".."Z" }
 row_ref    ::= integer

If the range is empty, the table implicitly starts at cell A1.

Otherwise, there are four ways to specify the table range :

  • Range of rows : '1:100'
    In this case the range of columns implicitly starts at A.
  • Range of columns : 'B:E'
    In this case the range of rows implicitly starts at 1.
  • Range of cells (top-left to bottom-right) : 'B2:F150'
  • Single cell anchor (top-left cell) : 'C3'

 

Columns syntax specification

The syntax is similar to the column list in a CREATE TABLE statement, with a couple of specifics :

 column_list    ::= column_expr { "," column_expr }
 column_expr    ::= ( identifier datatype [ "column" string_literal ] | identifier for_ordinality )
 datatype       ::= ( number_expr | varchar2_expr | date_expr | clob_expr )
 number_expr    ::= "number" [ "(" ( integer | "*" ) [ "," [ "-" ] integer ] ")" ]
 varchar2_expr  ::= "varchar2" "(" integer [ "char" | "byte" ] ")"
 date_expr      ::= "date" [ "format" string_literal ]
 clob_expr      ::= "clob"
 for_ordinality ::= "for" "ordinality"
 identifier     ::= "\"" { char } "\""
 string_literal ::= "'" { char } "'"

Column names must be declared using a quoted identifier.

Supported data types are :

  • NUMBER – with optional precision and scale specs
  • VARCHAR2 – including CHAR/BYTE semantics
    Values larger than the maximum length declared are silently truncated and no error is reported.
  • DATE – with optional format mask
    The format mask is used if the value is stored as text in the spreadsheet, otherwise the date value is assumed to be stored as date in Excel’s internal serial format.
  • CLOB

A special “FOR ORDINALITY” clause (like XMLTABLE or JSON_TABLE’s one) is also available to autogenerate a sequence number.

Each column definition (except for the one qualified with FOR ORDINALITY) may be complemented with an optional “COLUMN” clause to explicitly target a named column in the spreadsheet, instead of relying on the order of the declarations (relative to the range).
Positional and named column definitions cannot be mixed.

For instance :

  "RN"    for ordinality
, "COL1"  number
, "COL2"  varchar2(10)
, "COL3"  varchar2(4000)
, "COL4"  date           format 'YYYY-MM-DD'
, "COL5"  number(10,2)
, "COL6"  varchar2(5)
  "COL1"  number        column 'A'
, "COL2"  varchar2(10)  column 'C'
, "COL3"  clob          column 'D'

 

Examples

Using this sample file :

1- Loading all six columns, starting at cell A2, in order to skip the header :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , ' "SRNO"    number
  7           , "NAME"    varchar2(10)
  8           , "VAL"     number
  9           , "DT"      date
 10           , "SPARE1"  varchar2(6)
 11           , "SPARE2"  varchar2(6)'
 12         , 'A2'
 13         )
 14       ) t
 15  ;

      SRNO NAME              VAL DT                  SPARE1 SPARE2
---------- ---------- ---------- ------------------- ------ ------
         1 LINE-00001 66916.2986 13/10/1923 11:45:52
         2 LINE-00002 96701.3427 05/09/1906 10:12:35
         3 LINE-00003 68778.8698 23/01/1911 09:26:22        OK
         4 LINE-00004  95110.028 03/05/1907 13:52:30        OK
         5 LINE-00005 62561.5708 04/04/1927 18:10:39
         6 LINE-00006 28677.1166 11/07/1923 15:10:59        OK
         7 LINE-00007 16141.0202 20/11/1902 02:02:24
         8 LINE-00008 80362.6256 19/09/1910 14:06:42
         9 LINE-00009 10384.1973 16/07/1902 04:54:12
        10 LINE-00010  5266.9097 08/08/1921 11:51:34
        11 LINE-00011 12513.0679 01/07/1908 21:53:55
        12 LINE-00012 66596.9707 22/03/1913 05:20:10

...

        95 LINE-00095 96274.2193 08/04/1914 22:48:31
        96 LINE-00096  29783.146 06/04/1915 23:49:23
        97 LINE-00097 19857.7661 16/02/1909 21:21:52
        98 LINE-00098 19504.3969 05/12/1917 01:56:05
        99 LINE-00099 98675.8673 05/06/1906 17:41:10
       100 LINE-00100 24288.2885 22/07/1920 13:25:59

100 rows selected.

2- Loading columns B and F only, from rows 2 to 10, with a generated sequence :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , q'{
  7             "R_NUM"   for ordinality
  8           , "NAME"    varchar2(10) column 'B'
  9           , "SPARE2"  varchar2(6)  column 'F'
 10           }'
 11         , '2:10'
 12         )
 13       ) t
 14  ;

     R_NUM NAME       SPARE2
---------- ---------- ------
         1 LINE-00001
         2 LINE-00002
         3 LINE-00003 OK
         4 LINE-00004 OK
         5 LINE-00005
         6 LINE-00006 OK
         7 LINE-00007
         8 LINE-00008
         9 LINE-00009

9 rows selected.

 

Advertisements
  1. paulzip
    July 1, 2016 at 19:12

    This is an excellent piece of work on so many levels. The grammar for mapping types is excellent.

  2. Anonymous
    October 24, 2016 at 20:48

    Odie,
    I am trying to use your, Code for a POC, but I am stuck at the point in the oox_install.sql Script where there ina Anonymous Block,
    declare
    res boolean;
    begin
    res := DBMS_XDB.createFolder(‘/office’);
    res := DBMS_XDB.createFolder(‘/office/excel’);
    res := DBMS_XDB.createFolder(‘/office/excel/conf’);
    res := DBMS_XDB.createFolder(‘/office/excel/temp’);
    res := DBMS_XDB.createFolder(‘/office/excel/docs’);
    end;
    /

    I am getting the Following Error
    [Error] Execution (77: 1): ORA-31050: Access denied
    ORA-06512: at “XDB.DBMS_XDB”, line 340
    ORA-06512: at line 4

    I requested my DBA to Provide XDBADMIN to my schema, still the Same.

    What could be wrong.?

  3. Anand
    October 25, 2016 at 16:53

    Odie,
    Thanks for your Reply, I apologize for posting this Comment in the Wrong window, I had Opened Almost All your XML + Reading Excel file related article pages in my browser.

  4. Anonymous
    March 8, 2017 at 18:09

    Hi,

    The solution sounds promising, but I followed your installation steps on GitHub, and I got following compilation errrors for “ExcelTable.pkb”:

    Error(979,3): PL/SQL: Item ignored
    Error(989,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.initialize(java.sql.Blob, java.sql.Blob, java.lang.String, int, int, int) return int” is incomplete or malformed
    Error(992,3): PL/SQL: Item ignored
    Error(995,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.iterate(int, int) return java.sql.Array” is incomplete or malformed
    Error(998,3): PL/SQL: Item ignored
    Error(1000,3): PLS-00311: the declaration of “db.office.spreadsheet.ReadContext.terminate(int)” is incomplete or malformed

    Any ideas/suggestions?

    Thanks

    • March 11, 2017 at 12:12

      What’s your database version? (select * from v$version)

  5. Robert Chalton
    May 30, 2017 at 18:07

    Very useful package, thanks for sharing. I was trying to use as part of an ETL process written in PL/SQL. I have the call to your package within a cursor. When I run the SQL bit as SQL it works fine. But when I try to compile the exact same SQL but in a PL/SQL cursor (cursor xyz is select … from table(exceltable.getrows….) I get ORA-22905, cannot access rows from a non-nested table item. Any clues? Thanks again.

    • May 31, 2017 at 11:14

      Hi Robert, thanks for your feedback.
      It is a known error (unfortunately) when trying to use table functions returning AnyDataset within PL/SQL.
      Apparently, Oracle is not able to call the ODCITableDescribe routine at compile time and therefore cannot retrieve the projection from the SELECT statement.
      The workaround is to use dynamic SQL :
      OPEN my_refcursor FOR 'SELECT ...' USING <bind variables>

      • Robert Chalton
        June 7, 2017 at 16:52

        Thanks for feedback – that’s exactly how I got around the issue. Good to know I’m not totally stupid :)

      • June 12, 2017 at 20:52

        FYI, in ExcelTable 1.4, I’ve added a new function getCursor() with same parameters as getRows() and returning a REF cursor.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s