ExcelTable 1.2 : introducing streaming support for large files

November 15, 2016 1 comment

ExcelTable is my attempt at building a SQL query interface to read MS Excel files in xlsx (or xlsm) format.
Version 1.2 is now available with the following new features :

  • Streaming read support for large Excel files
  • setFetchSize() routine to limit the number of rows processed per request

 
/mbleron/oracle/ExcelTable

 

Streaming read method

The getRows() function has been extended with an additional optional argument p_method.
Allowed values are as follows :

-- Read methods  
DOM_READ     constant binary_integer := 0;
STREAM_READ  constant binary_integer := 1;

The default is 0 (DOM_READ).

The streaming method requires Java (StAX API) and is much more scalable than the DOM_READ method when accessing large files.
Please see the README for more details about what to install depending on the target database version.

Example on a 500,000-row file (bigfile.xlsx) :

select * 
from table(
       ExcelTable.getRows(
         ExcelTable.getFile('TMP_DIR','bigfile.xlsx')
       , 'data'      
       , q'{ 
            "ID"           number
          , "FIRST_NAME"   varchar2(15)
          , "LAST_NAME"    varchar2(20)
          , "EMAIL"        varchar2(30)
          , "GENDER"       varchar2(10)
          , "IP_ADDRESS"   varchar2(16)
          , "COMMENT"      varchar2(4000)
          , "IMAGE"        varchar2(4000)
          , "DT"           date  format 'DD/MM/YYYY'

         }'
       , 'A2'
       , 1
       )
     ) t ;

 

Read more…

How To : Convert Path Enumeration to Adjency List

October 4, 2016 Leave a comment

An Adjency List is a way to describe a finite graph structure by associating each vertex of the graph to each of its neighbours.
For a simple oriented tree, a possible implemention of the Adjency List model is to store a node along with its parent node reference.
In a RDBMS, this is indeed a very common way of storing hierarchical data, such as the Employee-Manager relationship :

SQL> select empno, ename, mgr
  2  from scott.emp;
 
EMPNO ENAME        MGR
----- ---------- -----
 7369 SMITH       7902
 7499 ALLEN       7698
 7521 WARD        7698
 7566 JONES       7839
 7654 MARTIN      7698
 7698 BLAKE       7839
 7782 CLARK       7839
 7788 SCOTT       7566
 7839 KING       
 7844 TURNER      7698
 7876 ADAMS       7788
 7900 JAMES       7698
 7902 FORD        7566
 7934 MILLER      7782

Such a structure can be easily traversed using a CONNECT-BY query, or Recursive Subquery Factoring :

SQL> select ename, ltrim(sys_connect_by_path(ename, '/'), '/') as path
  2  from scott.emp
  3  connect by prior empno = mgr
  4  start with mgr is null
  5  order siblings by empno;
 
ENAME      PATH
---------- -----------------------------------------------------------
KING       KING
JONES      KING/JONES
SCOTT      KING/JONES/SCOTT
ADAMS      KING/JONES/SCOTT/ADAMS
FORD       KING/JONES/FORD
SMITH      KING/JONES/FORD/SMITH
BLAKE      KING/BLAKE
ALLEN      KING/BLAKE/ALLEN
WARD       KING/BLAKE/WARD
MARTIN     KING/BLAKE/MARTIN
TURNER     KING/BLAKE/TURNER
JAMES      KING/BLAKE/JAMES
CLARK      KING/CLARK
MILLER     KING/CLARK/MILLER
 
SQL> with tree_walker (empno, ename, path) as (
  2    select empno, ename, ename
  3    from scott.emp
  4    where mgr is null
  5    union all
  6    select e.empno, e.ename, w.path || '/' || e.ename
  7    from tree_walker w
  8         join scott.emp e on e.mgr = w.empno
  9  )
 10  search depth first by empno set emp_order
 11  select ename, path
 12  from tree_walker;
 
ENAME      PATH
---------- ---------------------------------------------
KING       KING
JONES      KING/JONES
SCOTT      KING/JONES/SCOTT
ADAMS      KING/JONES/SCOTT/ADAMS
FORD       KING/JONES/FORD
SMITH      KING/JONES/FORD/SMITH
BLAKE      KING/BLAKE
ALLEN      KING/BLAKE/ALLEN
WARD       KING/BLAKE/WARD
MARTIN     KING/BLAKE/MARTIN
TURNER     KING/BLAKE/TURNER
JAMES      KING/BLAKE/JAMES
CLARK      KING/CLARK
MILLER     KING/CLARK/MILLER
 

Now, what if we want to – given only a list of paths – generate back an adjency data set?

For example, from this list of leaf paths :

KING/BLAKE/ALLEN
KING/BLAKE/JAMES
KING/BLAKE/MARTIN
KING/BLAKE/TURNER
KING/BLAKE/WARD
KING/CLARK/MILLER
KING/JONES/FORD/SMITH
KING/JONES/SCOTT/ADAMS

to :

  ID NAME     PARENT_ID
---- -------- ---------
   1 KING     
   2 JONES            1
   3 SCOTT            2
   4 ADAMS            3
   5 FORD             2
   6 SMITH            5
   7 BLAKE            1
   8 ALLEN            7
   9 WARD             7
  10 MARTIN           7
  11 TURNER           7
  12 JAMES            7
  13 CLARK            1
  14 MILLER          13

This post presents a way to achieve that result, using a pipelined table function.

 
Read more…

Inside Binary XML

August 27, 2016 Leave a comment

As you might know, Oracle 11g introduced a new format to store and transfer XML content : Binary XML, also known as CSX (Compact Schema-aware XML), which provides optimization in both storage and processing areas.
Binary XML became the default XMLType storage method in version 11.2.0.2.

The purpose of this post is to present a short overview of the format, with a concrete example, as well as some Oracle specifics regarding Binary XML persistent storage in the database.

Although Binary XML has been designed with schema-awareness in mind, I’ll only focus on the non schema-based format.

 

Read more…

PLCalc – a PL/SQL RPN calculator

August 7, 2016 1 comment

PLCalc is a revamped version of my previous work : RPN_UTIL.
The existing code has been refactored and modularized, and now includes the following features :

  1. Extended expression syntax
  2. Support for variable-args functions
  3. Compilation to binary format
  4. Serialization to Presentation MathML format

Read more in the User Guide.

 

Available for download on GitHub :

/mbleron/oracle/PLCalc

 

Categories: PL/SQL, SQL Tags: , , ,

How To : extract LONG column as CLOB using DBUriType

July 20, 2016 Leave a comment

Here’s a short post presenting an easy way to extract a LONG column as a CLOB.
Although the LONG data type was deprecated years ago in favour of CLOB, many Oracle dictionary views still use it.
Adrian Billington’s great article – working with long columns – shows different ways to access LONG columns in SQL, in particular when the size exceeds 32k. One of the solutions uses DBMS_SQL to fetch the LONG value piecewise into a CLOB.

What I’ll describe here is technically based on the same approach but only involves a single built-in function : SYS_DBURIGEN.

 

Read more…

Oracle SQL – Reading an Excel File (xlsx) as an External Table

June 21, 2016 4 comments

I’ve been thinking about it for quite a long time and never really had time to implement it, but it’s finally there : a pipelined table interface to read an Excel file (.xlsx) as if it were an external table.

It’s entirely implemented in PL/SQL using an object type (for the ODCI routines) and a package supporting the core functionalities.
Available for download on GitHub :

/mbleron/oracle/ExcelTable

Read more…

XML Namespaces 101

June 7, 2016 1 comment

Back to basics with a focus on XML namespaces.
A lot of people still struggle to use and reference namespaces correctly in XML-related functions, and most often try random combinations until it works correctly.
Hopefully, this post will clear a few things up :)

 

1. What is a namespace?

A namespace is not some exotic object but just one out of the two parts that form a node name.
In the XML Object Model, the node name of an element or attribute is composed of :

  • a namespace URI, i.e. the namespace name
  • a local name

If the namespace uri is absent (null), the node is said to be in no namespace.

 

2. Default namespaces and prefixes

In an XML document or fragment, a namespace can be defined in two ways :

  • namespace binding (prefix) declaration : xmlns:prefix="my-namespace-1"

    The scope is the element where it appears and all its descendant elements and attributes, unless it is redefined using another declaration (e.g. xmlns:prefix="my-namespace-2").
    A binding declaration applies to all qualified (i.e. prefixed) in-scope elements and attributes.

  • default namespace declaration : xmlns="my-default-ns"

    The scope is the element where it appears and all its descendants, unless it is redefined using another declaration (e.g. xmlns="new-default-ns") or undefined using an empty declaration (xmlns="").
    A default namespace declaration applies to all unqualified in-scope elements, but it does not apply to attributes.

Let’s consider a simple example :
Read more…