Archive

Archive for the ‘Miscellaneous’ Category

ExcelTable 1.3 : support for password-encrypted files

June 5, 2017 3 comments

Here’s the new version of ExcelTable, which can now read password-encrypted files.
It supports Standard and Agile encryption methods, as specified in [MS-OFFCRYPTO].

By default, Office 2007 will encrypt using the Standard method whereas Office 2010 and onwards use Agile encryption.
AES (128 or 256) is usually the default algorithm on standard Office installations.
Because latest Office versions (2013+) make use of SHA-2 hashing algorithms, Oracle 12c is required to read Excel documents encrypted in those versions.

Basically, the only change from ExcelTable 1.2 is the addition of an optional argument p_password in getRows() function :

function getRows (
  p_file     in blob
, p_sheet    in varchar2
, p_cols     in varchar2
, p_range    in varchar2 default null
, p_method   in binary_integer default DOM_READ
, p_password in varchar2 default null
) 
return anydataset pipelined
using ExcelTableImpl;

The following dependencies are also required :
XUTL_CDF
XUTL_OFFCRYPTO

 

Source code available on GitHub :

/mbleron/oracle/ExcelTable

 

A few words about the internals are following…

 

Read more…

TreeBuilder – a PL/SQL graphical tree generator

February 26, 2017 Leave a comment

TreeBuilder computes the set of node coordinates necessary to represent a single-rooted tree in a graphical environment.
Node positioning is implemented using the improved version of Walker’s algorithm, published by Buchheim, J√ľnger and Leipert :
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.8757

Tree data is exposed as a pipelined function.
The API also provides a constructor to visualize the tree as an SVG object.

Source code available on GitHub :

/mbleron/oracle/TreeBuilder

 

Read more…

ExcelTable 1.2 : introducing streaming support for large files

November 15, 2016 1 comment

ExcelTable is my attempt at building a SQL query interface to read MS Excel files in xlsx (or xlsm) format.
Version 1.2 is now available with the following new features :

  • Streaming read support for large Excel files
  • setFetchSize() routine to limit the number of rows processed per request

 
/mbleron/oracle/ExcelTable

 

Streaming read method

The getRows() function has been extended with an additional optional argument p_method.
Allowed values are as follows :

-- Read methods  
DOM_READ     constant binary_integer := 0;
STREAM_READ  constant binary_integer := 1;

The default is 0 (DOM_READ).

The streaming method requires Java (StAX API) and is much more scalable than the DOM_READ method when accessing large files.
Please see the README for more details about what to install depending on the target database version.

Example on a 500,000-row file (bigfile.xlsx) :

select * 
from table(
       ExcelTable.getRows(
         ExcelTable.getFile('TMP_DIR','bigfile.xlsx')
       , 'data'      
       , q'{ 
            "ID"           number
          , "FIRST_NAME"   varchar2(15)
          , "LAST_NAME"    varchar2(20)
          , "EMAIL"        varchar2(30)
          , "GENDER"       varchar2(10)
          , "IP_ADDRESS"   varchar2(16)
          , "COMMENT"      varchar2(4000)
          , "IMAGE"        varchar2(4000)
          , "DT"           date  format 'DD/MM/YYYY'

         }'
       , 'A2'
       , 1
       )
     ) t ;

 

Read more…

Oracle SQL – Reading an Excel File (xlsx) as an External Table

June 21, 2016 12 comments

I’ve been thinking about it for quite a long time and never really had time to implement it, but it’s finally there : a pipelined table interface to read an Excel file (.xlsx) as if it were an external table.

It’s entirely implemented in PL/SQL using an object type (for the ODCI routines) and a package supporting the core functionalities.
Available for download on GitHub :

/mbleron/oracle/ExcelTable

Read more…

XML Namespaces 101

June 7, 2016 1 comment

Back to basics with a focus on XML namespaces.
A lot of people still struggle to use and reference namespaces correctly in XML-related functions, and most often try random combinations until it works correctly.
Hopefully, this post will clear a few things up :)

 

1. What is a namespace?

A namespace is not some exotic object but just one out of the two parts that form a node name.
In the XML Object Model, the node name of an element or attribute is composed of :

  • a namespace URI, i.e. the namespace name
  • a local name

If the namespace uri is absent (null), the node is said to be in no namespace.

 

2. Default namespaces and prefixes

In an XML document or fragment, a namespace can be defined in two ways :

  • namespace binding (prefix) declaration : xmlns:prefix="my-namespace-1"

    The scope is the element where it appears and all its descendant elements and attributes, unless it is redefined using another declaration (e.g. xmlns:prefix="my-namespace-2").
    A binding declaration applies to all qualified (i.e. prefixed) in-scope elements and attributes.

  • default namespace declaration : xmlns="my-default-ns"

    The scope is the element where it appears and all its descendants, unless it is redefined using another declaration (e.g. xmlns="new-default-ns") or undefined using an empty declaration (xmlns="").
    A default namespace declaration applies to all unqualified in-scope elements, but it does not apply to attributes.

Let’s consider a simple example :
Read more…

Yet Another XML Flattening Technique

March 27, 2016 4 comments

As a follow-up to my previous post introducing XMLNest function, here’s now its “inverse” (to borrow from maths terminology) : XMLFlattenDOM, a PL/SQL DOM-based pipelined function.

We’ll see in the last part how this approach compares to the others described earlier :

 

Read more…

XML Flattening revisited : Java-based pipelined function

November 18, 2012 1 comment

As a follow-up to How To : Flatten out an XML Hierarchical Structure, here’s a fourth approach using a pipelined function built over the Java InfosetReader interface.
The function only works on an XMLType column/table stored as binary XML. Since we directly decode the binary stream and pipe the rows to the SQL engine, this method is faster, much more scalable and less memory-intensive.

 

1. Set up

The ODCI set up is based on : Pipelined Table Functions Example: Java Implementation from the Data Cartridge Developer’s Guide.

We start by creating an object type and its collection. These are the structures that will be filled by the Java program at runtime and returned to the SQL engine via the pipelined function :

CREATE TYPE XMLEdgeTableRow AS OBJECT (
  node_id        integer
, node_name      varchar2(2000)
, node_type      varchar2(30)
, parent_node_id integer
, node_value     varchar2(4000)
, namespace_uri  varchar2(2000)
);

CREATE TYPE XMLEdgeTable AS TABLE OF XMLEdgeTableRow;

 

then the implementation type :
Read more…