Archive

Archive for the ‘SQL’ Category

ExcelTable 2.2 : support for OpenDocument (ODF) files

July 7, 2018 Leave a comment

Here’s the latest addition to ExcelTable : support for OpenDocument spreadsheet files (ODF).

The OpenDocument standard has been widely adopted by a great number of organizations.
In France, for instance, ODF is now the recommended standard for office documents in public administrations, and some sectors have already completed their migration from Microsoft Office products (using Open Office XML Standard) to LibreOffice suite.

ODF, like OOXML (.xlsx, .xlsb, .docx etc.), is a container-based format (ZIP archive) composed of different parts, mainly XML documents, describing the document structure and data.
The comparison stops there though as ODF is much more simple to understand and manipulate from a programmatic point of view.
Here’s an overview of the container structure :

Everything we need to read the document resides in the Manifest part (META-INF/manifest.xml), which is used to determine if the document is encrypted (I’ll get to that later), and the content.xml part, which contains all document structure and data.
The mimetype part may also be read to check which type of document we are dealing with (spreadsheet, text, presentation, etc.).
For instance, a spreadsheet file (.ods) has this MIME type :

application/vnd.oasis.opendocument.spreadsheet

The apparent simplicity also comes with some disavantages :

  • The content part is globally more verbose than OOXML, with longer tag names and namespace prefixes.
  • String values are not shared across the spreadsheet.
  • Rows and cells are not indexed by their respective row numbers and column references, which makes range-based access of the content more difficult.

As an example, here’s the content part for a three-row spreadsheet table :

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2">
  <office:scripts/>
  <office:font-face-decls>
    <style:font-face style:name="Arial1" svg:font-family="Arial"/>
    <style:font-face style:name="Arial" svg:font-family="Arial" style:font-family-generic="swiss"/>
    <style:font-face style:name="Calibri" svg:font-family="Calibri" style:font-family-generic="swiss"/>
    <style:font-face style:name="Cambria" svg:font-family="Cambria" style:font-family-generic="swiss"/>
    <style:font-face style:name="Microsoft Sans Serif" svg:font-family="'Microsoft Sans Serif'" style:font-family-generic="swiss"/>
    <style:font-face style:name="Liberation Sans" svg:font-family="'Liberation Sans'" style:font-family-generic="swiss" style:font-pitch="variable"/>
    <style:font-face style:name="Microsoft YaHei" svg:font-family="'Microsoft YaHei'" style:font-family-generic="system" style:font-pitch="variable"/>
    <style:font-face style:name="Segoe UI" svg:font-family="'Segoe UI'" style:font-family-generic="system" style:font-pitch="variable"/>
    <style:font-face style:name="Tahoma" svg:font-family="Tahoma" style:font-family-generic="system" style:font-pitch="variable"/>
  </office:font-face-decls>
  <office:automatic-styles>
    <style:style style:name="co1" style:family="table-column">
      <style:table-column-properties fo:break-before="auto" style:column-width="16.55mm"/>
    </style:style>
    <style:style style:name="co2" style:family="table-column">
      <style:table-column-properties fo:break-before="auto" style:column-width="56.18mm"/>
    </style:style>
    <style:style style:name="co3" style:family="table-column">
      <style:table-column-properties fo:break-before="auto" style:column-width="21.77mm"/>
    </style:style>
    <style:style style:name="ro1" style:family="table-row">
      <style:table-row-properties style:row-height="4.52mm" fo:break-before="auto" style:use-optimal-row-height="true"/>
    </style:style>
    <style:style style:name="ro2" style:family="table-row">
      <style:table-row-properties style:row-height="8.04mm" fo:break-before="auto" style:use-optimal-row-height="false"/>
    </style:style>
    <style:style style:name="ta1" style:family="table" style:master-page-name="PageStyle_5f_DataSource">
      <style:table-properties table:display="true" style:writing-mode="lr-tb"/>
    </style:style>
    <number:text-style style:name="N100">
      <number:text-content/>
    </number:text-style>
    <style:style style:name="ce1" style:family="table-cell" style:parent-style-name="Default" style:data-style-name="N0">
      <style:table-cell-properties style:rotation-align="none"/>
      <style:text-properties fo:color="#000000" style:text-outline="false" style:text-line-through-style="none" style:text-line-through-type="none" style:font-name="Microsoft Sans Serif" fo:font-size="10pt" fo:font-style="normal" fo:text-shadow="none" style:text-underline-style="none" fo:font-weight="normal" style:font-size-asian="10pt" style:font-style-asian="normal" style:font-weight-asian="normal" style:font-name-complex="Microsoft Sans Serif" style:font-size-complex="10pt" style:font-style-complex="normal" style:font-weight-complex="normal"/>
    </style:style>
    <style:style style:name="ce2" style:family="table-cell" style:parent-style-name="Default" style:data-style-name="N0">
      <style:table-cell-properties style:rotation-align="none"/>
    </style:style>
    <style:style style:name="ce3" style:family="table-cell" style:parent-style-name="Default" style:data-style-name="N100">
      <style:table-cell-properties style:rotation-align="none"/>
      <style:text-properties fo:color="#000000" style:text-outline="false" style:text-line-through-style="none" style:text-line-through-type="none" style:font-name="Microsoft Sans Serif" fo:font-size="10pt" fo:font-style="normal" fo:text-shadow="none" style:text-underline-style="none" fo:font-weight="normal" style:font-size-asian="10pt" style:font-style-asian="normal" style:font-weight-asian="normal" style:font-name-complex="Microsoft Sans Serif" style:font-size-complex="10pt" style:font-style-complex="normal" style:font-weight-complex="normal"/>
    </style:style>
  </office:automatic-styles>
  <office:body>
    <office:spreadsheet>
      <table:calculation-settings table:case-sensitive="false" table:automatic-find-labels="false" table:use-regular-expressions="false" table:use-wildcards="true">
        <table:iteration table:maximum-difference="0.0001"/>
      </table:calculation-settings>
      <table:table table:name="DataSource" table:style-name="ta1">
        <office:forms form:automatic-focus="false" form:apply-design-mode="false"/>
        <table:table-column table:style-name="co1" table:default-cell-style-name="ce2"/>
        <table:table-column table:style-name="co2" table:default-cell-style-name="ce2"/>
        <table:table-row table:style-name="ro1">
          <table:table-cell table:style-name="ce1" office:value-type="float" office:value="1" calcext:value-type="float">
            <text:p>1</text:p>
          </table:table-cell>
          <table:table-cell table:style-name="ce3" office:value-type="string" calcext:value-type="string">
            <text:p>LINE-00001</text:p>
          </table:table-cell>
        </table:table-row>
        <table:table-row table:style-name="ro2">
          <table:table-cell table:style-name="ce1" office:value-type="float" office:value="2" calcext:value-type="float">
            <text:p>2</text:p>
          </table:table-cell>
          <table:table-cell table:style-name="ce3" office:value-type="string" calcext:value-type="string">
            <text:p>LINE-00002</text:p>
          </table:table-cell>
        </table:table-row>
        <table:table-row table:style-name="ro1">
          <table:table-cell table:style-name="ce1" office:value-type="float" office:value="3" calcext:value-type="float">
            <text:p>3</text:p>
          </table:table-cell>
          <table:table-cell table:style-name="ce3" office:value-type="string" calcext:value-type="string">
            <text:p>LINE-00003</text:p>
          </table:table-cell>
        </table:table-row>
      </table:table>
      <table:named-expressions/>
    </office:spreadsheet>
  </office:body>
</office:document-content>

 

For now, the ODF reader in ExcelTable is implemented using DOM interface only. So, the p_method argument of the getRows() function is ignored.
All other existing features are available, for instance :

  • Range-based access of data
  • Cell comments extraction
  • CLOB projection for large cell content
  • Support for password-protected documents

Here’s a sample query based on document test01.ods :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','test01.ods')
  5         , 'Feuille1'
  6         , q'{
  7             "SRNO"  number         column 'A'
  8           , "VAL"   number         column 'B'
  9           , "DT"    timestamp      column 'C'
 10           , "TXT1"  clob           column 'D'
 11           , "TXT2"  varchar2(1000) column 'E'
 12           , "RN"    for ordinality
 13           }'
 14         , '7:8'
 15         )
 16       ) t
 17  ;
 
      SRNO        VAL DT                             TXT1                            TXT2               RN
---------- ---------- ------------------------------ ------------------------------- ---------- ----------
         4 9999999999 10-JUN-18 03.20.37.000000 PM   €€€€€€€€€€€€€€€€€€€€€€€€€€ABC                       1
         5       -123 11-JUN-18 03.20.37.120000 PM                                   SPARE1              2
 

 

As usual, the source code, documentation and samples are available on GitHub :

/mbleron/ExcelTable
/mbleron/MSUtilities/OfficeCrypto
 

In the remaining of the post, I’ll focus on encryption methods used in password-protected ODF files.

Read more…

Advertisements

How To : Load JSON Data into Multiple Tables

June 20, 2018 Leave a comment

Lately, I’ve been revisiting old posts about miscellaneous XML techniques to see if they could be adapted to similar situations involving JSON data.

One of them is an approach to load a multi-level document into multiple tables :
How To : Load XML data into multiple tables

Here’s the same approach applied to JSON data.
 

Set up

For this test case, I’ll build a JSON document containing the famous HR hierarchy, starting from locations down to employees :

insert into json_documents (id, doc)
select 2
     , json_arrayagg(
         json_object(
           'id' value l.location_id
         , 'address' value l.street_address
         , 'city' value l.city
         , 'country' value l.country_id
         , 'departments' value (
             select json_arrayagg(
                      json_object(
                        'id' value d.department_id
                      , 'name' value d.department_name
                      , 'manager' value d.manager_id
                      , 'employees' value (
                          select json_arrayagg(
                                   json_object(
                                     'id' value e.employee_id
                                   , 'lastname' value e.last_name
                                   , 'hiredate' value e.hire_date
                                   , 'salary' value e.salary
                                   )
                                   order by e.last_name
                                 )
                          from hr.employees e
                          where e.department_id = d.department_id
                        )
                        absent on null
                      )
                      order by d.department_id
                    )
             from hr.departments d
             where d.location_id = l.location_id
           )
           absent on null         
         )
         returning clob
       ) 
       as result
from hr.locations l
;

Below is an excerpt of the generated content (manually formatted for clarity) :

[
  {"id":1000,"address":"1297 Via Cola di Rie","city":"Roma","country":"IT"},
  {"id":1100,"address":"93091 Calle della Testa","city":"Venice","country":"IT"},
  {"id":1200,"address":"2017 Shinjuku-ku","city":"Tokyo","country":"JP"},
  {"id":1300,"address":"9450 Kamiya-cho","city":"Hiroshima","country":"JP"},
  {
    "id":1400,
    "address":"2014 Jabberwocky Rd",
    "city":"Southlake",
    "country":"US",
    "departments":[
      {
        "id":60,
        "name":"IT",
        "manager":103,
        "employees":[
          {"id":105,"lastname":"Austin","hiredate":"1997-06-25T00:00:00","salary":4800},
          {"id":104,"lastname":"Ernst","hiredate":"1991-05-21T00:00:00","salary":6000},
          {"id":103,"lastname":"Hunold","hiredate":"1990-01-03T00:00:00","salary":9000},
          {"id":107,"lastname":"Lorentz","hiredate":"1999-02-07T00:00:00","salary":4200},
          {"id":106,"lastname":"Pataballa","hiredate":"1998-02-05T00:00:00","salary":4800}
        ]
      }
    ]
  },
  {
    "id":1500,
    "address":"2011 Interiors Blvd",
    "city":"South San Francisco",
    "country":"US",
    "departments":[
      ...
    ]
  },
  ...
]

The JSON document possesses three logical levels of entities organized in a master-details structure : locations > departments > employees.

The goal is now to load each entity back into its own table :

create table my_locations as 
select location_id, street_address, city, country_id from hr.locations where 1 = 0;

create table my_departments as
select department_id, department_name, manager_id from hr.departments where 1 = 0;

create table my_employees as 
select employee_id, last_name, hire_date, salary from hr.employees where 1 = 0;

 

Example

Just like the technique referenced in introduction using chained XMLTABLEs, we can follow the same logic here using JSON_TABLE and NESTED PATH clauses to flatten out all levels :

select x.location_id
     , x.street_address
     , x.city
     , x.country_id
     , nvl(x.dept_rnum, 1) as dept_rnum
     , x.department_id
     , x.department_name
     , x.manager_id
     , nvl(x.emp_rnum, 1) as emp_rnum
     , x.employee_id
     , x.last_name
     , x.hire_date
     , x.salary
from json_documents t
   , json_table(t.doc,
       '$[*]'
       columns location_id     number(4)    path '$.id'
             , street_address  varchar2(40) path '$.address'
             , city            varchar2(30) path '$.city'
             , country_id      varchar2(2)  path '$.country'
             , nested path '$.departments[*]'
               columns (
                 dept_rnum        for ordinality
               , department_id    number(4)    path '$.id'
               , department_name  varchar2(30) path '$.name'
               , manager_id       number(6)    path '$.manager'
               , nested path '$.employees[*]'
                 columns (
                   emp_rnum     for ordinality
                 , employee_id  number(6)    path '$.id'
                 , last_name    varchar2(25) path '$.lastname'
                 , hire_date    date         path '$.hiredate'
                 , salary       number(8,2)  path '$.salary'
                 )
               )
     ) x
where t.id = 2
;
 
LOCATION_ID STREET_ADDRESS           CITY                  COUNTRY_ID  DEPT_RNUM DEPARTMENT_ID DEPARTMENT_NAME  MANAGER_ID   EMP_RNUM EMPLOYEE_ID LAST_NAME   HIRE_DATE    SALARY
----------- ------------------------ --------------------- ---------- ---------- ------------- ---------------- ---------- ---------- ----------- ----------- ----------- -------
       1000 1297 Via Cola di Rie     Roma                  IT                  1                                                    1                                     
       1100 93091 Calle della Testa  Venice                IT                  1                                                    1                                     
       1200 2017 Shinjuku-ku         Tokyo                 JP                  1                                                    1                                     
       1300 9450 Kamiya-cho          Hiroshima             JP                  1                                                    1                                     
       1400 2014 Jabberwocky Rd      Southlake             US                  1            60 IT                      103          1         105 Austin      25/06/1997     4800
       1400 2014 Jabberwocky Rd      Southlake             US                  1            60 IT                      103          2         104 Ernst       21/05/1991     6000
       1400 2014 Jabberwocky Rd      Southlake             US                  1            60 IT                      103          3         103 Hunold      03/01/1990     9000
       1400 2014 Jabberwocky Rd      Southlake             US                  1            60 IT                      103          4         107 Lorentz     07/02/1999     4200
       1400 2014 Jabberwocky Rd      Southlake             US                  1            60 IT                      103          5         106 Pataballa   05/02/1998     4800
       1500 2011 Interiors Blvd      South San Francisco   US                  1            50 Shipping                121          1         130 Atkinson    30/10/1997     2800
       1500 2011 Interiors Blvd      South San Francisco   US                  1            50 Shipping                121          2         192 Bell        04/02/1996     4000
       1500 2011 Interiors Blvd      South San Francisco   US                  1            50 Shipping                121          3         129 Bissot      20/08/1997     3300
 ...
 

Now we just have to insert this data set into MY_LOCATIONS, MY_DEPARTMENTS and MY_EMPLOYEES using a multitable INSERT statement.
In order to load each location only once, we’ll only take those rows having DEPT_RNUM = 1 and EMP_RNUM = 1.
Likewise for departments with EMP_RNUM = 1, and excluding rows with actually no departement data (DEPARTMENT_ID IS NOT NULL).
Finally, employees data will be loaded from rows where EMPLOYEE_ID is not NULL :

insert all 
when dept_rnum = 1 and emp_rnum = 1 
     then into my_locations (location_id, street_address, city, country_id)
                     values (location_id, street_address, city, country_id)
when department_id is not null and emp_rnum = 1
     then into my_departments (department_id, department_name, manager_id)
                       values (department_id, department_name, manager_id)
when employee_id is not null
     then into my_employees (employee_id, last_name, hire_date, salary)
                     values (employee_id, last_name, hire_date, salary)
select x.location_id
     , x.street_address
     , x.city
     , x.country_id
     , nvl(x.dept_rnum, 1) as dept_rnum
     , x.department_id
     , x.department_name
     , x.manager_id
     , nvl(x.emp_rnum, 1) as emp_rnum
     , x.employee_id
     , x.last_name
     , x.hire_date
     , x.salary
from json_documents t
   , json_table(t.doc,
       '$[*]'
       columns location_id     number(4)    path '$.id'
             , street_address  varchar2(40) path '$.address'
             , city            varchar2(30) path '$.city'
             , country_id      varchar2(2)  path '$.country'
             , nested path '$.departments[*]'
               columns (
                 dept_rnum        for ordinality
               , department_id    number(4)    path '$.id'
               , department_name  varchar2(30) path '$.name'
               , manager_id       number(6)    path '$.manager'
               , nested path '$.employees[*]'
                 columns (
                   emp_rnum     for ordinality
                 , employee_id  number(6)    path '$.id'
                 , last_name    varchar2(25) path '$.lastname'
                 , hire_date    date         path '$.hiredate'
                 , salary       number(8,2)  path '$.salary'
                 )
               )         
     ) x
where t.id = 2
;

156 rows inserted.
 

SQL> select * from my_locations;
 
LOCATION_ID STREET_ADDRESS                           CITY                           COUNTRY_ID
----------- ---------------------------------------- ------------------------------ ----------
       1000 1297 Via Cola di Rie                     Roma                           IT
       1100 93091 Calle della Testa                  Venice                         IT
       1200 2017 Shinjuku-ku                         Tokyo                          JP
       1300 9450 Kamiya-cho                          Hiroshima                      JP
       1400 2014 Jabberwocky Rd                      Southlake                      US
       1500 2011 Interiors Blvd                      South San Francisco            US
...
       2800 Rua Frei Caneca 1360                     Sao Paulo                      BR
       2900 20 Rue des Corps-Saints                  Geneva                         CH
       3000 Murtenstrasse 921                        Bern                           CH
       3100 Pieter Breughelstraat 837                Utrecht                        NL
       3200 Mariano Escobedo 9991                    Mexico City                    MX
 
23 rows selected.
 
SQL> select * from my_departments;
 
DEPARTMENT_ID DEPARTMENT_NAME                MANAGER_ID
------------- ------------------------------ ----------
           60 IT                                    103
           50 Shipping                              121
           10 Administration                        200
           30 Purchasing                            114
           90 Executive                             100
          100 Finance                               108
          110 Accounting                            205
          120 Treasury                       
...                  
          270 Payroll                        
           20 Marketing                             201
           40 Human Resources                       203
           80 Sales                                 145
           70 Public Relations                      204
 
27 rows selected.
 
SQL> select * from my_employees;
 
EMPLOYEE_ID LAST_NAME                 HIRE_DATE       SALARY
----------- ------------------------- ----------- ----------
        105 Austin                    25/06/1997     4800.00
        104 Ernst                     21/05/1991     6000.00
        103 Hunold                    03/01/1990     9000.00
        107 Lorentz                   07/02/1999     4200.00
        106 Pataballa                 05/02/1998     4800.00
        130 Atkinson                  30/10/1997     2800.00
        192 Bell                      04/02/1996     4000.00
...
        159 Smith                     10/03/1997     8000.00
        171 Smith                     23/02/1999     7400.00
        157 Sully                     04/03/1996     9500.00
        176 Taylor                    24/03/1998     8600.00
        150 Tucker                    30/01/1997    10000.00
        155 Tuvault                   23/11/1999     7000.00
        162 Vishney                   11/11/1997    10500.00
        149 Zlotkey                   29/01/2000    10500.00
        204 Baer                      07/06/1994    10000.00
 
106 rows selected.
 

23 + 27 + 106 = 156
All rows accounted for :)

 

Hierarchical JSON Aggregation : the JSONNest function

June 17, 2018 Leave a comment

This post echoes one of my previous work on the topic, this time applied to JSON data.

The idea is to build a user-defined aggregate function to automatically nest a hierarchically-ordered set of JSON objects, and return a single JSON document containing the whole hierarchy.

For example, from this adjacency list :

        ID  PARENT_ID NAME
---------- ---------- ------
         1            Item_1
         2          1 Item_2
         3          1 Item_3
         4          3 Item_4
         5          4 Item_5

to :

{
  "id": 1,
  "name": "Item_1",
  "children": [
    {
      "id": 2,
      "name": "Item_2"
    },
    {
      "id": 3,
      "name": "Item_3",
      "children": [
        {
          "id": 4,
          "name": "Item_4",
          "children": [
            {
              "id": 5,
              "name": "Item_5"
            }
          ]
        }
      ]
    }
  ]
}

 

The implementation I’ll present here requires the following :

  • An object type to hold the definition of each node (JSONNestItem)
  • A PL/SQL package for the core routines and context maintenance (JSONNestUtil)
  • A object type for ODCI routines (JSONNestImpl)
  • The aggregate function (JSONNest)

Source code available on GitHub :

/mbleron/oracle/JSONNest

Installation :

@JSONNestItem.tps
@JSONNestUtil.pks
@JSONNestUtil.pkb
@JSONNestImpl.tps
@JSONNestImpl.tpb
@JSONNest.fnc

 

Implementation

JSONNest requires at least Oracle 12.2.
It uses PL/SQL JSON object types introduced in that release, allowing fine-grained manipulation of JSON data.
The implementation is a little more complicated than expected because of a bug (I reported) in 12.2, related to accessing JSON objects by reference :
ORA-7445 Or Corrupted Result When Nesting PL/SQL JSON Objects (Doc ID 2361514.1)

The bug is announced to be fixed in 18c (18.1), but I don’t have the opportunity to check yet.
The workaround requires maintaining a stack of JSON objects. When I get to work on a 18c instance, I’ll rewrite the code to use object references instead.

The function :

function JSONNest (input in JSONNestItem) 
return clob
aggregate using JSONNestImpl;

where JSONNestItem is an object type describing a node of the hierarchy :

create or replace type JSONNestItem as object (
  item_level    integer
, json_content  varchar2(32767)
, wrapper       varchar2(4000)
)
  • item_level : node level in the hierarchy, typically derived from LEVEL pseudocolumn in a CONNECT-BY query, or computed in a recursive CTE.
  • json_content : actual JSON content of this node. Must be a JSON object.
  • wrapper : name of the field containing the array of child nodes.

 

Example

I’ll use the usual HR.EMPLOYEES table to generate the hierarchy of employees :

select JSONNest( 
         JSONNestItem(
           level
         , json_object(
             'id' value employee_id
           , 'name' value first_name || ' ' ||last_name
           )
        , 'employees'
        )
      ) as result
from hr.employees
connect by prior employee_id = manager_id
start with manager_id is null
order siblings by last_name
;

Output (formatted) :

{
  "id": 100,
  "name": "Steven King",
  "employees": [
    {
      "id": 101,
      "name": "Neena Kochhar",
      "employees": [
        {
          "id": 108,
          "name": "Nancy Greenberg",
          "employees": [
            {
              "id": 109,
              "name": "Daniel Faviet"
            },
            {
              "id": 110,
              "name": "John Chen"
            },
            {
              "id": 111,
              "name": "Ismael Sciarra"
            },
            {
              "id": 112,
              "name": "Jose Manuel Urman"
            },
            {
              "id": 113,
              "name": "Luis Popp"
            }
          ]
        },
        {
          "id": 200,
          "name": "Jennifer Whalen"
        },
        {
          "id": 203,
          "name": "Susan Mavris"
        },
        {
          "id": 204,
          "name": "Hermann Baer"
        },
        {
          "id": 205,
          "name": "Shelley Higgins",
          "employees": [
            {
              "id": 206,
              "name": "William Gietz"
            }
          ]
        }
      ]
    },
    {
      "id": 102,
      "name": "Lex De Haan",
      "employees": [
        {
          "id": 103,
          "name": "Alexander Hunold",
          "employees": [
            {
              "id": 104,
              "name": "Bruce Ernst"
            },
            {
              "id": 105,
              "name": "David Austin"
            },
            {
              "id": 106,
              "name": "Valli Pataballa"
            },
            {
              "id": 107,
              "name": "Diana Lorentz"
            }
          ]
        }
      ]
    },
    {
      "id": 114,
      "name": "Den Raphaely",
      "employees": [
        {
          "id": 115,
          "name": "Alexander Khoo"
        },
        {
          "id": 116,
          "name": "Shelli Baida"
        },
        {
          "id": 117,
          "name": "Sigal Tobias"
        },
        {
          "id": 118,
          "name": "Guy Himuro"
        },
        {
          "id": 119,
          "name": "Karen Colmenares"
        }
      ]
    },
    {
      "id": 120,
      "name": "Matthew Weiss",
      "employees": [
        {
          "id": 125,
          "name": "Julia Nayer"
        },
        {
          "id": 126,
          "name": "Irene Mikkilineni"
        },
        {
          "id": 127,
          "name": "James Landry"
        },
        {
          "id": 128,
          "name": "Steven Markle"
        },
        {
          "id": 180,
          "name": "Winston Taylor"
        },
        {
          "id": 181,
          "name": "Jean Fleaur"
        },
        {
          "id": 182,
          "name": "Martha Sullivan"
        },
        {
          "id": 183,
          "name": "Girard Geoni"
        }
      ]
    },
    {
      "id": 121,
      "name": "Adam Fripp",
      "employees": [
        {
          "id": 129,
          "name": "Laura Bissot"
        },
        {
          "id": 130,
          "name": "Mozhe Atkinson"
        },
        {
          "id": 131,
          "name": "James Marlow"
        },
        {
          "id": 132,
          "name": "TJ Olson"
        },
        {
          "id": 184,
          "name": "Nandita Sarchand"
        },
        {
          "id": 185,
          "name": "Alexis Bull"
        },
        {
          "id": 186,
          "name": "Julia Dellinger"
        },
        {
          "id": 187,
          "name": "Anthony Cabrio"
        }
      ]
    },
    {
      "id": 122,
      "name": "Payam Kaufling",
      "employees": [
        {
          "id": 133,
          "name": "Jason Mallin"
        },
        {
          "id": 134,
          "name": "Michael Rogers"
        },
        {
          "id": 135,
          "name": "Ki Gee"
        },
        {
          "id": 136,
          "name": "Hazel Philtanker"
        },
        {
          "id": 188,
          "name": "Kelly Chung"
        },
        {
          "id": 189,
          "name": "Jennifer Dilly"
        },
        {
          "id": 190,
          "name": "Timothy Gates"
        },
        {
          "id": 191,
          "name": "Randall Perkins"
        }
      ]
    },
    {
      "id": 123,
      "name": "Shanta Vollman",
      "employees": [
        {
          "id": 137,
          "name": "Renske Ladwig"
        },
        {
          "id": 138,
          "name": "Stephen Stiles"
        },
        {
          "id": 139,
          "name": "John Seo"
        },
        {
          "id": 140,
          "name": "Joshua Patel"
        },
        {
          "id": 192,
          "name": "Sarah Bell"
        },
        {
          "id": 193,
          "name": "Britney Everett"
        },
        {
          "id": 194,
          "name": "Samuel McCain"
        },
        {
          "id": 195,
          "name": "Vance Jones"
        }
      ]
    },
    {
      "id": 124,
      "name": "Kevin Mourgos",
      "employees": [
        {
          "id": 141,
          "name": "Trenna Rajs"
        },
        {
          "id": 142,
          "name": "Curtis Davies"
        },
        {
          "id": 143,
          "name": "Randall Matos"
        },
        {
          "id": 144,
          "name": "Peter Vargas"
        },
        {
          "id": 196,
          "name": "Alana Walsh"
        },
        {
          "id": 197,
          "name": "Kevin Feeney"
        },
        {
          "id": 198,
          "name": "Donald OConnell"
        },
        {
          "id": 199,
          "name": "Douglas Grant"
        }
      ]
    },
    {
      "id": 145,
      "name": "John Russell",
      "employees": [
        {
          "id": 150,
          "name": "Peter Tucker"
        },
        {
          "id": 151,
          "name": "David Bernstein"
        },
        {
          "id": 152,
          "name": "Peter Hall"
        },
        {
          "id": 153,
          "name": "Christopher Olsen"
        },
        {
          "id": 154,
          "name": "Nanette Cambrault"
        },
        {
          "id": 155,
          "name": "Oliver Tuvault"
        }
      ]
    },
    {
      "id": 146,
      "name": "Karen Partners",
      "employees": [
        {
          "id": 156,
          "name": "Janette King"
        },
        {
          "id": 157,
          "name": "Patrick Sully"
        },
        {
          "id": 158,
          "name": "Allan McEwen"
        },
        {
          "id": 159,
          "name": "Lindsey Smith"
        },
        {
          "id": 160,
          "name": "Louise Doran"
        },
        {
          "id": 161,
          "name": "Sarath Sewall"
        }
      ]
    },
    {
      "id": 147,
      "name": "Alberto Errazuriz",
      "employees": [
        {
          "id": 162,
          "name": "Clara Vishney"
        },
        {
          "id": 163,
          "name": "Danielle Greene"
        },
        {
          "id": 164,
          "name": "Mattea Marvins"
        },
        {
          "id": 165,
          "name": "David Lee"
        },
        {
          "id": 166,
          "name": "Sundar Ande"
        },
        {
          "id": 167,
          "name": "Amit Banda"
        }
      ]
    },
    {
      "id": 148,
      "name": "Gerald Cambrault",
      "employees": [
        {
          "id": 168,
          "name": "Lisa Ozer"
        },
        {
          "id": 169,
          "name": "Harrison Bloom"
        },
        {
          "id": 170,
          "name": "Tayler Fox"
        },
        {
          "id": 171,
          "name": "William Smith"
        },
        {
          "id": 172,
          "name": "Elizabeth Bates"
        },
        {
          "id": 173,
          "name": "Sundita Kumar"
        }
      ]
    },
    {
      "id": 149,
      "name": "Eleni Zlotkey",
      "employees": [
        {
          "id": 174,
          "name": "Ellen Abel"
        },
        {
          "id": 175,
          "name": "Alyssa Hutton"
        },
        {
          "id": 176,
          "name": "Jonathon Taylor"
        },
        {
          "id": 177,
          "name": "Jack Livingston"
        },
        {
          "id": 178,
          "name": "Kimberely Grant"
        },
        {
          "id": 179,
          "name": "Charles Johnson"
        }
      ]
    },
    {
      "id": 201,
      "name": "Michael Hartstein",
      "employees": [
        {
          "id": 202,
          "name": "Pat Fay"
        }
      ]
    }
  ]
}

In addition, I’ve also published a function taking a weakly-typed ref cursor and returning the same JSON output.
The cursor must be based on a query returning exactly three columns matching the attributes of JSONNestItem object type :

DECLARE

  rc      sys_refcursor;
  output  clob;
  
BEGIN
 
  open rc for
  select level
       , json_object(
           'id' value employee_id
         , 'name' value first_name || ' ' ||last_name
         )
       , 'employees'
  from hr.employees
  connect by prior employee_id = manager_id
  start with manager_id is null
  order siblings by last_name;
  
  output := JSONNestUtil.getDocument(rc);
   
END;
/

 

Parsing sibling repeating XML groups : Part 2

June 3, 2018 Leave a comment

This post is an addendum to a previously published How-To presenting various techniques to break sibling repeating groups into relational data :

<root>
  <id>1</id>
  <name>SMITH</name>
  <id>2</id>
  <name>WILSON</name>
  <id>3</id>
  <name>GRANT</name>
</root>

to :

ID    NAME
----- ---------
    1 SMITH
    2 WILSON
    3 GRANT

The topic was raised again recently on the Oracle Community forum, but with a particularity I didn’t cover originally : what about optional nodes?
For instance, if id 2 has no name :

<root>
  <id>1</id>
  <name>SMITH</name>
  <id>2</id>
  <id>3</id>
  <name>GRANT</name>
</root>

The “following-sibling” trick will not return the required output :

SQL> select x.*
  2  from xmltable(
  3         'for $i in /root/id
  4          return element r {
  5            $i/following-sibling::name[1]
  6          , $i
  7          }'
  8         passing xmlparse(document
  9         '<root>
 10            <id>1</id><name>SMITH</name>
 11            <id>2</id>
 12            <id>3</id><name>GRANT</name>
 13          </root>')
 14         columns id    number       path 'id'
 15               , name  varchar2(15) path 'name'
 16       ) x
 17  ;
 
        ID NAME
---------- ---------------
         1 SMITH
         2 GRANT          
         3 GRANT

One way to workaround this is to add another predicate to ensure that the node is actually the one coming directly after the current key element, and not one belonging to another logical sibling group further down in the document.

In the following example, I’ll use the node identity comparison operator ‘is’ :

SQL> select x.*
  2  from xmltable(
  3         'for $i in /root/id
  4          return element r {
  5            $i/following-sibling::name[1][preceding-sibling::id[1] is $i]
  6          , $i
  7          }'
  8         passing xmlparse(document
  9         '<root>
 10            <id>1</id><name>SMITH</name>
 11            <id>2</id>
 12            <id>3</id><name>GRANT</name>
 13          </root>')
 14         columns id    number       path 'id'
 15               , name  varchar2(15) path 'name'
 16       ) x
 17  ;
 
        ID NAME
---------- ---------------
         1 SMITH
         2                
         3 GRANT
 

 

Additional Approaches and Performance Considerations

The full XQuery approach (w/ or w/o RETURNING SEQUENCE BY REF clause) is rather compact and maintainable, but it is also very slow when applied on large XML documents.
Mixed approaches using XQuery and SQL post-processing provide far better performance.
I’ll present below three other ways using PIVOT, MODEL and MATCH_RECOGNIZE SQL features.

Read more…

Categories: SQL, XQuery Tags: , ,

How To : Generate Empty XML Attributes

February 17, 2018 1 comment

It is well-known that XML attributes may be generated using SQL/XML publishing function XMLAttributes().
It is also well-known that XMLAttributes() does not generate anything when passed an expression that evaluates to NULL :

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/adxdb/generation-of-XML-data-from-relational-data.html

If an attribute value expression evaluates to NULL, then no corresponding attribute is created.

So, how can we generate an empty attribute using SQL?

<root attr=""/>

Following are a compilation of “tricks” (more than official methods) to generate empty XML attributes, and empty namespace declarations in a SQL context.
I’ll divide them into two categories :

  • Document post-processing : target attributes we want to be empty are transformed after the complete document has been generated
  • In-place generation : empty attributes are directly generated in the same “flow”

 

Read more…

Categories: HowTo, SQL, XQuery Tags: , ,

ExcelTable 1.6 : support for cell comments

January 3, 2018 Leave a comment

Here’s the new version of ExcelTable, which can now extract cell comments as regular columns.
Not much change in the user interface, except an extended column syntax specification to declare a request for cell metadata instead of its value :

For example, using sample file ooxdata3.xlsx :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','ooxdata3.xlsx')
  5         , 'DataSource'
  6         , q'{
  7             "RN"             for ordinality
  8           , "SPARE2"         varchar2(30)   column 'F'
  9           , "SPARE2_COMMENT" varchar2(2000) column 'F' for metadata (comment)
 10           }'
 11         , '2:11'
 12         )
 13       ) t
 14  ;

 RN SPARE2 SPARE2_COMMENT
--- ------ ------------------------------
  1
  2
  3 OK     bleronm:
           This is a comment.

  4 OK
  5
  6 OK
  7
  8
  9        This is
           another comment
           on three lines

 10

10 rows selected.

 

Source code available on GitHub :

/mbleron/ExcelTable

 

A few words about the internals are following…

 

Read more…

Categories: Miscellaneous, PL/SQL, SQL, XQuery Tags:

How To : Nest Multiple JSON Path Expression Filters

July 9, 2017 Leave a comment

A short post about JSON Path Expressions.
I will focus on filter expressions (now officially documented as of Oracle 12.2) we can use in conjunction with JSON_EXISTS function, and more specifically how to nest multiple relative filter steps.

Here’s an example of what I mean :

{
  "_id":"53e3c6ed-9bfc-2730-e053-0100007f6afb",
  "content":{
    "name":"obj1", 
    "type":1, 
    "values":[
      {"name":"x", "v":1},
      {"name":"y", "v":2},
      {"name":"z", "v":1.5}
    ]
  }
}

and say we want to find documents (like the one above) where content.type = 1 and content.values contains an x item having v > 0.

 

Read more…