PL/SQL CSV Parsing : To Buffer or Not Buffer

May 20, 2019 Leave a comment

I’m currently working on adding support for delimited and positional flat files to ExcelTable.

The file, residing as a CLOB inside the database, will be parsed using PL/SQL code.
In its simplest, the base algorithm consists in locating field separators (and line terminators) in the input text and extracting data in between.
DBMS_LOB.INSTR API may be used iteratively to scan the entire CLOB.
However it is generally faster to chunk-read the file into a VARCHAR2 buffer and process it from there using string functions, but to what extent?

Below are the results of parsing a set of files of different sizes, using different buffer sizes, and compared to a straightforward LOB scan using DBMS_LOB.INSTR.
This is the test procedure I used (native compilation + optimizer level 3) :

create or replace procedure test_csv_base (
  p_filename in varchar2
, p_buffer_sz in pls_integer
) 
is

  field_separator  varchar2(1) := ',';
  fs_sz            pls_integer := length(field_separator);

  type buffer_t is record (
    content  varchar2(32767)
  , sz       pls_integer
  , offset   pls_integer
  );

  type stream_t is record (
    content  clob
  , sz       integer
  , offset   integer
  );
  
  stream  stream_t;
  buf     buffer_t;
  
  amount     pls_integer;
  fs_offset  pls_integer;
  
  start_time pls_integer;
  end_time   pls_integer;
  
  procedure bufferize is
  begin
    amount := p_buffer_sz;
    dbms_lob.read(stream.content, amount, stream.offset, buf.content);
    stream.offset := stream.offset + amount;
    buf.sz := amount;
    buf.offset := 1;
  end;
  
  procedure p (message in varchar2) is
  begin
    dbms_output.put_line(message);
  end;
  
  procedure loadfile is
    f           bfile := bfilename('XL_DATA_DIR',p_filename);
    dest_offset integer := 1;
    src_offset  integer := 1;
    lang_ctx    integer := dbms_lob.default_lang_ctx;
    warn        integer;
  begin
    p('Loading file...');
    dbms_lob.createtemporary(stream.content, true);
    dbms_lob.fileopen(f, dbms_lob.lob_readonly);
    dbms_lob.loadclobfromfile(
      dest_lob     => stream.content
    , src_bfile    => f
    , amount       => dbms_lob.getlength(f)
    , dest_offset  => dest_offset
    , src_offset   => src_offset
    , bfile_csid   => 873
    , lang_context => lang_ctx
    , warning      => warn
    );
    dbms_lob.fileclose(f);
    stream.sz := dbms_lob.getlength(stream.content);
    stream.offset := 1;
  end;
  
begin
  
  loadfile;
  start_time := dbms_utility.get_time;
  bufferize;
  
  loop 
    
    fs_offset := instr(buf.content, field_separator, buf.offset);       
    
    if fs_offset != 0 then
      buf.offset := fs_offset + fs_sz;
    elsif stream.offset <= stream.sz then
      bufferize;
    else
      exit;
    end if;
  
  end loop;
  
  end_time := dbms_utility.get_time;
  p(to_char(p_buffer_sz)||':'||to_char((end_time - start_time)/100));
  
  dbms_lob.freetemporary(stream.content);

end;

First plot below shows parse time (in second) against buffer size (in bytes).
File size (2.5M, 1.25M etc.) is expressed in number of fields, the average size of a field being 100 bytes :

(click to enlarge)


That tells us two things :

  • There is an optimal buffer size
  • There is a (linear) asymptotic behaviour for large buffer sizes

Second plot focuses on the optimum region :

(click to enlarge)

which is actually best viewed using a base-2 logarithmic abscissa :

(Click to enlarge)

Using a polynomial regression fit on the 2.5M series, I determined (on my system) an optimal buffer size of approx. 850 bytes.
That size provides the best trade-off between the number of LOB read operations necessary to buffer the entire file content and the number of string manipulations necessary to parse the buffer.
We will now see below how that compares to a direct LOB scan using DBMS_LOB API.

I have used a similar procedure with the following loop to test the direct LOB scan approach :

loop

  fs_offset := dbms_lob.instr(stream.content, field_separator, stream.offset);
  exit when fs_offset = 0;
  stream.offset := fs_offset + fs_sz;

end loop;

Here are the results added to previous data gathered for the buffering approach, and this time presented as a stacked histogram.
The red horizontal line materializes the cumulated time for the LOB scan method :

(Click to enlarge)


Direct LOB scan performs almost the same as the 8k buffer method, probably because of the LOB chunk size, which happens to be 8132 for temporary LOBs on my system.
It is also better than using large buffer sizes (> 16k), but choosing an optimal buffer size is definitely the way to go in this situation.

 

Advertisements
Categories: PL/SQL Tags: , ,

Oracle 18c : PL/SQL support for pipelined table functions returning ANYDATASET

May 14, 2019 2 comments

I don’t know – or remember – if this was ever qualified as a bug or an unimplemented feature, but starting with Oracle 18c, we are now able to make static references to pipelined table functions returning an ANYDATASET instance in PL/SQL.
This enhancement comes together with the release of Polymorphic Table Functions (PTF) in 18c too, which is probably not a coincidence since ODCI and PTF share common concepts.

Before that, an attempt to embed a SQL query using such a function (e.g. ExcelTable.getRows) in PL/SQL code resulted in the following compile-time error :

ORA-22905: cannot access rows from a non-nested table item

The only workaround back then was dynamic SQL.
For instance, in ExcelTable, function getCursor was designed exactly for that purpose.

Now we can directly do this :

declare

  cursor c (p_file in bfile) is
  select * 
  from ExcelTable.getRows(
         p_file => to_blob(p_file)
       , p_sheet => '.*'
       , p_cols => q'{
           "C1"         number
         , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)
         }'
       );

begin
  
  ExcelTable.useSheetPattern(true);

  for r in c (bfilename('XL_DATA_DIR','multisheet.xlsx'))
  loop
    dbms_output.put_line(utl_lms.format_message('SHEET_NAME=%s C1=%s', r.sheet_name, to_char(r.c1)));
  end loop;
  
end;
/

SHEET_NAME=Sheet2 C1=1
SHEET_NAME=Sheet2 C1=2
SHEET_NAME=Sheet2 C1=3
SHEET_NAME=Sheet1 C1=4
SHEET_NAME=Sheet1 C1=5
SHEET_NAME=Sheet1 C1=6
SHEET_NAME=Sheet3 C1=7
SHEET_NAME=Sheet3 C1=8
SHEET_NAME=Sheet3 C1=9
  

Behind the scenes, that means Oracle is now calling ODCITableDescribe at compile-time so the projection of the SQL query is known and readily usable by the rest of the code.
A system-generated, not-persistable object type and its collection is created in the process and associated with the PL/SQL code unit.

On a side note, we can also see that the TABLE operator has become optional for pipelined table functions.

As far as ExcelTable is concerned, this new “feature” only works when passing a literal value for p_sheet (or p_sheets) parameter, as in the above example.
For instance, if p_sheet is a bind variable, we get this error :

PLS-00307: too many declarations of 'ODCITABLEDESCRIBE' match this call

That’s actually an expected behaviour because all non-literal values are converted to NULL when calling ODCITableDescribe routine.
We can see that in the following excerpt of a SQL trace. Since v3.0 and multi-sheet support, this call matches two possible overloads of ODCITableDescribe, and obviously Oracle cannot tell which one we intend to use :

PARSING IN CURSOR #140364229256600 len=234 dep=1 uid=104 oct=47 lid=104 tim=202406983730 hv=450544808 ad='7d4a5b30' sqlid='gjq1rgsddpj58'
 declare 
     rc sys_refcursor; 
   begin 
     :1 := "DEV"."EXCELTABLEIMPL"."ODCITABLEDESCRIBE"(:2 ,NULL,NULL,'
           "C1"         number
         , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)
         '); 
   end;
END OF STMT

 

Categories: PL/SQL, SQL Tags: , , ,

ExcelTable 3.1 : Default Value Feature in DML API

May 1, 2019 2 comments

Here’s the latest addition to ExcelTable (v3.1).
We are now able to specify default values when mapping columns using the DML API.
This new feature has been implemented based on Jordan Cortes’ request last October.

ExcelTable will apply the default value either in the usual way (i.e. when the input value is NULL), or as a constant when omitting the spreadsheet column reference.
 

Implementation

The mapColumn procedure has been extended with a p_default parameter of data type “ANYDATA”, allowing the user to bind a strongly-typed value to the target column.

procedure mapColumn (
  p_ctx       in DMLContext
, p_col_name  in varchar2
, p_col_ref   in varchar2     default null
, p_format    in varchar2     default null
, p_meta      in pls_integer  default null
, p_key       in boolean      default false
, p_default   in anydata      default null
);

 

For convenience, a new mapColumnWithDefault procedure has been added as well, to directly deal with the three common data types VARCHAR2, NUMBER and DATE :

procedure mapColumnWithDefault (
  p_ctx      in DMLContext
, p_col_name in varchar2
, p_col_ref  in varchar2 default null
, p_format   in varchar2 default null
, p_meta     in pls_integer default null
, p_key      in boolean default false
, p_default  in varchar2
);

procedure mapColumnWithDefault (
  p_ctx      in DMLContext
, p_col_name in varchar2
, p_col_ref  in varchar2 default null
, p_format   in varchar2 default null
, p_meta     in pls_integer default null
, p_key      in boolean default false
, p_default  in number
);

procedure mapColumnWithDefault (
  p_ctx      in DMLContext
, p_col_name in varchar2
, p_col_ref  in varchar2 default null
, p_format   in varchar2 default null
, p_meta     in pls_integer default null
, p_key      in boolean default false
, p_default  in date
);

 

Examples

create table tmp_sample2 (
  id       number       primary key
, name     varchar2(10)
, val      varchar2(30)
, load_dt  date
);

Using mapColumn with an ANYDATA value :

declare

  ctx    ExcelTable.DMLContext;
  nrows  integer;
  
begin
  
  ctx := ExcelTable.createDMLContext('TMP_SAMPLE2');
  
  ExcelTable.mapColumn(ctx, p_col_name => 'ID',   p_col_ref => 'A');
  ExcelTable.mapColumn(ctx, p_col_name => 'NAME', p_col_ref => 'B');
  ExcelTable.mapColumn(ctx, p_col_name => 'VAL',  p_col_ref => 'C');
  ExcelTable.mapColumn(ctx, p_col_name => 'LOAD_DT', p_default => anydata.ConvertDate(sysdate));
  
  nrows := 
  ExcelTable.loadData(
    p_ctx      => ctx
  , p_file     => ExcelTable.getFile('XL_DATA_DIR','sample_2.xlsx')
  , p_sheet    => 'DataSource'
  , p_method   => ExcelTable.STREAM_READ
  , p_dml_type => ExcelTable.DML_INSERT
  );
  
  dbms_output.put_line(nrows || ' rows inserted.');
  
end;
/

 

Using mapColumnWithDefault :

declare

  ctx    ExcelTable.DMLContext;
  nrows  integer;
  
begin
  
  ctx := ExcelTable.createDMLContext('TMP_SAMPLE2');
  
  ExcelTable.mapColumn(ctx, p_col_name => 'ID',   p_col_ref => 'A');
  ExcelTable.mapColumn(ctx, p_col_name => 'NAME', p_col_ref => 'B');
  ExcelTable.mapColumn(ctx, p_col_name => 'VAL',  p_col_ref => 'C');
  ExcelTable.mapColumnWithDefault(ctx, p_col_name => 'LOAD_DT', p_default => sysdate);
  
  nrows := 
  ExcelTable.loadData(
    p_ctx      => ctx
  , p_file     => ExcelTable.getFile('XL_DATA_DIR','sample_2.xlsx')
  , p_sheet    => 'DataSource'
  , p_method   => ExcelTable.STREAM_READ
  , p_dml_type => ExcelTable.DML_INSERT
  );
  
  dbms_output.put_line(nrows || ' rows inserted.');
  
end;
/

 

Download

/mbleron/ExcelTable
 

Categories: ExcelTable Tags:

ExcelTable 3.0 : Multi-sheet support

April 9, 2019 Leave a comment

ExcelTable (v3.0) finally supports multi-sheet queries.
I’ve implemented this new feature following Mike Kutz’s suggestion last year.

Function getRows, as well as related routines getCursor and loadData are now overloaded to accept both a list of sheet names or a sheet name pattern (regular expression).

In order to know which data come from which sheet, the FOR METADATA clause has been extended with :

  • SHEET_INDEX : 1-based index of the sheet in the workbook
  • SHEET_NAME : sheet name (obviously)

For backward compatibility, the p_sheet parameter is not interpreted as a regex pattern by default but this may be enabled session-wise via useSheetPattern procedure, or made it the default by modifying the initial value of sheet_pattern_enabled variable in ExcelTable package body.
 

Examples

Using a sheet list :
select x.* 
from table(
       ExcelTable.getRows(
         ExcelTable.getFile('XL_DATA_DIR','multisheet.xlsx')
       , ExcelTableSheetList('Sheet2','Sheet3')
       , q'{
            "C1"         number column 'A'
          , "SHEET_IDX"  number for metadata (sheet_index)
          , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)
          , "comment"    varchar2(4000) column 'A' for metadata (comment)
          , "R_NUM"      for ordinality
          }'
       )
     ) x
;

  C1  SHEET_IDX SHEET_NAME    comment                    R_NUM
---- ---------- ------------- ------------------------ -------
   1          1 Sheet2        Comment on first sheet         1
   2          1 Sheet2                                       2
   3          1 Sheet2                                       3
   7          3 Sheet3                                       4
   8          3 Sheet3                                       5
   9          3 Sheet3        bleronm:                       6
                              Comment on last sheet    
 

 

Using a sheet name pattern :
select x.* 
from table(
       ExcelTable.getRows(
         ExcelTable.getFile('XL_DATA_DIR','multisheet.xlsx')
       , '^Sheet[12]'
       , ' "C1" number
         , "SHEET_IDX"  number            for metadata (sheet_index)
         , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)'
       )
     ) x
;

  C1  SHEET_IDX SHEET_NAME
---- ---------- ------------
   1          1 Sheet2
   2          1 Sheet2
   3          1 Sheet2
   4          2 Sheet1
   5          2 Sheet1
   6          2 Sheet1
 

 

Download

/mbleron/ExcelTable
 

Categories: ExcelTable Tags:

JSON Patch, Merge Patch, and Redact features in Oracle 18c

August 27, 2018 Leave a comment

These days, I am looking into the latest additions to SODA in Oracle Database 18c, in particular the REST implementation available via Oracle REST Data Services (ORDS) : SODA for REST.

I wrote once about SODA in the past (see here), to give a preview of JSON filter expressions before their official documentation in release 12.2.

Before looking more closely at the features mentioned in the title, I’ll set up a little SODA collection.

A collection of documents created and managed via SODA is backed up by a database table containing at least an “id” column and a “content” column to store JSON documents.
Though Oracle 18c introduced SODA for PL/SQL as well, I’ll use the HTTP protocol to set up the following example.
 

Creating a new SODA collection

Without custom metadata, documents of a SODA collection are stored in a BLOB column by default.
For subsequent ease-of-use, I’ll pass the following metadata to create the table with the minimum structure : a numeric “ID” column based on a sequence, and a “DOC” column of CLOB data type.

collMetadata.json

{
  "tableName": "JSON_DOCUMENTS",
  "keyColumn" : {
    "name": "ID",
    "sqlType": "NUMBER",
    "assignmentMethod": "SEQUENCE",
    "sequenceName": "JSON_DOC_SEQ"
  },
  "contentColumn": {
    "name": "DOC",
    "sqlType": "CLOB",
    "cache": true
  }
}
Creating the sequence :
create sequence json_doc_seq;

 

Creating the collection, using a curl command :
curl -i -X PUT --data-binary @collMetadata.json -H "Content-Type: application/json" 
http://localhost:8083/ords/dev/soda/latest/MyCollection
SQL> desc json_documents

 Name    Null?    Type
 ------- -------- -------------
 ID      NOT NULL NUMBER
 DOC              CLOB

 

Inserting documents :

I’ll load the ready-to-use sample file ‘POList.json’ available in the ./examples/soda/getting-started directory of ORDS installation.
This file holds a JSON array containing 70 PO documents.

curl -X POST --data-binary @POList.json -H "Content-Type: application/json" 
http://localhost:8083/ords/dev/soda/latest/custom-actions/insert/MyCollection

Here’s the content of the table after successful execution :

SQL> select * from json_documents order by id;

        ID DOC
---------- ----------------------------------------------------------------------------
         1 {"PONumber":1,"Reference":"MSULLIVA-20141102","Requestor":"Martha Sullivan",
         2 {"PONumber":2,"Reference":"MSULLIVA(-20141113","Requestor":"Martha Sullivan"
         3 {"PONumber":3,"Reference":"TRAJS-20140518","Requestor":"Trenna Rajs","User":
         4 {"PONumber":4,"Reference":"TRAJS-20140520","Requestor":"Trenna Rajs","User":
         5 {"PONumber":5,"Reference":"MSULLIVA-20141121","Requestor":"Martha Sullivan",
         6 {"PONumber":6,"Reference":"TRAJS-20140530","Requestor":"Trenna Rajs","User":
         7 {"PONumber":7,"Reference":"VJONES-20140503","Requestor":"Vance Jones","User"
...
        66 {"PONumber":66,"Reference":"SMCCAIN-20141007","Requestor":"Samuel McCain","U
        67 {"PONumber":67,"Reference":"SMCCAIN-20141010","Requestor":"Samuel McCain","U
        68 {"PONumber":68,"Reference":"SHIGGINS-20141028","Requestor":"Shelley Higgins"
        69 {"PONumber":69,"Reference":"KPARTNER-20140905","Requestor":"Karen Partners",
        70 {"PONumber":70,"Reference":"SSTILES-20141011","Requestor":"Stephen Stiles","

70 rows selected.

 
The setup is complete.
Let’s now try out the new PATCH operation.
 

JSON PATCH operation

PATCH is a standard request method of the HTTP protocol, whose purpose is to make piecewise changes to a resource (a JSON document in this case).
The SODA HTTP PATCH operation implements the JSON Patch standard as per RFC 6902.
A JSON Patch specification is itself a JSON document representing an array of Patch steps. For example :

poPatchSpec.json

[
  { "op"    : "test",
    "path"  : "/ShippingInstructions/Address/street",
    "value" : "200 Sporting Green" },
  { "op"    : "replace",
    "path"  : "/ShippingInstructions/Address/street",
    "value" : "Winchester House, Heatley Rd" },
  { "op"    : "copy",
    "from"  : "/ShippingInstructions/Phone/0",
    "path"  : "/ShippingInstructions/Phone/1" },
  { "op"    : "replace",
    "path"  : "/ShippingInstructions/Phone/1/number",
    "value" : "861-555-8765" }
]

– First step is a “test” operation, which acts as a filter predicate.
It simply checks whether the value at location '/ShippingInstructions/Address/street' is '200 Sporting Green'. If not, the processing stops, else continues with the next steps.

– Second step is a “replace” operation.
It replaces the value at '/ShippingInstructions/Address/street' with value 'Winchester House, Heatley Rd'.

– Third step is a “copy”” operation.
It copies the value at location '/ShippingInstructions/Phone/0' (an array location) to the target location '/ShippingInstructions/Phone/1'.

– Fourth step is another “replace” operation, this time targetting the ‘number’ field of the new Phone item added on the previous step.

Path expressions used in JSON Patch are defined in RFC 6901 (JSON Pointer).

Here’s the relevant data for doc ID = 1 before applying the patch spec :

SQL> select po.*
  2  from json_documents t
  3     , json_table(t.doc, '$.ShippingInstructions'
  4         columns street varchar2(30)  path '$.Address.street'
  5               , phone  varchar2(256) format json with array wrapper path '$.Phone[*].number'
  6       ) po
  7  where t.id = 1;

STREET                         PHONE
------------------------------ --------------------
200 Sporting Green             ["979-555-6598"]
 

Applying patch :

curl -i -X PATCH --data-binary @poPatchSpec.json -H "Content-Type: application/json-patch+json" 
http://localhost:8083/ords/dev/soda/latest/MyCollection/1

Data after patch operation :

SQL> select po.*
  2  from json_documents t
  3     , json_table(t.doc, '$.ShippingInstructions'
  4         columns street varchar2(30)  path '$.Address.street'
  5               , phone  varchar2(256) format json with array wrapper path '$.Phone[*].number'
  6       ) po
  7  where t.id = 1;

STREET                         PHONE
------------------------------ -----------------------------------
Winchester House, Heatley Rd   ["979-555-6598","861-555-8765"]
 

 
So what happened behind the scenes?
We can check that using SQL_Trace, or by just querying V$SQL (before cursors age out of the shared pool).
Here are my findings :

select DBMS_SODA_DOM.JSON_PATCH_C("DOC",:1 ),to_char("ID") from "DEV"."JSON_DOCUMENTS" where ("ID" = to_number(:2 ) )

update "DEV"."JSON_DOCUMENTS" set "DOC" = :1  where ("ID" = to_number(:2 ) )

First statement selects the document and apply the patch using DBMS_SODA_DOM.JSON_PATCH_C function.
Second statement saves back the document in the collection table.

DBMS_SODA_DOM is not documented. It contains various functions supporting – among other interesting things – the PATCH method against different data types (VARCHAR2, NVARCHAR2, RAW, CLOB, NCLOB, BLOB).

Pushing the analysis a bit further, we can discover that those PATCH routines are actually based on PL/SQL JSON_ELEMENT_T.patch() method :

MEMBER PROCEDURE patch(self IN OUT NOCOPY JSON_ELEMENT_T, spec VARCHAR2)

 
Let’s now experiment with this method directly from PL/SQL.
 

Disclaimer :
The patch method is not yet documented, so “unsupported” for direct usage.
All the following is then presented for informational purpose.

 
Read more…

Categories: JSON, PL/SQL, SQL Tags: , , , ,

ExcelTable 2.3 : new API for DML operations

August 23, 2018 4 comments

We have seen in previous posts that PL/SQL does not support static reference to pipelined table functions returning an AnyDataset instance.
Somehow, Oracle is not able to call ODCITableDescribe routine at compile time in order to create the necessary object and collection type to back up the dataset :

SQL> begin
  2
  3    insert into tmp_ooxdata (id, name)
  4    select t.id, t.name
  5    from table(
  6           ExcelTable.getRows(
  7             p_file  => ExcelTable.getFile('TMP_DIR','ooxdata.xlsx')
  8           , p_sheet => 'DataSource'
  9           , p_cols  => '"ID" number, "NAME" varchar2(10)'
 10           )
 11         ) t
 12    ;
 13
 14  end;
 15  /
  from table(
       *
ERROR at line 5:
ORA-06550: line 5, column 8:
PL/SQL: ORA-22905: cannot access rows from a non-nested table item
 

 
The first (obvious) workaround is to use dynamic SQL :

SQL> declare
  2
  3    stmt  varchar2(2000) :=
  4    q'{
  5    insert into tmp_ooxdata (id, name)
  6    select t.id, t.name
  7    from table(
  8           ExcelTable.getRows(
  9             p_file  => :1
 10           , p_sheet => :2
 11           , p_cols  => '"ID" number, "NAME" varchar2(10)'
 12           )
 13         ) t
 14    }';
 15
 16  begin
 17
 18    execute immediate stmt
 19    using ExcelTable.getFile('TMP_DIR','ooxdata.xlsx')
 20        , 'DataSource'
 21    ;
 22
 23    dbms_output.put_line(sql%rowcount || ' rows inserted.');
 24
 25  end;
 26  /

50000 rows inserted.

PL/SQL procedure successfully completed.
 

 

When the purpose is to use getRows() function in a cursor, I’ve added a convenience function getCursor() taking the same set of parameters as getRows() and returning a REF cursor (SYS_REFCURSOR) that the user may consume from PL/SQL or another client language :

SQL> declare
  2
  3    type rec_t is record (id number, name varchar2(10));
  4
  5    my_rec  rec_t;
  6    rc      sys_refcursor;
  7
  8  begin
  9
 10    rc := ExcelTable.getCursor(
 11            p_file  => ExcelTable.getFile('TMP_DIR','ooxdata.xlsx')
 12          , p_sheet => 'DataSource'
 13          , p_cols  => '"ID" number, "NAME" varchar2(10)'
 14          , p_range => '1:10'
 15          );
 16
 17    loop
 18      fetch rc into my_rec;
 19      exit when rc%notfound;
 20      dbms_output.put_line(to_char(my_rec.id,'fm0999')||';'||my_rec.name);
 21    end loop;
 22
 23    close rc;
 24
 25  end;
 26  /

0001;LINE-00001
0002;LINE-00002
0003;LINE-00003
0004;LINE-00004
0005;LINE-00005
0006;LINE-00006
0007;LINE-00007
0008;LINE-00008
0009;LINE-00009
0010;LINE-00010

PL/SQL procedure successfully completed.
 

 

ExcelTable 2.3 introduces another way to load data from a spreasheet file, directly into a given target table (or view) using one of these DML operations :

  • INSERT
  • UPDATE
  • MERGE

A DELETE mode is also available to delete rows in a table based on data (keys) from the external spreadsheet file.

The DML API comprises three routines :
 

1) createDMLContext

function createDMLContext (
  p_table_name in varchar2    
)
return DMLContext;

This creates a new DML context to affect the specified target table or view.
The table name may be passed as a simple or qualified SQL name. For instance, these are valid examples of table specification :

MY_TABLE
MY_SCHEMA.MY_TABLE
"myTable"
MY_SCHEMA."myTable"

Database link spec is not supported yet.
 

2) mapColumn

procedure mapColumn (
  p_ctx       in DMLContext
, p_col_name  in varchar2
, p_col_ref   in varchar2
, p_format    in varchar2     default null
, p_meta      in pls_integer  default null
, p_key       in boolean      default false
);

For a given DML context, maps a spreadsheet column reference (A, B, C …) to a column of the target table.
Optional arguments p_format and p_meta are the equivalent of FORMAT and FOR METADATA clauses respectively, from the columns specs of getRows() function.
Optional argument p_key, when set to true, indicates the column is (part of) the key of the input dataset. At least one key column must be specified for an UPDATE, MERGE or DELETE context, an exception is raised otherwise.
 

3) loadData

function loadData (
  p_ctx          in DMLContext
, p_file         in blob
, p_sheet        in varchar2
, p_range        in varchar2       default null
, p_method       in binary_integer default DOM_READ
, p_password     in varchar2       default null
, p_dml_type     in pls_integer    default DML_INSERT
, p_err_log      in varchar2       default null
)
return integer;

Function loadData() performs the data loading operation according to the DML mode specified via p_dml_type argument.
Argument p_err_log may be used to pass a DML error logging clause.
Note that the syntax of this clause is not checked before runtime.

The rest of the arguments are identical to those of getRows() function.
The function returns the number of rows affected by the operation.
 

Download

/mbleron/ExcelTable
 

Examples

Inserting first 1,000 rows from file sample_2.xlsx into table TMP_SAMPLE2 :

SQL> create table tmp_sample2 (
  2    id   number       primary key
  3  , name varchar2(10)
  4  , val  varchar2(30)
  5  );

Table created.

SQL> declare
  2
  3    ctx    ExcelTable.DMLContext;
  4    nrows  integer;
  5
  6  begin
  7
  8    ctx := ExcelTable.createDMLContext('TMP_SAMPLE2');
  9
 10    ExcelTable.mapColumn(ctx, p_col_name => 'ID',   p_col_ref => 'A');
 11    ExcelTable.mapColumn(ctx, p_col_name => 'NAME', p_col_ref => 'B');
 12    ExcelTable.mapColumn(ctx, p_col_name => 'VAL',  p_col_ref => 'C');
 13
 14    nrows :=
 15    ExcelTable.loadData(
 16      p_ctx      => ctx
 17    , p_file     => ExcelTable.getFile('XL_DATA_DIR','sample_2.xlsx')
 18    , p_sheet    => 'DataSource'
 19    , p_range    => '1:1000'
 20    , p_method   => ExcelTable.STREAM_READ
 21    , p_dml_type => ExcelTable.DML_INSERT
 22    );
 23
 24    dbms_output.put_line(nrows || ' rows inserted.');
 25
 26  end;
 27  /

1000 rows inserted.

PL/SQL procedure successfully completed.

SQL> select count(*) from tmp_sample2;

  COUNT(*)
----------
      1000

SQL> commit;

Commit complete.
 

 
Read more…

Interconversion between XML and JSON : Part 1

August 12, 2018 3 comments

This publication inaugurates a new series focusing on XML/JSON interconversion.

In this first post, I’ll present a PL/SQL implementation of two XPath 3.1 functions performing lossless conversion between XML and JSON format :

XPath 3.1 bridges the gap between JSON data and the XML Data Model (XDM) by providing facilities to handle JSON via map and array types (introduced as part of XDM 3.1 as well).

The conversion is based on a standardized XML representation of JSON.
Here’s a small sample JSON document and its XML equivalent according to that mapping :

JSON:

{
  "item":[
     {"id":1, "v":"ABC"}
   , {"id":2, "v":"XYZ"}
  ]
}

XML:

<map xmlns="http://www.w3.org/2005/xpath-functions">
  <array key="item">
    <map>
      <number key="id">1</number>
      <string key="v">ABC</string>
    </map>
    <map>
      <number key="id">2</number>
      <string key="v">XYZ</string>
    </map>
  </array>
</map>

A larger yet concise example is available in the specifications.

The transformation between one format to the other is very straightforward : a JSON object translates to an XML “map” element, a JSON array to an “array” element and so on for scalar JSON types : “string”, “number”, “boolean”, “null”, each complemented with a “key” attribute when the element represents an object member.

Because my implementation makes use of PL/SQL JSON object types, it is only available in Oracle release 12.2 and onwards.
NB : the second (optional) argument of fn:json-to-xml and fn:xml-to-json is not supported (for now).

Source code available on GitHub :

/mbleron/JSONUtilities

 

Examples

SQL> select doc from json_documents where id = 1;
 
DOC
--------------------------------------------------
{
  "_id":"53e3c6ed-9bfc-2730-e053-0100007f6afb",
  "content":{
    "name":"obj1",
    "type":1,
    "isNew":true,
    "clientId":null,
    "values":[
      {"name":"x", "v":1},
      {"name":"y", "v":2}
    ]
  }
}
 

converted to XML :

SQL> select xutl_json.json_to_xml(doc) as xml_doc
  2  from json_documents
  3  where id = 1;
 
XML_DOC
--------------------------------------------------------------------
<map xmlns="http://www.w3.org/2005/xpath-functions">
  <string key="_id">53e3c6ed-9bfc-2730-e053-0100007f6afb</string>
  <map key="content">
    <string key="name">obj1</string>
    <number key="type">1</number>
    <boolean key="isNew">true</boolean>
    <null key="clientId"/>
    <array key="values">
      <map>
        <string key="name">x</string>
        <number key="v">1</number>
      </map>
      <map>
        <string key="name">y</string>
        <number key="v">2</number>
      </map>
    </array>
  </map>
</map>
 

 
Another example, using hr.json file :

SQL> insert into json_documents
  2  values (2, to_clob(bfilename('DATA_DIR','hr.json'), nls_charset_id('AL32UTF8')));

1 row created.

SQL> commit;

Commit complete.

SQL> select xutl_json.json_to_xml(doc) as xml_doc
  2  from json_documents
  3  where id = 2;
 
XML_DOC
----------------------------------------------------------------------
<array xmlns="http://www.w3.org/2005/xpath-functions">
  <map>
    <number key="id">1000</number>
    <string key="address">1297 Via Cola di Rie</string>
    <string key="city">Roma</string>
    <string key="country">IT</string>
  </map>

...

  <map>
    <number key="id">2700</number>
    <string key="address">Schwanthalerstr. 7031</string>
    <string key="city">Munich</string>
    <string key="country">DE</string>
    <array key="departments">
      <map>
        <number key="id">70</number>
        <string key="name">Public Relations</string>
        <number key="manager">204</number>
        <array key="employees">
          <map>
            <number key="id">204</number>
            <string key="lastname">Baer</string>
            <string key="hiredate">1994-06-07T00:00:00</string>
            <number key="salary">10000</number>
          </map>
        </array>
      </map>
    </array>
  </map>
  <map>
    <number key="id">2800</number>
    <string key="address">Rua Frei Caneca 1360 </string>
    <string key="city">Sao Paulo</string>
    <string key="country">BR</string>
  </map>

...

</array>
 

Full output available here : hr.xml

 

What’s next?

We can apply XQuery or XSLT transformation to JSON documents by using an intermediate XML representation of JSON.
Given the hr.json file, the following will output a JSON array containing, for each city having employees, the number of employees, the minimum and maximum salaries. The array will be sorted by country and city :

select xutl_json.xml_to_json(
         xmlquery(
          'declare default element namespace "http://www.w3.org/2005/xpath-functions"; (::)
           element array {
             for $loc in /array/map
             let $emps := $loc//array[@key="employees"]/map
             where exists($emps)
             order by $loc/string[@key="country"], $loc/string[@key="city"]
             return element map {
               $loc/string[@key="city"]
             , element number { attribute key {"count"}, count($emps) }
             , element number { attribute key {"min-salary"}, min($emps/number[@key="salary"]) }
             , element number { attribute key {"max-salary"}, max($emps/number[@key="salary"]) }
             }
           }'
           passing xutl_json.json_to_xml(t.doc)
           returning content
         )
       )
from json_documents t
where t.id = 2;
 

Output (pretty-printed) :

[
  {"city":"Toronto", "count":2, "min-salary":6000, "max-salary":13000},
  {"city":"Munich", "count":1, "min-salary":10000, "max-salary":10000},
  {"city":"London", "count":1, "min-salary":6500, "max-salary":6500},
  {"city":"Oxford", "count":34, "min-salary":6100, "max-salary":14000},
  {"city":"Seattle", "count":18,"min-salary":2500, "max-salary":24000},
  {"city":"South San Francisco", "count":45, "min-salary":2100, "max-salary":8200},
  {"city":"Southlake", "count":5, "min-salary":4200, "max-salary":9000}
]

A similar transformation may be performed using XSLT.
XSLT 1.0, the only version implemented natively inside the database, does not support fn:min and fn:max functions (part of XPath 2.0). Though they can be implemented using a named template, in the example below, for simplicity, I will just stick to “count” :

select xutl_json.xml_to_json(
         xmltransform(
           xutl_json.json_to_xml(t.doc)
         , q'~<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns="http://www.w3.org/2005/xpath-functions" xmlns:fn="http://www.w3.org/2005/xpath-functions"
                exclude-result-prefixes="fn">
                <xsl:output method="xml"/>
                <xsl:template match="/"> 
                  <array>
                    <xsl:apply-templates select="fn:array/fn:map">
                      <xsl:sort select="fn:string[@key='country']"/>
                      <xsl:sort select="fn:string[@key='city']"/>
                    </xsl:apply-templates>
                  </array>
                </xsl:template>
                <xsl:template match="fn:map">
                  <xsl:variable name="emps" select=".//fn:array[@key='employees']/fn:map"/>
                  <xsl:if test="$emps">
                    <map>
                      <xsl:copy-of select="fn:string[@key='city']"/>
                      <number key="count"><xsl:value-of select="count($emps)"/></number>
                    </map>
                  </xsl:if>
                </xsl:template>
              </xsl:stylesheet>~'
         )
       )
from json_documents t
where t.id = 2;
 

Output (pretty-printed) :

[
  {"city":"Toronto", "count":2},
  {"city":"Munich", "count":1},
  {"city":"London", "count":1},
  {"city":"Oxford", "count":34},
  {"city":"Seattle", "count":18},
  {"city":"South San Francisco", "count":45},
  {"city":"Southlake", "count":5}
]