Rounding Timestamp Values with Fractional Seconds

October 7, 2019 Leave a comment

Lately, I had to deal with a timestamp value with an overly long sequence of decimal digits, coming as a string in ISO 8601 format :

1970-01-04T15:20:34.49899986153468675

The goal was to build an Oracle TIMESTAMP instance from this lexical representation, but rounded to 3 decimal places :

1970-01-04 15:20:34.499

In SQL, when we already have a timestamp data type, we can use CAST operator to scale it down with round-up behaviour, e.g.

select cast(timestamp '1970-01-04 15:20:34.4995' as timestamp(3)) as ts
from dual;
 
TS
-------------------------
04/01/1970 15:20:34.500
 

In a PL/SQL context however, CAST does not support precision nor scale :

declare
  ts  timestamp := timestamp '1970-01-04 15:20:34.4995';
begin
  ts := cast(ts as timestamp(3));
end;
/
 
ORA-06550: line 5, column 29:
PLS-00103: Encountered the symbol "(" when expecting one of the following:

   . ) @ % with
 

In this case, the solution is to extract the “seconds” part of the timestamp, round it, then subtract from it the original seconds.
The resulting interval value is then added to the timestamp :

declare
  ts  timestamp := timestamp '1970-01-04 15:20:34.4995';
begin
  ts := ts + numtodsinterval(round(extract(second from ts),3) - extract(second from ts), 'second');
  dbms_output.put_line(ts);
end;
/
 
04/01/1970 15:20:34.500000
 

The timestamp I had to handle possesses 17 decimal digits, which cannot be parsed directly without first truncating it and risking an eventual bad rounded value, because the built-in TO_TIMESTAMP function can only parse up to 9 decimal digits using a format mask like :

YYYY-MM-DD"T"HH24:MI:SS.FF9

Trying to pass the string '1970-01-04T15:20:34.49899986153468675' to TO_TIMESTAMP results in (expected) error :
ORA-01830: date format picture ends before converting entire input string

The solution I implemented extracts the seconds part (using string manipulations), round it and add it back to the original timestamp truncated to the second.
Note the use of subtype TIMESTAMP_UNCONSTRAINED, to preserve all decimal places in the resulting value :

create or replace function iso_ts_round (
  p_value in varchar2
, p_scale in number
)
return timestamp_unconstrained
deterministic
is
  DATETIME_FMT  constant varchar2(23) := 'YYYY-MM-DD"T"HH24:MI:SS';
  sepIndex  pls_integer;
begin
  sepIndex := instr(p_value,'.',-1);
  if sepIndex = 0 then
    return to_timestamp(p_value, DATETIME_FMT);
  else
    return to_timestamp(substr(p_value,1,sepIndex-1), DATETIME_FMT)
            + numtodsinterval(round(to_number(substr(p_value, sepIndex), '.999999999999999999'), p_scale), 'second');
  end if;
end;

For example :

select level - 1 as scale
     , iso_ts_round('1970-01-04T15:20:34.49899986153468675', level - 1) as ts
from dual
connect by level <= 10
;
 
     SCALE TS
---------- -------------------------------
         0 04/01/1970 15:20:34.000000000
         1 04/01/1970 15:20:34.500000000
         2 04/01/1970 15:20:34.500000000
         3 04/01/1970 15:20:34.499000000
         4 04/01/1970 15:20:34.499000000
         5 04/01/1970 15:20:34.499000000
         6 04/01/1970 15:20:34.499000000
         7 04/01/1970 15:20:34.498999900
         8 04/01/1970 15:20:34.498999860
         9 04/01/1970 15:20:34.498999862
 

Categories: PL/SQL, SQL Tags: ,

ExcelTable 4.0 : Flat File Support

September 25, 2019 Leave a comment

ExcelTable now supports delimited or positional text data sources.
Though this addition provides a small subset of the features available with Oracle’s buit-in external tables, it is not meant to compete with them.
External tables can only read data from external files.
Through ExcelTable interface, we are now able to read text content already residing in the database, typically in a CLOB column or variable.
 

Implementation

The text parser is implemented in a separate package (XUTL_FLATFILE) and interfaced with ExcelTable via the usual Open-Iterate-Close API.
A couple of standalone pipelined functions is also published to directly read data field by field :

function get_fields_delimited (
  p_content   in clob
, p_cols      in varchar2
, p_skip      in pls_integer default 0
, p_line_term in varchar2 default DEFAULT_LINE_TERM
, p_field_sep in varchar2 default DEFAULT_FIELD_SEP
, p_text_qual in varchar2 default DEFAULT_TEXT_QUAL
)
return ExcelTableCellList 
pipelined;

function get_fields_positional (
  p_content   in clob
, p_cols      in varchar2
, p_skip      in pls_integer default 0
, p_line_term in varchar2 default DEFAULT_LINE_TERM
)
return ExcelTableCellList 
pipelined;
Parameter Description Mandatory
p_content Text data as a CLOB Yes
p_cols Field list, one of :

  • List of column references, with range option e.g. ‘A,B,C,D,E’, or ‘A-C,F-G’
  • List of position references, e.g. ‘1:3,4:6,12:13’
Yes
p_skip Number of logical rows to skip from the beginning No
p_line_term Line terminator string (at most two characters are allowed) No
p_field_sep Field separator character No
p_text_qual Text qualifier character No

 

Examples

Parsing delimited text file test_8k.csv:

select t.cellRow
     , t.cellCol
     , t.cellData.accessVarchar2() as cellData
from table(
       xutl_flatfile.get_fields_delimited(
         p_content  => ExcelTable.getTextFile('XL_DATA_DIR','test_8k.csv')
       , p_cols     => 'A-J'
       , p_line_term => chr(13)||chr(10)
       , p_field_sep => ';'
       , p_text_qual => '"'
       )
     ) t
;

   CELLROW CELLCOL CELLDATA
---------- ------- --------------------------------------------------------------------------------
         1 A       1
         1 B       5;35545956033232307223893610587965339454
         1 C       258417,601531795111780807441530803644449
         1 D       784576847
         1 E       11/04/1918 17:58:03
         1 F       25/08/1952 00:41:46
         1 G       15-FEB-2018 04.09.21.858660227 AM
         1 H       IPTVKFUEUG
         1 I       DZCB50VK0J
         1 J       jizDfdCrVgsCnROenwHxHWIbdvHwbZpYlJZwHojqqgLdfidFMOyBzJDNdGzGCrQHpSzuGyUdcSamEmgp
         2 A       2
         2 B       8,86894653690723101369920288651691923762
         2 C       654498,207697434379585034477725531830969
         2 D       4439566
         2 E       11/07/1916 00:47:41
         2 F       02/06/2075 14:27:02
         2 G       27-MAY-1969 09.34.54.282047076 PM
         2 H       WQUZECAWIH
...
 

Parsing positional text file test_pos_8k.dat:

select t.cellRow
     , t.cellCol
     , t.cellData.accessVarchar2() as cellData
from table(
       xutl_flatfile.get_fields_positional(
         p_content  => ExcelTable.getTextFile('XL_DATA_DIR','test_pos_8k.dat')
       , p_cols     => '1:5,6:35,36:163,164:186,187:203,204:303,304:314'
       )
     ) t
;

   CELLROW CELLCOL CELLDATA
---------- ------- --------------------------------------------------------------------------------
         1 A       20
         1 B       SYS
         1 C       ICOL$
         1 D       TABLE
         1 E       20140911084051000
         1 F       60BQ37Z6QU1FMRFETJRTXVG97MX58MTFSZLVSDDVTN5R217OU3KXRVZDM8WCUTLITMAE4SAM7XK0WNX1
         1 G       821943.0591
         2 A       4
         2 B       SYS
         2 C       TAB$
         2 D       TABLE
         2 E       20140911084051000
         2 F       HA104XOIXFCPZDCMEX13J0AQO9GEQ6Q483QM1P0R9RXGLYSDCOCZ6GK6KHEV4EN6J3C02CDOPZJ4Z341
         2 G       307403.4572
         3 A       47
         3 B       SYS
         3 C       I_USER2
...
 

 

ExcelTable New Features

getRows and getCursor functions, and loadData procedure have been overloaded to accept a new set of parameters dedicated to flat files.
See the README for more information and detailed syntax.
 

Examples

Reading delimited text file test_8k.csv:

select * 
from table(
       ExcelTable.getRows(
         p_file => ExcelTable.getTextFile('XL_DATA_DIR','test_8k.csv')
       , p_cols => q'{
                     "C1"  number
                   , "C2"  varchar2(50)
                   , "C3"  varchar2(50)
                   , "C4"  number
                   , "C5"  date format 'DD/MM/YYYY HH24:MI:SS'
                   , "C6"  date format 'DD/MM/YYYY HH24:MI:SS'
                   , "C7"  timestamp(6) format 'DD-MON-YYYY HH.MI.SS.FF9 AM'
                   , "C8"  varchar2(10)
                   , "C9"  varchar2(10)
                   , "C10" varchar2(1000) 
                   }'
       , p_skip => 0
       , p_line_term => chr(13)||chr(10)
       , p_field_sep => ';'
       , p_text_qual => '"'
       )
     )
;

   C1 C2                                         C3                                                 C4 C5          C6          C7                             C8         C9         C10
----- ------------------------------------------ ------------------------------------------ ---------- ----------- ----------- ------------------------------ ---------- ---------- --------------------------------------------------------------------------------
    1 5;35545956033232307223893610587965339454   258417,601531795111780807441530803644449    784576847 11/04/1918  25/08/1952  15-FEB-18 04.09.21.858660 AM   IPTVKFUEUG DZCB50VK0J jizDfdCrVgsCnROenwHxHWIbdvHwbZpYlJZwHojqqgLdfidFMOyBzJDNdGzGCrQHpSzuGyUdcSamEmgp
    2 8,86894653690723101369920288651691923762   654498,207697434379585034477725531830969      4439566 11/07/1916  02/06/2075  27-MAY-69 09.34.54.282047 PM   WQUZECAWIH UTWJHFYE6P SZWQGZbKgnRkedvhhiyYztpfLJrjRDSScDezFPJqsSdvroYcKXvxNzPdjOzXctcJrvkktEXaBbTrhOHW
    3 5,84188355121591767332111870186821300058   668829,631029308923012105389089920817679    605727050 15/09/1907  02/02/1908  29-OCT-15 06.42.08.055719 PM   NPKDWHVQSP MXDSRJNYH7 omeMbKQMJAgfXJmAmZoWAgFuGmtsBAsUnfiCplTDOWNInSxFNcyjddEvYYEdLJgxgOhMuGEAvBfozCzU
    4 8,60853564798215481725713327266708686167   115666,597815216858940137456353481776261    178562788 11/02/2047  28/09/2053  25-DEC-07 07.13.04.905200 PM   MUASFENKNI BN5SEI0QS1 MaNsILurCDFDfnkbhlxtGpvdSISVbgHVCqMgzvdOMgYGEBJpbSUHakffjfNksCNFtSjfiNpuRVAKqFSt
    5 1,60766805613683079256098275638130882313   218424,091310228450292777295111384805049    472266773 16/04/2061  06/08/1915  22-MAR-67 02.29.28.618655 AM   SDMLKNPUWT MXQA8SCZ39 AnNapnEVlZINyLgNHHibJvQMPQBGWIZYRQFZtdsGAoCryMJtXuQcuVaPDOgLvduCpOnspYTHvlSIHaGm
    6 9,22424704062639462704579643308937530611   700974,680244563872645264003610311558495    571693999 13/01/2144  30/05/2097  05-MAR-82 03.54.40.107135 AM   ZJABDOZCFC K74PL6FWM6 ytDqgqLBhrTHdXZzNSttTzDVodWywbtbopGfKLEpUTMndDKTCZDpujemXaocSFdVgZmnjdpimGhwCNor
    7 6,73713696328170952808602834939463998285   553097,23440123836530871377392488143884     440988038 17/01/1910  06/04/2131  23-MAY-21 11.50.49.978164 PM   FPMVSTSIAT IYLZ3AUMDT xMYfzKSUwgMhrQZvMxxSsjXxhqOrLBTUOQCKprKhTwysEzsqxBhjProeCnXKLloKxOshIkcfuvLwjPoe
    8 9,65588791809711692944398934037846807668   82381,3636028640383755886480837170493393    567769585 29/06/1931  18/07/2126  21-MAY-03 05.18.17.722478 PM   JEOCGQJDHR Q0ZJVOSGKS QqSRoEIxcYzIAlaeGEhQzkmVBpGuFbuuwJlHMmDpERPQVMeGhazoHPLSfZMufCrZilPMYdTLWwgUxQbi
    9 2,7099727051634262143722455538781371474    844368,653648904380414757019490787563176    700338192 12/06/2166  01/07/1939  04-MAR-80 06.04.19.110021 PM   OSNNTMPRWO OL8OIJWRMV XyXTsatoJYYAeMssFbBkziFrLClHZpkHxfiVKUPxxDiXpcXRKcMtWmkiBIaSQevHDHVogHpCekIoKdIV
   10 9,55762825009060539054244757121586587235   420789,697838230690543667994806550268362     98651573 24/01/2131  11/07/2139  22-JAN-80 05.53.11.499863 AM   PXIKQNSODL 60LHWRFM3J SHquytrgukYXWPPEVqUcTPPpMhQJTDxYykcdnJCbLKAdlYEBPQPpwVRfoHDSwpEtUAAJZRghAUFFnUqc
   11 5,05354760238646271668458059271835115284   524715,41244040510486425207325017019653     705035282 21/05/2036  26/08/1952  15-MAY-53 10.51.30.874214 PM   YMIQZPGNUE N12QQMDCG0 GvfZGsueEZbPagkvTYTMmjqxIEATHvcqVpCDRMKDDnqPpSyXKGsFCOyOzgsOnWVZPdSstvzXMKbbCoUS
   12 1,42333244852136451707584751493946782348   70660,3289396937753046996822103369957927    397217524 21/07/1975  10/09/2168  17-JUL-93 06.40.50.814480 PM   QAQPBYFGYY T08W89W17J ParbCWOCByMrLYqJbTkugMcgvNnlyNfkoqMjYviEwWEIWhJnOxJbOnHLofNAktnqpYajUYNXppDDtmTK
   13 1,47681640019451140147143414878031794259   254709,209345463208737321141471792605361    909099683 31/12/2027  25/02/2026  23-NOV-37 01.13.31.630587 PM   KAEWHDYZHC LLB88H4B0Q HIuqPnbvsYkUxWayvjburNdKjeXVEDzChSxkpjpGehTLbFdMvnTyddwrnRMTqbyZqhEmFQFaQvGSSGWe
...
 

Reading positional text file test_pos_8k.dat:

select * 
from table(
       exceltable.getRows(
         p_file => exceltable.getTextFile('XL_DATA_DIR','test_pos_8k.dat')
       , p_cols => q'{
                     "OBJECT_ID"    number(5)     position(1:5)
                   , "OWNER"        varchar2(30)  position(6:35)
                   , "OBJECT_NAME"  varchar2(128) position(36:163)
                   , "OBJECT_TYPE"  varchar2(23)  position(164:186)
                   , "CREATED"      timestamp(3) format 'YYYYMMDDHH24MISSFF3' position(187:203)
                   , "STR1"         varchar2(100) position(204:303)
                   , "NUM1"         number        position(304:314)
                   }'
       , p_skip => 0
       , p_line_term => chr(13)||chr(10)
       )
     )
;

OBJECT_ID OWNER  OBJECT_NAME   OBJECT_TYPE  CREATED                     STR1                                                                                   NUM1
--------- ------ ------------- ------------ --------------------------- -------------------------------------------------------------------------------- ----------
       20 SYS    ICOL$         TABLE        11-SEP-14 08.40.51.000 AM   60BQ37Z6QU1FMRFETJRTXVG97MX58MTFSZLVSDDVTN5R217OU3KXRVZDM8WCUTLITMAE4SAM7XK0WNX1 821943.059
        4 SYS    TAB$          TABLE        11-SEP-14 08.40.51.000 AM   HA104XOIXFCPZDCMEX13J0AQO9GEQ6Q483QM1P0R9RXGLYSDCOCZ6GK6KHEV4EN6J3C02CDOPZJ4Z341 307403.457
       47 SYS    I_USER2       INDEX        11-SEP-14 08.40.51.000 AM   611NW45KO7HASSSPYZ48T650BGJ81RA962LOI27ZIAVY9YTLXW33OTBIHLWXWGTVA2ZBU0OGA0735YM9 953217.755
       58 SYS    I_CCOL2       INDEX        11-SEP-14 08.40.51.000 AM   I28WA7TSXSU89NH1MJYKMPD1JAOVG73GOH9XJJDJKIXQAVKNDRYHZIZ1CWSRFW8M946S7UK61VBWLYPX 723613.700
       36 SYS    I_OBJ1        INDEX        11-SEP-14 08.40.51.000 AM   5TFN9EKGPYEI5750RON6WRNGQ0PZVDB4JCPWR07SHGW7W3GDTIU0B56WU4N5RVL8B4EW7CS84KESV4TK 449681.837
        5 SYS    CLU$          TABLE        11-SEP-14 08.40.51.000 AM   9LY28S69ZGDONBADMLSHGF24T1PWIWBEFILSSCSHHQ51DU8EML2SJ6L3NOZROHNFYYGGO3U3CYJU9BJG 929358.648
       10 SYS    C_USER#       CLUSTER      11-SEP-14 08.40.51.000 AM   3IZKLMTI4J9GNTRE4H5JZYIJRJ1K1W7F2F7EAVK11CM8C98KEK8LGLGV115NI0X7ZHUYPBF6TSRB7R45 591723.467
       23 SYS    PROXY_DATA$   TABLE        11-SEP-14 08.40.51.000 AM   OR8C0XOCL15BB4ZO8AGUWRLUO6N9XDTFVDHA46T9C9SO6SLQJGNEXNFCXVMXBHSGFNMGATXKK8I2K7K7 622158.037
       55 SYS    I_CDEF3       INDEX        11-SEP-14 08.40.51.000 AM   JB3P6GHU0KK6ASCQK9ORUVYGNCGNNJ6QWBCH4CW8CEFF0SD7ZFN90F27NKCT0OLWVXNF49ZXN46VNCNN 30522.1929
       38 SYS    I_OBJ3        INDEX        11-SEP-14 08.40.51.000 AM   C4NKT5NRQUFPAHZXJGRMYL5R45Z711OOSACY2BQZRZLZS5X1MYY2GBSH9MRV6CGI4RODTQWBGCMRQ3SR 871783.890
       19 SYS    IND$          TABLE        11-SEP-14 08.40.51.000 AM   TC3BLU6EK8DK2CITMK1WHMGXD88MZDLW5UFJ4NQ1EGH66RGT2YNRZ2S7LBXL7SBQ3T4609U11J2ODXRF 22919.6165
       59 SYS    BOOTSTRAP$    TABLE        11-SEP-14 08.40.51.000 AM   BV3IQFC0KNR8BKO6FH968JGANJB6FUE9MJPMTBHYRIEOO1MG9RGVWMBYFEO59RA76GV4V556ZOVXK2RW 189652.915
       50 SYS    I_COL3        INDEX        11-SEP-14 08.40.51.000 AM   SN9GCSGP3VQ6PHQHJ95UN7YXB4YYPHRVF5R4TS5U3S439TKAE3TX0J43S8E1A7KJ0IAJ9VBM19XW0J8Q 694147.645
       21 SYS    COL$          TABLE        11-SEP-14 08.40.51.000 AM   WXGDQU12CT00ASB3JQEOGM9LORCPJFPF56LJQR37S7EHYE393IZW8RALPT2AARII0LJ1JEBBXS5F1CUS 809026.934
       53 SYS    I_CDEF1       INDEX        11-SEP-14 08.40.51.000 AM   EK1B4BA45D05O2HRJD08D22Z14QQOXUQFTLGX2B6EV5BWIA44T0BLKTFOR1T84UL7LML5TXY95KALD45 342997.197
       22 SYS    USER$         TABLE        11-SEP-14 08.40.51.000 AM   83DZKL6YYLMJ32QS0PT9WRCZPKQXGBXL8JWCNFGLQQ4QCW06HFQNOECYN4W1X3J64JVRI7ERGY09OC56 123822.035
...
 

Reading an inline delimited string:

select * 
from table(
       ExcelTable.getRows(
         p_file => '1;val1;2019-01-01|2;val2;2019-01-02|3;val3;2019-01-03|4;val4;2019-01-04|5;val5;2019-01-05'
       , p_cols => q'{"C1" number, "C2" varchar2(10), "C3" date format 'YYYY-MM-DD'}'
       , p_skip => 0
       , p_line_term => '|'
       , p_field_sep => ';'
       )
     )
;

        C1 C2         C3
---------- ---------- -----------
         1 val1       01/01/2019
         2 val2       02/01/2019
         3 val3       03/01/2019
         4 val4       04/01/2019
         5 val5       05/01/2019
 

 

Download

/mbleron/ExcelTable
 

PL/SQL CSV Parsing : To Buffer or Not Buffer

May 20, 2019 Leave a comment

I’m currently working on adding support for delimited and positional flat files to ExcelTable.

The file, residing as a CLOB inside the database, will be parsed using PL/SQL code.
In its simplest, the base algorithm consists in locating field separators (and line terminators) in the input text and extracting data in between.
DBMS_LOB.INSTR API may be used iteratively to scan the entire CLOB.
However it is generally faster to chunk-read the file into a VARCHAR2 buffer and process it from there using string functions, but to what extent?

Below are the results of parsing a set of files of different sizes, using different buffer sizes, and compared to a straightforward LOB scan using DBMS_LOB.INSTR.
This is the test procedure I used (native compilation + optimizer level 3) :

create or replace procedure test_csv_base (
  p_filename in varchar2
, p_buffer_sz in pls_integer
) 
is

  field_separator  varchar2(1) := ',';
  fs_sz            pls_integer := length(field_separator);

  type buffer_t is record (
    content  varchar2(32767)
  , sz       pls_integer
  , offset   pls_integer
  );

  type stream_t is record (
    content  clob
  , sz       integer
  , offset   integer
  );
  
  stream  stream_t;
  buf     buffer_t;
  
  amount     pls_integer;
  fs_offset  pls_integer;
  
  start_time pls_integer;
  end_time   pls_integer;
  
  procedure bufferize is
  begin
    amount := p_buffer_sz;
    dbms_lob.read(stream.content, amount, stream.offset, buf.content);
    stream.offset := stream.offset + amount;
    buf.sz := amount;
    buf.offset := 1;
  end;
  
  procedure p (message in varchar2) is
  begin
    dbms_output.put_line(message);
  end;
  
  procedure loadfile is
    f           bfile := bfilename('XL_DATA_DIR',p_filename);
    dest_offset integer := 1;
    src_offset  integer := 1;
    lang_ctx    integer := dbms_lob.default_lang_ctx;
    warn        integer;
  begin
    p('Loading file...');
    dbms_lob.createtemporary(stream.content, true);
    dbms_lob.fileopen(f, dbms_lob.lob_readonly);
    dbms_lob.loadclobfromfile(
      dest_lob     => stream.content
    , src_bfile    => f
    , amount       => dbms_lob.getlength(f)
    , dest_offset  => dest_offset
    , src_offset   => src_offset
    , bfile_csid   => 873
    , lang_context => lang_ctx
    , warning      => warn
    );
    dbms_lob.fileclose(f);
    stream.sz := dbms_lob.getlength(stream.content);
    stream.offset := 1;
  end;
  
begin
  
  loadfile;
  start_time := dbms_utility.get_time;
  bufferize;
  
  loop 
    
    fs_offset := instr(buf.content, field_separator, buf.offset);       
    
    if fs_offset != 0 then
      buf.offset := fs_offset + fs_sz;
    elsif stream.offset <= stream.sz then
      bufferize;
    else
      exit;
    end if;
  
  end loop;
  
  end_time := dbms_utility.get_time;
  p(to_char(p_buffer_sz)||':'||to_char((end_time - start_time)/100));
  
  dbms_lob.freetemporary(stream.content);

end;

First plot below shows parse time (in second) against buffer size (in bytes).
File size (2.5M, 1.25M etc.) is expressed in number of fields, the average size of a field being 100 bytes :

(click to enlarge)


That tells us two things :

  • There is an optimal buffer size
  • There is a (linear) asymptotic behaviour for large buffer sizes

Second plot focuses on the optimum region :

(click to enlarge)

which is actually best viewed using a base-2 logarithmic abscissa :

(Click to enlarge)

Using a polynomial regression fit on the 2.5M series, I determined (on my system) an optimal buffer size of approx. 850 bytes.
That size provides the best trade-off between the number of LOB read operations necessary to buffer the entire file content and the number of string manipulations necessary to parse the buffer.
We will now see below how that compares to a direct LOB scan using DBMS_LOB API.

I have used a similar procedure with the following loop to test the direct LOB scan approach :

loop

  fs_offset := dbms_lob.instr(stream.content, field_separator, stream.offset);
  exit when fs_offset = 0;
  stream.offset := fs_offset + fs_sz;

end loop;

Here are the results added to previous data gathered for the buffering approach, and this time presented as a stacked histogram.
The red horizontal line materializes the cumulated time for the LOB scan method :

(Click to enlarge)


Direct LOB scan performs almost the same as the 8k buffer method, probably because of the LOB chunk size, which happens to be 8132 for temporary LOBs on my system.
It is also better than using large buffer sizes (> 16k), but choosing an optimal buffer size is definitely the way to go in this situation.

 

Categories: PL/SQL Tags: , ,

Oracle 18c : PL/SQL support for pipelined table functions returning ANYDATASET

May 14, 2019 2 comments

I don’t know – or remember – if this was ever qualified as a bug or an unimplemented feature, but starting with Oracle 18c, we are now able to make static references to pipelined table functions returning an ANYDATASET instance in PL/SQL.
This enhancement comes together with the release of Polymorphic Table Functions (PTF) in 18c too, which is probably not a coincidence since ODCI and PTF share common concepts.

Before that, an attempt to embed a SQL query using such a function (e.g. ExcelTable.getRows) in PL/SQL code resulted in the following compile-time error :

ORA-22905: cannot access rows from a non-nested table item

The only workaround back then was dynamic SQL.
For instance, in ExcelTable, function getCursor was designed exactly for that purpose.

Now we can directly do this :

declare

  cursor c (p_file in bfile) is
  select * 
  from ExcelTable.getRows(
         p_file => to_blob(p_file)
       , p_sheet => '.*'
       , p_cols => q'{
           "C1"         number
         , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)
         }'
       );

begin
  
  ExcelTable.useSheetPattern(true);

  for r in c (bfilename('XL_DATA_DIR','multisheet.xlsx'))
  loop
    dbms_output.put_line(utl_lms.format_message('SHEET_NAME=%s C1=%s', r.sheet_name, to_char(r.c1)));
  end loop;
  
end;
/

SHEET_NAME=Sheet2 C1=1
SHEET_NAME=Sheet2 C1=2
SHEET_NAME=Sheet2 C1=3
SHEET_NAME=Sheet1 C1=4
SHEET_NAME=Sheet1 C1=5
SHEET_NAME=Sheet1 C1=6
SHEET_NAME=Sheet3 C1=7
SHEET_NAME=Sheet3 C1=8
SHEET_NAME=Sheet3 C1=9
  

Behind the scenes, that means Oracle is now calling ODCITableDescribe at compile-time so the projection of the SQL query is known and readily usable by the rest of the code.
A system-generated, not-persistable object type and its collection is created in the process and associated with the PL/SQL code unit.

On a side note, we can also see that the TABLE operator has become optional for pipelined table functions.

As far as ExcelTable is concerned, this new “feature” only works when passing a literal value for p_sheet (or p_sheets) parameter, as in the above example.
For instance, if p_sheet is a bind variable, we get this error :

PLS-00307: too many declarations of 'ODCITABLEDESCRIBE' match this call

That’s actually an expected behaviour because all non-literal values are converted to NULL when calling ODCITableDescribe routine.
We can see that in the following excerpt of a SQL trace. Since v3.0 and multi-sheet support, this call matches two possible overloads of ODCITableDescribe, and obviously Oracle cannot tell which one we intend to use :

PARSING IN CURSOR #140364229256600 len=234 dep=1 uid=104 oct=47 lid=104 tim=202406983730 hv=450544808 ad='7d4a5b30' sqlid='gjq1rgsddpj58'
 declare 
     rc sys_refcursor; 
   begin 
     :1 := "DEV"."EXCELTABLEIMPL"."ODCITABLEDESCRIBE"(:2 ,NULL,NULL,'
           "C1"         number
         , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)
         '); 
   end;
END OF STMT

 

Categories: PL/SQL, SQL Tags: , , ,

ExcelTable 3.1 : Default Value Feature in DML API

May 1, 2019 2 comments

Here’s the latest addition to ExcelTable (v3.1).
We are now able to specify default values when mapping columns using the DML API.
This new feature has been implemented based on Jordan Cortes’ request last October.

ExcelTable will apply the default value either in the usual way (i.e. when the input value is NULL), or as a constant when omitting the spreadsheet column reference.
 

Implementation

The mapColumn procedure has been extended with a p_default parameter of data type “ANYDATA”, allowing the user to bind a strongly-typed value to the target column.

procedure mapColumn (
  p_ctx       in DMLContext
, p_col_name  in varchar2
, p_col_ref   in varchar2     default null
, p_format    in varchar2     default null
, p_meta      in pls_integer  default null
, p_key       in boolean      default false
, p_default   in anydata      default null
);

 

For convenience, a new mapColumnWithDefault procedure has been added as well, to directly deal with the three common data types VARCHAR2, NUMBER and DATE :

procedure mapColumnWithDefault (
  p_ctx      in DMLContext
, p_col_name in varchar2
, p_col_ref  in varchar2 default null
, p_format   in varchar2 default null
, p_meta     in pls_integer default null
, p_key      in boolean default false
, p_default  in varchar2
);

procedure mapColumnWithDefault (
  p_ctx      in DMLContext
, p_col_name in varchar2
, p_col_ref  in varchar2 default null
, p_format   in varchar2 default null
, p_meta     in pls_integer default null
, p_key      in boolean default false
, p_default  in number
);

procedure mapColumnWithDefault (
  p_ctx      in DMLContext
, p_col_name in varchar2
, p_col_ref  in varchar2 default null
, p_format   in varchar2 default null
, p_meta     in pls_integer default null
, p_key      in boolean default false
, p_default  in date
);

 

Examples

create table tmp_sample2 (
  id       number       primary key
, name     varchar2(10)
, val      varchar2(30)
, load_dt  date
);

Using mapColumn with an ANYDATA value :

declare

  ctx    ExcelTable.DMLContext;
  nrows  integer;
  
begin
  
  ctx := ExcelTable.createDMLContext('TMP_SAMPLE2');
  
  ExcelTable.mapColumn(ctx, p_col_name => 'ID',   p_col_ref => 'A');
  ExcelTable.mapColumn(ctx, p_col_name => 'NAME', p_col_ref => 'B');
  ExcelTable.mapColumn(ctx, p_col_name => 'VAL',  p_col_ref => 'C');
  ExcelTable.mapColumn(ctx, p_col_name => 'LOAD_DT', p_default => anydata.ConvertDate(sysdate));
  
  nrows := 
  ExcelTable.loadData(
    p_ctx      => ctx
  , p_file     => ExcelTable.getFile('XL_DATA_DIR','sample_2.xlsx')
  , p_sheet    => 'DataSource'
  , p_method   => ExcelTable.STREAM_READ
  , p_dml_type => ExcelTable.DML_INSERT
  );
  
  dbms_output.put_line(nrows || ' rows inserted.');
  
end;
/

 

Using mapColumnWithDefault :

declare

  ctx    ExcelTable.DMLContext;
  nrows  integer;
  
begin
  
  ctx := ExcelTable.createDMLContext('TMP_SAMPLE2');
  
  ExcelTable.mapColumn(ctx, p_col_name => 'ID',   p_col_ref => 'A');
  ExcelTable.mapColumn(ctx, p_col_name => 'NAME', p_col_ref => 'B');
  ExcelTable.mapColumn(ctx, p_col_name => 'VAL',  p_col_ref => 'C');
  ExcelTable.mapColumnWithDefault(ctx, p_col_name => 'LOAD_DT', p_default => sysdate);
  
  nrows := 
  ExcelTable.loadData(
    p_ctx      => ctx
  , p_file     => ExcelTable.getFile('XL_DATA_DIR','sample_2.xlsx')
  , p_sheet    => 'DataSource'
  , p_method   => ExcelTable.STREAM_READ
  , p_dml_type => ExcelTable.DML_INSERT
  );
  
  dbms_output.put_line(nrows || ' rows inserted.');
  
end;
/

 

Download

/mbleron/ExcelTable
 

Categories: ExcelTable Tags:

ExcelTable 3.0 : Multi-sheet support

April 9, 2019 Leave a comment

ExcelTable (v3.0) finally supports multi-sheet queries.
I’ve implemented this new feature following Mike Kutz’s suggestion last year.

Function getRows, as well as related routines getCursor and loadData are now overloaded to accept both a list of sheet names or a sheet name pattern (regular expression).

In order to know which data come from which sheet, the FOR METADATA clause has been extended with :

  • SHEET_INDEX : 1-based index of the sheet in the workbook
  • SHEET_NAME : sheet name (obviously)

For backward compatibility, the p_sheet parameter is not interpreted as a regex pattern by default but this may be enabled session-wise via useSheetPattern procedure, or made it the default by modifying the initial value of sheet_pattern_enabled variable in ExcelTable package body.
 

Examples

Using a sheet list :
select x.* 
from table(
       ExcelTable.getRows(
         ExcelTable.getFile('XL_DATA_DIR','multisheet.xlsx')
       , ExcelTableSheetList('Sheet2','Sheet3')
       , q'{
            "C1"         number column 'A'
          , "SHEET_IDX"  number for metadata (sheet_index)
          , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)
          , "comment"    varchar2(4000) column 'A' for metadata (comment)
          , "R_NUM"      for ordinality
          }'
       )
     ) x
;

  C1  SHEET_IDX SHEET_NAME    comment                    R_NUM
---- ---------- ------------- ------------------------ -------
   1          1 Sheet2        Comment on first sheet         1
   2          1 Sheet2                                       2
   3          1 Sheet2                                       3
   7          3 Sheet3                                       4
   8          3 Sheet3                                       5
   9          3 Sheet3        bleronm:                       6
                              Comment on last sheet    
 

 

Using a sheet name pattern :
select x.* 
from table(
       ExcelTable.getRows(
         ExcelTable.getFile('XL_DATA_DIR','multisheet.xlsx')
       , '^Sheet[12]'
       , ' "C1" number
         , "SHEET_IDX"  number            for metadata (sheet_index)
         , "SHEET_NAME" varchar2(31 char) for metadata (sheet_name)'
       )
     ) x
;

  C1  SHEET_IDX SHEET_NAME
---- ---------- ------------
   1          1 Sheet2
   2          1 Sheet2
   3          1 Sheet2
   4          2 Sheet1
   5          2 Sheet1
   6          2 Sheet1
 

 

Download

/mbleron/ExcelTable
 

Categories: ExcelTable Tags:

JSON Patch, Merge Patch, and Redact features in Oracle 18c

August 27, 2018 Leave a comment

These days, I am looking into the latest additions to SODA in Oracle Database 18c, in particular the REST implementation available via Oracle REST Data Services (ORDS) : SODA for REST.

I wrote once about SODA in the past (see here), to give a preview of JSON filter expressions before their official documentation in release 12.2.

Before looking more closely at the features mentioned in the title, I’ll set up a little SODA collection.

A collection of documents created and managed via SODA is backed up by a database table containing at least an “id” column and a “content” column to store JSON documents.
Though Oracle 18c introduced SODA for PL/SQL as well, I’ll use the HTTP protocol to set up the following example.
 

Creating a new SODA collection

Without custom metadata, documents of a SODA collection are stored in a BLOB column by default.
For subsequent ease-of-use, I’ll pass the following metadata to create the table with the minimum structure : a numeric “ID” column based on a sequence, and a “DOC” column of CLOB data type.

collMetadata.json

{
  "tableName": "JSON_DOCUMENTS",
  "keyColumn" : {
    "name": "ID",
    "sqlType": "NUMBER",
    "assignmentMethod": "SEQUENCE",
    "sequenceName": "JSON_DOC_SEQ"
  },
  "contentColumn": {
    "name": "DOC",
    "sqlType": "CLOB",
    "cache": true
  }
}
Creating the sequence :
create sequence json_doc_seq;

 

Creating the collection, using a curl command :
curl -i -X PUT --data-binary @collMetadata.json -H "Content-Type: application/json" 
http://localhost:8083/ords/dev/soda/latest/MyCollection
SQL> desc json_documents

 Name    Null?    Type
 ------- -------- -------------
 ID      NOT NULL NUMBER
 DOC              CLOB

 

Inserting documents :

I’ll load the ready-to-use sample file ‘POList.json’ available in the ./examples/soda/getting-started directory of ORDS installation.
This file holds a JSON array containing 70 PO documents.

curl -X POST --data-binary @POList.json -H "Content-Type: application/json" 
http://localhost:8083/ords/dev/soda/latest/custom-actions/insert/MyCollection

Here’s the content of the table after successful execution :

SQL> select * from json_documents order by id;

        ID DOC
---------- ----------------------------------------------------------------------------
         1 {"PONumber":1,"Reference":"MSULLIVA-20141102","Requestor":"Martha Sullivan",
         2 {"PONumber":2,"Reference":"MSULLIVA(-20141113","Requestor":"Martha Sullivan"
         3 {"PONumber":3,"Reference":"TRAJS-20140518","Requestor":"Trenna Rajs","User":
         4 {"PONumber":4,"Reference":"TRAJS-20140520","Requestor":"Trenna Rajs","User":
         5 {"PONumber":5,"Reference":"MSULLIVA-20141121","Requestor":"Martha Sullivan",
         6 {"PONumber":6,"Reference":"TRAJS-20140530","Requestor":"Trenna Rajs","User":
         7 {"PONumber":7,"Reference":"VJONES-20140503","Requestor":"Vance Jones","User"
...
        66 {"PONumber":66,"Reference":"SMCCAIN-20141007","Requestor":"Samuel McCain","U
        67 {"PONumber":67,"Reference":"SMCCAIN-20141010","Requestor":"Samuel McCain","U
        68 {"PONumber":68,"Reference":"SHIGGINS-20141028","Requestor":"Shelley Higgins"
        69 {"PONumber":69,"Reference":"KPARTNER-20140905","Requestor":"Karen Partners",
        70 {"PONumber":70,"Reference":"SSTILES-20141011","Requestor":"Stephen Stiles","

70 rows selected.

 
The setup is complete.
Let’s now try out the new PATCH operation.
 

JSON PATCH operation

PATCH is a standard request method of the HTTP protocol, whose purpose is to make piecewise changes to a resource (a JSON document in this case).
The SODA HTTP PATCH operation implements the JSON Patch standard as per RFC 6902.
A JSON Patch specification is itself a JSON document representing an array of Patch steps. For example :

poPatchSpec.json

[
  { "op"    : "test",
    "path"  : "/ShippingInstructions/Address/street",
    "value" : "200 Sporting Green" },
  { "op"    : "replace",
    "path"  : "/ShippingInstructions/Address/street",
    "value" : "Winchester House, Heatley Rd" },
  { "op"    : "copy",
    "from"  : "/ShippingInstructions/Phone/0",
    "path"  : "/ShippingInstructions/Phone/1" },
  { "op"    : "replace",
    "path"  : "/ShippingInstructions/Phone/1/number",
    "value" : "861-555-8765" }
]

– First step is a “test” operation, which acts as a filter predicate.
It simply checks whether the value at location '/ShippingInstructions/Address/street' is '200 Sporting Green'. If not, the processing stops, else continues with the next steps.

– Second step is a “replace” operation.
It replaces the value at '/ShippingInstructions/Address/street' with value 'Winchester House, Heatley Rd'.

– Third step is a “copy”” operation.
It copies the value at location '/ShippingInstructions/Phone/0' (an array location) to the target location '/ShippingInstructions/Phone/1'.

– Fourth step is another “replace” operation, this time targetting the ‘number’ field of the new Phone item added on the previous step.

Path expressions used in JSON Patch are defined in RFC 6901 (JSON Pointer).

Here’s the relevant data for doc ID = 1 before applying the patch spec :

SQL> select po.*
  2  from json_documents t
  3     , json_table(t.doc, '$.ShippingInstructions'
  4         columns street varchar2(30)  path '$.Address.street'
  5               , phone  varchar2(256) format json with array wrapper path '$.Phone[*].number'
  6       ) po
  7  where t.id = 1;

STREET                         PHONE
------------------------------ --------------------
200 Sporting Green             ["979-555-6598"]
 

Applying patch :

curl -i -X PATCH --data-binary @poPatchSpec.json -H "Content-Type: application/json-patch+json" 
http://localhost:8083/ords/dev/soda/latest/MyCollection/1

Data after patch operation :

SQL> select po.*
  2  from json_documents t
  3     , json_table(t.doc, '$.ShippingInstructions'
  4         columns street varchar2(30)  path '$.Address.street'
  5               , phone  varchar2(256) format json with array wrapper path '$.Phone[*].number'
  6       ) po
  7  where t.id = 1;

STREET                         PHONE
------------------------------ -----------------------------------
Winchester House, Heatley Rd   ["979-555-6598","861-555-8765"]
 

 
So what happened behind the scenes?
We can check that using SQL_Trace, or by just querying V$SQL (before cursors age out of the shared pool).
Here are my findings :

select DBMS_SODA_DOM.JSON_PATCH_C("DOC",:1 ),to_char("ID") from "DEV"."JSON_DOCUMENTS" where ("ID" = to_number(:2 ) )

update "DEV"."JSON_DOCUMENTS" set "DOC" = :1  where ("ID" = to_number(:2 ) )

First statement selects the document and apply the patch using DBMS_SODA_DOM.JSON_PATCH_C function.
Second statement saves back the document in the collection table.

DBMS_SODA_DOM is not documented. It contains various functions supporting – among other interesting things – the PATCH method against different data types (VARCHAR2, NVARCHAR2, RAW, CLOB, NCLOB, BLOB).

Pushing the analysis a bit further, we can discover that those PATCH routines are actually based on PL/SQL JSON_ELEMENT_T.patch() method :

MEMBER PROCEDURE patch(self IN OUT NOCOPY JSON_ELEMENT_T, spec VARCHAR2)

 
Let’s now experiment with this method directly from PL/SQL.
 

Disclaimer :
The patch method is not yet documented, so “unsupported” for direct usage.
All the following is then presented for informational purpose.

 
Read more…

Categories: JSON, PL/SQL, SQL Tags: , , , ,