Archive

Archive for the ‘HowTo’ Category

How To : Uncompress Raw DEFLATE Data without CRC32

May 27, 2018 Leave a comment

Here’s a short post about uncompressing data compressed using DEFLATE.

As of Oracle 10g, one can use UTL_COMPRESS utility to compress/uncompress data with the Lempel-Ziv algorithm (which DEFLATE is based on).
However, UTL_COMPRESS only handles input/output in the GZIP format :

The output of the UTL_COMPRESS compressed data is compatible with gzip(with -n option)/gunzip on a single file.

For a single file, the GZIP format is only composed of a header, followed by blocks of compressed data and a trailer.
In its simplest form, the header is a sequence of 10 bytes :

Field Size Description
ID1-ID2 2 GZIP signature (0x1F8B)
CM 1 compression method (0x08 = DEFLATE)
FLG 1 bit vector for extra fields
MTIME 4 modification time (UNIX format)
XFL 1 extra flags
OS 1 source Operating System (0xFF = Unknown)

The trailer consists in 8 bytes :

CRC32 4 CRC-32 checksum of the uncompressed data
ISIZE 4 Size of the uncompressed data

So, if we only have a raw DEFLATE input, we must first wrap it in a GZIP container.
While it’s easy to create the header, for instance using following constant : 0x1F8B08000000000000FF, the CRC-32 and uncompressed size is generally unknown, unless shipped together with the compressed data (as in the ZIP file format).

To illustrate the problem and the solution, I’ll use a 200,000-byte data sample composed of a sequence of ‘X’ characters, and terminated by a ‘A’ character.
This sample data will be compressed using UTL_COMPRESS.LZ_COMPRESS procedure and the raw DEFLATE block extracted from the resulting GZIP file :

SQL> declare
  2    input    blob;
  3    output   blob;
  4    deflate  raw(1024);
  5  begin
  6
  7    dbms_lob.createtemporary(input, true);
  8    for i in 1 .. 9999 loop
  9      dbms_lob.writeappend(input, 20, utl_raw.cast_to_raw('XXXXXXXXXXXXXXXXXXXX'));
 10    end loop;
 11    dbms_lob.writeappend(input, 20, utl_raw.cast_to_raw('XXXXXXXXXXXXXXXXXXXA'));
 12    dbms_output.put_line('Input size = '||dbms_lob.getlength(input));
 13
 14    output := utl_compress.lz_compress(input);
 15
 16    deflate := dbms_lob.substr(output, amount => dbms_lob.getlength(output)-18, offset => 11);
 17    dbms_output.put_line('Compressed size = '||utl_raw.length(deflate));
 18    dbms_output.put_line('DEFLATE data = ');
 19    dbms_output.put_line(deflate);
 20
 21  end;
 22  /
Input size = 200000
Compressed size = 213
DEFLATE data =
EDC1B1000000000230B62CF22789A17F5B0B00000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000009C32

PL/SQL procedure successfully completed.

 

Now, if we try to uncompress it without the trailer, it produces an incomplete output :

SQL> declare
  2
  3    input   blob := hextoraw(
  4            'EDC1B1000000000230B62CF22789A17F5B0B00000000000000000000000000000000000000000000'||
  5            '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  6            '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  7            '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  8            '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  9            '00000000000000000000009C32');
 10    tmp_gz  blob;
 11    output  blob;
 12
 13  begin
 14
 15    tmp_gz := hextoraw('1F8B08000000000000FF');
 16    dbms_lob.copy(tmp_gz, input, dbms_lob.getlength(input), 11, 1);
 17    dbms_lob.createtemporary(output, true);
 18    utl_compress.lz_uncompress(tmp_gz, output);
 19
 20    dbms_output.put_line('Uncompressed size = '||dbms_lob.getlength(output));
 21
 22  end;
 23  /
Uncompressed size = 195168

PL/SQL procedure successfully completed.

 

The solution is to use UTL_COMPRESS piecewise operations :

SQL> declare
  2
  3    input   blob := hextoraw(
  4                    'EDC1B1000000000230B62CF22789A17F5B0B00000000000000000000000000000000000000000000'||
  5                    '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  6                    '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  7                    '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  8                    '00000000000000000000000000000000000000000000000000000000000000000000000000000000'||
  9                    '00000000000000000000009C32');
 10    tmp_gz  blob;
 11    output  blob;
 12
 13    ctx     binary_integer;
 14    buf     raw(32767);
 15
 16  begin
 17
 18    tmp_gz := hextoraw('1F8B08000000000000FF');
 19    dbms_lob.copy(tmp_gz, input, dbms_lob.getlength(input), 11, 1);
 20    dbms_lob.createtemporary(output, true);
 21
 22    -- initialize piecewise uncompress context
 23    ctx := utl_compress.lz_uncompress_open(tmp_gz);
 24    -- uncompress data in chunks of 32KiB, until NO_DATA_FOUND is raised
 25    loop
 26      begin
 27        utl_compress.lz_uncompress_extract(ctx, buf);
 28      exception
 29        when no_data_found then
 30          exit;
 31      end;
 32      dbms_lob.writeappend(output, utl_raw.length(buf), buf);
 33    end loop;
 34    -- close context
 35    utl_compress.lz_uncompress_close(ctx);
 36
 37    dbms_output.put_line('Uncompressed size = '||dbms_lob.getlength(output));
 38    dbms_output.put_line('Tail = ');
 39    dbms_output.put_line(utl_raw.cast_to_varchar2(dbms_lob.substr(output, offset => dbms_lob.getlength(output)-20)));
 40
 41    dbms_lob.freetemporary(output);
 42
 43  end;
 44  /
Uncompressed size = 200000
Tail =
XXXXXXXXXXXXXXXXXXXXA

PL/SQL procedure successfully completed.

Done!

 

Advertisements
Categories: HowTo, PL/SQL Tags: , , ,

How To : Generate Empty XML Attributes

February 17, 2018 1 comment

It is well-known that XML attributes may be generated using SQL/XML publishing function XMLAttributes().
It is also well-known that XMLAttributes() does not generate anything when passed an expression that evaluates to NULL :

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/adxdb/generation-of-XML-data-from-relational-data.html

If an attribute value expression evaluates to NULL, then no corresponding attribute is created.

So, how can we generate an empty attribute using SQL?

<root attr=""/>

Following are a compilation of “tricks” (more than official methods) to generate empty XML attributes, and empty namespace declarations in a SQL context.
I’ll divide them into two categories :

  • Document post-processing : target attributes we want to be empty are transformed after the complete document has been generated
  • In-place generation : empty attributes are directly generated in the same “flow”

 

Read more…

Categories: HowTo, SQL, XQuery Tags: , ,

How To : Nest Multiple JSON Path Expression Filters

July 9, 2017 Leave a comment

A short post about JSON Path Expressions.
I will focus on filter expressions (now officially documented as of Oracle 12.2) we can use in conjunction with JSON_EXISTS function, and more specifically how to nest multiple relative filter steps.

Here’s an example of what I mean :

{
  "_id":"53e3c6ed-9bfc-2730-e053-0100007f6afb",
  "content":{
    "name":"obj1", 
    "type":1, 
    "values":[
      {"name":"x", "v":1},
      {"name":"y", "v":2},
      {"name":"z", "v":1.5}
    ]
  }
}

and say we want to find documents (like the one above) where content.type = 1 and content.values contains an x item having v > 0.

 

Read more…

(PL/SQL) CFBF File Reader

May 21, 2017 5 comments

In this post, I’ll be presenting a PL/SQL utility to read Compound File Binary Format (CFBF).
CFBF is originally a Microsoft proprietary format, which has now been published through the Open Specifications program.

In a nutshell, a CFBF file, or Compound Document File (CDF), is a container assembled like a FAT filesystem, which can be explored as a regular directory structure.

The purpose of the post is not to discuss about the format itself so I’ll let the reader browse the specs to get more familiar with it :

CFBF is the basis for various file formats used in Windows systems.
To cite a few, .MSI (Microsoft Windows Installer) and the well-known .DOC, .XLS, .PPT etc. from the Office 97-2003 Suite.

So, why am I interested in developing a CFBF PL/SQL reader?

  1. Because it’s fun
  2. Because password-encrypted Open Office XML files (.xlsx, .docx …) are stored in a CFBF container, and I thought it would be a good enhancement to my ExcelTable interface.

Source code available on GitHub :

/mbleron/MSUtilities/CDFReader

 

Read more…

JSON Flattening Part 1 – The JSONFlatten function

April 30, 2017 Leave a comment

This article presents an approach to flatten out a JSON structure into an “Edge” table.
That’s something I’ve been exploring multiple times in the past for XML data, see for example : Yet Another XML Flattening Technique.

I’ll apply the same technique used for XMLFlattenDOM function to JSON data, leveraging the new JSON PL/SQL types introduced in the 12.2 release : JSON_ELEMENT_T, JSON_OBJECT_T, JSON_ARRAY_T.
As those object types implement a DOM-like API (based on the internal JDOM_T object), it is quite easy to adapt my previous approach to JSON.

Given the following (very) simple JSON object :

{
  "key":"ABC",
  "arr":[1,2,3,null]
}

This is the output we’re looking for :

 ID TYPE    PARENT_ID NAME     POS      VALUE
--- ------ ---------- ------- ---- -----------
  1 object          0
  2 string          1 key               ABC
  3 array           1 arr
  4 number          3            1      1
  5 number          3            2      2
  6 number          3            3      3
  7 null            3            4

 

Read more…

How To : Flatten out a Recursive XML Document

January 17, 2017 Leave a comment

I blogged a few times about XML flattening in the past :

Those posts were about converting any XML structure and data into a schema-less model called “Edge”.
What I’ll be presenting here applies to XML documents having a recursive structure,
e.g.

<emp id="1">
  <emp id="2"/>
  <emp id="3">
    <emp id="4"/>
    <emp id="5"/>
  </emp>
</emp>

which could be converted to an adjency list model :

ID  PARENT_ID
--- ---------
  1
  2         1
  3         1
  4         3
  5         3

or, denormalized into level-based columns, like this :

ID_1 ID_2 ID_3
---- ---- ----
   1    2
   1    3    4
   1    3    5

 

Read more…

How To : Access Oracle Sequence from XQuery

December 11, 2016 1 comment

In this new post, we’ll see how to access an Oracle sequence from an XQuery expression.
I already wrote about a similar topic a few years ago, in How To : Access database tables from XSLT context.
Pretty much the same concepts presented back then for XSLT also apply here with XQuery.

The solution relies on the ability to query relational or object view data via the oradb URI scheme.
By wrapping the sequence call in a function and a view, we can fetch the sequence value using fn:collection() function.

In the following developments, I’ll distinguish two cases :

  • fetching sequence values from a statically declared sequence
  • fetching values from different sequences dynamically

Preliminary setup

Let’s first create a sequence :

create sequence my_seq;

And the function.
It takes the sequence name as a parameter and use native dynamic SQL to retrieve the value :

create or replace function get_seq_value (
  p_sequence_name in varchar2
) 
return number
is
  l_result  number;
begin
  execute immediate 'SELECT '||p_sequence_name||'.NEXTVAL FROM DUAL' into l_result;
  return l_result;
end;
/

 

Read more…