Archive

Posts Tagged ‘AES encryption’

ExcelTable 2.2 : support for OpenDocument (ODF) files

July 7, 2018 Leave a comment

Here’s the latest addition to ExcelTable : support for OpenDocument spreadsheet files (ODF).

The OpenDocument standard has been widely adopted by a great number of organizations.
In France, for instance, ODF is now the recommended standard for office documents in public administrations, and some sectors have already completed their migration from Microsoft Office products (using Open Office XML Standard) to LibreOffice suite.

ODF, like OOXML (.xlsx, .xlsb, .docx etc.), is a container-based format (ZIP archive) composed of different parts, mainly XML documents, describing the document structure and data.
The comparison stops there though as ODF is much more simple to understand and manipulate from a programmatic point of view.
Here’s an overview of the container structure :

Everything we need to read the document resides in the Manifest part (META-INF/manifest.xml), which is used to determine if the document is encrypted (I’ll get to that later), and the content.xml part, which contains all document structure and data.
The mimetype part may also be read to check which type of document we are dealing with (spreadsheet, text, presentation, etc.).
For instance, a spreadsheet file (.ods) has this MIME type :

application/vnd.oasis.opendocument.spreadsheet

The apparent simplicity also comes with some disavantages :

  • The content part is globally more verbose than OOXML, with longer tag names and namespace prefixes.
  • String values are not shared across the spreadsheet.
  • Rows and cells are not indexed by their respective row numbers and column references, which makes range-based access of the content more difficult.

As an example, here’s the content part for a three-row spreadsheet table :

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2">
  <office:scripts/>
  <office:font-face-decls>
    <style:font-face style:name="Arial1" svg:font-family="Arial"/>
    <style:font-face style:name="Arial" svg:font-family="Arial" style:font-family-generic="swiss"/>
    <style:font-face style:name="Calibri" svg:font-family="Calibri" style:font-family-generic="swiss"/>
    <style:font-face style:name="Cambria" svg:font-family="Cambria" style:font-family-generic="swiss"/>
    <style:font-face style:name="Microsoft Sans Serif" svg:font-family="'Microsoft Sans Serif'" style:font-family-generic="swiss"/>
    <style:font-face style:name="Liberation Sans" svg:font-family="'Liberation Sans'" style:font-family-generic="swiss" style:font-pitch="variable"/>
    <style:font-face style:name="Microsoft YaHei" svg:font-family="'Microsoft YaHei'" style:font-family-generic="system" style:font-pitch="variable"/>
    <style:font-face style:name="Segoe UI" svg:font-family="'Segoe UI'" style:font-family-generic="system" style:font-pitch="variable"/>
    <style:font-face style:name="Tahoma" svg:font-family="Tahoma" style:font-family-generic="system" style:font-pitch="variable"/>
  </office:font-face-decls>
  <office:automatic-styles>
    <style:style style:name="co1" style:family="table-column">
      <style:table-column-properties fo:break-before="auto" style:column-width="16.55mm"/>
    </style:style>
    <style:style style:name="co2" style:family="table-column">
      <style:table-column-properties fo:break-before="auto" style:column-width="56.18mm"/>
    </style:style>
    <style:style style:name="co3" style:family="table-column">
      <style:table-column-properties fo:break-before="auto" style:column-width="21.77mm"/>
    </style:style>
    <style:style style:name="ro1" style:family="table-row">
      <style:table-row-properties style:row-height="4.52mm" fo:break-before="auto" style:use-optimal-row-height="true"/>
    </style:style>
    <style:style style:name="ro2" style:family="table-row">
      <style:table-row-properties style:row-height="8.04mm" fo:break-before="auto" style:use-optimal-row-height="false"/>
    </style:style>
    <style:style style:name="ta1" style:family="table" style:master-page-name="PageStyle_5f_DataSource">
      <style:table-properties table:display="true" style:writing-mode="lr-tb"/>
    </style:style>
    <number:text-style style:name="N100">
      <number:text-content/>
    </number:text-style>
    <style:style style:name="ce1" style:family="table-cell" style:parent-style-name="Default" style:data-style-name="N0">
      <style:table-cell-properties style:rotation-align="none"/>
      <style:text-properties fo:color="#000000" style:text-outline="false" style:text-line-through-style="none" style:text-line-through-type="none" style:font-name="Microsoft Sans Serif" fo:font-size="10pt" fo:font-style="normal" fo:text-shadow="none" style:text-underline-style="none" fo:font-weight="normal" style:font-size-asian="10pt" style:font-style-asian="normal" style:font-weight-asian="normal" style:font-name-complex="Microsoft Sans Serif" style:font-size-complex="10pt" style:font-style-complex="normal" style:font-weight-complex="normal"/>
    </style:style>
    <style:style style:name="ce2" style:family="table-cell" style:parent-style-name="Default" style:data-style-name="N0">
      <style:table-cell-properties style:rotation-align="none"/>
    </style:style>
    <style:style style:name="ce3" style:family="table-cell" style:parent-style-name="Default" style:data-style-name="N100">
      <style:table-cell-properties style:rotation-align="none"/>
      <style:text-properties fo:color="#000000" style:text-outline="false" style:text-line-through-style="none" style:text-line-through-type="none" style:font-name="Microsoft Sans Serif" fo:font-size="10pt" fo:font-style="normal" fo:text-shadow="none" style:text-underline-style="none" fo:font-weight="normal" style:font-size-asian="10pt" style:font-style-asian="normal" style:font-weight-asian="normal" style:font-name-complex="Microsoft Sans Serif" style:font-size-complex="10pt" style:font-style-complex="normal" style:font-weight-complex="normal"/>
    </style:style>
  </office:automatic-styles>
  <office:body>
    <office:spreadsheet>
      <table:calculation-settings table:case-sensitive="false" table:automatic-find-labels="false" table:use-regular-expressions="false" table:use-wildcards="true">
        <table:iteration table:maximum-difference="0.0001"/>
      </table:calculation-settings>
      <table:table table:name="DataSource" table:style-name="ta1">
        <office:forms form:automatic-focus="false" form:apply-design-mode="false"/>
        <table:table-column table:style-name="co1" table:default-cell-style-name="ce2"/>
        <table:table-column table:style-name="co2" table:default-cell-style-name="ce2"/>
        <table:table-row table:style-name="ro1">
          <table:table-cell table:style-name="ce1" office:value-type="float" office:value="1" calcext:value-type="float">
            <text:p>1</text:p>
          </table:table-cell>
          <table:table-cell table:style-name="ce3" office:value-type="string" calcext:value-type="string">
            <text:p>LINE-00001</text:p>
          </table:table-cell>
        </table:table-row>
        <table:table-row table:style-name="ro2">
          <table:table-cell table:style-name="ce1" office:value-type="float" office:value="2" calcext:value-type="float">
            <text:p>2</text:p>
          </table:table-cell>
          <table:table-cell table:style-name="ce3" office:value-type="string" calcext:value-type="string">
            <text:p>LINE-00002</text:p>
          </table:table-cell>
        </table:table-row>
        <table:table-row table:style-name="ro1">
          <table:table-cell table:style-name="ce1" office:value-type="float" office:value="3" calcext:value-type="float">
            <text:p>3</text:p>
          </table:table-cell>
          <table:table-cell table:style-name="ce3" office:value-type="string" calcext:value-type="string">
            <text:p>LINE-00003</text:p>
          </table:table-cell>
        </table:table-row>
      </table:table>
      <table:named-expressions/>
    </office:spreadsheet>
  </office:body>
</office:document-content>

 

For now, the ODF reader in ExcelTable is implemented using DOM interface only. So, the p_method argument of the getRows() function is ignored.
All other existing features are available, for instance :

  • Range-based access of data
  • Cell comments extraction
  • CLOB projection for large cell content
  • Support for password-protected documents

Here’s a sample query based on document test01.ods :

SQL> select t.*
  2  from table(
  3         ExcelTable.getRows(
  4           ExcelTable.getFile('TMP_DIR','test01.ods')
  5         , 'Feuille1'
  6         , q'{
  7             "SRNO"  number         column 'A'
  8           , "VAL"   number         column 'B'
  9           , "DT"    timestamp      column 'C'
 10           , "TXT1"  clob           column 'D'
 11           , "TXT2"  varchar2(1000) column 'E'
 12           , "RN"    for ordinality
 13           }'
 14         , '7:8'
 15         )
 16       ) t
 17  ;
 
      SRNO        VAL DT                             TXT1                            TXT2               RN
---------- ---------- ------------------------------ ------------------------------- ---------- ----------
         4 9999999999 10-JUN-18 03.20.37.000000 PM   €€€€€€€€€€€€€€€€€€€€€€€€€€ABC                       1
         5       -123 11-JUN-18 03.20.37.120000 PM                                   SPARE1              2
 

 

As usual, the source code, documentation and samples are available on GitHub :

/mbleron/ExcelTable
/mbleron/MSUtilities/OfficeCrypto
 

In the remaining of the post, I’ll focus on encryption methods used in password-protected ODF files.

Read more…

Advertisements

ExcelTable 1.3 : support for password-encrypted files

June 5, 2017 3 comments

Here’s the new version of ExcelTable, which can now read password-encrypted files.
It supports Standard and Agile encryption methods, as specified in [MS-OFFCRYPTO].

By default, Office 2007 will encrypt using the Standard method whereas Office 2010 and onwards use Agile encryption.
AES (128 or 256) is usually the default algorithm on standard Office installations.
Because latest Office versions (2013+) make use of SHA-2 hashing algorithms, Oracle 12c is required to read Excel documents encrypted in those versions.

Basically, the only change from ExcelTable 1.2 is the addition of an optional argument p_password in getRows() function :

function getRows (
  p_file     in blob
, p_sheet    in varchar2
, p_cols     in varchar2
, p_range    in varchar2 default null
, p_method   in binary_integer default DOM_READ
, p_password in varchar2 default null
) 
return anydataset pipelined
using ExcelTableImpl;

The following dependencies are also required :
XUTL_CDF
XUTL_OFFCRYPTO

 

Source code available on GitHub :

/mbleron/ExcelTable

 

A few words about the internals are following…

 

Read more…