Knowledge of document encoding management

[Note]
  • [INFOCENTER_V9] Background information on XML internal encoding

Background information on XML internal encoding

XML data in a binary application data type has internal encoding. With internal encoding, the content of the data determines the encoding. The DB2 database system derives the internal encoding from the document content according to the XML standard.

Internal encoding is derived from three components:

  1. Unicode Byte Order Mark (BOM)

    A byte sequence that consists of a Unicode character code at the beginning of XML data. The BOM indicates the byte order of the following text. The DB2 database manager recognizes a BOM only for XML data. For XML data that is stored in a non-XML column, the database manager treats a BOM value like any other character or binary value.

  2. XML declaration

    A processing instruction at the beginning of an XML document. The declaration provides specific details about the remainder of the XML.

  3. Encoding declaration

    An optional part of the XML declaration that specifies the encoding for the characters in the document.

The DB2 database manager uses the following procedure to determine the encoding:

  1. If the data contains a Unicode BOM, the BOM determines the encoding. The following table lists the BOM types and the resultant data encoding:

    Table 3.1.  Byte order marks and resultant document encoding

    BOM typeBOM valueEncoding

    UTF-8

    X'EFBBBF'

    UTF-8

    UTF-16 Big Endian

    X'FEFF'

    UTF-16

    UTF-16 Little Endian

    X'FFFE'

    UTF-16

    UTF-32 Big Endian

    X'0000FEFF'

    UTF-32

    UTF-32 Little Endian

    X'FFFE0000'

    UTF-32

  2. If the data contains an XML declaration, the encoding depends on whether there is an encoding declaration:

    • If there is an encoding declaration, the encoding is the value of the encoding attribute. For example, the encoding is EUC-JP for XML data with the following XML declaration:

      
      <?xml version="1.0" encoding="EUC-JP"?>
      
      											

    • If there is an encoding declaration and a BOM, the encoding declaration must match the encoding from the BOM. Otherwise, an error occurs.

    • If there is no encoding declaration and no BOM, the database manager determines the encoding from the encoding of the XML declaration:

      • If the XML declaration is in single-byte ASCII characters, the encoding of the document is UTF-8.

      • If the XML declaration is in double-byte ASCII characters, the encoding of the document is UTF-16.

  3. If there is no XML declaration and no BOM, the encoding of the document is UTF-8.

Professional hosting     Belorussian informational portal         Free SCWCD 1.4 Study Guide     Free SCDJWS 1.4 Study Guide     SCDJWS 1.4 Quiz     Free IBM Certified Associate Developer Study Guide     IBM Test 000-287. Enterprise Application Development with IBM WebSphere Studio, V5.0 Study Guide     SCDJWS 5.0 Quiz