Sample code for 30+ languages & platforms
DataFlex

Efficiently Process a Huge XML File

See more XML Examples

Demonstrates a technique for processing a huge XML file (can be any size, even many gigabytes).

Note: This example requires Chilkat v9.5.0.80 or greater.

Chilkat DataFlex Downloads

DataFlex
Use ChilkatAx-win32.pkg

Procedure Test
    Boolean iSuccess
    Handle hoFac
    Handle hoXml
    Variant vSb
    Handle hoSb
    Boolean iFirstIteration
    Integer iRetval
    Integer iNumTransactions
    String sBeginMarker
    String sEndMarker
    String sTemp1

    Move False To iSuccess

    // This example shows a way to efficiently process a gigantic XML file -- one that may be too large
    // to fit in memory.  
    // 
    // Two types of XML parsers exist: DOM parsers and SAX parsers.

    // A DOM parser is a Document Object Model parser, where the entire XML is loaded into memory
    // and the application has the luxury of interacting with the XML in a convenient, random-access
    // way.  The Chilkat Xml class is a DOM parser.  Because the entire XML is loaded into memory,
    // huge XML files (on the order of gigabytes) are usually not loadable for memory constraints.

    // A SAX parser is such that the XML file is parsed as an input stream.  No DOM exists.  
    // Using a SAX parser is generally less palatable than using a DOM parser, for many reasons.
    // 
    // The technique described here is a hybrid.  It streams the XML file as unstructured text
    // to extract fragments that are individually treated as separate XML documents loaded into
    // the Chilkat Xml parser.
    // 
    // For example, imagine your XML file is several GBs in size, but has a relatively simple structure, such as:
    // 
    // <Transactions>
    //     <Transaction id="1">
    //          ...
    //     </Transaction>
    //     <Transaction id="2">
    //          ...
    //     </Transaction>
    //     <Transaction id="3">
    //          ...
    //     </Transaction>
    // ...
    // </Transactions>

    // In the following code, each <Transaction ...> ... </Transaction>
    // is extracted and loaded separately into an Xml object, where it can be manipulated
    // independently.  The entire XML file is never entirely loaded into memory.

    Get Create (RefClass(cComCkFileAccess)) To hoFac
    If (Not(IsComObjectCreated(hoFac))) Begin
        Send CreateComObject of hoFac
    End

    Get ComOpenForRead Of hoFac "qa_data/xml/transactions.xml" To iSuccess
    If (iSuccess = False) Begin
        Get ComLastErrorText Of hoFac To sTemp1
        Showln sTemp1
        Procedure_Return
    End

    Get Create (RefClass(cComChilkatXml)) To hoXml
    If (Not(IsComObjectCreated(hoXml))) Begin
        Send CreateComObject of hoXml
    End
    Get Create (RefClass(cComChilkatStringBuilder)) To hoSb
    If (Not(IsComObjectCreated(hoSb))) Begin
        Send CreateComObject of hoSb
    End
    Move True To iFirstIteration
    Move 1 To iRetval
    Move 0 To iNumTransactions

    // The begin marker is "XML tag aware".  If the begin marker begins with "<"
    // and ends with ">", then it is assumed to be an XML tag and it will also match
    // substrings where the ">" can be a whitespace char.
    Move "<Transaction>" To sBeginMarker
    Move "</Transaction>" To sEndMarker

    While (iRetval = 1)
        Send ComClear To hoSb
        // The retval can have the following values:
        // 0: No more fragments exist.
        // 1: Captured the next fragment.  The text from beginMarker to endMarker, including the markers, are returned in sb.
        // -1: Error.
        Get pvComObject of hoSb to vSb
        Get ComReadNextFragment Of hoFac iFirstIteration sBeginMarker sEndMarker "utf-8" vSb To iRetval
        Move False To iFirstIteration

        If (iRetval = 1) Begin
            Move (iNumTransactions + 1) To iNumTransactions
            Get pvComObject of hoSb to vSb
            Get ComLoadSb Of hoXml vSb True To iSuccess
            // Your application may now do what it needs with this particular XML fragment...
        End

    Loop

    If (iRetval < 0) Begin
        Get ComLastErrorText Of hoFac To sTemp1
        Showln sTemp1
    End

    Showln "numTransactions: " iNumTransactions


End_Procedure