DataFlex
DataFlex
Efficiently Process a Huge XML File
See more XML Examples
Demonstrates a technique for processing a huge XML file (can be any size, even many gigabytes).Note: This example requires Chilkat v9.5.0.80 or greater.
Chilkat DataFlex Downloads
Use ChilkatAx-win32.pkg
Procedure Test
Boolean iSuccess
Handle hoFac
Handle hoXml
Variant vSb
Handle hoSb
Boolean iFirstIteration
Integer iRetval
Integer iNumTransactions
String sBeginMarker
String sEndMarker
String sTemp1
Move False To iSuccess
// This example shows a way to efficiently process a gigantic XML file -- one that may be too large
// to fit in memory.
//
// Two types of XML parsers exist: DOM parsers and SAX parsers.
// A DOM parser is a Document Object Model parser, where the entire XML is loaded into memory
// and the application has the luxury of interacting with the XML in a convenient, random-access
// way. The Chilkat Xml class is a DOM parser. Because the entire XML is loaded into memory,
// huge XML files (on the order of gigabytes) are usually not loadable for memory constraints.
// A SAX parser is such that the XML file is parsed as an input stream. No DOM exists.
// Using a SAX parser is generally less palatable than using a DOM parser, for many reasons.
//
// The technique described here is a hybrid. It streams the XML file as unstructured text
// to extract fragments that are individually treated as separate XML documents loaded into
// the Chilkat Xml parser.
//
// For example, imagine your XML file is several GBs in size, but has a relatively simple structure, such as:
//
// <Transactions>
// <Transaction id="1">
// ...
// </Transaction>
// <Transaction id="2">
// ...
// </Transaction>
// <Transaction id="3">
// ...
// </Transaction>
// ...
// </Transactions>
// In the following code, each <Transaction ...> ... </Transaction>
// is extracted and loaded separately into an Xml object, where it can be manipulated
// independently. The entire XML file is never entirely loaded into memory.
Get Create (RefClass(cComCkFileAccess)) To hoFac
If (Not(IsComObjectCreated(hoFac))) Begin
Send CreateComObject of hoFac
End
Get ComOpenForRead Of hoFac "qa_data/xml/transactions.xml" To iSuccess
If (iSuccess = False) Begin
Get ComLastErrorText Of hoFac To sTemp1
Showln sTemp1
Procedure_Return
End
Get Create (RefClass(cComChilkatXml)) To hoXml
If (Not(IsComObjectCreated(hoXml))) Begin
Send CreateComObject of hoXml
End
Get Create (RefClass(cComChilkatStringBuilder)) To hoSb
If (Not(IsComObjectCreated(hoSb))) Begin
Send CreateComObject of hoSb
End
Move True To iFirstIteration
Move 1 To iRetval
Move 0 To iNumTransactions
// The begin marker is "XML tag aware". If the begin marker begins with "<"
// and ends with ">", then it is assumed to be an XML tag and it will also match
// substrings where the ">" can be a whitespace char.
Move "<Transaction>" To sBeginMarker
Move "</Transaction>" To sEndMarker
While (iRetval = 1)
Send ComClear To hoSb
// The retval can have the following values:
// 0: No more fragments exist.
// 1: Captured the next fragment. The text from beginMarker to endMarker, including the markers, are returned in sb.
// -1: Error.
Get pvComObject of hoSb to vSb
Get ComReadNextFragment Of hoFac iFirstIteration sBeginMarker sEndMarker "utf-8" vSb To iRetval
Move False To iFirstIteration
If (iRetval = 1) Begin
Move (iNumTransactions + 1) To iNumTransactions
Get pvComObject of hoSb to vSb
Get ComLoadSb Of hoXml vSb True To iSuccess
// Your application may now do what it needs with this particular XML fragment...
End
Loop
If (iRetval < 0) Begin
Get ComLastErrorText Of hoFac To sTemp1
Showln sTemp1
End
Showln "numTransactions: " iNumTransactions
End_Procedure