C
C
Efficiently Process a Huge XML File
See more XML Examples
Demonstrates a technique for processing a huge XML file (can be any size, even many gigabytes).Note: This example requires Chilkat v9.5.0.80 or greater.
Chilkat C Downloads
#include <C_CkFileAccess.h>
#include <C_CkXml.h>
#include <C_CkStringBuilder.h>
void ChilkatSample(void)
{
BOOL success;
HCkFileAccess fac;
HCkXml xml;
HCkStringBuilder sb;
BOOL firstIteration;
int retval;
int numTransactions;
const char *beginMarker;
const char *endMarker;
success = FALSE;
// This example shows a way to efficiently process a gigantic XML file -- one that may be too large
// to fit in memory.
//
// Two types of XML parsers exist: DOM parsers and SAX parsers.
// A DOM parser is a Document Object Model parser, where the entire XML is loaded into memory
// and the application has the luxury of interacting with the XML in a convenient, random-access
// way. The Chilkat Xml class is a DOM parser. Because the entire XML is loaded into memory,
// huge XML files (on the order of gigabytes) are usually not loadable for memory constraints.
// A SAX parser is such that the XML file is parsed as an input stream. No DOM exists.
// Using a SAX parser is generally less palatable than using a DOM parser, for many reasons.
//
// The technique described here is a hybrid. It streams the XML file as unstructured text
// to extract fragments that are individually treated as separate XML documents loaded into
// the Chilkat Xml parser.
//
// For example, imagine your XML file is several GBs in size, but has a relatively simple structure, such as:
//
// <Transactions>
// <Transaction id="1">
// ...
// </Transaction>
// <Transaction id="2">
// ...
// </Transaction>
// <Transaction id="3">
// ...
// </Transaction>
// ...
// </Transactions>
// In the following code, each <Transaction ...> ... </Transaction>
// is extracted and loaded separately into an Xml object, where it can be manipulated
// independently. The entire XML file is never entirely loaded into memory.
fac = CkFileAccess_Create();
success = CkFileAccess_OpenForRead(fac,"qa_data/xml/transactions.xml");
if (success == FALSE) {
printf("%s\n",CkFileAccess_lastErrorText(fac));
CkFileAccess_Dispose(fac);
return;
}
xml = CkXml_Create();
sb = CkStringBuilder_Create();
firstIteration = TRUE;
retval = 1;
numTransactions = 0;
// The begin marker is "XML tag aware". If the begin marker begins with "<"
// and ends with ">", then it is assumed to be an XML tag and it will also match
// substrings where the ">" can be a whitespace char.
beginMarker = "<Transaction>";
endMarker = "</Transaction>";
while (retval == 1) {
CkStringBuilder_Clear(sb);
// The retval can have the following values:
// 0: No more fragments exist.
// 1: Captured the next fragment. The text from beginMarker to endMarker, including the markers, are returned in sb.
// -1: Error.
retval = CkFileAccess_ReadNextFragment(fac,firstIteration,beginMarker,endMarker,"utf-8",sb);
firstIteration = FALSE;
if (retval == 1) {
numTransactions = numTransactions + 1;
success = CkXml_LoadSb(xml,sb,TRUE);
// Your application may now do what it needs with this particular XML fragment...
}
}
if (retval < 0) {
printf("%s\n",CkFileAccess_lastErrorText(fac));
}
printf("numTransactions: %d\n",numTransactions);
CkFileAccess_Dispose(fac);
CkXml_Dispose(xml);
CkStringBuilder_Dispose(sb);
}