Chilkat HOME Android™ AutoIt C C# C++ Chilkat2-Python CkPython Classic ASP DataFlex Delphi DLL Go Java Node.js Objective-C PHP Extension Perl PowerBuilder PowerShell PureBasic Ruby SQL Server Swift Tcl Unicode C Unicode C++ VB.NET VBScript Visual Basic 6.0 Visual FoxPro Xojo Plugin
(Tcl) Efficiently Process a Huge XML FileDemonstrates a technique for processing a huge XML file (can be any size, even many gigabytes). Note: This example requires Chilkat v9.5.0.80 or greater.
load ./chilkat.dll # This example shows a way to efficiently process a gigantic XML file -- one that may be too large # to fit in memory. # # Two types of XML parsers exist: DOM parsers and SAX parsers. # A DOM parser is a Document Object Model parser, where the entire XML is loaded into memory # and the application has the luxury of interacting with the XML in a convenient, random-access # way. The Chilkat Xml class is a DOM parser. Because the entire XML is loaded into memory, # huge XML files (on the order of gigabytes) are usually not loadable for memory constraints. # A SAX parser is such that the XML file is parsed as an input stream. No DOM exists. # Using a SAX parser is generally less palatable than using a DOM parser, for many reasons. # # The technique described here is a hybrid. It streams the XML file as unstructured text # to extract fragments that are individually treated as separate XML documents loaded into # the Chilkat Xml parser. # # For example, imagine your XML file is several GBs in size, but has a relatively simple structure, such as: # # <Transactions> # <Transaction id="1"> # ... # </Transaction> # <Transaction id="2"> # ... # </Transaction> # <Transaction id="3"> # ... # </Transaction> # ... # </Transactions> # In the following code, each <Transaction ...> ... </Transaction> # is extracted and loaded separately into an Xml object, where it can be manipulated # independently. The entire XML file is never entirely loaded into memory. set fac [new_CkFileAccess] set success [CkFileAccess_OpenForRead $fac "qa_data/xml/transactions.xml"] if {$success == 0} then { puts [CkFileAccess_lastErrorText $fac] delete_CkFileAccess $fac exit } set xml [new_CkXml] set sb [new_CkStringBuilder] set firstIteration 1 set retval 1 set numTransactions 0 # The begin marker is "XML tag aware". If the begin marker begins with "<" # and ends with ">", then it is assumed to be an XML tag and it will also match # substrings where the ">" can be a whitespace char. set beginMarker "<Transaction>" set endMarker "</Transaction>" while {$retval == 1} { CkStringBuilder_Clear $sb # The retval can have the following values: # 0: No more fragments exist. # 1: Captured the next fragment. The text from beginMarker to endMarker, including the markers, are returned in sb. # -1: Error. set retval [CkFileAccess_ReadNextFragment $fac $firstIteration $beginMarker $endMarker "utf-8" $sb] set firstIteration 0 if {$retval == 1} then { set numTransactions [expr $numTransactions + 1] set success [CkXml_LoadSb $xml $sb 1] # Your application may now do what it needs with this particular XML fragment... } } if {$retval < 0} then { puts [CkFileAccess_lastErrorText $fac] } puts "numTransactions: $numTransactions" delete_CkFileAccess $fac delete_CkXml $xml delete_CkStringBuilder $sb |
© 2000-2025 Chilkat Software, Inc. All Rights Reserved.