<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: processing large xml data files</title>
	<atom:link href="http://surgeworks.com/blog/rails/processing-large-xml-data-files/feed" rel="self" type="application/rss+xml" />
	<link>http://surgeworks.com/blog/processing-large-xml-data-files</link>
	<description>iPhone and iPad App Design and Development</description>
	<lastBuildDate>Tue, 17 Apr 2012 21:07:40 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Lolke Dijkstra</title>
		<link>http://surgeworks.com/blog/processing-large-xml-data-files#comment-43</link>
		<dc:creator>Lolke Dijkstra</dc:creator>
		<pubDate>Tue, 12 Oct 2010 17:29:57 +0000</pubDate>
		<guid isPermaLink="false">http://surgeworks.com/blog/?p=681#comment-43</guid>
		<description>Hi Brad,

I came across your site when searching for XML and large datasets.

I agree with you that SAX makes a lot of sense, however also StAX may be considered.

I have worked in many projects as a software engineer (mostly 90&#039;s) and architect (mostly last 10 years). One of the projects that I came across was processing extremely large XML datasets (containing 1,000,000 + financial transactions) and entered a team where development had been outsourced. The guys had come up with a &#039;solution&#039; using JaXB. Of course that could not work. At least that is if you were to process all data into memory and work from there (which was exactly what they had been doing). I estimated the max dataset to be around 120,000 depending on available memory. In practice it was much worse of course (only about 30,000 max). So, I recommended a different solution based on SAX or StAX (actually I had done that already). As a spin off of that project I decided to further develop that solution into a MDE based approach.  Based on the specification (XSD) the code generator would generate the XML parser, which could send events (logical) to the listening processor. This is what I did and it worked very well. You can find the information here: http://dijkstra-ict.nl/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf.

If this is of interest to anyone here, drop me a note.
: lolke.dijkstra@dijkstra-ict.com

Kind regards,
Lolke Dijkstra</description>
		<content:encoded><![CDATA[<p>Hi Brad,</p>
<p>I came across your site when searching for XML and large datasets.</p>
<p>I agree with you that SAX makes a lot of sense, however also StAX may be considered.</p>
<p>I have worked in many projects as a software engineer (mostly 90&#8242;s) and architect (mostly last 10 years). One of the projects that I came across was processing extremely large XML datasets (containing 1,000,000 + financial transactions) and entered a team where development had been outsourced. The guys had come up with a &#8216;solution&#8217; using JaXB. Of course that could not work. At least that is if you were to process all data into memory and work from there (which was exactly what they had been doing). I estimated the max dataset to be around 120,000 depending on available memory. In practice it was much worse of course (only about 30,000 max). So, I recommended a different solution based on SAX or StAX (actually I had done that already). As a spin off of that project I decided to further develop that solution into a MDE based approach.  Based on the specification (XSD) the code generator would generate the XML parser, which could send events (logical) to the listening processor. This is what I did and it worked very well. You can find the information here: <a href="http://dijkstra-ict.nl/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf" rel="nofollow">http://dijkstra-ict.nl/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf</a>.</p>
<p>If this is of interest to anyone here, drop me a note.<br />
: <a href="mailto:lolke.dijkstra@dijkstra-ict.com">lolke.dijkstra@dijkstra-ict.com</a></p>
<p>Kind regards,<br />
Lolke Dijkstra</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruby/Rails: Parse large XMLs (SAX parsers, Pull parsers) + example of Pull parser &#171; Rails, Web 2.0, Data Modeling</title>
		<link>http://surgeworks.com/blog/processing-large-xml-data-files#comment-42</link>
		<dc:creator>Ruby/Rails: Parse large XMLs (SAX parsers, Pull parsers) + example of Pull parser &#171; Rails, Web 2.0, Data Modeling</dc:creator>
		<pubDate>Fri, 21 May 2010 20:27:32 +0000</pubDate>
		<guid isPermaLink="false">http://surgeworks.com/blog/?p=681#comment-42</guid>
		<description>[...] Processing large XML files (SAX example) [...] </description>
		<content:encoded><![CDATA[<p>[...] Processing large XML files (SAX example) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruby/Rails: Parse large XMLs (SAX parsers, Pull parsers) &#8211; example of Pull parser &#171; Rails, Web 2.0, Data Modeling</title>
		<link>http://surgeworks.com/blog/processing-large-xml-data-files#comment-41</link>
		<dc:creator>Ruby/Rails: Parse large XMLs (SAX parsers, Pull parsers) &#8211; example of Pull parser &#171; Rails, Web 2.0, Data Modeling</dc:creator>
		<pubDate>Fri, 21 May 2010 20:24:39 +0000</pubDate>
		<guid isPermaLink="false">http://surgeworks.com/blog/?p=681#comment-41</guid>
		<description>[...] Processing large XML files (SAX example) [...] </description>
		<content:encoded><![CDATA[<p>[...] Processing large XML files (SAX example) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Conniff</title>
		<link>http://surgeworks.com/blog/processing-large-xml-data-files#comment-40</link>
		<dc:creator>Bill Conniff</dc:creator>
		<pubDate>Sun, 14 Feb 2010 02:25:02 +0000</pubDate>
		<guid isPermaLink="false">http://surgeworks.com/blog/?p=681#comment-40</guid>
		<description>The problem I have with such parsers is they read forward-only and do not cache. Xponent developed a caching parser, allowing one to do most anything, in one pass. Is is currently in public open beta test.</description>
		<content:encoded><![CDATA[<p>The problem I have with such parsers is they read forward-only and do not cache. Xponent developed a caching parser, allowing one to do most anything, in one pass. Is is currently in public open beta test.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: anon_anon</title>
		<link>http://surgeworks.com/blog/processing-large-xml-data-files#comment-39</link>
		<dc:creator>anon_anon</dc:creator>
		<pubDate>Mon, 17 Aug 2009 19:32:51 +0000</pubDate>
		<guid isPermaLink="false">http://surgeworks.com/blog/?p=681#comment-39</guid>
		<description>You might want to VTD-XML (http://vtd-xml.sf.net) for processing large or huge XML documents... it has two APis, the standard version processes XML up to 2GB in size, the extended version allows documents up to 256 GB in size... XPath 1.0 is built-in... it has tons other features</description>
		<content:encoded><![CDATA[<p>You might want to VTD-XML (<a href="http://vtd-xml.sf.net" rel="nofollow">http://vtd-xml.sf.net</a>) for processing large or huge XML documents&#8230; it has two APis, the standard version processes XML up to 2GB in size, the extended version allows documents up to 256 GB in size&#8230; XPath 1.0 is built-in&#8230; it has tons other features</p>
]]></content:encoded>
	</item>
</channel>
</rss>

