best xml parser to use
Hi,
I have a particular problem to solve:
I have an xml batch file that contains individual xml invoices. I need to extract these xml invoices one at a time and
place them on a message queue i.e. I just need to get all the data between the invoice start and end tags put it
in a string and place it on a message queue (validation occurs on the invoice itself on the receiver side).
What is likely to be my best approach, DOM (unlikely I guess), SAX, StAX or simply writing a java program using indexOf, in terms of performance ?
TIA Peter
[582 byte] By [
petera] at [2007-11-20 1:07:36]

# 1 Re: best xml parser to use
Hi ,
To fetch the values from XML tags it is good to use JAXB.Which has the
speed of SAX and the storage capacity of DOM so try it and it is easy too.
but for that you should first write a schema file so that the JAXB will parse it
and create a class according to the tags , then using JAXB coding (that is
very simillar to the code what we have in Java)we can fetch the value .
Thanks
-James.A.Johney
# 2 Re: best xml parser to use
java's xml parsers can help you more... couple of them r, DOM, JDOM, SAX...
if u use DOM or JDOM, get the whole document parsed, then just use a method called getElementsByTagName(String Tag). This is pretty easy to do. you will have to follow factory pattern.
# 3 Re: best xml parser to use
The very fastest way to extract the data is to assume that you only get valid input and extract substrings using string functions.
<invoices>
<invoice>
<title>Abc</title>
<total>$12.95</total>
</invoice>
<invoice>
<title>Abc</title>
<total>$12.95</total>
</invoice>
</invoices>
With the above XML data, a very fast extraction routine would be the following:
int i = -1;
while ((i = xmlData.indexOf("<invoice>", i + 1)) > -1) {
String invoice = xmlData.substring(i + 9, xmlData.indexOf("</invoice>", i + 9));
// Do whatever you like with invoice here. You may want to trim()
// it because it will probably have whitespace in both ends.
}
Note that the above code will behave incorrectly if <invoice> tags are nested inside other <invoice> tags.
You can further optimize by making it skip text it knows does not contain <invoice> and reducing calculations:
int i = -1;
while ((i = xmlData.indexOf("<invoice>", i) + 9) > 8) {
String invoice = xmlData.substring(i, xmlData.indexOf("</invoice>", i));
i += invoice.length + 10;
// Do whatever you like with invoice here. You may want to trim()
// it because it will probably have whitespace in both ends.
}