Processing XML

    Processing XML


    I have an C++ on Unix that needs to get name and attribute from an XML. I'm currently using xerces. As I traverse through the XML, I'm populating a std::map which later will have things like :-
    myMap["Name"] = "Ami"
    myMap["Age"] = "..."

    When I have a very long XML, e.g. 1000 elements, then it takes approx 2 seconds to process it and populate the map.

    Does anyone have any ideas how to process this quicker ? Is xerces not too old ? When I tried seaching for the values by looking for '<', '/>', etc. it takes just as long or sometimes quicker than xerces.

    How come I'm hearing people say that they process 10,000 massive XMLs in 1 second ?

    Can someone please help ? I'm working on Unix and I don't have a schema or DTD.


    I really dislike XML, its extremely bloated at the best of times. Portability is nice but taking up 10X the space for data is not so nice.

    You could try to multi-thread the search, divide and conquor the file if you have multiple CPUs on the system.

    Other than that, how exactly are you searching it and what sort of data structure is it in? If you read the file into a pointer/array of bytes, and search it byte by byte in a loop, thats about as fast as you can get since the data is not sorted or organized in any way that you can attack. If you read the whole thing into a string or vector or other complicated container, depending on the implementation, those can slow things down, or speed them up, it varies from package to package a lot. The search algorithms in the STL may or may not be faster than what you can do. I suspect they are slower than a basic loop.

    Anyway, thats my take on it... drop it in a POD container, loop fast, thread. If you can beat that with some other method, share with me, please! It is unlikely that a 5 line assembly search to do the same would be any better for so basic an approach.
