Working with large XML files in PHP

There are many ways to manage big XML files in php. When I say large, is really large, 10, 20, 30 mebabytes. If we were open a file like this on a common server, we certainly will have a time-out error. This is because we usually use SimpleXML functions. This extensions is, in fact, a “Tree-based parser”, just like “DOM Parser”. They work great in small-medium XML files (~1MB). They put the whole content on memory, and then, parse it. But when we face a big content, the only option is the “Stream-based parsers”. These are more efficiently and faster, because it read the file on demand, and dont crush your servers memory.

Amongst the stream-based parsers, we have SAX and XMLReader. Ill show you how to read a big XML with XMLReader, because its easier and faster comparing to SAX, as you can se here.

The XMLReader is an extension enabled by default on PHP 5.1 and earlier. Born from XmlTextReader API (C#) and its based om libxml2 library. Before that, the XMLReader extension was only available on PECL. XMLReader supports namespaces and validations, including DTD and Relax NG (REgular LAnguage for XML Next Generation).

Well, lets to the code. Heres me XML example:

PHP code:

See, in determinate point I transform the actual block of data (the users tag) into a SimpleXML object, making the manipulation extremely easy. This way, you can easily work with big XML with no harm to the servers memory and no decrease on speed. My next move: try to optimize this script and maybe create a CodeIgniters library. That certainly will be a nice addon to developers, I think. Of course any help will be very welcome!

Did you like this? Share it:

9 Comments

  • zeigen |

    can you help me on this script

    $doc = new Domdocument(‘1.0′,’utf-8’);
    $root = $doc->appendchild($doc->CreateElement(‘root’));
    $users = new XMLReader();
    $users->open(‘discogs_artist.xml’);
    while ($users->read()) {
    switch ($users->nodeType) {
    case (XMLReader::ELEMENT):
    if ($users->localName == “artist”) {
    $node = $users->expand();
    $dom = new DomDocument();
    $n = $dom->importNode($node,true);
    $root->appendChild($n);
    }
    }
    }
    $response = $doc->saveXML();

    it’s say Error:
    parser error : Extra content at the end of the document
    and other few error.

  • James |

    Just to say thank you so much for this snippet, I’ve spent hours trying to figure out reading a complex XML with XMLReader and within 10 minutes I now have it working 🙂

  • James |

    You are truly a star!!! 🙂
    I’ve been pulling my hair out trying to figure this out the whole day.. thank you so much!!..

    Btw, have you been able to create the CI library? I use CI too on most projects.. 🙂

  • sergiu |

    Hwy, this script could solve the huge problem I have. The thing is that I get an error for the “open” command: “Call to undefined function open() in…”. Any thoughts?

  • sergiu |

    This is the one that has been working for me perfectly!
    Just had to open the file as a XMLReader object.

    I tried “fopen”, but id did not do the job on my side… anyways, the rest of the code works nice! Thanks for posting it.

    *just replace xml_file_path with the path and name of the actual xml file
    *replace xml_record_name – in our case “users” – this does not come from the file name, but from the record tag inside it.

    $xmlFile = “xml_file_path”;
    $xml = new XMLReader();

    $xml->open($xmlFile);

    while ($xml->read()) {
    switch ($xml->nodeType) {
    case (XMLReader::ELEMENT):
    if ($xml->localName == “xml_record_name”) {
    $node = $xml->expand();
    $dom = new DomDocument();
    $n = $dom->importNode($node,true);
    $dom->appendChild($n);
    $job = simplexml_import_dom($n);

  • louis |

    I found that, for some reason, doing the variable assignage in a step by step manner, as you have done, was successful, whereas a ‘compound’ statement (commented out, below) did not return any results. Could this be perhaps to the order of execution, or maybe I’m missing something? Thanks.

    code:
    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == ‘Users’)
    {
    $node = $reader->expand();
    $doc = new DOMDocument(‘1.0′,’UTF-8’);
    $n = $doc->importNode($node,true);
    $doc->appendChild($n);
    $xml_bio_report = simplexml_import_dom($n);
    //$xml_bio_report = simplexml_import_dom($doc->importNode($reader->expand(),true));
    }
    }

So, what do you think ?