Working with large XML files in PHP

by Vicente Russo Neto on May 23, 2009

There are many ways to manage big XML files in php. When I say large, is really large, 10, 20, 30 mebabytes. If we were open a file like this on a common server, we certainly will have a time-out error. This is because we usually use SimpleXML functions. This extensions is, in fact, a “Tree-based parser”, just like “DOM Parser”. They work great in small-medium XML files (~1MB). They put the whole content on memory, and then, parse it. But when we face a big content, the only option is the “Stream-based parsers”. These are more efficiently and faster, because it read the file on demand, and don`t crush your server`s memory.

Amongst the stream-based parsers, we have SAX and XMLReader. I`ll show you how to read a big XML with XMLReader, because it`s easier and faster comparing to SAX, as you can se here.

The XMLReader is an extension enabled by default on PHP 5.1 and earlier. Born from XmlTextReader API (C#) and it`s based om libxml2 library. Before that, the XMLReader extension was only available on PECL. XMLReader supports namespaces and validations, including DTD and Relax NG (REgular LAnguage for XML Next Generation).

Well, lets to the code. Here`s me XML example:

1
2
3
4
5
6
7
<users>
     <name>John</name>
     <address>My Address</address>
     <zipcode>12345</zipcode>
     <city>My City</city>
     <phone>555 1234-4321</phone>
</users>

PHP code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
fopen('users.xml');
while ($users->read()) {
     switch ($users->nodeType) {
          case (XMLReader::ELEMENT):
               if ($users->localName == "users") {
             $node = $reader->expand();
             $dom = new DomDocument();
             $n = $dom->importNode($node,true);
             $dom->appendChild($n);
             $simple_xml = simplexml_import_dom($n);
             $id = $simple_xml['id'];
             $name = $simple_xml->name;
             $address = $simple_xml->address;
                     // Custom code insert, update, whatever...
               }
     }
}

See, in determinate point I transform the actual block of data (the users tag) into a SimpleXML object, making the manipulation extremely easy. This way, you can easily work with big XML with no harm to the server`s memory and no decrease on speed. My next move: try to optimize this script and maybe create a CodeIgniter`s library. That certainly will be a nice addon to developers, I think. Of course any help will be very welcome!

Did you like this? Share it:

{ 7 comments… read them below or add one }

zeigen October 9, 2009 at 2:04 am

can you help me on this script

$doc = new Domdocument(’1.0′,’utf-8′);
$root = $doc->appendchild($doc->CreateElement(‘root’));
$users = new XMLReader();
$users->open(‘discogs_artist.xml’);
while ($users->read()) {
switch ($users->nodeType) {
case (XMLReader::ELEMENT):
if ($users->localName == “artist”) {
$node = $users->expand();
$dom = new DomDocument();
$n = $dom->importNode($node,true);
$root->appendChild($n);
}
}
}
$response = $doc->saveXML();

it’s say Error:
parser error : Extra content at the end of the document
and other few error.

James October 26, 2009 at 7:47 pm

Just to say thank you so much for this snippet, I’ve spent hours trying to figure out reading a complex XML with XMLReader and within 10 minutes I now have it working :-)

anon_anon November 25, 2009 at 12:12 am

Have you looked at vtd-xml?

James November 30, 2009 at 4:02 pm

You are truly a star!!! :)
I’ve been pulling my hair out trying to figure this out the whole day.. thank you so much!!..

Btw, have you been able to create the CI library? I use CI too on most projects.. :)

sergiu September 3, 2010 at 1:59 pm

Hwy, this script could solve the huge problem I have. The thing is that I get an error for the “open” command: “Call to undefined function open() in…”. Any thoughts?

Vicente Russo Neto September 3, 2010 at 5:36 pm

Hi Sergiu. I think it’s a typo, please change to fopen and let me know.

sergiu September 5, 2010 at 7:33 pm

This is the one that has been working for me perfectly!
Just had to open the file as a XMLReader object.

I tried “fopen”, but id did not do the job on my side… anyways, the rest of the code works nice! Thanks for posting it.

*just replace xml_file_path with the path and name of the actual xml file
*replace xml_record_name – in our case “users” – this does not come from the file name, but from the record tag inside it.

$xmlFile = “xml_file_path”;
$xml = new XMLReader();

$xml->open($xmlFile);

while ($xml->read()) {
switch ($xml->nodeType) {
case (XMLReader::ELEMENT):
if ($xml->localName == “xml_record_name”) {
$node = $xml->expand();
$dom = new DomDocument();
$n = $dom->importNode($node,true);
$dom->appendChild($n);
$job = simplexml_import_dom($n);

Leave a Comment