27 August 2011 Last updated on 16 January 2012

In response to the feature request made on Twitter by Fotis Evangelou of K2 fame.

Today, Fotis Evangelou (of K2 fame) made a feature request for Admin Tools on Twitter. He asked if it's possible to have one-click database export/import in Admin Tools. I always tell people that feature requests should be made over e-mail or through the pre-sales forum. I do that for a good reason: explaining why a feature is possible or not requires a lengthy analysis. Twitter, with 140 characters limit, makes it impossible so I have to resort to writing a blog post. Oh, well, here we go.

The backup part

Backing up a database can not be done in a single page load, unless your database is really tiny. Let's analyse the problems. If you have a rather big database and/or a slow host, backing up the database may require much more time than what is allowed by the PHP max_execution_time, meaning that the process will time out and will result in a blank page. That's why you have to split the backup in multiple steps.

Backing up a database could, in theory, be performed by running one query per database row, fetch its data and add it to the dump file. While this sounds like a good idea, it won't work. Most hosts have a SQL query limit per hour, meaning that if you try to perform too many SQL requests in a limited amount of time (which is what the database backup does) you will get an error.

As a result, you have to load a bunch of rows -as many as possible- with a single query, therefore limiting the amount of queries you need to iterate through the entire database. But you have different kinds of tables with different kinds of data in them. You can't just do a SELECT * FROM table because with large tables it would cause PHP memory exhaustion, since the amount of memory available to PHP is limited by php.ini directives. You also can't use an inelastic, fixed amount (e.g. 1000 rows). Why? Tough experience showed me that some developers use BLOB fields to store huge amounts of data, making a big select like this one fail miserably. Therefore, you need to let the user decide how much it should be backed up.

Just about you thought all your problems were solved, another roadblock is on the way. Many hosts limit the maximum size a file created by PHP can have. I have seen limits of 128Kb, 256Kb, 512Kb, 1Mb, 2Mb, 5Mb, 10Mb and 20Mb. Which one applies to your host? The only way to know is to do some heuristics, i.e. try to create a large file until you get an error, then you know what is the correct limit. That's what Akeeba Backup's "Configuration Wizard" does in its final step, automatically. Moreover, since you now have multiple database dump files, you need to put them in an archive file. Let me just say that using tar, tar.gz and zip is guaranteed to cause issues for reasons that would take another 10 paragraphs to analyse. So, we'd still have to use the JPA format for those archives.

You also have another problem. Since the restoration will also be plagued by SQL query limits per hour, you can't create one INSERT statement per database row. You need to "group" them using what is known as "extended INSERTs", i.e. one INSERT statement for multiple rows. However, you can't create an arbitrarily long query. MySQL has a maximum packet size, i.e. it limits the maximum size of each SQL query. Therefore you need to know what that is. One generally safe value is 512Kb (half than the default setting of 1Mb for the maximum packet size) but it needs to be configurable by the user to guarantee a smooth operation.

Before you can call it a day, you also bump into another couple of problems. Most hosts don't like you calling the same URL many times over a short period of time. However, splitting the backup process in multiple steps does exactly that. So, you need a way to enforce a minimum execution time per backup step. Then again, you have other hosts which enforce CPU usage limits. Since the backup is CPU intensive, you need to "throttle down" the backup speed, e.g. backing up for 1 second and doing nothing for anohter 2-3 seconds to lower the average CPU usage below the threshold levels.

Finally, some tables must not be backed up. For example, the sessions table. However, it's not just that one table. It all depends on the software you have installed. Sometimes, you might not want to backup, let's say, the banner click tracking table because it makes no sense on your site, or would like to only backup VirtueMart's tables because this is what makes sense on your site. In order to do that, I'd have to give you a way to exclude database tables from the backup, i.e. what Akeeba Backup does!

All of the above have lead to Akeeba Backup having a lot of options for the MySQL database engine and even more in its Fine Tuning section. Replicating the same thing in Admin Tools is just plain stupid and guarantees the introduction of regression errors and new bugs. Not something I would enjoy. So, for the backup part you need to use Akeeba Backup and its "All configured databases (archive file)" backup mode which tackles all of the above issues.

Downloading the backup

What's simpler than downloading a binary file from the web? Sounds simple, right? Oh, wait, it's not that simple... 

The obvious choice would be to put the backup archive in a web-accessible place and give you the URL to it. Your web server would happily send it to your browser which would save it to disk. Your browser, as well as anybody else's browser for that matter, including that evil cracker's browser! How would he know the URL? It's not that hard to guess it, trust me on that ;) So, placing your backup archive in a web accessible directory is a Very Bad Idea™. You have to hide it in a protected directory, which doesn't allow direct access.

What about just password-protecting the directory? Thank you very much, what a great idea... only it's not universal. The only way to do that is through .htaccess files which only work on the Apache web server. Everything else (IIS, Lighttpd, Nginx, you-name-it) doesn't understand .htaccess files. That "everything else" is about 25% of the Internet, so it's a big percentage to ignore. Even for the 75% of the Internet which understands .htaccess files, we some some serious issues. Linux-based Apache installations understand encrypted passwords and everything is secure. Windows-based Apache installations do not, so we have to save the password in plain text. That's insecure and can easily be compromised. So, this is not a solution.

What about using PHP to "pump" the backup archive from the server to the browser? Thank you, I've already thought of that 5 years ago and I've already implemented that in Akeeba Backup. However, there is a very high chance of causing a corrupt backup archive to be downloaded. For starters, you can't just read the entire file to the standard output. This will overflow either PHP's memory limit, or Apache's internal cache. So, you have to read a couple of Mb at a time, push them to the browser, wait a little, repeat until finished. Due to PHP's time limit, if you have a large backup archive and a slow network connection, this process times out somewhere in the middle WITHOUT A WARNING and you have a corrupt backup archive. More problems. Some browsers do not honour MIME headers and will download the binary file in text mode, corrupting the heck out of it. Some will save it under a different name (minor glitch, who cares). Some Joomla! plugins screw up when they are confronted with format=raw, required to get raw data out of Joomla!, causing PHP notices/warnings/errors to be printed out to the output, which corrupt the backup archive. Sometimes they will just kill the whole download process, resulting in the download of a 0-byte archive.

The only reliable solution is using FTP, SFTP or another similar method (WebDAV?) to download the backup archive manually. Since Fotis was talking about PHPFog which doesn't have FTP access, all bets are off. If the download through the browser works, all is fine and dandy. If not, you're screwed.

Another approach, much more reliable, would be to upload the whole archive to Amazon S3. From there, it's easy to download it without any problems from everywhere on Earth as long as you have an Internet connection. Guess what? Akeeba Backup does that. Replicating that in Admin Tools is just plain stupid.

Uploading the backup

Oh, PHP uploads. What's easier than uploading a file through a PHP script? You wouldn't know. Sending the man to the moon is probably one of those things easier than creating a relaible PHP upload script.

First, we have the two limits imposed by PHP, the maximum upload file size and the maximum POST size. The archive you upload can not be larger than the smaller of either of these. If it is, your browser tells you that the file was uploaded correctly, but PHP sees a 0-byte file or gets an upload failure. Bummer. You could use a browser-side chunked upload, but every such solution I have tested relied on Flash, Silverlight, HTML 5 or any other technology not available on many corporate environments. Not ot mention that they are unstable as hell and usually cause corrupt uploads. And they require PHP-writable directories (more on that later).

Then again, in order to upload a file you need a writtable upload temporary directory. You would be surprised at the sheer amount of hosts which have a default PHP upload temproary directory which either has wrong permissions or is outside the open_basedir restrictions path. Argh! It's also worth mentioning that these problems can not be detected from within PHP code (to the best of my knowledge) making the troubleshooting of the error a very nice exercise in psychic abilities.

Finally, you need a directory which is directly writtable by PHP in order to do move the uploaded file in there before you can process it. Did I mention that most shared hosts do not allow that by default unless you use unsafe (0777) permissions? This takes us back to the problem of creating a PHP-writtable directory which is not web-accessible to mitigate the security hole we knowligly opened using 0777 permissions. We can do that with .htaccess (Apache) and web.config (IIS) files, but about 10% of the Internet doesn't support either methods. So, we have a 10% chance of creating an insecure setup.

The restoration

Suppose you now have a database backup, you downloaded it and uploaded it, without anything blowing up. How do you restore it?

Extracting the backup archive can't happen in a single pageload without risking a PHP timeout. So, we have to split the process into multiple steps. Everything I said in the backup paragraphs regarding file sizes, timeout issues, memory issues and CPU consumption still holds true. Everything I said about permissions in the upload section still holds true. The solution? I have implemented some very sophisticated algorithms in Kickstart to work around most of that and give you the option to use FTP mode to extract what can not otherwise be written do, without risking timeouts or hitting CPU limits. However, there is a small portion of web hosts where even this doesn't work and you have to do the extraction manually and upload everything by FTP.

Now that everything is extracted, let's restore it.

Restoring the entire database dump in a single pageload will not work for database dumps over 300-500Kb as experience taught me (the very hard way). Again, you have to split the process in multiple steps and all the timeout, memory and CPU consumption issues hold true.

Moreover, you have a few more serious issues. By far and foremost, if the extended INSERTs' packet size was not configured properly at backup time, your restoration will fail because MySQL will reject a very lengthy query. In all honesty, it's possible to have a single Joomla! article which is just too big for your target host's MySQL database and you are, quite simply put, stuck. You won't be able to restore your database and your database will be left in a half-broken state, therefore the site you were restoring the database to is now FUBAR. You lost the site doing a db restoration, yay! I hope you have backups.

Did I mention that the process has to be split in multiple steps? What does that mean? It means that in order to restore the database, we have to make multiple calls to the same address, which goes through Joomla!. But Joomla! requires its database to be in good shape in order to load a page and our process is modifying the Joomla! database! This means that it is very likely that at some point, between two successive steps, the Joomla! database will be in a half-broken state (restoration is still in progress!) and Joomla! won't be able to load the next page which would continue the database restoration. Bummer! In order to work around that, you need to create an external script, i.e. something which runs outside of Joomla! and is able to overwrite Joomla!'s database. Does this sound like something a hacker would enjoy? Damn right!  All a hacker would need is to upload a SQL file somewhere in your site and call our database restoration script to hack you. Therefore, implementing a database restoration solution would by definition render your site insecure. And that's why even though JoomlaPack had this feature, Akeeba Backup removed it.

The bottom line

It was of course impossible to analyse all of the above over Twitter. That's why I kindly request that you all make your feature requests through the forum or e-mail.

The bottom line is that xreating that feature would mean that I'd have to do a compromise in a non-viable manner. It would only run on very small sites or quality hosts and everyone else is not supported. But, how small is "very small"? Impossible to say. It depends on the content of your site, the speed of your server and your server configuration. Since it's impossible to create an accurate description of "very small", people would use that, it would fail and I'd have to support them even though I would know that this feature couldn't work on their servers.  It's easy to say "if your host has a query limit, I don't support you" and "if your host imposes maximum file size limits, I don't support you". And go on and on until you are only left supporting unrestricted local and dedicated server setups, i.e. exactly the environments that do not need that feature. Stupid idea, isn't it?

No matter how you look at it, it makes no sense to create a feature which can't possibly work reliably on the majority of live hosts and have to provide support about it. It would not only make me look like a retard (why publish software which knowlingly doesn't run on most hosts if I am not a retard?), it would also cause a support stress which is double as bad because it is inevitable and doesn't even result to a solution for the user (which certainly makes me look as a retard). So, thanks but no thanks, it's not that I don't like you Fotis, but implementing that on generally available software is a truly bad idea. And next time you have a feature request, please go through the forum so that I don't have to write a blog post the size of War & Peace to explain what can't be explained over Twitter. Thanks :)