#31530 – Extract only part of archive

Posted in ‘Kickstart / Backup Restoration’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Thursday, 11 July 2019 20:44 CDT
Hi,

We have a large site of 6GB and the backup was split in to 3x files.

One of those files has become corrupted, but I'd like to extract the data in the other 2 files to see if the one small file I need is in one of those parts.

It's in JPA format and it won't extract as there's a part missing.


Q: Is there a way to extract the 2 parts that aren't corrupted??


If not, had we used .zip would this have allowed us to extract the files??

If not, then this is a flaw in the process, as part files need to be able to be extracted and JPA is our preferred format.

Can you please look at a system that allows the extraction of any part without the other parts being present.


Thanks,

Mark
Custom Fields
Joomla! version (in x.y.z format) 5.2.2
PHP version (in x.y.z format) 7.2.0
Akeeba Backup version (x.y.z format) 3.5.1
Kickstart version (x.y.z format) 5.3
 
its@hcwd.com.au
Friday, 12 July 2019 05:18 CDT
It depends. I would like to note that no, it is not a flaw in the process and no, ZIP files would not be better -- they would be worse. In fact, the process for splitting archives is not even mine. It was devised by Phil Katz for PKZIP several decades ago and it's been copied for literally every archive format supporting split archives ever since. In short, it's the standard (and only) way to do it.

Let's start with ZIP files which have a normative format specification. A split archive with three parts is not three separate ZIP files. It's a single big file which was chopped down in predetermined lengths. The first part is the one with the .z01 extension and has the header of the file, telling the extraction engine that this is a split archive. The last part is the one with the .zip extension and ends with the Central Directory and end of archive markers. If the Central Directory is busted or missing you cannot extract any file from any part of the ZIP archive. In short: if either the first or the last part of the archive is busted you cannot extract a split ZIP file at all. If you have these two parts and there's a corruption in the file data you may be able to extract some files, as long as the corruption did not change the length of the data by even a single byte.

That necessitated coming up with an archive format that's slightly less finicky than ZIP and better suited for site backups. The JPA format is basically "ZIP lite". It does away with all the ZIP features we can't or don't need to use, including the Central Directory. It still follows the same archive splitting logic as all other archive formats: you don't have three separate archives, you have a single big file that was chopped down in predetermined lengths. The first part is the one with the .j01 extension and has a header that tells us it's a split JPA file. There's nothing special for the other part files. Unlike ZIP files we don't need the last part to extract a multipart JPA file. If you have at least the first part (.j01) you can start extracting the archive with Kickstart, up to the point where a part file is missing or corrupt.

With JPA files you have the option to tell the archive extraction script, Kickstart, to ignore most errors. It will happily skip over corrupt files as long as the header data before its file is intact. If that's corrupt, ugh, we have very limited options which boil down to using a custom script that tries to "guesstimate" where the next file begins and hope for the best. If you're missing entire parts it's even worse because we now have a big "hole" in the big archive, making heuristics even harder.

Before I can tell you if there's anything we can do I need to know if you have all the archive parts and the nature of the corruption (are some bytes screwed up or are the parts truncated).


Nicholas K. Dionysopoulos

Lead Developer and Director



Greek: native

English: excellent

French: basic



Please keep in mind my timezone and cultural differences when reading my replies. Thank you!



nicholas
Tuesday, 23 July 2019 02:00 CDT
Thanks Nicholas, at the moment, the client hasn't complained, so I'm inclined not to waste your time on something that they may never notice. We'll leave it for now and if it crops up, I'll be in touch.
 
its@hcwd.com.au
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.

Support Information

Working hours: Typically we work Monday to Friday, 9am to 7pm Cyprus timezone (EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets, but we cannot respond to them, outside of our working hours.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!

Cookies Notification - Action required

This website uses cookies to provide user authentication and improve your user experience. Please indicate whether you consent to our site placing these cookies on your device. You can change your preference later, from the controls which will be made available to you at the bottom of every page of our site.