New Upload Format, *_images.zip, for Scribe-style Uploads
I’d like to provide some information about a new file format to some of you who have been involved with uploading already-digitized materials to the Archive. (Please share this message with anyone I didn’t include and should have.)
You may be familiar with (and may be using) our existing _jp2.zip and _jp2.tar files. Making these from your own existing images is inconvenient and error-prone, due to the rigid expectations for individual image filenames and directory structure.
The new format is much more flexible. If you provide a file whose name ends in _images.zip, we’ll make a _jp2.zip from it: the _images.zip will be unpacked, its contents sorted alphabetically (and any subdirectories flattened), and the set of images found within converted into a standard _jp2.zip, which we’ll then process as usual.
In a bit more detail, the _images.zip will be scanned for files it contains, at any directory level, whose names end with .jp2, .jpg, .tif, or .png, matched case-insensitively; any other files (.xml, .txt, etc.) will be ignored. You can mix and match different image formats. All image files found will be sorted alphabetically (including any directory names, so that files originally in the same directory stay together in the new sequence), converted to jp2 if they’re not already, renamed the way our code expects, and packed into a new _jp2.zip, leaving your _images.zip in place as it was.
For an example of how messy an _images.zip we can deal with, see:listing from hr100106_images.zip 767010/ 01-06-10 13:18 0 767010/76701057/ 01-06-10 06:59 0 767010/76701057/00000001.jpg 01-06-10 06:59 268802 767010/76701061/ 01-06-10 07:00 0 767010/76701061/00000001.jpg 01-06-10 07:00 292476 767010/76701067/ 01-06-10 07:01 0 767010/76701067/00000001.jpg 01-06-10 07:01 230612 767010/76701068/ 01-06-10 07:02 0 767010/76701068/00000001.jpg 01-06-10 07:02 235011 767010/76701069/ 01-06-10 07:05 0 767010/76701069/00000001.jpg 01-06-10 07:05 281997 ...
The 589 images files found there were converted into:listing from hr100106_jp2.zip hr100106_jp2/ 02-22-11 05:31 0 hr100106_jp2/hr100106_0000.jp2 (JPG) 02-22-11 05:30 143845 hr100106_jp2/hr100106_0001.jp2 (JPG) 02-22-11 05:30 191348 hr100106_jp2/hr100106_0002.jp2 (JPG) 02-22-11 05:30 93923 hr100106_jp2/hr100106_0003.jp2 (JPG) 02-22-11 05:30 100340 hr100106_jp2/hr100106_0004.jp2 (JPG) 02-22-11 05:30 164196 hr100106_jp2/hr100106_0005.jp2 (JPG) 02-22-11 05:30 169330 ...
Note that the new _jp2.zip, and the files it contains, are named according to the name of the original _images.zip file (“hr100106″), regardless of how directories and files are names inside the _images.zip. Those files and directories can be named any way you like; the names matter only in that they determine the sequence of the images in the new _jp2.zip.
Again, please share this info with anyone you think will be interested.