Tagged: docs Toggle Comment Threads | Keyboard Shortcuts

  • raj 11:40 pm on February 24, 2011 Permalink | Reply
    Tags: , docs, tts   

    BookReader TTS 

    The Internet Archive BookReader now contains a real-time Text-To-Speech feature to help assist our print-disabled users.

    I added some documentation on how the “Read-it-to-me” feature is implemented here:

  • raj 7:55 pm on February 24, 2011 Permalink
    Tags: , data, docs, formats   

    New Upload Format, *_images.zip, for Scribe-style Uploads 

    Hank says:

    I’d like to provide some information about a new file format to some of you who have been involved with uploading already-digitized materials to the Archive. (Please share this message with anyone I didn’t include and should have.)

    You may be familiar with (and may be using) our existing _jp2.zip and _jp2.tar files. Making these from your own existing images is inconvenient and error-prone, due to the rigid expectations for individual image filenames and directory structure.

    The new format is much more flexible. If you provide a file whose name ends in _images.zip, we’ll make a _jp2.zip from it: the _images.zip will be unpacked, its contents sorted alphabetically (and any subdirectories flattened), and the set of images found within converted into a standard _jp2.zip, which we’ll then process as usual.

    In a bit more detail, the _images.zip will be scanned for files it contains, at any directory level, whose names end with .jp2, .jpg, .tif, or .png, matched case-insensitively; any other files (.xml, .txt, etc.) will be ignored. You can mix and match different image formats. All image files found will be sorted alphabetically (including any directory names, so that files originally in the same directory stay together in the new sequence), converted to jp2 if they’re not already, renamed the way our code expects, and packed into a new _jp2.zip, leaving your _images.zip in place as it was.

    For an example of how messy an _images.zip we can deal with, see:


    listing from hr100106_images.zip
    	767010/	01-06-10 13:18	0
    	767010/76701057/	01-06-10 06:59	0
    	767010/76701057/00000001.jpg	01-06-10 06:59	268802
    	767010/76701061/	01-06-10 07:00	0
    	767010/76701061/00000001.jpg	01-06-10 07:00	292476
    	767010/76701067/	01-06-10 07:01	0
    	767010/76701067/00000001.jpg	01-06-10 07:01	230612
    	767010/76701068/	01-06-10 07:02	0
    	767010/76701068/00000001.jpg	01-06-10 07:02	235011
    	767010/76701069/	01-06-10 07:05	0
    	767010/76701069/00000001.jpg	01-06-10 07:05	281997

    The 589 images files found there were converted into:


    listing from hr100106_jp2.zip
    	hr100106_jp2/	02-22-11 05:31	0
    	hr100106_jp2/hr100106_0000.jp2	(JPG)	02-22-11 05:30	143845
    	hr100106_jp2/hr100106_0001.jp2	(JPG)	02-22-11 05:30	191348
    	hr100106_jp2/hr100106_0002.jp2	(JPG)	02-22-11 05:30	93923
    	hr100106_jp2/hr100106_0003.jp2	(JPG)	02-22-11 05:30	100340
    	hr100106_jp2/hr100106_0004.jp2	(JPG)	02-22-11 05:30	164196
    	hr100106_jp2/hr100106_0005.jp2	(JPG)	02-22-11 05:30	169330

    Note that the new _jp2.zip, and the files it contains, are named according to the name of the original _images.zip file (“hr100106”), regardless of how directories and files are names inside the _images.zip. Those files and directories can be named any way you like; the names matter only in that they determine the sequence of the images in the new _jp2.zip.

    Again, please share this info with anyone you think will be interested.

    Thanks, Hank!

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc