Updates from March, 2011 Toggle Comment Threads | Keyboard Shortcuts

  • raj 4:46 am on March 9, 2011 Permalink | Reply
    Tags: eff, public domain   

    Supreme Court to Hear Challenge to Law That Removes Works from the Public Domain 

    From the EFF:

    Today the Supreme Court agreed to hear an important case about whether Congress has the power to “restore” copyright protection to works that already exist in the public domain. To be clear, for more than 200 years the law has been settled – once a work was in the public domain, there it remained, and downstream users could feel free to use, store, or share it any way they saw fit. Now Congress, in enacting Section 514 of the Uruguay Round Agreements Act, is changing the game by granting copyright protection to works by foreign authors that, for a variety of reasons, were no longer protected by copyright (for example, if an author had failed to renew her copyright). This means that many works already in the public domain – Peter the Wolf, literature by Maxim Gorky, pieces by Picasso, and music by Stravinski, for example – that have been used and performed countless times would now be subject to copyright protection. Those who have used the works could now be required to pay hefty license fees, and – even worse – if they can’t afford those fees, cease use of the works.

    Also, the EFF has filed an Amicus Brief on behalf of the Internet Archive (pdf).

  • raj 11:56 pm on March 4, 2011 Permalink | Reply
    Tags: , , wine   

    Easy To Make Wine 

    Easy To Make Wine, with additional recipes for cocktails, city, beer, fruit syrups, and herb teas is a must-have for every kitchen:

    This book is intended for the ordinary housewife or
    perhaps her husband.

    I hope it will be helpful to those who wish to make a
    few bottles for home consumption or for giving to friends.

    Available to borrow from Open Library’s Lending Library program!

  • raj 12:56 am on March 3, 2011 Permalink | Reply
    Tags: opds,   

    Open Library OPDS 

    I’ve started adding OPDS/BookServer support to Open Library, starting with OL Edition records. OPDS is an Atom-based specification for distribution of ebook metadata. You can read more at http://opds-spec.org.

    For each edition record in OL, you can add “.odps” to the end of the edition key to retrieve the OPDS version. For example the OPDS entry for this edition of Peter Rabbit can be retrieved from


    OPDS provides a good way of programmatically extracting and harvesting OL edition data. Currently, the OPDS record contains more metadata than the JSON version of an edition record, which makes it easy to grab author and subject data without multiple requests.

    For example, to get an author name for an OL edition using JSON, you would first have to get the edition’s JSON record, find the work key, then request the work’s JSON record, then find the author key, then request the author’s JSON record, and then you would be able find the author name. The OPDS entry for an edition will contain Work-level data, so you can avoid jumping through hoops.

    Also, for edition records added before Works were introduced in OL, the JSON edition record sometimes shows author and subject data which is old and sometimes incorrect!

    We are still discussing how to support external identifiers, indirect acquisition, and DAISY links with the ODPS community, so the format of the OPDS record might change slightly in the future.

    The template for the OPDS XML is at the end of this very long url:


    This is the first step in closing this bug that Matt filed two years ago.. Poor Matt!

  • raj 12:43 am on February 26, 2011 Permalink | Reply  

    Daniel, showing off DIY bookscanning tronix 

  • raj 10:28 pm on February 25, 2011 Permalink | Reply  

    Daniel, designer of this DIY bookscanner, meets Tom, designer of the Scribe scanner #pda11 

  • raj 11:40 pm on February 24, 2011 Permalink | Reply
    Tags: , , tts   

    BookReader TTS 

    The Internet Archive BookReader now contains a real-time Text-To-Speech feature to help assist our print-disabled users.

    I added some documentation on how the “Read-it-to-me” feature is implemented here:

  • raj 7:55 pm on February 24, 2011 Permalink
    Tags: , data, , formats   

    New Upload Format, *_images.zip, for Scribe-style Uploads 

    Hank says:

    I’d like to provide some information about a new file format to some of you who have been involved with uploading already-digitized materials to the Archive. (Please share this message with anyone I didn’t include and should have.)

    You may be familiar with (and may be using) our existing _jp2.zip and _jp2.tar files. Making these from your own existing images is inconvenient and error-prone, due to the rigid expectations for individual image filenames and directory structure.

    The new format is much more flexible. If you provide a file whose name ends in _images.zip, we’ll make a _jp2.zip from it: the _images.zip will be unpacked, its contents sorted alphabetically (and any subdirectories flattened), and the set of images found within converted into a standard _jp2.zip, which we’ll then process as usual.

    In a bit more detail, the _images.zip will be scanned for files it contains, at any directory level, whose names end with .jp2, .jpg, .tif, or .png, matched case-insensitively; any other files (.xml, .txt, etc.) will be ignored. You can mix and match different image formats. All image files found will be sorted alphabetically (including any directory names, so that files originally in the same directory stay together in the new sequence), converted to jp2 if they’re not already, renamed the way our code expects, and packed into a new _jp2.zip, leaving your _images.zip in place as it was.

    For an example of how messy an _images.zip we can deal with, see:


    listing from hr100106_images.zip
    	767010/	01-06-10 13:18	0
    	767010/76701057/	01-06-10 06:59	0
    	767010/76701057/00000001.jpg	01-06-10 06:59	268802
    	767010/76701061/	01-06-10 07:00	0
    	767010/76701061/00000001.jpg	01-06-10 07:00	292476
    	767010/76701067/	01-06-10 07:01	0
    	767010/76701067/00000001.jpg	01-06-10 07:01	230612
    	767010/76701068/	01-06-10 07:02	0
    	767010/76701068/00000001.jpg	01-06-10 07:02	235011
    	767010/76701069/	01-06-10 07:05	0
    	767010/76701069/00000001.jpg	01-06-10 07:05	281997

    The 589 images files found there were converted into:


    listing from hr100106_jp2.zip
    	hr100106_jp2/	02-22-11 05:31	0
    	hr100106_jp2/hr100106_0000.jp2	(JPG)	02-22-11 05:30	143845
    	hr100106_jp2/hr100106_0001.jp2	(JPG)	02-22-11 05:30	191348
    	hr100106_jp2/hr100106_0002.jp2	(JPG)	02-22-11 05:30	93923
    	hr100106_jp2/hr100106_0003.jp2	(JPG)	02-22-11 05:30	100340
    	hr100106_jp2/hr100106_0004.jp2	(JPG)	02-22-11 05:30	164196
    	hr100106_jp2/hr100106_0005.jp2	(JPG)	02-22-11 05:30	169330

    Note that the new _jp2.zip, and the files it contains, are named according to the name of the original _images.zip file (“hr100106”), regardless of how directories and files are names inside the _images.zip. Those files and directories can be named any way you like; the names matter only in that they determine the sequence of the images in the new _jp2.zip.

    Again, please share this info with anyone you think will be interested.

    Thanks, Hank!

  • raj 7:26 pm on February 24, 2011 Permalink | Reply  

    Very little making out going on at #PDA11 

  • raj 7:14 pm on February 23, 2011 Permalink | Reply  

    Personal Digital Archiving, tomorrow at archive.org HQ 

    See http://www.personalarchiving.com/ for more info.

  • raj 9:16 pm on February 22, 2011 Permalink | Reply
    Tags: Archive Team, , yahoo   

    Archive Team is rescuing Yahoo Videos, and needs help! 

    From Archive Team (no relation):

    So as usual Yahoo! is deleting terabytes of user-generated content and as usual they are doing it in a clunky, fucked-up manner and as usual the timeframe is arbitrary and out of nowhere and as usual Archive Team is here to clean up the fucking mess.

    So we’ve been downloading it. We’ve been downloading it for a month. Seriously.

    It’s been a crack team of people, all donating time, bandwidth and disk space to download every single video out of Yahoo! Video. We’re at full clip, but we need more volunteers, or we’re not going to make it.

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc