How to serve book images using IA-style /download/ URLs

archive.org uses stable URLs for downloading book images. Here is what I had to do to add similar functionality to cluster.biodiversitylibrary.org. These instructions assume you use IA-style storage for book images, but don’t use the IA petabox code, and you have already followed these instructions to get the BookReader to work.

First, I created a small script called download.php:

<?
 
require_once('BookReader.inc');
 
// BookReader::getURLbhl will return URL to redir to or null
 
// This script is called to handle URLs such as:
//    http://cluster.biodiversitylibrary.org/download/journalofnatural11lond/page/n10_w1150.jpg
 
// The bookreader will return an image URL to redirect to, such as:
//    http://cluster.biodiversitylibrary.org/BookReader/BookReaderImages.php?id=journalofnatural11lond&itemPath=%2Fmnt%2Fglusterfs%2Fwww%2Fj%2Fjournalofnatural11lond&server=cluster.biodiversitylibrary.org&page=n10_w1150.jpg
 
//strip off leading /download/
$path = preg_replace('#^/download/#', '', $_SERVER['REQUEST_URI']);
 
$id = strtok($path, '/'); //the leading part of the path is the item id
$first = $id[0];
 
$mainDir = "/mnt/glusterfs/www/$first/$id";
 
$redirUrl = BookReader::getURLbhl($path, 'cluster.biodiversitylibrary.org', $mainDir);
 
if ($redirUrl) {
    header("Location: $redirUrl");
}
exit;
 
?>

Then, I modified BookReaderIA/BookReader.inc to have a BHL-specific getURL() function. The only change here was to pass in $itemServer and $mainDir as strings. In the IA version, we pass in a petabox Item object, which contains these two strings.

  public static function getURLbhl($path, $itemServer, $mainDir) {
    // $path should look like {itemId}/{operator}/{filename}
    // Other operators may be added
 
    $urlParts = BookReader::parsePath($path);
 
    // Check for non-handled cases
    $required = array('identifier', 'operator', 'operand');
    foreach ($required as $key) {
        if (!array_key_exists($key, $urlParts)) {
            return null;
        }
    }
 
    $identifier = $urlParts['identifier'];
    $operator = $urlParts['operator'];
    $filename = $urlParts['operand'];
    $subPrefix = $urlParts['subPrefix'];
 
    $serverBaseURL = BookReader::serverBaseURL($itemServer);
 
    // Baseline query params
    $query = array(
        'id' => $identifier,
        'itemPath' => $mainDir,
        'server' => $serverBaseURL
    );
    if ($subPrefix) {
        $query['subPrefix'] = $subPrefix;
    }
 
    switch ($operator) {
        case 'page':
 
            // Look for old-style preview request - e.g. {identifier}_cover.jpg
            if (preg_match('/^(.*)_((cover|title|preview).*)/', $filename, $matches) === 1) {
                // Serve preview image
                $page = $matches[2];
                $query['page'] = $page;
                return 'http://' . $serverBaseURL . '/BookReader/BookReaderPreview.php?' . http_build_query($query, '', '&');
            }
 
            // New-style preview request - e.g. cover_thumb.jpg
            if (preg_match('/^(cover|title|preview)/', $filename, $matches) === 1) {
                $query['page'] = $filename;
                return 'http://' . $serverBaseURL . '/BookReader/BookReaderPreview.php?' . http_build_query($query, '', '&');
            }
 
            // Asking for a non-preview page
            $query['page'] = $filename;
            return 'http://' . $serverBaseURL . '/BookReader/BookReaderImages.php?' . http_build_query($query, '', '&');
 
        default:
            // Unknown operator
            return null;            
    }
 
    return null; // was not handled
  }

Finally, Phil modified /etc/nginx/sites-enabled/default to contain this rewrite rule:

rewrite ^/download/(.)             /download.php?$1;