How to serve book images using IA-style /download/ URLs
archive.org uses stable URLs for downloading book images. Here is what I had to do to add similar functionality to cluster.biodiversitylibrary.org. These instructions assume you use IA-style storage for book images, but don’t use the IA petabox code, and you have already followed these instructions to get the BookReader to work.
First, I created a small script called download.php:
<? require_once('BookReader.inc'); // BookReader::getURLbhl will return URL to redir to or null // This script is called to handle URLs such as: // http://cluster.biodiversitylibrary.org/download/journalofnatural11lond/page/n10_w1150.jpg // The bookreader will return an image URL to redirect to, such as: // http://cluster.biodiversitylibrary.org/BookReader/BookReaderImages.php?id=journalofnatural11lond&itemPath=%2Fmnt%2Fglusterfs%2Fwww%2Fj%2Fjournalofnatural11lond&server=cluster.biodiversitylibrary.org&page=n10_w1150.jpg //strip off leading /download/ $path = preg_replace('#^/download/#', '', $_SERVER['REQUEST_URI']); $id = strtok($path, '/'); //the leading part of the path is the item id $first = $id[0]; $mainDir = "/mnt/glusterfs/www/$first/$id"; $redirUrl = BookReader::getURLbhl($path, 'cluster.biodiversitylibrary.org', $mainDir); if ($redirUrl) { header("Location: $redirUrl"); } exit; ?> |
Then, I modified BookReaderIA/BookReader.inc to have a BHL-specific getURL() function. The only change here was to pass in $itemServer
and $mainDir
as strings. In the IA version, we pass in a petabox Item object, which contains these two strings.
public static function getURLbhl($path, $itemServer, $mainDir) { // $path should look like {itemId}/{operator}/{filename} // Other operators may be added $urlParts = BookReader::parsePath($path); // Check for non-handled cases $required = array('identifier', 'operator', 'operand'); foreach ($required as $key) { if (!array_key_exists($key, $urlParts)) { return null; } } $identifier = $urlParts['identifier']; $operator = $urlParts['operator']; $filename = $urlParts['operand']; $subPrefix = $urlParts['subPrefix']; $serverBaseURL = BookReader::serverBaseURL($itemServer); // Baseline query params $query = array( 'id' => $identifier, 'itemPath' => $mainDir, 'server' => $serverBaseURL ); if ($subPrefix) { $query['subPrefix'] = $subPrefix; } switch ($operator) { case 'page': // Look for old-style preview request - e.g. {identifier}_cover.jpg if (preg_match('/^(.*)_((cover|title|preview).*)/', $filename, $matches) === 1) { // Serve preview image $page = $matches[2]; $query['page'] = $page; return 'http://' . $serverBaseURL . '/BookReader/BookReaderPreview.php?' . http_build_query($query, '', '&'); } // New-style preview request - e.g. cover_thumb.jpg if (preg_match('/^(cover|title|preview)/', $filename, $matches) === 1) { $query['page'] = $filename; return 'http://' . $serverBaseURL . '/BookReader/BookReaderPreview.php?' . http_build_query($query, '', '&'); } // Asking for a non-preview page $query['page'] = $filename; return 'http://' . $serverBaseURL . '/BookReader/BookReaderImages.php?' . http_build_query($query, '', '&'); default: // Unknown operator return null; } return null; // was not handled } |
Finally, Phil modified /etc/nginx/sites-enabled/default
to contain this rewrite rule:
rewrite ^/download/(.) /download.php?$1; |
Reply