Tagged: bookreader Toggle Comment Threads | Keyboard Shortcuts

  • raj 11:58 pm on February 7, 2014 Permalink | Reply
    Tags: bookreader   

    BookReader usage in 2013 

    We measured whenever someone opened the IA BookReader. Usage more than doubled in 2013, to more than 5 people opening up the BookReader every second!

    bookreader_graphite.us.archive.org

    For 2014, we won’t be able to produce the same data, since we now embed the BookReader on archive.org details pages, and a pageview now registers as a “bookreader open” event, even if the user doesn’t actually read the book.

     
  • raj 7:16 pm on October 31, 2013 Permalink | Reply
    Tags: bookreader, wordpress   

    BookReader embeds on wordpress.com 

    Blogs hosted at wordpress.com can now use BookReader embeds!

    http://en.support.wordpress.com/embedding-ebooks/

     
  • raj 9:38 pm on May 2, 2011 Permalink | Reply
    Tags: , bookreader,   

    How to serve book images using IA-style /download/ URLs 

    archive.org uses stable URLs for downloading book images. Here is what I had to do to add similar functionality to cluster.biodiversitylibrary.org. These instructions assume you use IA-style storage for book images, but don’t use the IA petabox code, and you have already followed these instructions to get the BookReader to work.

    First, I created a small script called download.php:

    <?
     
    require_once('BookReader.inc');
     
    // BookReader::getURLbhl will return URL to redir to or null
     
    // This script is called to handle URLs such as:
    //    http://cluster.biodiversitylibrary.org/download/journalofnatural11lond/page/n10_w1150.jpg
     
    // The bookreader will return an image URL to redirect to, such as:
    //    http://cluster.biodiversitylibrary.org/BookReader/BookReaderImages.php?id=journalofnatural11lond&itemPath=%2Fmnt%2Fglusterfs%2Fwww%2Fj%2Fjournalofnatural11lond&server=cluster.biodiversitylibrary.org&page=n10_w1150.jpg
     
    //strip off leading /download/
    $path = preg_replace('#^/download/#', '', $_SERVER['REQUEST_URI']);
     
    $id = strtok($path, '/'); //the leading part of the path is the item id
    $first = $id[0];
     
    $mainDir = "/mnt/glusterfs/www/$first/$id";
     
    $redirUrl = BookReader::getURLbhl($path, 'cluster.biodiversitylibrary.org', $mainDir);
     
    if ($redirUrl) {
        header("Location: $redirUrl");
    }
    exit;
     
    ?>

    Then, I modified BookReaderIA/BookReader.inc to have a BHL-specific getURL() function. The only change here was to pass in $itemServer and $mainDir as strings. In the IA version, we pass in a petabox Item object, which contains these two strings.

      public static function getURLbhl($path, $itemServer, $mainDir) {
        // $path should look like {itemId}/{operator}/{filename}
        // Other operators may be added
     
        $urlParts = BookReader::parsePath($path);
     
        // Check for non-handled cases
        $required = array('identifier', 'operator', 'operand');
        foreach ($required as $key) {
            if (!array_key_exists($key, $urlParts)) {
                return null;
            }
        }
     
        $identifier = $urlParts['identifier'];
        $operator = $urlParts['operator'];
        $filename = $urlParts['operand'];
        $subPrefix = $urlParts['subPrefix'];
     
        $serverBaseURL = BookReader::serverBaseURL($itemServer);
     
        // Baseline query params
        $query = array(
            'id' => $identifier,
            'itemPath' => $mainDir,
            'server' => $serverBaseURL
        );
        if ($subPrefix) {
            $query['subPrefix'] = $subPrefix;
        }
     
        switch ($operator) {
            case 'page':
     
                // Look for old-style preview request - e.g. {identifier}_cover.jpg
                if (preg_match('/^(.*)_((cover|title|preview).*)/', $filename, $matches) === 1) {
                    // Serve preview image
                    $page = $matches[2];
                    $query['page'] = $page;
                    return 'http://' . $serverBaseURL . '/BookReader/BookReaderPreview.php?' . http_build_query($query, '', '&');
                }
     
                // New-style preview request - e.g. cover_thumb.jpg
                if (preg_match('/^(cover|title|preview)/', $filename, $matches) === 1) {
                    $query['page'] = $filename;
                    return 'http://' . $serverBaseURL . '/BookReader/BookReaderPreview.php?' . http_build_query($query, '', '&');
                }
     
                // Asking for a non-preview page
                $query['page'] = $filename;
                return 'http://' . $serverBaseURL . '/BookReader/BookReaderImages.php?' . http_build_query($query, '', '&');
     
            default:
                // Unknown operator
                return null;            
        }
     
        return null; // was not handled
      }

    Finally, Phil modified /etc/nginx/sites-enabled/default to contain this rewrite rule:

    rewrite ^/download/(.)             /download.php?$1;
     
  • raj 5:19 am on March 17, 2011 Permalink | Reply
    Tags: , bookreader,   

    How to serve IA-style books from your own cluster 

    The Internet Archive BookReader is designed so that you can run it on your own server. Once you download the BookReader source code to your webserver, you can load the BookReaderDemo, which will run the bookreader code with static images. You can change the location of the images to anywhere on your webserver, and you should be up and running!

    Others have modified the IA BookReader to read image files from an image server, such as the Djatoka JPEG 2000 Image Server, instead of using static files on disk. This is also pretty easy to do.

    These two scenarios should cover most use cases. Most likely, your book images are either static images in a directory, or they are served by an image server. However, what if your images are stored in a zip file, similar to how archive.org stores book images? We’ll walk you through how to set up your webserver (or cluster) to serve images using IA-style book data.

    Internet Archive Storage for Book Data

    The Internet Archive stores book images in JPEG 2000 format, and the individual images are sequentially-numbered and stored in a ZIP file. There are various other files that describe a book, and these files are grouped together in an Internet Archive item. An item has an identifier that is unique within the IA cluster.

    Here is a breakdown of how the files in an item would look for an item with the identifier bookid, which is located in the directory /1/items/bookid:

    Files used by the bookreader:

    • bookid_abbyy.gz – contains OCR data XML format, used by full-text search
    • bookid_jp2.zip – contains processed JPEG2000 images, these are scaled and displayed by the bookreader
    • bookid_meta.xml – contains bibliographic metadata about the book
    • bookid_scandata.xml – contains image size and page number information
    • scandata.xml – older variant of bookid_scandata.xml
    • scandata.zip – older variant of bookid_scandata.xml

    The structure of bookid_jp2.zip looks like this:

    > unzip -l bookid_jp2.zip |head
    Archive:  bookid_jp2.zip
      Length     Date   Time    Name
     --------    ----   ----    ----
            0  09-04-07 17:25   bookid_jp2/
       677279  09-04-07 17:21   bookid_jp2/bookid_0001.jp2
       418643  09-04-07 17:21   bookid_jp2/bookid_0002.jp2
       400545  09-04-07 17:21   bookid_jp2/bookid_0003.jp2
       367304  09-04-07 17:21   bookid_jp2/bookid_0004.jp2
       447760  09-04-07 17:21   bookid_jp2/bookid_0005.jp2
       383252  09-04-07 17:21   bookid_jp2/bookid_0006.jp2

    Structure of the Bookreader codebase

    The BookReader code is designed to be split onto two kinds of different cluster nodes, web nodes and data nodes. However, it is easy to run the BookReader on a single machine, serving both roles.

    The static files, such as BookReader.js and BookReader.css, are served from a web node. They located in the top-level BookReader directory in the git repository.

    The IA-specific backend PHP and python files that parse the meta.xml and extract the JPEG 2000 image are normally served from a data node. They live in the BookReaderIA directory in the repository. These files are not necessary for a simple BookReader deployment using static images or an image server.

    Setting up the Datanode and the BookReader Image Server

    BookReaderImages.php turns your data node into a very simple image server. It extracts, decompresses, scales, rotates, and recompresses images that are stored in the Internet Archive storage format.

    Supported input image formats:

    • JPEG 2000
    • JPEG
    • TIFF
    • PNG

    Supported output image formats (these are formats that all browsers can display):

    • JPEG
    • PNG

    Supported archive formats:

    • ZIP
    • Tar

    Supported image operations:

    • Scaling by powers of two
    • Scaling by an arbitrary factor (increases server load)
    • Rotation by 90 degrees

    In addition to serving images, the data node also executes scripts that directly interfaces with the files in an item. Most importantly, BookReaderJSIA.php reads meta.xml and scandata in order to instantiate the BookReader with the correct parameters.

    1. Configure PHP

    First, you will need to configure a web server on the datanode to serve php scripts. You can use a standard web server such as Apache or Nginx with fastcgi-php enabled.

    In this example, we will set the docroot of the webserver to /var/www, and it will be able to serve php scripts from /var/www/BookReader (note the capitalization).

    Phil at the Biodiversity Heritage Library has written detailed instructions on how to configure Nginx and fastcgi-php for hosting the BookReader.

    2. Install BookReader PHP code

    Now that your data node’s webserver has been configured to serve files from /var/www, create a directory called /var/www/BookReader (note the capitalization).

    In the /var/www/BookReader directory, install the following scripts from the BookReaderIA/datanode repo directory:

    • BookReaderImages.php
    • BookReaderImages.inc.php
    • BookReaderMeta.inc.php
    • BookReaderJSIA.php

    3. Test to see if the data node is properly serving PHP scripts

    Let’s see if the webserver on the data node is properly configured. Try loading BookReaderImages.php script. An example URL would look like:

    http://cluster.biodiversitylibrary.org/BookReader/BookReaderImages.php

    Without any script arguments, your script should return a 404 HTTP status. If you aren’t using a custom 404 handler, you should see something like:

    Error serving request:
      Image error: Image stack does not exist at 
    
    Debugging information:
    #0 /var/www/BookReader/BookReaderImages.inc.php(245): BookReaderImages->BRfatal('Image stack doe...')
    #1 /var/www/BookReader/BookReaderImages.php(38): BookReaderImages->serveRequest(Array)
    #2 {main}

    If you have php-cli installed, you can also run php BookReaderImages.php from the command line and you should see a similar error message.

    This will verify that the webserver is configured to serve php scripts, and that the scripts are in the correct location.

    4. Install binaries required for serving images

    Tools needed to extract and decompress images, that need to be installed in the webserver process owner’s (www-data) executable $PATH:

    • unzip
    • 7z (for efficiently extracting images from tar archives, since tar does not seek() to the requested file)
    • netpbm tools (we call bmptopnm, jpegtopnm, tifftopnm, pngtopnm, pnmtopng, and pnmtojpeg)
    • exiftool (this can be installed at any path)

    In addition to these binaries, you need to install the Kakadu JPEG 2000 Software. We use Kakadu because it is fast, but you could modify the BookReaderImages.inc.php files to use Jasper or OpenJpeg instead.

    Although we have a license to the Kakadu SDK, it is possible to use the freely distributed pre-compiled Kakadu binaries with BookReaderImages.php. Download the 32-bit Linux Kakadu binaries from here. If you are on 64-bit linux, you will also need to install ia32-libs.

    5. Edit paths in BookReaderImages.inc.php

    Paths to the exiftool and kdu_expand command-line binaries are hard-coded in BookReaderImages.inc.php. These will need to be edited in your copy to your install path for these binaries:

       // Paths to command-line tools
        var $exiftool = '/petabox/sw/books/exiftool/exiftool';
        var $kduExpand = '/petabox/sw/bin/kdu_expand';

    Also, the path to the Kakadu shared library (libkdu_vXXX.so) is hardcoded and will need to be changed:

            putenv('LD_LIBRARY_PATH=/petabox/sw/lib/kakadu');

    6. Edit path in BookReaderJSIA.php

    archive.org stores items in a directory structure that looks like /XX/items/bookid, where XX is a 1 or 2 digit integer. If your directory structure is different, you will need to remove or edit this path check in BookReaderJSIA.php:

    if (!preg_match("|^/\d+/items/{$id}$|", $itemPath)) {
        BRFatal("Bad id!");
    }

    7. Test the data node PHP scripts

    You should be finished setting up the data node at this point.

    You can test BookReaderImages.php by passing in four cgi parameters:

    • zip – path to image zip file
    • file – image inside zip file to decompress
    • scale – radix-2 reduction parameter
    • rotate – rotation angle in 90-degree increments

    An example URL will look like:

    http://ia600307.us.archive.org/BookReader/BookReaderImages.php?zip=/35/items/flatlandromanceo00abbouoft/flatlandromanceo00abbouoft_jp2.zip&file=flatlandromanceo00abbouoft_jp2/flatlandromanceo00abbouoft_0007.jp2&scale=4&rotate=0

    You can test BookReaderJSIA.php by passing three cgi parameters:

    • id – the bookid
    • itemPath – the path on disk to the item (not the web-accessible path)
    • server – the domain name of the datanode
    • subPrefix – this is usually the same as the id, if you follow archive.org naming conventions

    An example URL will look like:

    http://ia600307.us.archive.org/BookReader/BookReaderJSIA.php?id=flatlandromanceo00abbouoft&itemPath=/35/items/flatlandromanceo00abbouoft&server=ia600307.us.archive.org&subPrefix=flatlandromanceo00abbouoft

    Read Aloud and full-text search have not yet been installed but you can now use the datanode to serve book images!

    If you have trouble with BookReaderImages.php, try running kdu_expand on the command line. The php script will set LD_LIBRARY_PATH, create a symlink called /tmp/stdout.bmp that points to /dev/stdout, and execute a command like:

    unzip -p '/data/b/bookid/bookid_jp2.zip' 'bookid_jp2/bookid_0001.jp2' | /petabox/sw/bin/kakadu/kdu_expand -no_seek -quiet -reduce 2 -rotate 0 -i /dev/stdin -o /tmp/stdout.bmp | (bmptopnm 2&gt;/dev/null) | pnmtojpeg -quality 75

    Set up the Webnode

    1. Install a web server

    If you are using a single server for both the webnode and the datanode, this step is already done

    2. Install the static BookReader webnode scripts

    If your docroot is /var/www, create a directory called /var/www/bookreader (note the capitalization).

    If you are using a single server for both webnode and datanode, you will now have two directories in /var/www called “BookReader” and “bookreader”. You will need a case-sensitive file system for this to work (HFS+ on OS X won’t work with this naming scheme).

    In /var/www/bookreader, install the javascript and css files from the main BookReader git directory.

    In addition, you will have to install the following (links provided):

    3. Create a luanch script

    The BookReader.inc draw() method writes the necessary HTML to render the bookreader. You can create a simple file called book.php that calls draw() and takes the bookid as a parameter:

    require_once('BookReader.inc');
     
    //assuming your book path is /data/b/bookid
     
    $id = $_GET['id'];
    $first_letter = $id[0];
     
    BookReader::draw('cluster.biodiversitylibrary.org',
        '/data/'.$first_letter.'/'.$id,
        $id,
        '',
        'test title');

    Be sure you have placed both BookReader.inc and book.php in the webserver’s docroot. You should now be able to launch the bookreader with URL such as:

    http://cluster.biodiversitylibrary.org/book.php?id=journalofnatural11lond

    Thanks for using the Internet Archive BookReader! Happy Reading!

     
    • Hank Szeto 5:07 am on April 27, 2011 Permalink | Reply

      Hi Raj,

      I’m trying to get the full text search working with the FileViewer Drupal module (which uses BookReader). In BookReader.js, around line 2708 there is a an ajax call to /fulltext/inside.php to perform the search. I wonder whether you could please point me to more documentation or the source code for inside.php? What is the XML format for the OCR XML data in bookid_abbyy.gz? And how should the search results from inside.php be formatted for BookReader.js?

      Any info would be appreciated.

      Kind regards.

      • raj 5:11 pm on May 2, 2011 Permalink | Reply

        Hi,

        I’ll work on moving inside.php into the git repository, but it might not be helpful to you, unless you run your own fulltext search engine. Here is a bit about how the archive.org bookreader uses the Open Library fulltext search engine:

        The bookreader requires a server to be running a Solr search engine to provide full-text search. Setting this up is somewhat difficult.

        Open Library runs a full-text search engine at http://openlibrary.org/search/inside, so we don’t have to run a separate instance of Solr for the bookreader.

        Since the bookreader needs coordinate information which is not stored in solr to display the highlighted phrases, the inside.php script takes the Solr result and matches it with the position information in the ocr xml file.

        Edward wrote a bit about the Open Library full text search engine here:
        http://blog.openlibrary.org/2011/02/02/search_inside_solr/

        It might be easier to create another search backend, since you probably don’t the scale that solr provides..

    • Hank Szeto 3:25 am on May 3, 2011 Permalink | Reply

      Hi Raj,

      Thanks for your reply. We will be using Solr as well. So any info or code for your setup would be of great benefit. Thank you for the excellent work on BookReader.

      Kind regards.

    • Mutugi 2:28 pm on September 15, 2011 Permalink | Reply

      Hi Raj,

      We are also trying to implement full text search as Hank Szeto wanted. Did you get a chance to add the /fulltext/inside.php file. I’m having a hard time trying to format the results of the search using a call to flipbook_search_br.php

      Thanks in advance.

    • Mang 12:57 am on September 16, 2011 Permalink | Reply

      Hi Mutugi,

      We have made those files available as part of the BookReader github project. You can find inside.php and inside.py here: https://github.com/openlibrary/bookreader/tree/master/BookReaderIA/fulltext

      Best,
      – mang

    • Mutugi 1:09 pm on September 16, 2011 Permalink | Reply

      THanks Mang!!
      We are currently converting pdfs with OCR text from abbyy to djvu xml, so that we can extract the text for full text search. Do you think it’s neccessary step? I noticed you have a abbyy converter script.

    • Rajendra 1:31 pm on October 27, 2011 Permalink | Reply

      I am trying to locate the step by step guide to installing and configuring bookreader for my own site and image repository. I am a beginner and not good with coding. I havent been able to get the instructure. Please guide!!

      regards,

    • Khaled 3:58 pm on November 3, 2011 Permalink | Reply

      in BookReaderJSLocate.php there is a call: require_once ‘/petabox/setup.inc’; and for Locator class, but setup.inc does not exist in repo.

    • Ben 2:30 pm on December 2, 2011 Permalink | Reply

      Does anyone have an openjpeg version of BookReaderImages.inc.php? Either that or could someone share an example of inputs and outputs to kdu_expand so i can figure out the exact command I need. What is the default output format for kdu_expand? that would be a big help. the only documentation i can find for kdu_expand is example commands, no man page

      • raj 9:11 pm on December 13, 2011 Permalink | Reply

        kdu_expand output format depends on the file extension of the -o paramter…

    • Arayik 9:33 am on March 7, 2012 Permalink | Reply

      Dear raj,
      we have a question about the installation process.
      Although we have installed all necessary tools on our server (Ubuntu 11.4, apache server), we are still unable to deploy the software.
      We have created a 10/items/ddd directory in the 93.187.162.218/var/www directory and located a ddd.zip folder there containing ddd.jpg file. So, we construct our link according to these inputs, namely,
      —————
      http://93.187.162.218/BookReader/BookReaderImages.php?zip=/var/www/10/items/ddd/ddd.zip&file=ddd/ddd.jpg&scale=4&rotate=0
      ———
      Yet, the software responses with an error.
      —————-
      Error serving request:
      Image error: Image stack does not exist at var/www/10/items/ddd/ddd.zip

      Debugging information:
      #0 /var/www/BookReader/BookReaderImages.inc.php(249): BookReaderImages->BRfatal(‘Image stack doe…’)
      #1 /var/www/BookReader/BookReaderImages.php(38): BookReaderImages->serveRequest(Array)
      #2 {main}
      ——————
      Please, help us with these issue.

      • raj 11:10 pm on March 7, 2012 Permalink | Reply

        It appears that you didn’t create your `ddd.zip` file the same way archive.org does. The zip file should contain a directory named `ddd/`, which contains numbered images. See the top part of this post where we run `unzip -l bookid_jp2.zip |head` to verify the structure of the zip file.

    • rack 6:20 am on March 19, 2012 Permalink | Reply

      Hi Raj
      I am new to implement this kind of stuff. Please elaborate how to change url (or not sure whether to modify function?) in order to refer to image collection named images/1.jpg, 2.jpg … etc on my server? I have modified like this but not working :

      var leafStr = ‘000’;
      var imgStr = (index+1).toString();
      var re = new RegExp(“0{“+imgStr.length+”}$”);
      var url = ‘http://www.myserver.com/images’+leafStr.replace(re, imgStr) + ‘.jpg’;
      return url;

      Please help what else to do or how to change url in similar sequence as stated above. Thanks

      • raj 7:17 pm on June 6, 2012 Permalink | Reply

        You can change the line that starts with `var url = ` to this:

        var url = 'http://www.myserver.com/images/' + imgStr + '.jpg';
        • heleneveragten 7:13 pm on May 1, 2013 Permalink | Reply

          Hi Raj,

          I’ve got the same problem! This is my input. When I go to my bookreader, it doesn’t show my images (just I sign you get when you can’t open the images)

          var leafStr = ‘000’;
          var imgStr = (index+1).toString();
          var re = new RegExp(“0{“+imgStr.length+”}$”);
          var url = ‘https://www.dropbox.com/home/images’ + imgStr + ‘.jpg’;
          return url;

          Can you help me? Thanks!!

          • raj 7:26 pm on May 1, 2013 Permalink | Reply

            The url you are using seems like it would only work if you are signed into dropbox. If you are serving the bookreader from your local machine or a different domain, perhaps you are running into issues with third-party cookies being blocked. I don’t think you will be able to get the bookreader to work well in this way.

            Also, the above instructions seem to be missing a trailing slash. it should be something like ‘https://www.dropbox.com/home/images/’ + imgStr + ‘.jpg’. You can always right-click the broken image icon and copy the image url and inspect that it works if you paste it into a different browser window.

    • Allison 5:23 pm on June 6, 2012 Permalink | Reply

      Hi Raj,

      Thank you so much for this tutorial. I am looking for info on the most basic implementation of BookReader. Where you say, “Once you download the BookReader source code to your webserver, you can load the BookReaderDemo, which will run the bookreader code with static images. You can change the location of the images to anywhere on your webserver, and you should be up and running!” — are there instructions somewhere on how to do that? I have the demo up and running, but I don’t see how to change the location of the images.
      Thank you!

    • Dominik 8:12 pm on July 20, 2012 Permalink | Reply

      Is it possible to add svg support? Would be so nice in times of retina displays and so on to have scalable svg-files, not talking about the advantages having your “data” in xml format instead of a picture.

      • Dominik 8:13 pm on July 20, 2012 Permalink | Reply

        Sorry, can’t edit: I’m talking about “pure” bookreader, I would have my own image server who’s capable of generating svgs. I just don’t get the trick on how to change the bookreader.js to show svg instead of png.

      • Dominik 9:07 pm on July 20, 2012 Permalink | Reply

        Hi raj, and thanks for your fast respone!
        I Just tried another approach here: http://nie-wieder.net/br/BookReaderDemo/noten.html#page/1/mode/1up – conclusion: works fine (except a known safari svg-bug which I would take care of when i’ll change “officially”… okay, and I haven’t tested in IE yet. But let’s say it works like a charm for my preferred browsers 😉 ) What I did: I had to change the bookreader.js drawLeafsOnePage – function, especially this line:
        var img = document.createElement(“embed”);
        and… worked. Sometimes it’s just that easy. I hope to don’t need pngs anymore so it’s totally fine with me to not having the img-tag.
        Greetings!

        • raj 9:24 pm on July 20, 2012 Permalink | Reply

          Good work, and thanks for the update!

    • LZ 2:56 am on September 12, 2012 Permalink | Reply

      Hi how do you get books in the Internet Archive format in the first place? I have some PDF documents that I want to serve on my own webserver in a user-friendly non-PDF format. Is there a way to convert them to a images with xml ocr data?

      • raj 3:06 am on September 12, 2012 Permalink | Reply

        If you upload the PDF to archive.org, IA will convert the PDF to the format that the bookreader uses, which you can then download to your own server.

        • LZ 12:32 am on April 1, 2014 Permalink | Reply

          Hi Raj, thanks for your reply. But is there software for me to convert the files offline without having to upload them to archive.org? I have a few hundred files and converting them all would be very cumbersome. Thanks!

    • Anthony 4:03 pm on October 4, 2012 Permalink | Reply

      Hi Raj,

      First, I’d like to thank you for your very comprehensive article. We’ve been able to successfully test the IA book reader on our own book collection very easily by just following your instructions.

      We still have a problem though for the TTS and full-text search functionalities. We are using ABBYY FineReader Engine 9 to perform OCR but this version does not provide Djvu xml export. It only provides exports to their proprietary ABBYY xml format.

      I was wondering if the script “petabox/sw/books/bin/AbbyyToDjvuXml.pl” used by Internet Archive would be available somewhere to make this conversion or if there would be any alternative ?

      Kind regards !

      • raj 4:30 pm on October 4, 2012 Permalink | Reply

        I’ll check about the availability of the script..

        • Anthony 1:35 pm on October 30, 2012 Permalink | Reply

          Hello Raj,

          Sorry to bother you again but I was wondering if you had the chance to check about the availability of the script AbbyyToDjvuXml.pl… It would indeed save me a lot of time if I didn’t have to create my own script. 🙂

          Thanks a lot,
          Anthony

    • Anthony 8:45 am on December 10, 2012 Permalink | Reply

      Hello Raj,

      Just a small message to tell you that I found a small bug in the BookReader which prevents iPad users from performing a search inside the book. The search input form cannot get focus.

      The issue is known and comes from jQuery. It can be solved by simply modifying jquery.ui.ipad.js :

      in function iPadTouchHandler(event),
      replace the line 344 : “if ($(event.changedTouches[0].target).is(“select”)) {”
      by: “if ($(event.changedTouches[0].target).is(“select”) || $(event.changedTouches[0].target).is(“input”)) {”

      Hope this helps.

      Best regards,
      Anthony.

    • Sandeep 11:24 pm on January 14, 2013 Permalink | Reply

      Dear Anthony, would you be able to share some information how you implemented the search into bookreader. Thanks.

    • Marco 3:36 pm on January 31, 2013 Permalink | Reply

      Hello, Raj,
      I’m trying to set up my own openlibrary server to publish IA-style books.
      I’m asking where I can find (if it exists at all, I can’t find anything like that on the whole internet…) a specification for “Internet Archive Storage for Book Data”, i.e. a full description of the data requested to add a scanned boot to my openlibrary server. In the second paragraph of this post you describe the list of files used by the bookreader, but do not fully specify their internal format…

      • raj 7:28 pm on February 1, 2013 Permalink | Reply

        Hi Marco,

        The book data isn’t stored in an openlibrary server. Open Library is only about metadata about books. The actual book images are stored on the archive.org storage cluster, and the format of an archive.org book item is described here: http://archive.org/about/faqs.php#140

    • William 5:46 pm on September 19, 2013 Permalink | Reply

      Hi Raj,

      We’re running the following command:

      unzip -p ‘/mnt/glusterfs/www/0/01A23374-4D72-4A06-9B88-EF74D0ACEE5D/01A23374-4D72-4A06-9B88-EF74D0ACEE5D_jp2.zip’ ’01A23374-4D72-4A06-9B88-EF74D0ACEE5D_jp2/01A23374-4D72-4A06-9B88-EF74D0ACEE5D_0003.jp2′ | /mnt/glusterfs/www/includes/kakadu/kdu_expand -no_seek -quiet -reduce 0 -rotate 0 -i /dev/stdin -o /tmp/stdout.bmp | (bmptopnm 2>/dev/null) | pnmtojpeg -quality 7

      and here’s the results:

      [root@clustr-03 /root]# unzip -p ‘/mnt/glusterfs/www/0/01A23374-4D72-4A06-9B88-EF74D0ACEE5D/01A23374-4D72-4A06-9B88-EF74D0ACEE5D_jp2.zip’ ’01A23374-4D72-4A06-9B88-EF74D0ACEE5D_jp2/01A23374-4D72-4A06-9B88-EF74D0ACEE5D_0003.jp2′ | /mnt/glusterfs/www/includes/kakadu/kdu_expand -no_seek -quiet -reduce 0 -rotate 0 -i /dev/stdin -o /tmp/stdout.bmp | (bmptopnm 2>/dev/null) | pnmtojpeg -quality 7
      Error in Kakadu File Format Support:
      Non-seekable JP2 sources must be read sequentially. You are probably trying to
      read from multiple boxes simultaneously.
      pnmtojpeg: EOF / read error reading magic number

      It seems like kakadu doesn’t like the binary data that’s being returned from unzip, however looking at the jp2s on their own after unzipping, they appear fine. We’re running Kakadu 7.2.2. We’ve tested multiple zipped jp2 files, both from our cluster and from IA, all with the same results.

      Any suggestions you spot there? Thanks in advance!

      • raj 11:36 pm on October 10, 2013 Permalink | Reply

        maybe your unzip command is causing a problem. You can try `unzip -p images.zip file.jp2 > tmp.jp2`, inspect the tmp file, and then try using kakadu on that file..

    • Sandeep Sahota 4:05 am on October 10, 2013 Permalink | Reply

      Hi Raj,

      We are trying to setup openlibrary on our VPS and we are having some issues, would you or someone on your team be able to guide us in the right direction? Your help in this matter is greatly appreciated. Thank you.

    • D. Shivashankar 5:27 am on February 26, 2014 Permalink | Reply

      Hi Raj,

      I am trying to configure BookReader for our archive. We don’t have jp2 files but we have tiff files. Is it possible to configure it for tiff files?

      Now I am trying to configure BookReader in my local machine. But I am getting following error when I execute following command

      ==========================================================================
      unzip -p ’34/items/book1/book1_jpg.zip’ ‘book1_jpg/723.jpg’ | /opt/Kakadu/kdu_expand -no_seek -quiet -reduce 2 -rotate 0 -i /dev/stdin -o /tmp/stdout.bmp | (bmptopnm 2>/dev/null) | pnmtojpeg -quality 75

      Kakadu Core Error:
      Code-stream must start with an SOC marker!
      pnmtojpeg: EOF / read error reading magic number
      ===========================================================================
      Please let me know how to fix this problem?

      • raj 5:15 pm on July 31, 2014 Permalink | Reply

        Kakadu is only for uncompressing jp2 files. Also, the line you posted above seems to indicate you have jpg files already.

    • Manikanda Subbu 10:39 am on July 31, 2014 Permalink | Reply

      Hi, I am planning to setup the IA book reader for my library. We already have a windows server. Is it possible to install the book reader and setup the image server in our windows server and serve from them ?

      I am new to this and please guide me..

      • raj 5:11 pm on July 31, 2014 Permalink | Reply

        No, the image server instructions above will only work for linux or other unix-like systems.

  • raj 11:40 pm on February 24, 2011 Permalink | Reply
    Tags: bookreader, , tts   

    BookReader TTS 

    The Internet Archive BookReader now contains a real-time Text-To-Speech feature to help assist our print-disabled users.

    I added some documentation on how the “Read-it-to-me” feature is implemented here:
    https://github.com/openlibrary/bookreader/wiki/Read-It-To-Me

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel