11 December 2014

Google Play Books: Last Android App Version Slowness [Solved]

Recently, I reported slowness with Android Google Play Books app when reading a fixed layout ePub. The problem is under investigation by Google.

Google Books Partner support reference: 0-8666000005666.

Since the last Android Google Play Books app version (3.2.21 - 12 November 2014), because of the new “page preview” feature (called “skim mode” by Google), it is very slow, when reading a fixed layout ePub, to go directly from one page to another if there is a lot of pages in between. For example, it can take 5 minutes to go from page 1 to page 200, in place of only some seconds with the previous app version (3.1.49) which did not have the “page preview” feature. The new app has to “build” all the previews / caches of the pages in between page 1 and page 200, which takes a lot of time. Once done, it is fast to move in between pages 1 and 200, but once you leave the book, the cache is again lost.

Remarks:
  • Producing a fixed layout ePub with a lower image resolution (e.g. 72 dpi in place of 150 dpi) does not influence the speed. The slowness is only caused by the text rendering. And fixed layout ePub files have usually “very heavy” HTML/CSS/fonts code to render the text at its exact position and exact formatting. The rendering of such pages takes between 0.1 second to more than 2 seconds per page, depending of the page complexity (and the device power). 
  • Such problem does not happen with PDF files (image only). 
  • The iOS app does not have the problem, as the “page preview” feature is not (yet) available. 
Example: Jacques Pépin New Complete Techniques, by Jacques Pépin. It takes 1 min 20 sec to go directly from page 1 to page 75! (1 sec/page) The book has 736 pages.
https://play.google.com/store/books/details?id=h1MH2KmhxdQC


Going from page 1 to page 41: the previews are not yet displayed.


Going from page 1 to page 41: the previews are displayed after about 45 seconds.


Page 41 (44-45) can be selected and displayed.

Solution: One possible solution would be that once you select the “page preview” of the page you want to read (even if the rendering of the previews is not yet finished), the app will start to render the selected page immediately (without continuing to build the previews of all the preceding pages).

UPDATE (2 MAY 2015): The problem is solved in version 3.3.41 (released on 16 April 2015), or maybe in a previous version already.

Google Play Books: About a Bug with SVG Vector Images [Solved]

Recently, I reported a bug regarding SVG vector images in Google Play Books. The bug has since been corrected by Google.

Google Books Partner support reference: 5-2966000005478.

When you upload an ePub file to the Google Books Partner Center, it takes some hours to be processed. The following steps have to be checked once all the treatment is finished:

GPB: Google Play Books (Google Play store)
GB: Google Books

1. GPB app reader (Android): is the eBook displayed fine?
2. GPB app reader (iOS): is the eBook displayed fine?
3. GPB web interface: is the downloaded ePub file displayed fine in an eBook reader? 
4. GPB web interface: is the downloaded PDF file displayed fine in an eBook reader?
5. GPB web interface (web reader): is the eBook displayed fine?
6. GB web interface (web reader): is the preview of the eBook displayed fine?

Vector images are composed of paths. A path can have a stroke and/or a fill. The bug I reported was concerning steps 4, 5, and 6, for ePub with SVG code (vector images) embedded in the XHTML code. The vector images were displayed fine in the Google Play Books app, but not in the PDF file and in the web reader: the stokes were not displayed! After Google solved the problem, the web reader is displaying the images correctly (but it looks like there are using the fallback from vector/svg to bitmap/png).
Left: the path with a blue stroke and a red fill.
Right: the path with the stroke not displayed.

I reported a similar problem with the open source software Scribus (desktop publishing):

The fill of a curve with only two nodes is not exported in the PDF (Solved in Scribus 1.5)

13 November 2014

Fixed Layout ePub: A Practical Guide to Publish eBooks from PDF Files

I just have updated my book about fixed layout ePub.

Old title: A Practical Guide to Convert a PDF File to an ePub Version 3 Fixed Layout File (May 2014 - No ISBN).

New title: Fixed Layout ePub: A Practical Guide to Publish eBooks from PDF Files  (November 2014 - ISBN: 978-1502809506).

Buy paper version on Amazon.
Buy electronic version on Google Play Books.

Preview the book on Google Books.

23 May 2014

A Practical Guide to Convert a PDF File to an ePub Version 3 Fixed Layout File

I just published a very small ebook which can be bought on Google Play Books store and previewed on Google Books.

 Google Play Books

This is the beginning of the book (the rest is mainly technical stuffs to make the conversion from pdf to html, then from html to epub):

Fixed Layout

Different file formats exist for fixed layout ebooks. Bellow a list of the main ones:

- PDF (Portable Document Format) [.pdf]
- DjVu (Déja Vu) [.djvu]
- ePub (electronic Publication) [.epub]
- Apple iBooks (similar to ePub) [.ibooks]
- Amazon Kindle (similar to ePub) [.kf8]

In this book, we will focus mainly on the conversion of a PDF file to a fixed layout ePub file. This is possible since the version 3 of the ePub format which includes now the fixed layout mode in addition to the traditional flowing text mode.

This type of conversion can be very useful as the page layout programs (e.g. Scribus) are always exporting the final result as a PDF (optimized for paper or online publication).

The "ePub 3.0 Fixed Layout (FXL) Format Specifications" published by the International Digital Publishing Forum (IDPF) can be found here:

http://www.idpf.org/epub/fxl

A "Field Guide to Fixed Layout for E-Books" published by the Book Industry Study Group (BISG) is available for free here:

http://www.bisg.org/publications/field-guide-fixed-layout-e-books

The ePub version 3 format uses all the modern Web technologies like HTML5, CSS3, JS, SVG, XML, XHTML, WOFF, etc.

Important remarks:

1) This book is only about fixed layout ePub. Fixed layout can be used if the book has a sophisticated layout with lots of images. Such fixed layout books are made with desktop publishing (DTP) programs like Scribus, Adobe InDesign, Quark XPress, or Microsoft Publisher. For books with only text or with few images, a flowing text ePub is more suitable and more easy to do.

2) Most of the PDF to ePub converters do not work for sophisticated layout because they convert a fixed layout PDF into a flowing text ePub, which gives most of the time an ugly and unusable result unless the file is heavily adapted. They just extract the text and the images from the PDF, and put then sequentially into a flowing text ePub with all the layout gone.

3) Most of the ePub viewers do not support (yet) the fixed layout. If you try to display a fixed layout ePub with such viewer, the result will be ugly and unusable. Two good ePub viewers supporting the fixed layout are Google Play Books (for tablets running under Google Android or Apple iOS (iPad)) and Readium (for laptops or desktops running under Microsoft Windows, Apple OS X (Mac), or GNU Linux; it is a Google Chrome browser extension). Most of the time, small screens are not suitable for fixed layout books. Such books should be read on tablets, not on smartphones.

Conversion Methods

There are three main methods to convert a PDF file to an ePub fixed layout file:

1) Method 1: Bitmap image only + Hidden text

Each ePub page is a bitmap image (PNG8, possibly PNG24 or JPEG) of an exact replica of the PDF page. This bitmap image is the result of the rendering of the text (using vector fonts), bitmap images, and vector images. To maintain accessibility (select text, copy/paste text, search text, text to speech, etc.), an invisible text layer is added on top of the image. This is also the way used to convert a PDF file to a DjVu file. Some PDF files are also made like that, mainly when they are the results of scanning paper books (the text layer is made by OCR).

2) Method 2: Image + Text

Probably the best method, but more sophisticated than the first one, is to add on each ePub page a bitmap image (JPEG, possibly PNG) which is made of all bitmap and vector images of the PDF page, or a bitmap and vector image (SVG). The text is not converted in a bitmap image or inserted in the SVG file, but added on the ePub page by using XHTML5 and CSS3. The CSS uses: a) absolute positioning to put the text at the exact same place than in the PDF page; b) styles and fonts for the text to look exactly the same as in the PDF page. These two last steps are challenging, because HTML5 cannot always do what the PDF format can; lots of free and commercial tools exist, but most of the time cannot do that correctly when it comes to fixed layout.

3) Method 3: SVG only

The bitmap images, the vector images, and the text are embedded in SVG files (one SVG per page). The text should be rendered as true text (with fonts), not just outlines of the glyphs (vector images). Also called: SVG in the spine (no XHTML).

In the following of this book, I will only focus on the second method (image + text).

Conversion Tools

There are free open source and commercial tools to convert PDF to ePub3-fxl, but some have drawbacks.

The tool and the method I will describe below is free, and give a very good result for the visual aspect and for the text accessibility. The tool I will use is pdf2htmlEX, developed by Lu Wang (speudo: coolwanglu), a Chinese PhD student at the Department of Computer Science and Engineering of the Hong Kong University of Science and Technology. You can find it here:

http://coolwanglu.github.io/pdf2htmlEX

This tool, as its name tells us, does a conversion of the PDF pages to HTML pages, and does not produce an ePub file. To get an ePub3-fxl file, I will show how to use the result produced by pdf2htmlEX, to create the ePub3-fxl file. It means mainly: a) remove the HTML viewer that pdf2htmlEX produces and integrates in the result; b) create all the files required by the ePub format and wrap the result into one unique file.