Blog to Ebook Conversion Program v1

Since I wrote Converting WordPress Blog to Kindle Ebook, I’ve built a PHP program that can take any WordPress blog and output a zip file of the blog’s posts. With this zip file, Calibre can create an ebook PDF or MOBI file.

The program consumes a great deal of memory and takes a few minutes to complete. For that reason, I’ve limited the script to only do 20 posts maximum, it simply cannot run in a shared hosting environment.

If you don’t already, you’ll need the ebook program Calibre to compile the HTML files and images in the zip file into a PDF, EPUB, or MOBI ebook file.

The actual program has four steps:

First, you enter the blog domain URL in the form and hit Submit. On the backend, the program will generate a text file with all the blog post URLs sorted by oldest first.

Then, the second form will randomly take 3 of those URLs and try to figure out the page’s HTML tags for the post title and content. The largest challenge is that each WordPress blog has a different structure so figuring out how to only extract the title and post content can be difficult. At the bottom of the page, I provide different generic HTML tags that usually matches up the title and post’s body. You can also input a custom HTML tag and attribute. Most blogs seem to use a <div id=”post”></div>. Once you’ve selected the right HTML tags, the program will save all the blog posts as an HTML file, save all the embedded images, and make everything available for download as a zip file.

After downloading the provided file, extract its contents, and drag and drop the Calibre.html file into the Calibre program.

Calibre will generate its own zip file where it says, “Formats: ZIP”.  If you want the images included, you’ll need to manually move the images folder into Calibre’s zip file.

There are PHP libraries out there that can convert HTML files too, but they’re rather resource intensive. Maybe, I’ll add it in the next version.

Some future possible upgrades:
– Have the PHP Program convert the blog automatically to MOBI/PDF rather than rely on Calibre (biggest issue is security and resource intensity)
– AJAX Interface, have everything handled on one page rather than multiple
– Frontend is rather ugly and not most intuitive. Was focused primarily on getting it working before making it user friendly.

See it in action here:

http://www.peterxpark.com/blogebook/blogform.php

[ssba]

3 thoughts on “Blog to Ebook Conversion Program v1”

  1. This is very cool. I’ve always wanted an easy way to get blog posts onto my kindle. Is this a product that you’re hoping to launch? I’m certain there’d be a demand for a professional version.

  2. Hello. For converting pdf to mobi or epub, I suggest you try an online converter, like http://kitpdf.com I am using it and so far I like it really much because it works pretty fast and I haven’t found any mistakes after conversion. Hope you like it! Let me know what you think about it. 🙂

  3. Thanks Siar, I’ll definitely look into it.

    I was actually on version 3 of this program but stopped developing it after a while. I had converted it to a Python program which can build an ebook without any third party programs and ran 10x faster. I just don’t have the time right now to work on it. I might throw it up on github at some point

Leave a Reply

Your email address will not be published. Required fields are marked *