Blog to Ebook Conversion Program v1

Since I wrote Converting WordPress Blog to Kindle Ebook, I’ve built a PHP program that can take any WordPress blog and output a zip file of the blog’s posts. With this zip file, Calibre can create an ebook PDF or MOBI file.

The program consumes a great deal of memory and takes a few minutes to complete. For that reason, I’ve limited the script to only do 20 posts maximum, it simply cannot run in a shared hosting environment.

If you don’t already, you’ll need the ebook program Calibre to compile the HTML files and images in the zip file into a PDF, EPUB, or MOBI ebook file.

The actual program has four steps:

First, you enter the blog domain URL in the form and hit Submit. On the backend, the program will generate a text file with all the blog post URLs sorted by oldest first.

Then, the second form will randomly take 3 of those URLs and try to figure out the page’s HTML tags for the post title and content. The largest challenge is that each WordPress blog has a different structure so figuring out how to only extract the title and post content can be difficult. At the bottom of the page, I provide different generic HTML tags that usually matches up the title and post’s body. You can also input a custom HTML tag and attribute. Most blogs seem to use a <div id=”post”></div>. Once you’ve selected the right HTML tags, the program will save all the blog posts as an HTML file, save all the embedded images, and make everything available for download as a zip file.

After downloading the provided file, extract its contents, and drag and drop the Calibre.html file into the Calibre program.

Calibre will generate its own zip file where it says, “Formats: ZIP”.  If you want the images included, you’ll need to manually move the images folder into Calibre’s zip file.

There are PHP libraries out there that can convert HTML files too, but they’re rather resource intensive. Maybe, I’ll add it in the next version.

Some future possible upgrades:
– Have the PHP Program convert the blog automatically to MOBI/PDF rather than rely on Calibre (biggest issue is security and resource intensity)
– AJAX Interface, have everything handled on one page rather than multiple
– Frontend is rather ugly and not most intuitive. Was focused primarily on getting it working before making it user friendly.

See it in action here: