Wordpress export to ebook script

So, last weekend I wrote a script that takes the export from a WordPress site and converts it into a Microsoft Word document (for submission to Smashwords or the Amazon Kindle store). It's pretty bare bones right now: it won't handle font changes, and the only formatting it accepts are italics (not bold, for example) because that's all I use in my books. Anyway, I wanted to see if I could make it more robust and see if it works as well for other people's stories as it does my own. It will insert a copyright page, a linked table of contents, and chapters based on post headings, however (and if you've worked with Smashwords before, it's the equivalent of the 'nuclear' formatting option, which minimizes the font styles used in the final document).

If anyone is interested in providing an export, please contact me at [email protected]

I'll return any word docs the script produces, and after getting feedback so I can update/improve the script I'll delete my copies of the docs and the provided exports. :)


I envy anyone who can export from Wordpress without need for revisions. I need at least 3 rounds of edits before I'm halfway satisfied with the ebook version - and maybe not even then.

If you can solve the formatting issue, this would be a neat tool for those of us who write professionally by default.

(really, I envy you guys. Grrr...)

Haha. Originally, I used a script that would cut apart a word doc and schedule it as chapter posts in WordPress. Since then I've started doing my wdits and revisions to the posted chapters, and then exporting them afterward. It lets me leverage all of the typo reports from my readers while keeping my best work all in one spot. Since the export/convert process takes only a few minutes, I just redo that whenever Ive made significant online edits.

What formatting would be most useful to you? Including bold? Multiple fonts? Right nowit does headings, paragraph text, and maintains italics.

I'm not the target audience! But, if you want my opinion... :)

Ebooks should look like 'real books', meaning that new paragraphs are indented and there are no spaces between paragraphs. Here's a random sample from google: http://jamigold.com/wp-content/uploads/2015/10/Ironclad-Devotion-ebook-formatting.jpg

So if your script could indent and remove those spaces between paragraphs, that would be pretty neat. The double spacing is unique to web fiction.

I'm not sure if multiple fonts are supported by most ereaders. They might not display properly on every model. I'm getting my books formatted by Polgarus (who do Hugh Howey's books, among others) and they advise against fancy fonts.

Oh, yeah: that's pretty easy. That can be adjusted in the paragraph style inside of the Microsoft Word document. My current default is no indentation, with an 8 pt spacing between paragraphs and 1.08 line spacing within paragraphs, using Calibri 11 pt font for the text.

All of that can be changed to whatever default formatting is desired from inside of Microsoft word, and I can probably set it up so that I just have a toggle between the 'indented, no spacing between paragraphs' and 'not indented, extra space between paragraphs' which seem to be the two preferred styles for eBooks.

Even though you aren't the target audience, if you'd like to see what the results would look like on an export of your site, let me know. Personally, I think it would be pretty cool to get to compare the results of my script to the results sent back by a professional service.

Awesomesauce! Sounds great. And I wouldn't mind sending you the epub / mobi from Polgarus if you're curious about it. I'm going to offer a free copy to anyone who ever reviewed Anathema. :)

I should have it in... 3 weeks or so.

@Eren: I'm currently looking for a new job, and while I'm doing it, I'm also doing some programming. Specifically, I'm renewing work on this plugin.

At some point, we may want to talk about connecting the two projects.

@Jim Zoetewey: That's pretty interesting. I wrote the script I'm currently using in python -- unfortunately, I have pretty barebones experience with javascript (most of my web experience was with .asp). It also relies on having a copy of MS Word installed -- it uses win32com to automate the MS Word installation and then transfer the content from the WordPress export file into a new MS Word document, after which it can be saved in any format that MS Word supports. In my case, a .doc that I can submit to Smashwords and convert into an ePub for Amazon.) It uses categories to determine what posts belong to what book and post titles as chapter titles (though it can re-name and re-number those, as well -- I added that in because I wanted to do some formatting on my chapter titles, like stripping out the book number that I include on post titles for the blog.)

I was debating with myself a while back over if it would be worth it to make a web-fiction specific blog implementation in DJANGO that would provide a better reading experience while having more of the common web-fiction formating pre-integrated.

The alternatives I came up with didn't include building a wordpress plugin -- but that was quite a bit out of my grasp, heh. Instead, I thought it might be feasible to make a wordpress theme that had the desired formatting for web fiction pre-included. Things like an obvious previous/next link at the top of a post, bottom of a post, and bottom of the comments section, a table of contents section based on the blog's categories so that new chapters are automatically added in, etc. (for an example, check the table of contents for one of my books) and things like that.

Or, as a final alternative, developing a standalone RSS feed reader that behaved more like a kindle/nook than like most RSS readers I've had experience with, and that had it's own bookmarking features and such built in.

Anyway, with all of that said, I'd be delighted to collaborate if there's any way I can help.