My ebook creator

Hi! I recently wrote some software for scraping episodic story sites and turning them into ebooks. For example, a command like python2 stories/void_domain.json will download all of Void Domain and create a single well-formatted EPUB ebook for it. You can extend it to work on whatever stories you want: it should work on nearly any website. It is also fairly easy to write these extensions; for example, the set of instructions telling the software how to download Void Domain is only 26 lines long. Writing a custom script to do the same thing would be far more difficult.

I was inspired to write this software by several stories on Web Fiction Guide, which is why I'm announcing it here.

I probably won't write extensions for the stories you want, but let me know if you need help writing them yourself or using the software.

Here it is: **link removed**

Sounds great, but... I hope you aren't using the software to create free ebooks for stories that have actual ebooks for sale. Or if you don't do this, that no one else does.

Thanks for making this available aarachne ! Will be super convenient.

Is there a way I can block this? I really don't want unauthorized ebook copies floating around of my work. That being said, this is very neat from a coding perspective.

@leoduhvinci: You can't block all web scrapers but you can block most by simply including a piece of JS code that has to be executed for the story to load. Not a long piece, perhaps just something that calculates 5*5, but it has to be done for the story to load. Since most webscrapers don't parse the JS code on a website, that'll catch most of them.

Also, aarachne, I did do the "make public web scraping thing" as well, and everybody jumped on me. Apparently it's a bit taboo. Perhaps it's changed, but I don't think so.

Sorry, not trying to "jump"! I do think it's a neat idea and neat bit of code.

From the author's perspective, however, it can have a pretty big impact. There are times where we might have to take down our work, times when others pass it off as their own, and sites that retain pirated copies.

Does it work because Void Domain is using the json file format? I.e., is it reading json metadata to know which pages to scrape? How would this work on database driven sites that are serving up database-driven pages?

It would be interesting to play around with on my site to see if it can tell the difference between all the different types of content on it...

Oh, I see. It's not Void Domain that's using json, it's the void domain "module" you're running. That makes more sense...

Kids these days. Don't know how good they have it. Back in my day, we had to cut and paste each entry onto a document, uphill both ways, through snow!

@leoduhvinci If you block HTTP user agents starting with Python-urllib, then that'll stop the version I've published, but it would be trivial to modify the software to use a more reasonable user agent. It'd be difficult to block in that case: about as difficult as stopping adblock-plus-style adblockers from blocking your ads (that is, possible but difficult).

My intention in creating this was that people would use the resulting ebooks for personal use and then delete them. Passing around a snapshot ebook of an ongoing story would be silly IMO, and this very practice was one of my inspirations for creating this tool (e.g. there's a very outdated ebook of Dungeon Keeper Ami floating around). With, you no longer have to rely on some random person to get around to fixing his fragile regex-based script and publish an updated ebook. If anyone does distribute ebooks created by, I'd hope that they would at least share their json story file so that other people can update it.

On monetization: I don't know about other people, but there's no way that I'm going to pay for an ebook if the story's available for free online, doubly so if I haven't even read any of the story yet. I do support several serial-authors on Patreon, which I think is a better way of handling it.

As this is essentially software for copyright violation, I've removed the link and am closing the thread. Sorry.