Domenic Denicola
41a6bfddb9
Fix some backward apostrophes
2015-05-20 23:43:33 -04:00
Domenic Denicola
5d86661c55
Fix end-of-thought punctuation being left out of the <em>
2015-05-20 23:43:20 -04:00
Domenic Denicola
baa98156b1
More <em> work
2015-05-20 23:43:07 -04:00
Domenic Denicola
33aaecb2ce
Tweak he asked/she asked replacement
2015-05-20 23:42:32 -04:00
Domenic Denicola
a4e722ddea
Fix <em>[single non-letter character]</em>
...
Move <em>-fixing before most others so that subsequent quote-fixing can apply.
2015-05-20 23:42:26 -04:00
Domenic Denicola
601d200c6b
Try to fix the rest of the hyphen-minuses that should be dashes
2015-05-17 16:38:56 -04:00
Domenic Denicola
eab0448455
A couple more one-off fixes
2015-05-17 16:36:16 -04:00
Domenic Denicola
ebd3776390
Fix all double-periods
...
Needs to be done on a case-by-case basis, as sometimes they become ellipses, and sometimes single periods, and sometimes there's a stray <em> or </em> in the mix.
2015-05-17 16:30:44 -04:00
Domenic Denicola
4019f5d1e6
Tweaks and bug-fixes for the cleanups
...
Several notable fixes:
- Fixed a bad bug with <span> remover: since moving the child node to a document fragment changes the indices of the childNodes collection, this would leave several nodes in limbo, with the net effect of removing their text from the document.
- Fixed the empty-<em> remover to replace the empty <em> with a space, instead of a removing it entirely; this leads to a lot fewer wordsstuck together, which were starting to accumulate erroneously in substitutions.json.
- Warn instead of error on bad substitutions: this makes it easier to actually find the bad substitution afterward, since then the output still happens.
2015-05-17 16:19:23 -04:00
Domenic Denicola
801e28d602
A lot more cleanups
2015-05-15 01:29:34 -04:00
Domenic Denicola
247f713e13
Better cleanup in the convert step
2015-05-11 22:52:14 -04:00
Domenic Denicola
ace6a55be8
Update extras code to use cache manifest
...
Finally we have reasonable chapter titles in the TOC.
2015-05-11 21:24:16 -04:00
Domenic Denicola
cf6b5c9ab9
Update conversion to work with cache manifest
2015-05-11 21:24:16 -04:00
Domenic Denicola
b68b88e17e
While downloading, save a cache manifest alongside
...
This allows us to keep track of the chapter title after the fact.
2015-05-11 21:24:15 -04:00
Domenic Denicola
ba1e7b956f
More cleanup in conversion
2015-05-09 01:28:43 +02:00
Domenic Denicola
a26f622fbb
Work on assembling the extras
2015-05-09 01:28:29 +02:00
Domenic Denicola
1f241b85ac
Serialize body as XHTML, not HTML
2015-05-09 00:40:54 +02:00
Domenic Denicola
f932be159c
More clean-ups; do these at a textual level.
2015-05-09 00:21:05 +02:00
Domenic Denicola
64de4a27e5
Clean up better
2015-05-08 00:19:06 +02:00
Domenic Denicola
8bb41473d2
Throttle conversions; move to a separate file
2015-05-07 23:56:53 +02:00
Domenic Denicola
a4e5a12fe4
Use sortable filenames, and fix output filenames
2015-05-07 23:27:08 +02:00
Domenic Denicola
a820bc517b
Don't fetch or execute JavaScript
2015-05-07 22:57:35 +02:00
Domenic Denicola
2236bb2a86
Try to parallelize the conversion more
...
Stalls before it gets to any writing though, it seems.
2015-05-06 23:08:59 +02:00
Domenic Denicola
127b6612c9
Re-architect
...
Separate out download code from the rest. Re-do the download code to store URLs alongside HTML, and add support for resuming from the last-seen URL.
2015-05-06 22:17:26 +02:00
Domenic Denicola
972fcaa294
Take care of another broken Next Chapter pathology
2015-05-06 01:30:55 +02:00
Domenic Denicola
1e346e7f9c
Handle URLs that io.js can't handle
2015-05-06 01:22:04 +02:00
Domenic Denicola
45b5b9d6fd
Restructure to be a bit more efficient
2015-05-06 01:16:27 +02:00
Domenic Denicola
925dcc6861
Detect "Next Chapter" links better
2015-05-06 00:58:52 +02:00
Domenic Denicola
ae1937df41
Lint better
2015-05-06 00:49:36 +02:00
Domenic Denicola
039c0fb2eb
Got chapter downloading and conversion working
2015-05-06 00:42:55 +02:00
Domenic Denicola
3c2f67000e
Initial commit
2015-05-05 22:47:45 +02:00