How to remove furigana when importing .epub files
Lately, books that I've been importing and where I use this method, the conversion will remove not only the furigana but will also delete some of the kanji (though never the first kanji of the word). Has anyone encountered this or know a fix?
Are you adding the ' \1 ' in the replace section? If you don't do that then you will lose some Kanji as well.
yes I am. The only thing that changed is that I started reading light novels. As far as I'm seeing, it's only happening with light novels (but I haven't tried extensively)- and that with all of the light novels I have tried (about 5). I don't know if it has something to do with that.
I've just found a workaround for it, which I'll post here in case anyone else would have the same issue:
1. make an EPUB of the file if it's not already epub (leave the furigana in the file)
2. drop the EPUB in the online ereader: https://ttu-ebook.web.app/
3. Switch off furigana in the ereader
4. Select all the text and copy paste in a word document.
I still add the following steps to get a good layout:
5. Add extra line breaks by dropping the text here: https://www.textfixer.com/tools/add-line-breaks.php
6. Copy the new text and paste it again in word.
7. Change font and size if needed.
8. Save the document and import into the lingq 5.0 importer (better layout than ling 4.0)
I had a similar issue with a bit of ruby text in a book where there were multiple furigana in a single word. Throw the above formula into a regex tester and you'll see it. Here is a clip of text to test it on (the below text uses a different style with rb and rt, but use your imagination here to see how it would adapt):
So the ruby bracket opens, two kanji are defined, ruby tag closes. This regex will grab that first kanji and discard the other(s), thus, data loss. I couldn't solve this with a single regex (because I'm just not that good at regex). But I did limp it along by stripping the <ruby> tags entirely, then processing the other sections independently. The formulas stack one after the other and look like this:
replace with: \1
replace with: \2
Worked like a charm!
This didn't really work for me. I processed the two regex you mention through calibre and it did seem to remove the furigana, but then displayed it as regular text.
The result of this as per your example: そういいたいのに、言葉がのどにつかえたまま
Then becomes this: そういいたいのに、言こと葉ばがのどにつかえたまま
This removes the furigana, but puts it in the actual text.
Either I did something wrong, or this is your intention?
No, it was definitely not my intention to put it into actual text. It seems like the furigana is coded differently in different books. Are you able to see the original text in code form? Under the Search and Replace text, my version of Calibre has a wizard that lets you see the text code of the book, it might illuminate which makes each book a bit different.
I wonder if someone made a tool to strip this out of ebooks (before or after they are converted), it would save us all a lot of trouble.
I had the same problem where the furigana just showed up in the converted text.
The original ebook file contained newlines between each tag, so the regex did not match.
What solved this for me was to split the regex into multiple rules and add them independently:
- SEARCH: <rt>((.|\n)*?)</rt> REPLACE: (leave empty)
- SEARCH: <ruby> REPLACE: (leave empty)
- SEARCH: </ruby> REPLACE: (leave empty)
You could probably combine the last two rules, but this worked for me.
Added an easier method above in the original post.
Thanks a lot, I'll try it out!
I got some help from the forums years ago to do this using Calibre's search and replace feature for kindle books after I removed the DRM. It works really well and I have this saved somewhere if anyone's interested I can find it.
Brilliant, thank you!
thanks for these links! It might help me, Although I am still confused about Hiragana at Kanji, still! thanks!
WANT TO LEARN A NEW LANGUAGE?
Learn from content you love!Sign Up Free