How to remove furigana when importing .epub files

meraleigh · October 19, 2020, 11:30pm

Issue:
When importing an .epub file into Lingq the furigana is added after the kanji which affects reading and translation.

NEW (Easier) Solution:

Using Calibre go to CONVERT from existing file format (eg. epub, azw3, pdf, etc)
In the CONVERT menu there is a Search and Replace Option, go there
Add the following regex to the SEARCH bar:
(.?).?
4.) Add the following regex in the REPLACE bar:
\1
Click the ADD button
Continue with the file conversion to epub

OLD Solution:

Using an archive editor such as 7zip (https://www.7-zip.org/) open the epub file as if it were a .zip. (NOTE: you may need to rename the epub file extension to .zip first if using a different archive editor but no rename is necessary when using 7zip)
Each epub can be different but you should see various folders, look around until you find a style sheet file usually named ‘book-style.css’. Open this in a text editor.
You should see some code at the top similar to this:
@charset “UTF-8”;
@import “style-reset.css”;
@import “style-standard.css”;
@import “style-advance.css”;@charset “UTF-8”;
@import “style-reset.css”;
@import “style-standard.css”;
@import “style-advance.css”;
Add this line of code after those @import commands:
ruby rt { visibility: hidden; }
Make sure to save the file and let 7zip update the archive with the save. (you will be prompted to do this when you try to close the archive)
Using free software such as Calibre (https://calibre-ebook.com/) you should convert the epub file now to a .docx file
This docx file will not have the furigana and can be imported into lingq

Simonsays · October 21, 2020, 1:41pm

thanks for these links! It might help me, Although I am still confused about Hiragana at Kanji, still! thanks!

azarya · March 30, 2021, 11:17am

Brilliant, thank you!

kraemder · March 31, 2021, 1:31am

I got some help from the forums years ago to do this using Calibre’s search and replace feature for kindle books after I removed the DRM. It works really well and I have this saved somewhere if anyone’s interested I can find it.

azarya · March 31, 2021, 8:59am

I’m interested!

meraleigh · April 10, 2021, 8:54pm

Added an easier method above in the original post.

azarya · April 11, 2021, 8:15am

Thanks a lot, I’ll try it out!

azarya · November 21, 2021, 1:55pm

Lately, books that I’ve been importing and where I use this method, the conversion will remove not only the furigana but will also delete some of the kanji (though never the first kanji of the word). Has anyone encountered this or know a fix?

meraleigh · December 4, 2021, 6:13pm

Are you adding the ’ \1 ’ in the replace section? If you don’t do that then you will lose some Kanji as well.

azarya · December 5, 2021, 5:51pm

yes I am. The only thing that changed is that I started reading light novels. As far as I’m seeing, it’s only happening with light novels (but I haven’t tried extensively)- and that with all of the light novels I have tried (about 5). I don’t know if it has something to do with that.

I’ve just found a workaround for it, which I’ll post here in case anyone else would have the same issue:

make an EPUB of the file if it’s not already epub (leave the furigana in the file)
drop the EPUB in the online ereader: https://ttu-ebook.web.app/
Switch off furigana in the ereader
Select all the text and copy paste in a word document.

That’s it.

I still add the following steps to get a good layout:

Add extra line breaks by dropping the text here: Add Line Breaks
Copy the new text and paste it again in word.
Change font and size if needed.
Save the document and import into the lingq 5.0 importer (better layout than ling 4.0)

storercd · April 18, 2022, 8:01pm

I had a similar issue with a bit of ruby text in a book where there were multiple furigana in a single word. Throw the above formula into a regex tester and you’ll see it. Here is a clip of text to test it on (the below text uses a different style with rb and rt, but use your imagination here to see how it would adapt):

そういいたいのに、言こと葉ばがのどにつかえたまま

So the ruby bracket opens, two kanji are defined, ruby tag closes. This regex will grab that first kanji and discard the other(s), thus, data loss. I couldn’t solve this with a single regex (because I’m just not that good at regex). But I did limp it along by stripping the tags entirely, then processing the other sections independently. The formulas stack one after the other and look like this:

regex: (.*?)</ruby>
replace with: \1

regex: ((.?)</rb>.?</rt>)
replace with: \2

Worked like a charm!

azarya · April 24, 2022, 7:47am

@storercd
This didn’t really work for me. I processed the two regex you mention through calibre and it did seem to remove the furigana, but then displayed it as regular text.

The result of this as per your example: そういいたいのに、言葉がのどにつかえたまま

Then becomes this: そういいたいのに、言こと葉ばがのどにつかえたまま

This removes the furigana, but puts it in the actual text.

Either I did something wrong, or this is your intention?

storercd · April 24, 2022, 10:55pm

No, it was definitely not my intention to put it into actual text. It seems like the furigana is coded differently in different books. Are you able to see the original text in code form? Under the Search and Replace text, my version of Calibre has a wizard that lets you see the text code of the book, it might illuminate which makes each book a bit different.

I wonder if someone made a tool to strip this out of ebooks (before or after they are converted), it would save us all a lot of trouble.

deltaflops · November 25, 2022, 8:39am

I had the same problem where the furigana just showed up in the converted text.
The original ebook file contained newlines between each tag, so the regex did not match.

What solved this for me was to split the regex into multiple rules and add them independently:

SEARCH: ((.|\n)*?) REPLACE: (leave empty)
SEARCH: REPLACE: (leave empty)
SEARCH: REPLACE: (leave empty)
You could probably combine the last two rules, but this worked for me.

nfzvyvbj7s · March 16, 2024, 6:25pm

Hi, it seems the Regex statements are not displayed complete here (e.g. deltaflops’ post has the 2nd and 3rd search statements empty and meraleigh’s new solution just asks to search for (.?).?).
Would appreciate if someone could please point me to the right regex statements to use with Calibre?

nfzvyvbj7s · March 23, 2024, 3:53pm

So I have been experimenting myself a bit and this is the best I have come up with so far:

Search “<#ruby>(.<#em>?)<#rt>.</#rt></#em>?</#ruby>” and replace with “\1”
Search “<#rt>((.|\n)*?)</#rt>” and replace with “”
(Input into Calibre without quotes “” and without hashtags # … only put them in there for the tags to show up in the forum software).

Seems to remove furigana but also introduces some occasional weird spacing issues. Maybe there is still a chance to revive this thread and got some expert advice (@deltaflops)?