June 03 laiku: 20:41

I imported the subtitles of a few episodes from Netflix. It took a few attempts because the LingQ extension complained it found no subtitles, but after trying a few times, it was smooth sailing, which is good.

However, at least one HTML-code was not stripped, and that is a bit annoying. For example, ‎ is not stripped, which results in sentences like this:


This happens a lot in programmes for children, where the pronunciation of kanji is put in parentheses, but not only there. I understand that stripping out these parentheses or even interpreting these parentheses in some way would be a step too far and would lead to more problems than it solves, But would be it be possible to strip out those codes?