&lrm; – tags not stripped

bartvb · June 3, 2022, 8:41pm

I imported the subtitles of a few episodes from Netflix. It took a few attempts because the LingQ extension complained it found no subtitles, but after trying a few times, it was smooth sailing, which is good.
However, at least one HTML-code was not stripped, and that is a bit annoying. For example, ‎ is not stripped, which results in sentences like this:
お気にいりの‎歯(は)‎なのに‬
This happens a lot in programmes for children, where the pronunciation of kanji is put in parentheses, but not only there. I understand that stripping out these parentheses or even interpreting these parentheses in some way would be a step too far and would lead to more problems than it solves, But would be it be possible to strip out those codes?

zoran · June 5, 2022, 2:29am

I’ll check with our developers, I think we should be able to do that. We’ll look into it.

bartvb · June 5, 2022, 3:36am

That would be a significant improvement, Zoran. Thank you!

joca20 · June 25, 2022, 10:57pm

I would appreciate this as well! I’ve recently tried getting Netflix subtitles but the sentences that are brought in from japanese shows are too hard to read with the ‎

zoran · June 26, 2022, 1:31am

@joca20 we are looking into this. Can you please send that Netflix show URL to support(at)lingq.com? Thanks!

azarya · June 26, 2022, 8:13am

I don’t think I’ve ever imported a show where this does not happen.

zoran · June 27, 2022, 3:37pm

Thanks, our developers are working on it and I hope we will have it solved soon.

rafarafa · June 28, 2022, 11:41pm

Pretty much. This bug has being there since before 5.0 launch. There’s even a bug report in ‘LingQ Librarians’ forum room dating from 7 months back by user spidersylar, and I remember that I had already experienced the bug before reading it there.
I guess adding one line of regex rightfully deserves 7 months, and I hope you can tell how excited I am to have to wait another 7 months for you to fix the boldface bug that I reported 1 month ago (which, incidentally, is not but a particular case of a formatting bug that I reported in the 5,0 beta… so technically we already are at the 7 months mark).

zoran · June 29, 2022, 3:04am

We will push a fix for this within next 2-3 days. Thanks for your patience.

pezcharles · February 15, 2024, 4:51pm

Hi! I’ve imported a Netflix show with Japanese subtitles (Alice in Borderland, S01E01: Watch Alice in Borderland | Netflix Official Site) and all the lines in LingQ start with the tag “&lrm ;”. This is considerably frustrating, especially when text-to-speech is added. Is there any fix? Thanks!

zoran · February 15, 2024, 10:25pm

That seems to be an issue with the subtitles format for some videos. Nothing we can do about it at the moment unfortunately.

roosterburton · February 16, 2024, 12:35am

&lrm is a formatting tag for left → right writing position. It just means that the text should start at the left and not at the right which would be &rlm.

My guess is that LingQ checks and removes RLM but LRM, because… most websites don’t bother with this marker. If it affects Netflix, that is another story and worth reconsidering your stripping function.

  // Remove HTML entity for Left-to-Right Mark (LRM)
        cleanedText = cleanedText.replace(/&lrm;/g, "");

From another post, I saw that at the bottom of the Lesson Editor screen is a Find/Replace feature which you could use to mass remove those tags.