5.8.0, 5.9.0 Beta 2
QUrl.toDisplayString() with the default QUrl::PrettyDecoded seems to do nothing about IDN homograph attacks.
There are various interesting approaches for those:
- Mixing cyrillic lookalike-chars with latin ones, such as http://xn--pple-43d.com/ with a cyrillic cyrillic a-lookalike and ascii "pple" (looks like apple.com)
- Using cyrillic lookalikes only, like http://www.xn--e1awd7f.com/ (looks like epic.com)
- Using lookalikes for slashes or dots to do other funny stuff.
Browsers have been typically dealing with the first variant by showing the Punycode variant instead when there are mixed scripts which aren't commonly mixed.
Doing something against the second one is notably more difficult, as it might be a valid cyrillic word. Chromium currently simply blacklists domains with lookalike chars only when they are used with a non-internationalized TLD. Firefox is still debating on what to do.
I think at the very least, QUrl should fall back on the IDN variant with the mixed script issues (maybe using ICU's uspoof API and doing what Chromium does?). It would be nice to fix the more difficult case too, though.
I think this should be the default behavior with PrettyDecoded (which is documented as "The exact behavior of PrettyDecoded varies from component to component and may also change from Qt release to Qt release.") and there should be a flag to turn it off (AlwaysDecodePunycode or so).
Also see https://github.com/qutebrowser/qutebrowser/issues/2547 for some more thoughts and ressources.