Details
Description
Qt tries to filter directly entries that might not be represented in UCS2 ( http://code.qt.io/cgit/qt/qtbase.git/commit/?h=5.11.2&id=094869d4a88c3d0187d2f9c03294ce32f3503533 )
This is unfortunately too aggressive and does not take in account that there are several valid encodings for a string - the solution only test for one encoding form.
E.g. Umlauts (ÄÖÜöäü) can be represented in multiple ways in the encoding (see https://unicode.org/reports/tr15/#Norm_Forms ) - and can be opened with both encodings on MacOS.
MacOS treats both encodings as identical. E.g. you can't create 2 files with different encodings.
E.g. the file name "öäüÖÄÜß_umlauts.png" can be represented like this:
00000000 6f cc 88 61 cc 88 75 cc 88 4f cc 88 41 cc 88 55 |o..a..u..O..A..U| 00000010 cc 88 c3 9f 5f 75 6d 6c 61 75 74 73 2e 70 6e 67 |...._umlauts.png| 00000020 0a |.| 00000021
and
00000000 c3 b6 c3 a4 c3 bc c3 96 c3 84 c3 9c c3 9f 5f 75 |.............._u| 00000010 6d 6c 61 75 74 73 2e 70 6e 67 0a |mlauts.png.|
The first encoding still appears in the dir entry list, the second one is filtered. Both files can be opened from Qt.
APFS does not force normalization according to this blog post (https://mjtsai.com/blog/2017/03/24/apfss-bag-of-bytes-filenames/ )
- The Apple engineer’s reply is not very helpful because it’s not clear what the “correct Normalization routines” are. If APFS is not normalized, then there really is no canonical form that you can expect to find on disk. Your code has to pick one and use it consistently. Cocoa has four different methods for normalizing strings. **
Reproduce:
If you generate or rename a file from Finder you get the longer encoding, if you create a file from command line you usually get the shorter one.
Create a file form command line:
echo "test" > äöü.txt
ls äöü.txt | hexdump -C