Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-70732

QDir::entryList hides file names with unexpected normalisation on macOS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • P2: Important
    • 5.12.5, 5.13.1
    • 5.11.2
    • Core: I/O
    • None
    • macOS
    • d01693733f6c1ebe6b3709f9c1284239ce3b5354
    • Bug Fixing Week Q2/2020

    Description

      Qt tries to filter directly entries that might not be represented in UCS2 ( http://code.qt.io/cgit/qt/qtbase.git/commit/?h=5.11.2&id=094869d4a88c3d0187d2f9c03294ce32f3503533 )

      This is unfortunately too aggressive and does not take in account that there are several valid encodings for a string - the solution only test for one encoding form.

      E.g. Umlauts (ÄÖÜöäü) can be represented in multiple ways in the encoding (see https://unicode.org/reports/tr15/#Norm_Forms ) - and can be opened with both encodings on MacOS. 

      MacOS treats both encodings as identical. E.g. you can't create 2 files with different encodings.

      E.g. the file name "öäüÖÄÜß_umlauts.png" can be represented like this: 

      00000000  6f cc 88 61 cc 88 75 cc  88 4f cc 88 41 cc 88 55  |o..a..u..O..A..U|
      00000010  cc 88 c3 9f 5f 75 6d 6c  61 75 74 73 2e 70 6e 67  |...._umlauts.png|
      00000020  0a                                                |.|
      00000021
      

      and

      00000000  c3 b6 c3 a4 c3 bc c3 96  c3 84 c3 9c c3 9f 5f 75  |.............._u|
      00000010  6d 6c 61 75 74 73 2e 70  6e 67 0a                 |mlauts.png.|
      

      The first encoding still appears in the dir entry list, the second one is filtered. Both files can be opened from Qt.

       

      APFS does not force normalization according to this blog post (https://mjtsai.com/blog/2017/03/24/apfss-bag-of-bytes-filenames/ )

      1. The Apple engineer’s reply is not very helpful because it’s not clear what the “correct Normalization routines” are. If APFS is not normalized, then there really is no canonical form that you can expect to find on disk. Your code has to pick one and use it consistently. Cocoa has four different methods for normalizing strings. **

       

      Reproduce:

      If you generate or rename a file from Finder you get the longer encoding, if you create a file from command line you usually get the shorter one.

      Create a file form command line:

      echo "test" > äöü.txt
      ls äöü.txt | hexdump -C

       

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thiago Thiago Macieira
            andreas.loew andreas.loew
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes