Uploaded image for project: 'Qt'
  1. Qt
  2. QTBUG-59619

investigate memory deduplication (KSM on Linux)

    XMLWordPrintable

Details

    • Suggestion
    • Resolution: Unresolved
    • P4: Low
    • None
    • None
    • GUI: Painting
    • None
    • Linux

    Description

      I've been thinking for a long time that due to the tendency of developers not to rely on single instances of shared libraries for multiple applications, maybe memory could be saved anyway by deduplicating the parts that happen to be the same. For example there are the app bundle packaging technologies, like .app bundles on macOS, APKs which have Qt libraries included on Android, and snaps (and many others) on Linux, which encourage wasting disk space by duplicating shared libraries and effectively ensuring that they are not used as shared libraries. Other developers may choose to make static builds. In the limit, if all apps were static (or using their own copies of shared libraries) and you run 10 Qt-based applications at once, then you've wasted memory by loading 10 copies of significant portions of Qt. If I try to do better than that by using my Linux distro-provided Qt for all the normal apps, but also install an SDK downloaded from our servers, then running Creator depends on a different copy, which may be only subtly different, right? In general, multiple apps may use multiple versions of Qt, and yet it may be that much of the code is similar between versions. It depends on the memory layout and alignment whether this similarity could be discovered by blind bitwise comparison; I don't know, but this is a suggestion to investigate further.

      Public APIs don't change as much, so if the public code ends up separate in memory from the private class implementations, it would help. Putting string constants into string tables with deterministic layout might help.

      There's been recent work on explicit sharing of images between multiple processes. But it's such a small tip of the iceberg of what could be shared, on embedded systems where memory is tight. We should be finding ways to work smart and share more stuff automatically rather than having to write code to explicitly share just one type of data.

      It turns out that virtualization has prompted the invention of KSM: kernel same-page memory (renamed from kernel shared memory).

      https://www.ibm.com/developerworks/library/l-kernel-shared-memory/

      says "The KSM application programming interface (API) is implemented through the madvise system call (see Listing 1) and a new advice parameter called MADV_MERGEABLE (which indicates that the defined region is mergeable)." So maybe we would need to do that? Or is it supposed to be up to the compiler to mark "text" pages and static const data as mergeable?

      I think the next step would be for the kernel (or the runtime linker, or in cooperation) to deduplicate functions which have identical implementation rather than blocks of memory. But haven't seen any evidence that someone has started such a project, so of course it's not up to us yet.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sletta Gunnar Sletta
            srutledg Shawn Rutledge
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes