Details
-
Task
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
6.9
-
None
Description
The problem
Our current API workflow for screen/window capturing is very strict:
- Assume window/screen capturing is always possible on the current device
- Enumerate windows/screens that are available
- Select a specific window/screen, and capture that
However, this workflow does not work on the following platforms: macOS*, Linux+Wayland, Web, Android. These platforms do not allow us to enumerate windows/screens at all. Instead, we can request that the operating system shows the end-user a dialog, and allow the user to determine what portion of the screen/window to capture. The content that is actually captured is then transparent to our application logic. This dialog is often called a "content-picker".
The current API does not support this workflow well, and gives our customers not enough information to develop proper UIs to accommodate. The current state is that we sometimes expose screens/window as available, but they are never used. Instead, we open the content-picker whenever "setActive(true)" is called. In many cases, we report the ScreenCapture.active as true even if the user is currently selecting content, or the user aborts the capture request. This behavior is currently bugged most of the time.
Developers of Qt applications have no cross-platform API to use in order to determine ahead of time whether they should expect a list of windows, or if they should expect that the system uses a content-picker, or if the device supports no capturing at all. Common applications such as Discord or Microsoft Teams tend to rely on the content-picker, and if the system has no content-picker, they create their own UI that allows the user to select the content. With the current Qt public APIs, this is not possible.
We should extend our current APIs, so that we can support these workflows. This should include exposing whether the system prefers a content-picker, and allowing the Qt developer to choose between relying on the content-picker or if they can choose their own window ahead of time. This should provide a clear workflow on how Qt developers should build their apps to support screen-capturing.
Some additional notes:
- We currently have no support for capturing the system audio, although this is supported on several platforms (macOS, Web, Android). The APIs for this is commonly a part of the content-sharing APIs we use for video recording.
- iOS only supports recording the current application, through the ReplayKit API. This means we would only be able to capture the Qt application that is currently running.
*macOS supports both workflows, but other apps seem to prefer using system picker
Usecases that should be considered:
- macOS supports enumerating windows, screens, applications ahead of time through ScreenCaptureKit. Enumerating these ahead of time requires explicit permission, given by a user through a system dialog (similar to microphone/camera permissions). Additionally it supports a "content-sharing-picker". This is demonstrated in the attached file 'macos_screencapture.mp4'. When the app sends a request to the OS to start a capture, the end-user is shown a large purple overlay, where the user can select to capture the currently visible window, the entire screen, an application, or cancel the capture request.
- When the overlay is present, the user cannot interact with the application.
- ScreenCaptureKit seems to imply invoking content-picker does not block any threads: developer.apple.com/documentation/screencapturekit/sccontentsharingpickerobserver?language=objc
- Linux X11 supports enumerating windows, but Linux Wayland does not. When testing on Linux Gnome Wayland, the system uses a content-picker. This is demonstrated in the attached file 'Linux_Wayland_Gnome_Contentpicker.mp4'.
- When the content-picker dialog is present, we cannot interact with the application.
- Needs investigation whether content-picking blocks UI thread.
- Web Screen Capture API can be demoed using https://progressier.com/pwa-capabilities/screen-capture-desktop. I tested this on Firefox, the common behavior for all platforms is that the browser opens a very small dialog. Depending on the platform, you can then select from a list of windows/screens, or "Use operating system settings". The latter seems to rely on a system content-picker.
- When running on macOS, this opened up the purple ScreenCaptureKit overlay. Same as the one mentioned above.
- When running on Linux Wayland Gnome, this opened up the Gnome content-picker dialog. Same as the one mentioned above.
- When running on Windows, the browser provided a list of windows + displays to choose from. No content-picker.
- Documentation of Web Screen Capture API is mostly based on promises, I don't think it should block the UI thread. https://developer.mozilla.org/en-US/docs/Web/API/Screen_Capture_API/Using_Screen_Capture
- Android: When we request the OS to start a screen-capturing session, the OS will present a dialog to the end-user, asking them to select a display or an app to capture. Alternatively, the user can cancel. An attached video shows this behavior, note that a few seconds was missing at the start, but it was triggered by calling QScreenCapture.start(). When the user confirms the selection, the capture session begins immediately and we receive video frames.
- When the overlay is present, the user cannot interact with the application. Attempting to press outside the dialog, aborts the begin-capture-request session.
Key areas an improved API should address:
- Support other workflows, such as the system content-picker.
- This should include giving information to the application developer which workflow they can expect, so they can adapt their UI accordingly.
- On some platforms, we can support multiple workflows. The API should allow the application developer to check which are possible, and then select one.
- Providing information to the application developer whether we can allow capturing at all.
- Currently there is no way for the application developer to tell if the device does not support window/screen capturing. It is always assumed that it is possible, so they can not write a UI that adapts automatically to the platform.
- Support audio capturing.
Attachments
Issue Links
- relates to
-
QTBUG-137545 QScreenCapture captures window instead of screen
-
- Reported
-
-
QTBUG-136997 Support window capturing on Linux+Wayland
-
- Reported
-
-
QTBUG-136989 Move macOS screen capturing backend to ScreenCaptureKit
-
- Open
-
-
QTBUG-136992 Screen/window capturing on Web
-
- Open
-