vulk*A collection of research, brainstorming, and smaller todo items in no particular order*
- Better compressed texture story: Think about integrating https://github.com/BinomialLLC/basis_universal on the proper level. The latter is important, this cannot be thrown in in random places in the Qt and tooling stack, hence listing it here. Perhaps to be integrated with the QRhi API, so that it is Qt-wide common solution, usable by Qt Quick, 3D and others as well. There's an example at https://github.com/alpqr/qrhibasis
- rhi: WebGPU backend (for Qt wasm). Once Emscripten gains WebGPU support, it should be possible to bring up a QRhi backend that renders via WebGPU. This would assume a WebGPU implementation on the browser side with SPIR-V support. (e.g. Chrome) - the new shading language developed alongside WebGPU (WHLSL) is out of scope for the foreseeable future. Track https://github.com/emscripten-core/emscripten/issues/9575 for Emscripten progress.
- Think about a future compute-based particle engine covering both 2D and 3D. Not as a replacement for the existing ParticleSystem/ImageParticle in Quick, but as a modern alternative, perhaps with a different feature set.
- think about multiple command buffers. Relevant really only when worker threads are involved to offload the heavy CPU work needed to generate a CB (but what is that heavy CPU work in Qt 2D/3D context?). Queues and resource hazard stuff will not be exposed in the QRhi API, but building and submitting multiple command buffers (i.e. build one or more additional CBs on separate threads, then submit them to the same queue, with appropriate sync primitives on the main CB to wait where needed). TBD.
- rhi: d3d backends: use dxc and new compiler APIs? (no point for d3d11? how relevant for d3d12? (esp. if we only care about vertex/fragment/compute))
- Conclude if we can live without push constants. (we only support uniform buffers with QRhi, even though some backends (Vulkan, OpenGL?) could offer another (limited) form of sending data to the shader - no option for D3D11 and Metal, however)
- Conclude if never using uniform buffers in the OpenGL backend is suitable for all users, including Quick 3D. (The GL backend decomposes uniform buffer data into individual uniforms. We never generate true uniform blocks in the SPIRV-Cross generated GLSL shaders, since for Qt Quick & co. this is not worth it. But think of other clients, e.g. Quick 3D with a larger amount of light data)
- Vulkan backend: staging for immut/static buffers should be persistently mapped too?
- Metal backend: should pipeline objects and such be deferred-released too?
- storage buffer bind-range alignment? (check, perhaps just needs documenting it's same as ubuf?)
- make some warnings/errors nicer (e.g. print backend name when failed to create)
- improve QRhi internal docs about barriers: add a note about barriers to beginPass and beginComputePass at least. Check the general section too.
- do we need a waitIdle (or similar) in the renderloop before cleaning up nodes? (but watch out for deviceLost flag)
- OpenGL backend: sRGB textures, what's the story there, what's missing. May not map well to what we've been implementing for the other backends. Note separate task for HDR in colorspace in general. This is just feature parity in OpenGL backend vs. the other three.
- storage buffers with dynamic offset (add it to api, and all backends vk, mtl, d3d, gl)
- replace some Q_UNREACHABLE with Q_ASSERT_X? - check if we are following the right pattern here
- utilize recently added bgr888 QImage format in some form? Perhaps not very relevant for now.
- rhi: investigation: consider some form of limited threading for copy ops in particular? to help glTexImage on bad/slow systems? (but cross api this is difficult and may make no sense for non-GL anyway) - probably not
- Investigate if number of allocations performed while filling up a QRhiResourceUpdateBatch could be reduced: https://codereview.qt-project.org/c/qt/qtbase/+/275432 shows an approach to remove the individual QByteArray data; members from individual DynamicBufferUpdate items inside a resource update batch. Rather, a single, batch-level QByteArray is used there, with the individual update descriptions only storing on offset inside the "pool". This still allocates/resizes of course, but the there is fewer allocations overall. Or figure out something else. It could be that the improvements may (also) come from changes on the client side (i.e. the Qt Quick renderer) instead of or in addition to the QRhi API.--
- d3d11: Make slot reset logic smarter? resetShaderResources could be made more optimal by avoiding resetting slots that will have something bound to them in the pass.
- QRhi OpenGL/Vulkan backend: consider if dynamic buffers should follow the MAP_DISCARD style approach like d3d11. Dynamic QRhiBuffer updates map to a glBufferSubData with gl, while with Vulkan they are each a memcopy to a (persistently mapped) host visible, maybe host coherent, but maybe uncached, memory area. Consider if we should switch to what the d3d11 backend does for QRhiBuffer::Dynamic: keep a copy of the full buffer data on CPU side, then map with MAP_DISCARD and always copy the full contents regardless of the changed region). This in fact is closer to what the scenegraph does on the direct OpenGL code path, although there it is at the mercy of what glBufferData does internally, but the important part is that it also avoids the partial glBufferSubData (or map/memcpy/unmap). The same would be true for the gl backend of QRhi then (it would do a full glBufferData instead of subData) But then, it should be considered if the gl backend should employ double buffering (i.e. keep two native buffers per QRhiBuffer etc.) since we do not have a true MAP_DISCARD style solution in ES 2.0. May not be worth it though. Note that any change to the Vulkan approach must be retested on a mobile/embedded device due to the different characteristics - for instance the fact that we benefit greatly from persistent mapping was only found out when running on the Shield TV (Tegra X1, Android). Regressions must be avoided. Dropped the priority since this is not the cause of the huge difference between d3d11 and vulkan/gl. An experiment for the gl backend is at https://codereview.qt-project.org/c/qt/qtbase/+/275687 with no real improvements.
- Long term physical device / adapter selection story. Physical device (Vulkan terminology) or adapter (D3D/DXGI terminology) selection where applicable: what do we want to support? Currently we do have env.vars. like QT_D3D_ADAPTER_INDEX and QT_VK_PHYSICAL_DEVICE_INDEX. Should there be something less backend specific and more general? We have to be careful with introducing full-blown adapter enumeration and management APIs, however. Think about what a simple, compact API would look like. Metal: do we care about choosing between MTLDevices, or just using the "system default" is enough? (do not have any hardware with multiple GPUs, cannot investigate) Do we need to monitor the device list on Metal? Only relevant on macOS with external GPUs (disconnect at runtime - apps relying on that GPU are quit/restarted by default by the OS, I think, perhaps that good enough?) Cannot test, no hardware available.
- May need an API on QQuickWindow or elsewhere to indicate that external native rendering will be used (overlay, underlay, rendernode, ...) ExternalContentsInPass is a beginFrame() flag in QRhi, but we cannot guess this in Qt Quick, and therefore we either set it always or never. For now we assume yes always, even though this is not ideal strictly speaking since it triggers secondary command buffer usage for all Qt Quick apps when running on Vulkan, including those that do not integrate their own custom rendering, which is harmless, but is still extra work at run time. The case itself is not necessarily that interesting, but how we handle it should define a pattern for possible future cases that are similar to this one. With some of the underlying graphics APIs leaning heavily towards declaring everything (how a resource is used etc.) up front we cannot guarantee that similar flags will not be introduced later on.
- Investigate how to use some of the data we can collect in combination with the QML profiler
- Should the QRhiProfiler integration (opt-in via env var) be disabled completely in Qt Quick?
- buffer memtype to qrhiprofiler
- rhiprofiler disconnect when device destroyed?
- what if no event loop because on render thread?
- rhiprof what if multiple windows so multiple QRhis? -> broken atm
QRhi renderpass load/store configurability brain dump
Some potential use cases have been brought up both from Qt Quick 3D and Qt 3D side that are not necessarily well-supported with the current model of having pre-defined load/store settings, with the only configurability being via QRhiTextureRenderTarget's PreserveColor and similar flags.
Think depth pre-pass in a separate renderpass (depth needs to be written out, cannot be DONT_CARE), or any special case when someone absolutely has to do an additional pass targeting the swapchain without clearing the content from the previous pass in the same frame.
We do not (and probably never will) support Vulkan subpasses (esp. since it has little applicability to other APIs), but rather some sort of configurability could be introduced via flags passed to beginPass() for instance. (but this would come with complications inside some backends, in particular, Vulkan)
There are some issues with this however, due to the graphics pipeline being dependent on the renderpass descriptor (in Vulkan at least). So just having plain flags for beginPass() is not sufficient. (that's why QRhiRenderPassDescriptor exists in the first place, but that's always created from a QRhiRenderTarget atm, not just out of nowhere)
For what it's worth, WebGPU (the high-level API that is closest to QRhi in spirit when it comes to vision and goals) seems to also follow a configure-on-renderpass-start approach (https://gpuweb.github.io/gpuweb/#ref-for-dictdef-gpurenderpassdescriptor ) (on the other hand I cannot seem to find where a GPURenderPipeline is associated with a GPURenderPassDescriptor - presumably it is left to the implementations to juggle with the native resources under the hood?)
it is not clear we really want this. As long as one can record everything in one single pass (e.g. a simple depth pre-pass does not need to be a separate renderpass), the additional complexity that would be added here is basically not worth it.
For getting the depth buffer out of some renderpass - something that may be needed for Quick3D custom materials and effects - one can use a depthTexture in the QRhiTextureRenderTargetDescription instead of depthStencilBuffer, which will implicitly enable writing depth out.
Also, the way Qt Quick 3D integrates with Qt Quick in overlay/underlay mode is by definition incompatible with additional renderpasses on the main render target (the swapchain for the window) (in the 'render' phase, that is, one could do anything in the 'prepare' phase of course, but that's all cleared away when Qt Quick scenegraph issues the main beginPass()). Therefore relying on multiple renderpasses onto the main target is out question, so not sure if there is a use case for having non-default attachment load/store settings.
Needs some more thinking.