I'm generally trying to get qmlcachegen into a shape where it doesn't crash or generate incorrect C++ for any of the QML code we have in qtdeclarative. It should either generate correct C++ or refuse to compile the specific binding or function in question. This is the minimal requirement for a somewhat believable tech preview. In order to do this, I'm currently compiling our tests with qmlcachegenplus and fixing the issues found that way one by one.
The following steps can be take to accomplish this:
- Dig up Simon's prototype and make it compile against Qt6
- Remove dead code, clean up, rename to qmlcachegenplus
- Drop the "Type" class in favor QQmlJSScopeTree
- Move code duplicated in qmlcachegen and qmlcachegenplus into a QtQmlCompiler library
- Explore the inner workings of the compiler by adding excessive categorized logging
- Implement access to properties of the scope object
- Discern between new-style QProperty and old-style Q_PROPERTY, and capture the latter on access
- Implement basic signal handlers
- Implement access to objects in the context by ID
- Implement access to singletons
- Implement basic read access to sequences
- Implement access to enums
- Implement partial support for calling methods on objects
- Add rudimentary support for function arguments and return types
- Use the primitive type transformations to improve arithmetic operations and handle overflows like they should be
- Implement access to properties of builtin types like string.length
- Extend qmlcachegenplus tests to make sure we don't regress by rejecting code we previously compiled (and falling back to interpretation).
- Run qmlcachegenplus tests also with QV4_FORCE_INTERPRETER to make sure we don't test for incorrect behavior.
- qmlcachegenplus now generates drop-in compatible but slow code when invoked without any special arguments. You can pass --direct-calls to make it assume it can access any C++ classes for which there is a header noted in the qmltypes files. This requires adding dependencies to many projects and is impossible for others. You can also pass --qmljs-runtime to use the external Qml/JS runtime. If you don't pass this flag, the runtime is pasted into each generated file.
- Support exception checks but not catch blocks.
- A particularly nasty problem is creating object literals. We could just declare that unsupported, but there is a lot of code out there that does create object literals, and I do want to use it for test coverage. So, a slow implementation would be OK, but none would be sad. Now, for each object literal an InternalClass is created in the compilation unit, and in order to know the keys we need to access it. Of course that is private API. We can wrap an InternalClass into QJSManagedValue and add some instantiate() method to that, but then we still need to access the internal classes by ID.
- Compile the qtdeclarative examples with qmlcachegenplus and fix any failures (in progress). It turns out that
- We still have inaccessible types in qtdeclarative. In particular the ones in src/imports/layouts. We need to move those into a library we can link against.
The next step is refactoring the type information (done). We have several generic storage types that may hold a variety of internal types:
Each of these fills a specific role and cannot easily be dropped. Contained in these, there can be a number of "things". In particular:
- Regular types as QQmlJSScope
- enums as a pair of QQmlJSScope and QQmlJSMetaEnum
- methods as QQmlJSMetaMethod
- lists as a QQmlJSScope for the value type and a flag telling us it's a list
Before, all of this was done in an ad-hoc way on top of a single QQmlJSScope with lots of implicit rules on what can be combined and how. In particular, the contained type was often lost this way and we ended up with plain storage types which we couldn't further reason about. This way we often threw the towel on code we could actually compile to C++.
With this in place, we can get rid of all spaghetti in the code generator. The type resolver dictates the stored type and the stored type dictates the code to access it. No need for any environment checking. The type propagator conducts all the analysis on the meta level, determining input and output types for each instruction. This can force the code generator into the respective conversions, again without checking any environment.
- Get the metaobject for a namespace by string name of the namespace. We cannot rely on the type to be accessible after all. This is more complicated than it looks. The problem occurs for enums in namespaces defined in C++. Something has to give here:
- We don't have the enum values in .qmltypes because moc doesn't implement enough C++ to calculate them (with QML-defined enums we do have the values available).
- We cannot get a QMetaType (and then QMetaObject) by name because QMetaType::fromName() only works for types.
- We cannot retrieve them via QJSValue because a, we don't have an object and b, enums aren't actual properties.
- In indirect mode we cannot just retrieve them by including the header and referencing them as symbols because the header might not be available.
- We could just reject them. That would make all code that uses the Qt namespace uncompilable in indirect mode. Or we can reject them and add an exception for the Qt namespace. We can always include qnamespace.h after all.
- Building on that ... we could look up all enum values of which we know they are the same between host and target at compile time in the host Qt. Sounds rather hacky, though.
- The obvious solution is to just build metatypes for any metaobjects we encounter during type registration. The overhead should be minimal as most of them will already have metatypes. Then we can look up the namespaces with QMetaType::fromName() and avoid all the mess in the compiler.
- Set line numbers on the current frame, so that any backtraces generated from errors make sense.
- Restrict access to the global object so that only known properties of the global object can be used. Others may be context properties and we need to reject such code.
- Add even more transformations and operator variations. Denying operator!= on two bools is really silly, for example. For any arithmetic operation we should just forcibly convert to QJSPrimitiveValue, and for any equality operation to either QJSPrimitiveValue or QJSManagedValue, depending on whether the original values are primitives.
Extended and attached types: Extended value types work, and enum access on extended object or namespace types also works. Actual extended objects fail horribly, though, as we have no way of retrieving the actual object, yet. The same holds for retrieving the attached object of an attached type.
Current problems: (in progress) Enums on types with attached types: The enums are always looked up in the attached type. Getting/Setting properties on nullptr objects. This just crashes in direct mode and does nothing in indirect mode. It should throw a TypeError.
We often have code that needlessly copies register variables around. This is due to limits of the byte code. Most instructions can only operate on the accumulator. Therefore, the accumulator has to be copied to a proper register periodically, in order to free it up for further operations. So we end up with things like this:
If the dtor of r1 or r2 cannot be proven to be free of side effects, the C++ compiler cannot do anything clever about this. Notably QString, all Qt containers, and by extension, QJSPrimitiveValue don't have side effect free ctors because of implicit sharing. We can detect this, though, and fold any expressions for registers that are used exactly once into the place where they are used. Furthermore, we can determine the best type for a register variable and pre-convert its contents on assignment. And, if a register variable is used more than once, we can std::move it on the last usage.
For this to work, we need to properly discern between expressions, register assignments and actual statements. Only actual statements can have side effects in QML. Everything else can be elided if it's not needed. An expression has a "natural" type, and uses conversions of 0 or more registers. A register at first has the natural type of the expression it's assigned. A statement acts like an expression, but its natural type can be empty, in which case it cannot be assigned to anything. We already determine if a register is used at all and remove its assignment if it's not. We might instead count how often it is used. Instead of just directly generating the C++ code, we might generate a list of format strings for expressions, assignments, and statements, with $1, $2 etc for the register variables. In a second step, we either arg() the register names, or the raw expression a register represents, or a std::move of a register, depending on whether a register is used more than once and whether the current usage is the last one. On top of this, we can also generate the conversions as needed when arg()'ing the registers in. We do know the natural type of an expression as well as the expected conversion at its usage. We can therefore skip the type of the register if it's used exactly once. Often, this will reduce the number of conversions. For registers that are used multiple times, we can analyze the conversions and determine the best type for the register by its usage rather than by the expression's natural type. We can then pre-convert the expression on assignment to the register. For example, an expression might return a string, but if we use it as QJSManagedValue several times, we should store it as QJSManagedValue rather than converting it again and again.
- We might run the whole thing through a fuzzer to catch further crashes.
- If we declare the functions in the same file as actual functions in C++, rather than anonymous lambdas, we can call them directly. That would speed things up. Right now, even if we're calling a function on the same scopeObject, we still have to go through the QML engine. The implementation of that function is right there in the same C++ file, though.
- Now that we have --direct-calls, we can place further restrictions on the properties or methods supported by it. We might, for example, state that calling private invocables is unsupported and that property accessors must follow the common naming scheme. This would allow us to skip further metatype operations. However, QQuickItem::parent doesn't follow the common naming scheme, so that's that ...