Initially, NumPy support suffered heavily from
- wrong documentation
- implementation of arbitrary APIs
- mismatch between C++/XML and the Python signatures.
Meanwhile, several actions have been performed:
- error messages are created in Python, using the signature module
- object protocols are more consequently using PySequence
- signatures support the <array\> attribute for primitive type pointers
- signatures understand generic types like List[t], Sequence[t] and Iterator[t]
Still, there are oversights and omissions. For instance:
This works fine with Python2.
With Python3, this gives a crash. Why?
- range(16) is a List object in Python2
- range(16) is a lazy Sequence object in Python3. This is almost like the xrange object in Python2.
Full discussion of Python 3's range compared with Python2's xrange function can be found in https://treyhunner.com/2018/02/python-3-s-range-better-than-python-2-s-xrange/.
There is a common misunderstanding concerning range, because it has lazy behavior, but is not an iterator! I leant that distiction myself after reading https://treyhunner.com/2018/02/python-range-is-not-an-iterator/..
The fix is quite simple, btw. What is difficult is handling the many cases where a change is necessary. This needs an extra section, see below.
All array-like interfaces that use typing.List or typing.Sequence[t] will be changed to support the most general typing.Iterable[t] API. What does that mean?
The iterator protocol is partially quite complicated and partially quite easy.
- Complicated: Create objects that support the iterator protocol.
- Easier: Interface to objects that support typing.Iterable.
We restrict ourselves to the easy part: If an object supports it, then we use the iterator protocol.
Example: We will be able to use iter(range(16)) for the data of a QMatrix4x4 object.
The distinction between Iterables and sequences is not tied to some explizit type.
Python uses duck typing, instead: https://en.wikipedia.org/wiki/Duck_typing.
Important is do understand the difference between Iterable and Iterator, which is a bit more.
The transition from PySequence to Iterable is unfortunately not simple. The current
implementation is a conglomerate of XML snippets in the XML files that are used
to generate all classes. Then these XML files contain quite a number of links to code
snippets that are either in the new snippet files (PySide2/glue/qtcore.cpp) or use template files which exist since a long time (PySide2/templates/core_common.xml).
After quite some attempts to change this in a general way, the following concept was
- Preserve the status quo
In order to make sure that no functionality is lost in any transition, new tests are
- test that all kinds of PySequence are supported where it is claimed
- Move algorithmic code into a single file with a few, central functions
- Then extend the implementation and the tests to support Iterables everywhere
This is currently happening.
It turned out to be very complicated to understand the indirection that is happening in the code generator. The original plan to refactor everything was postponed after we made not much progress.
Actually, instead of changing everything, the locations of the PySequence usage were identified and changed to support iterable protocol. This has the advantage that not much needed to change, and what was a sequence before would now work as an iterable. Basic tests are added that ensure the correct implementation.
After the transition to the Iterable protocol has succeeded, the code base should
have changed so much that we can focus on the real goal:
- Supporting Iterables for multidimensional NumPy arrays or generators that produce them.
Actually, by the existence of the iterable support, this little test function already works:
This means that there are possibilities to improve and simplify the collaboration between PySide and NumPy structures, but the basic functionality is already there via the iterable protocol. This protocol is "loose" enough
to make completely different structures fit together, as long as the basic data types are compatible.
This is the previous analysis
PySide needs to be more user-friendly when interfacing to other packages.
The most prominent example is NumPy, but it applies to other packages like SciPy, Pandas and MatPlotLib as well.
After looking into this issue, it appears to be a two-fold problem:
a) Many types support only List instead of allowing any sequence,
b) The error messages are extremely misleading and wrong.
There are many locations where PySide accepts a list only, although a sequence would be ok as well. That would allow for many new container types from NumPy, for instance.
This example says it uses a set, but it does not work:
The same with the function fromVector does of course not allow an arbitrary sequence.
but gives a very unintuitive error message.
I propose to either
- allow arbitrary sequences here, or
- add a new method fromSequence.
That would then work with any extension that uses the sequence protocol.
When trying to find negative examples for a), I stumbled surprisingly into examples for b). PySide already supports interfacing to other packages by using abstract protocols.
See the following example:
Here is the example that gives the impression that NumPy is not supported:
This problem seems to apply to the matrices, only. The error messages are very wrong, because the general sequence protocol is supported, but it says list. When you look at the file
instead, you see that it says sequence instead. But this is also still not completely correct, since the signature module needs more work on correct types, too.
My impression is that before adding a new sequence support that serves many extensions, we should try to
- improve error messages (re-write them using the signature module)
- write a tool that checks every function signature for functionality.
When that is working well enough that we can trust error messages, then we should think of new functionality.
Maybe the topic of this task needs to be changed?