Uploaded image for project: 'Qt for Python'
  1. Qt for Python
  2. PYSIDE-795

Improve Integration of Numeric Packages

    XMLWordPrintable

Details

    • User Story
    • Resolution: Done
    • P2: Important
    • 5.12.6
    • 5.11.0
    • PySide, Shiboken
    • None
    • f89113e21 (dev), 8b302d296 (6.7), b0be45b23 (6.6), 195ad4311 (tqtc/lts-6.5)

    Description

      The Current Status

      Initially, NumPy support suffered heavily from

      • wrong documentation
      • implementation of arbitrary APIs
      • mismatch between C++/XML and the Python signatures.

      Meanwhile, several actions have been performed:

      • error messages are created inĀ Python, using the signature module
      • object protocols are more consequently using PySequence
      • signatures support the <array\> attribute for primitive type pointers
      • signatures understand generic types like List[t], Sequence[t] and Iterator[t]

      Still, there are oversights and omissions. For instance:

      >>> from PySide2 import *
      >>> QtGui.QMatrix4x4(list(range(16)))
      PySide2.QtGui.QMatrix4x4((0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15))
      >>> QtGui.QMatrix4x4(range(16))
      ### guess what?
      

      This works fine with Python2.
      With Python3, this gives a crash. Why?

      • range(16) is a List object in Python2
      • range(16) is a lazy Sequence object in Python3. This is almost like the xrange object in Python2.

      Full discussion of Python 3's range compared with Python2's xrange function can be found in https://treyhunner.com/2018/02/python-3-s-range-better-than-python-2-s-xrange/.

      There is a common misunderstanding concerning range, because it has lazy behavior, but is not an iterator! I leant that distiction myself after reading https://treyhunner.com/2018/02/python-range-is-not-an-iterator/..

      The fix is quite simple, btw. What is difficult is handling the many cases where a change is necessary. This needs an extra section, see below.

      Phase 2 of Improved NumPy support:

      All array-like interfaces that use typing.List or typing.Sequence[t] will be changed to support the most general typing.Iterable[t] API. What does that mean?

      Supporting the Iterator Protocol

      The iterator protocol is partially quite complicated and partially quite easy.

      1. Complicated: Create objects that support the iterator protocol.
      2. Easier: Interface to objects that support typing.Iterable.

      We restrict ourselves to the easy part: If an object supports it, then we use the iterator protocol.
      Example: We will be able to use iter(range(16)) for the data of a QMatrix4x4 object.

      Comparison Of Methods: Iterable vs. Sequence

      The distinction between Iterables and sequences is not tied to some explizit type.
      Python uses duck typing, instead: https://en.wikipedia.org/wiki/Duck_typing.
      Important is do understand the difference between Iterable and Iterator, which is a bit more.

      Strategy For Moving Forward to Iterables

      The transition from PySequence to Iterable is unfortunately not simple. The current
      implementation is a conglomerate of XML snippets in the XML files that are used
      to generate all classes. Then these XML files contain quite a number of links to code
      snippets that are either in the new snippet files (PySide2/glue/qtcore.cpp) or use template files which exist since a long time (PySide2/templates/core_common.xml).

      After quite some attempts to change this in a general way, the following concept was
      developed:

      • Preserve the status quo
        In order to make sure that no functionality is lost in any transition, new tests are
        written that
        • test that all kinds of PySequence are supported where it is claimed
      • Move algorithmic code into a single file with a few, central functions
      • Then extend the implementation and the tests to support Iterables everywhere
        as well.

      This is currently happening.

      Update 2019-10-25:
      It turned out to be very complicated to understand the indirection that is happening in the code generator. The original plan to refactor everything was postponed after we made not much progress.
      Actually, instead of changing everything, the locations of the PySequence usage were identified and changed to support iterable protocol. This has the advantage that not much needed to change, and what was a sequence before would now work as an iterable. Basic tests are added that ensure the correct implementation.

      Phase 3 of Improved NumPy support:

      After the transition to the Iterable protocol has succeeded, the code base should
      have changed so much that we can focus on the real goal:

      • Supporting Iterables for multidimensional NumPy arrays or generators that produce them.

      Actually, by the existence of the iterable support, this little test function already works:

          def test_iterable_numpy(self):
              # Demo for numpy: We create a unit matrix.
              num_mat = np.eye(4)
              num_mat.shape = 16
              unit = QtGui.QMatrix4x4(num_mat)
              self.assertEqual(unit, QtGui.QMatrix4x4())
      

      This means that there are possibilities to improve and simplify the collaboration between PySide and NumPy structures, but the basic functionality is already there via the iterable protocol. This protocol is "loose" enough
      to make completely different structures fit together, as long as the basic data types are compatible.


      .
      .
      .
      .
      .


      This is the previous analysis

      PySide needs to be more user-friendly when interfacing to other packages.

      The most prominent example is NumPy, but it applies to other packages like SciPy, Pandas and MatPlotLib as well.

      After looking into this issue, it appears to be a two-fold problem:

      a) Many types support only List instead of allowing any sequence,

      b) The error messages are extremely misleading and wrong.

      Examples for a):

      There are many locations where PySide accepts a list only, although a sequence would be ok as well. That would allow for many new container types from NumPy, for instance.

      This example says it uses a set, but it does not work:

      >>> QtCore.QItemSelection.fromSet({2, 3, 5})
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: 'PySide2.QtCore.QItemSelection.fromSet' called with wrong argument types:
        PySide2.QtCore.QItemSelection.fromSet(set)
      Supported signatures:
        PySide2.QtCore.QItemSelection.fromSet(QSet)
      >>> 
      

      The same with the function fromVector does of course not allow an arbitrary sequence.

      >>> QtCore.QItemSelection.fromVector([1, 2, 3])
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: 'PySide2.QtCore.QItemSelection.fromVector' called with wrong argument types:
        PySide2.QtCore.QItemSelection.fromVector(list)
      Supported signatures:
        PySide2.QtCore.QItemSelection.fromVector(list)
      >>> 
      

      but gives a very unintuitive error message.

      I propose to either

      • allow arbitrary sequences here, or
      • add a new method fromSequence.

      That would then work with any extension that uses the sequence protocol.

      Examples for b):

      When trying to find negative examples for a), I stumbled surprisingly into examples for b). PySide already supports interfacing to other packages by using abstract protocols.
      See the following example:

      >>> from PySide2 import *
      >>> import numpy as np
      >>> a = np.array((1,2,3,4))
      >>> QtGui.QMatrix2x2(a)
      PySide2.QtGui.QMatrix2x2((1, 2, 3, 4))
      >>> 
      

      Here is the example that gives the impression that NumPy is not supported:

      >>> a3 = np.array((1,2,3))
      >>> QtGui.QMatrix2x2(a3)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: 'PySide2.QtGui.QMatrix2x2' called with wrong argument types:
        PySide2.QtGui.QMatrix2x2(numpy.ndarray)
      Supported signatures:
        PySide2.QtGui.QMatrix2x2(list)
        PySide2.QtGui.QMatrix2x2(PySide2.QtGui.QMatrix2x2)
      >>> 
      

      This problem seems to apply to the matrices, only. The error messages are very wrong, because the general sequence protocol is supported, but it says list. When you look at the file

      exists_{platf}_{version}_ci.py
      

      instead, you see that it says sequence instead. But this is also still not completely correct, since the signature module needs more work on correct types, too.

      First Attempt

      My impression is that before adding a new sequence support that serves many extensions, we should try to

      • improve error messages (re-write them using the signature module)
      • write a tool that checks every function signature for functionality.

      When that is working well enough that we can trust error messages, then we should think of new functionality.

      Maybe the topic of this task needs to be changed?

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.
          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ctismer Christian Tismer
              ctismer Christian Tismer
              Alex Blasche Alex Blasche
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: