Details
-
Bug
-
Resolution: Done
-
P3: Somewhat important
-
3.x, 4.4.0, 4.5.3
-
None
Description
The program at the bottom (let's call it "retest") simply matches a regexp with a string and displays the captures. Here's is an erroneous run:
$ ./retest '(==|=)(\S+)(\1)' '===='
Offset=0
Capture 0 at 0: '==='
Capture 1 at 0: '='
Capture 2 at 1: '=' <---------- the \S+ was not greedy
It should have matched the whole string because the subexpression \S+ is supposed to be greedy:
Offset=0
Capture 0 at 0: '===='
Capture 1 at 0: '='
Capture 2 at 1: '=='
Inserting a noop makes it work:
$ ./retest '(==|=)(\S+\S?)(\1)' '===='
Offset=0
Capture 0 at 0: '===='
Capture 1 at 0: '='
Capture 2 at 1: '=='
But this does not work:
./retest '(==|=)(\S*\S)(\1)' '===='
Offset=0
Capture 0 at 0: '==='
Capture 1 at 0: '='
Capture 2 at 1: '='
There is probably an optimization in the regexp compiler that stops * expansion on the basis of the set of first chars matched by the rest of the expression when it can be know easily. Inserting the \S? blocks the algorithm that determines this set, while \S does not.
Code for the test program:
#include <QtCore> int main( int argc, char * argv[] ) { QString sre = QString::fromLocal8Bit( argv[1] ); QString s = QString::fromLocal8Bit( argv[2] ); QRegExp re( sre ); int offset = re.indexIn(s); int ncap = re.numCaptures(); qDebug( "Offset=%d", offset ); for ( int i = 0; i < ncap; i++ ) qDebug( "Capture %d at %d: '%s'", i, re.pos( i ), re.cap( i ).toLocal8Bit().constData() ); return 0; }