Loading...

XML

Word

Printable

Type: Suggestion
Resolution: Unresolved
Priority: Not Evaluated
Fix Version/s: None
Affects Version/s: None
Component/s: PDF
Labels:
None

I have a system that would benefit greatly from being able to extract the text of multiple PDF documents in parallel. This would allow a much more efficient construction of full text search indexes for a library of PDFs.

The idea here would be to allow different instances of QPdfDocument to be processed in parallel by different threads. The current implementation serializes all access to QPdfDocument methods by using a QPdfMutexLocker. For example, the following code would be fully parallelized, whereas today t1 and t2 have to wait on one another:

std::thread t1 = []()
{
    QPdfDocument doc;
    doc.load("Book1.pdf");
    doc.getAllText(0);
};

std::thread t2 = []()
{
    QPdfDocument doc;
    doc.load("Book2.pdf");
    doc.getAllText(0);
};

t1.join();
t2.join();

From a high level viewpoint, I can't find a reason why parsing and rendering PDFs would have to introduce contention between threads on the method call level.

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Assignee:: Shawn Rutledge

Reporter:: Felipe Goron Farinon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 19 Apr '25 15:09

Updated:: 19 Apr '25 15:14

There are no open Gerrit changes

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews