Details
-
Bug
-
Resolution: Unresolved
-
P3: Somewhat important
-
None
-
6.3.2
-
None
Description
QPdfDocument::getAllText does not return all characters, some characters are missing. Please check the out.pdf I posted. The out.txt is generated by fitz+PyMuPDF:
python3 -m fitz gettext -pages 1 out.pdf
It works fine. But result from QPdfDocument::getAllText missing some charactors, I put the result in getAllText.txt file. Here is the diff :
getAllText: 是一个共享 ,供 个 系 统 (如在计算 机之
PyMuPDF: 接口是一个共享框架,供 两 个 系 统 (如在计算机和打印机之间
As it shows , a lot character are missing. I think pdfium returned wrong result, but chrome can handle this pdf correctly (copy works fine, along with other pdf viewers ). May be it's relevant to chromium version Qt used?
Attachments
Issue Links
- relates to
-
QTBUG-122766 QPdfSearchModelPrivate::doSearch warning "not found in context" happens sometimes
- Reported