Details
-
Bug
-
Resolution: Unresolved
-
P2: Important
-
None
-
5.15, 6.7
Description
Problem
The attached minimal example showcases the issue, given an input xml file with new line and other special characters, these characters are stripped from the output using QXmlStreamreader. This differs from previous functionality found in QXmlSimpleReader and (which retains these and other special characters).
This is causing issues with customers who rely on knowing about these characters post ingestion and currently have to manually insert other hex values for new lines that the QXmlStreamReader won't auto remove.
You can see several attempts made to retain and extract these characters from the newer xml parser to no avail.
Example Output from attached
Parsing preprocessed XML with QXmlStreamReader:
"INVALID_ELEMENT" TEXT attribute: "some_function();\n\nfunction some_function(){\n\n // I am a comment\n let variableA = some_code();\n\n // I am another comment\n let variableB = 1;\n\n for (let i = 0; i < 5; i++)\n
\n}\n"
Error parsing XML with QXmlStreamReader: "Extra content at end of document."
Parsing with QXmlStreamReader (newline replacement only):
"INVALID_ELEMENT" TEXT attribute (processed): "some_function(); function some_function(){ // I am a comment let variableA = some_code(); // I am another comment let variableB = 1; for (let i = 0; i < 5; i++)
Error parsing XML with QXmlStreamReader: "Extra content at end of document."
Parsing with QXmlStreamReader:
"INVALID_ELEMENT" TEXT attribute: "some_function(); function some_function(){ // I am a comment let variableA = some_code(); // I am another comment let variableB = 1; for (let i = 0; i < 5; i++) { variableB = variableB + 1; }
} "
Error parsing XML with QXmlStreamReader: "Extra content at end of document."
Parsing with QXmlSimpleReader:
"INVALID_ELEMENT" TEXT attribute: "some_function();\r\n\r\nfunction some_function(){\r\n\r\n\t// I am a comment\r\n\tlet variableA = some_code();\r\n\r\n\t// I am another comment\r\n\tlet variableB = 1;\r\n\r\n\tfor (let i = 0; i < 5; i++)\r\n\t
\r\n}\r\n"
Suggested Solution
As having these characters removed can be seen as a feature or a bug depending upon use case requirements, I suggest a property/flag/bool that can be set within the QXmlStreamReader that either keeps current functionality (removing special characters) or retains the special characters as this customer requires.