Skip to content

quote value ending with multi-character delimiter prefix#620

Open
rootvector2 wants to merge 1 commit into
apache:masterfrom
rootvector2:quote-multichar-delimiter-prefix
Open

quote value ending with multi-character delimiter prefix#620
rootvector2 wants to merge 1 commit into
apache:masterfrom
rootvector2:quote-multichar-delimiter-prefix

Conversation

@rootvector2

Copy link
Copy Markdown
Contributor

printWithQuotes in MINIMAL quote mode does not encapsulate a value whose tail is a straddling prefix of a multi-character delimiter, so the printed record cannot be read back. With delimiter ||, a field ending in | such as a| is written unquoted, and the record a|||b is then parsed by CSVParser as [a, |b] because the greedy lexer matches the delimiter one character early at the value/delimiter boundary. The existing scan only quotes when the full delimiter occurs inside the value; a trailing partial delimiter slips through. Fixed by endsWithDelimiterPrefix, which quotes the value when appending the delimiter after it would create a boundary-straddling delimiter match. Single-character delimiters and values that contain the delimiter start without a straddling suffix (for example delimiter [|], value a[) are unaffected.

Found while auditing multi-character delimiter round-trips through the printer.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant