Skip to content

CSV-288 Fix for multi-char delimiter not working as expected#218

Merged
garydgregory merged 1 commit into
apache:masterfrom
angusdev:CVS-288
Feb 19, 2022
Merged

CSV-288 Fix for multi-char delimiter not working as expected#218
garydgregory merged 1 commit into
apache:masterfrom
angusdev:CVS-288

Conversation

@angusdev

Copy link
Copy Markdown
Contributor

Jira Issue: https://issues.apache.org/jira/browse/CSV-288

When checking if last token is delimiter in below line of code

// did we reach eof during the last iteration already ? EOF
if (isEndOfFile(lastChar) || !isDelimiter(lastChar) && isEndOfFile(c)) { 

isDelimiter(lastChar) unintentionally advance the buffer pointer and consume the first "|" when it comes to the "b" in "a||b||c" (lastChar is "|", nextChar is also "|", make it "||"). Subsequent read will see "|c" instead of "||c" so the second token is "b|c"

In addition, isDelimiter(lastChar) cannot handle multi-char delimiter.

To fix this, create a new indicator isLastTokenDelimiter instead of using isDelimiter(lastChar), the indicator is set/reset in isDelimiter()

When checking if previous token is delimiter, isDelimiter(lastChar) unintentionally advance the buffer pointer. Also isDelimiter(lastChar) cannot handle multi-char delimiter. To fix this, create a new indicator isLastTokenDelimiter instead of using isDelimiter(lastChar), the indicator is set/reset in isDelimiter()
@garydgregory garydgregory merged commit c15a06e into apache:master Feb 19, 2022
@angusdev angusdev deleted the CVS-288 branch October 15, 2022 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants