Skip to content

added ignoreQuoteInToken support to ignore quotes in strings#46

Closed
ranjithrp wants to merge 2 commits into
apache:masterfrom
ranjithrp:master
Closed

added ignoreQuoteInToken support to ignore quotes in strings#46
ranjithrp wants to merge 2 commits into
apache:masterfrom
ranjithrp:master

Conversation

@ranjithrp

Copy link
Copy Markdown

added ignoreQuoteInToken support to ignore quotes in strings even when there are few encapsulatedTokens with comma within. This will help in parsing csv values like
abc,"xyz" 123 bar,3,11961034,"First author, Second Author"

…n there are few encapsulatedTokens with comma within
@coveralls

Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.2%) to 92.618% when pulling ad01ee1 on ranjithrp:master into 7754cd4 on apache:master.

@coveralls

coveralls commented Jun 21, 2019

Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.08%) to 92.913% when pulling a381d53 on ranjithrp:master into 7754cd4 on apache:master.

@garydgregory

Copy link
Copy Markdown
Member

@ranjithrp ,
You'll need to provide unit tests so we can clearly assess what it is you are trying to achieve.
Thank you,
Gary

@ranjithrp

Copy link
Copy Markdown
Author

@ranjithrp ,
You'll need to provide unit tests so we can clearly assess what it is you are trying to achieve.
Thank you,
Gary

i have added the junits.

@garydgregory

Copy link
Copy Markdown
Member

Would you mind showing an example with actual expected rows please? It is not clear to me yet if this is a good thing.

@ranjithrp

ranjithrp commented Jun 22, 2019

Copy link
Copy Markdown
Author

Would you mind showing an example with actual expected rows please? It is not clear to me yet if this is a good thing.

In the actual expected row, we have one column which has value with quotes in it, and other column which has a comma in it.
Ex row -
abc,"xyz" 123 bar,3,11961034,"First author, Second Author"

Here the second column has value "xyz" 123 bar. This has quotes in the token
The 5th column has value First author, Second Author. This has comma in the token

if i use withQuote(null), it ignores the quote for the fifth column and then splits that value to 2.
If i dont use withQuote(null), for the second column, the lexer tries to parseEncapsulatedToken and when it sees character other than delimiter or newline, it throws exception.

This was the problem we were facing, and to handle this, i had made the above change.

@ranjithrp

Copy link
Copy Markdown
Author

@garydgregory Hope the above explanation clarifies your question. Please let me know if you need any additional details. Thanks

@garydgregory

Copy link
Copy Markdown
Member

@ranjithrp
I would prefer to see a unit test that that compares your example input with an actual parsed row where each column value is asserted for its correctness.
Thank you,
Gary

@ranjithrp

Copy link
Copy Markdown
Author

@ranjithrp
I would prefer to see a unit test that that compares your example input with an actual parsed row where each column value is asserted for its correctness.
Thank you,
Gary

@garydgregory i have tried to do the same in src/test/java/org/apache/commons/csv/LexerTest.java. In one of the test case, i have set the boolean to true and shown how it is able to retrieve each token and have asserted the values also.
In another test case, i have set the boolean to false and shown how the parser throws an exception while parsing a token with quotes within.
Please let me know if you are looking for anything specific apart from this?

@garydgregory

garydgregory commented Jul 5, 2021

Copy link
Copy Markdown
Member

@ranjithrp

abc,"xyz" 123 bar,3,11961034,"First author, Second Author"

Please document in the PR description the expected column values. It's not clear to me how malformed the input is and what you are exactly trying to work around. Is this really about dealing with malformed input. If the input was properly escaped or formatted, could the input be processed properly?
TY!

@garydgregory

Copy link
Copy Markdown
Member

Closing: No more feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants