[CSV-147] Better error message during faulty CSV record read#347
Conversation
|
I'm OK with adding the position but I am guessing someone will create a security issue for data exfiltration. |
|
@garydgregory : |
|
@elharo: Thank you for the feedback. Changes are updated in the PR now. |
|
@garydgregory: |
garydgregory
left a comment
There was a problem hiding this comment.
@gbidsilva
Thank you for your updates. Please see my comments.
|
@garydgregory @elharo : |
|
@garydgregory : |
|
@garydgregory : let us know if there is anymore change to be done in this PR. |
|
@garydgregory @elharo |
| // error invalid char between token and next delimiter | ||
| throw new IOException("(line " + getCurrentLineNumber() + | ||
| ") invalid char between encapsulated token and delimiter"); | ||
| throw new IOException("Invalid char between encapsulated token and delimiter at line: " + getCurrentLineNumber() + ", position: " + getCharacterPosition()); |
There was a problem hiding this comment.
This probably shouldn't be an IOException but that issue is not new with this PR
| .build(); | ||
|
|
||
| CSVParser csvParser = csvFormat.parse(stringReader); | ||
| Exception exception = assertThrows(UncheckedIOException.class, () -> { |
There was a problem hiding this comment.
UncheckedIOException is not right either, but again not new in this PR
|
@garydgregory : Checkstyle issue has been fixed. |
|
@garydgregory : Anything pending from development side for this to be merged ? |
Completed. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #347 +/- ##
=========================================
Coverage 97.87% 97.87%
Complexity 549 549
=========================================
Files 11 11
Lines 1178 1179 +1
Branches 204 204
=========================================
+ Hits 1153 1154 +1
Misses 13 13
Partials 12 12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This fix is related to : https://issues.apache.org/jira/browse/CSV-147.
If we have some faulty data in the CSV then the current error message which we are getting is something similar to below.
java.io.IOException: (line 2) invalid char between encapsulated token and delimiterWith this fix, what we will be getting is something similar to below,
java.io.IOException: An error occurred while tying to parse the CSV content. Error in line: 2, position: 94, last parsed content: ...rec4,rec5,rec6,rec7,rec8Update
It has been decided to only to add the record position into the exception message and treat
getLastParsedContentmethod as a new feature. Therefore this PR only contains the position related changes.