Skip to content

[CSV-147] Better error message during faulty CSV record read#347

Merged
garydgregory merged 18 commits into
apache:masterfrom
gbidsilva:CSV-147
Sep 13, 2023
Merged

[CSV-147] Better error message during faulty CSV record read#347
garydgregory merged 18 commits into
apache:masterfrom
gbidsilva:CSV-147

Conversation

@gbidsilva

@gbidsilva gbidsilva commented Aug 30, 2023

Copy link
Copy Markdown
Contributor

This fix is related to : https://issues.apache.org/jira/browse/CSV-147.

If we have some faulty data in the CSV then the current error message which we are getting is something similar to below.
java.io.IOException: (line 2) invalid char between encapsulated token and delimiter

With this fix, what we will be getting is something similar to below,
java.io.IOException: An error occurred while tying to parse the CSV content. Error in line: 2, position: 94, last parsed content: ...rec4,rec5,rec6,rec7,rec8

Update
It has been decided to only to add the record position into the exception message and treat getLastParsedContent method as a new feature. Therefore this PR only contains the position related changes.

@garydgregory

Copy link
Copy Markdown
Member

I'm OK with adding the position but I am guessing someone will create a security issue for data exfiltration.

@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory :
That is a good point. But IMO, whenever someone log the exception, at that point they should consider the security risks.
Isn't usually exception provide such details? as I remember I have came across such scenarios.
What would be the best practice in this case ? - good to know :-)

Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
@gbidsilva

Copy link
Copy Markdown
Contributor Author

@elharo: Thank you for the feedback. Changes are updated in the PR now.

@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory:
I have looked around what would be the best practice in adding data records in Exceptions. It seems that arguments are there for both the sides. However, as you suggested, we should avoid adding such details in Exception and keep it consistent with how commons were creating exceptions so far.
To rectify this, I have removed the content part from the exception message and made 'getLastParsedContent()' method public. Therefore, if there is any need then anyone can call that method and can get the last parsed content.
Test cases are added to the PR :-)
Appreciate your feedback on this.

Comment thread src/test/java/org/apache/commons/csv/CSVParserTest.java Outdated
Comment thread src/test/java/org/apache/commons/csv/CSVParserTest.java Outdated
Comment thread src/test/java/org/apache/commons/csv/CSVParserTest.java Outdated
Comment thread src/test/java/org/apache/commons/csv/CSVParserTest.java Outdated
Comment thread src/test/java/org/apache/commons/csv/CSVParserTest.java Outdated

@garydgregory garydgregory left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbidsilva
Thank you for your updates. Please see my comments.

Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
Comment thread src/test/java/org/apache/commons/csv/CSVParserTest.java Outdated
Comment thread src/test/java/org/apache/commons/csv/CSVParserTest.java Outdated
@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory @elharo :
Requested changes are done.

@gbidsilva gbidsilva requested a review from garydgregory August 31, 2023 14:20
Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory :
I understand your point. In this case, we will treat getLastParsedContent method as a new feature and decide whether it is actually needed or not in the product.
To make this PR work, I have removed the getLastParsedContent method content and only kept the positioning related changes.
Appreciate your feedback :-)

@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory : let us know if there is anymore change to be done in this PR.

Comment thread src/main/java/org/apache/commons/csv/CSVParser.java Outdated
@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory @elharo
Exception handling moved to Lexer class.

// error invalid char between token and next delimiter
throw new IOException("(line " + getCurrentLineNumber() +
") invalid char between encapsulated token and delimiter");
throw new IOException("Invalid char between encapsulated token and delimiter at line: " + getCurrentLineNumber() + ", position: " + getCharacterPosition());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be an IOException but that issue is not new with this PR

.build();

CSVParser csvParser = csvFormat.parse(stringReader);
Exception exception = assertThrows(UncheckedIOException.class, () -> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UncheckedIOException is not right either, but again not new in this PR

@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory : Checkstyle issue has been fixed.

@gbidsilva

Copy link
Copy Markdown
Contributor Author

@garydgregory : Anything pending from development side for this to be merged ?

@gbidsilva

Copy link
Copy Markdown
Contributor Author

@gbidsilva Thank you for your updates. Please see my comments.

Completed.

@codecov-commenter

codecov-commenter commented Sep 13, 2023

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.87%. Comparing base (7104598) to head (c0759cd).
⚠️ Report is 844 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master     #347   +/-   ##
=========================================
  Coverage     97.87%   97.87%           
  Complexity      549      549           
=========================================
  Files            11       11           
  Lines          1178     1179    +1     
  Branches        204      204           
=========================================
+ Hits           1153     1154    +1     
  Misses           13       13           
  Partials         12       12           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@garydgregory garydgregory left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@garydgregory garydgregory merged commit a33b24c into apache:master Sep 13, 2023
@garydgregory garydgregory changed the title [CSV-147] Error message optimization during faulty CSV record read [CSV-147] Better error message during faulty CSV record read Sep 13, 2023
asfgit pushed a commit that referenced this pull request Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants