Skip to content

StringIndexOutOfBoundsException in patternCSSExtract#57

Closed
sebastian-nagel wants to merge 1 commit into
iipc:masterfrom
sebastian-nagel:CssExtractStrOutOfBound
Closed

StringIndexOutOfBoundsException in patternCSSExtract#57
sebastian-nagel wants to merge 1 commit into
iipc:masterfrom
sebastian-nagel:CssExtractStrOutOfBound

Conversation

@sebastian-nagel

Copy link
Copy Markdown
Collaborator

The method patternCSSExtract may fail with a StringIndexOutOfBoundsException when processing some HTML documents (e.g., http://www.sensordynamics.com.au/). See also commoncrawl#1.

 - correct check for min. required URL lenght when stripping 4 characters (2 at each end)
 - simplified code, use non-capturing groups in regular expression
@ldko

ldko commented Jan 27, 2017

Copy link
Copy Markdown
Member

Closing this as no longer relevant with the the inclusion of #63.

@ldko ldko closed this Jan 27, 2017
@sebastian-nagel

Copy link
Copy Markdown
Collaborator Author

Correct, that's now included in #63. Thanks!

@sebastian-nagel sebastian-nagel deleted the CssExtractStrOutOfBound branch October 18, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants