Skip to content

Escape redirect URLs in RealCDXExtractorOutput#36

Merged
anjackson merged 2 commits into
iipc:masterfrom
gerhardgossen:master
Dec 17, 2014
Merged

Escape redirect URLs in RealCDXExtractorOutput#36
anjackson merged 2 commits into
iipc:masterfrom
gerhardgossen:master

Conversation

@gerhardgossen

Copy link
Copy Markdown
Contributor

The classes does not escape the URLs it gets from the HTTP headers / the HTML meta tags. This makes the resulting CDX files invalid if the redirect URL contains spaces (see e.g. internetarchive/ia-hadoop-tools#4). This commit fixes that by passing the resolved URL through java.net.URI's multi-argument constructor which escapes the individual parts appropriately.

@anjackson

Copy link
Copy Markdown
Member

This looks good. Can you also add a note to the CHANGES.md file that summarises the change?

@gerhardgossen

Copy link
Copy Markdown
Contributor Author

Updated CHANGES.md

anjackson added a commit that referenced this pull request Dec 17, 2014
Escape redirect URLs in RealCDXExtractorOutput
@anjackson anjackson merged commit 598c524 into iipc:master Dec 17, 2014
@anjackson

Copy link
Copy Markdown
Member

Thanks, looks great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants