Skip to content

urls with spaces unescaped #58

@ghost

Description

With a badly configured redirect it's possible to arrive at a url with unescaped spaces in the name; eg:

uk,nhs,wales)/sites3/docopen.cfm?637545f2-1143-e756-5c8403609089cb40&id=18400&orgid=268 20090729144455 http://www.wales.nhs.uk/sites3/docopen.cfm?orgid=268&ID=18400&637545F2-1143-E756-5C8403609089CB40 text/html 302 3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ C:\Documents and Settings\All Users\Documents\My Pictures\Sample Pictures\CLHG building.JPG - 349 21033214 EA-TNA0709.www.nhs.uk-20090729135204-01475.arc.gz

Clearly this has gone completely wrong and the underlying record is unusable, but the fault in this record also prevents parsing of the CDX file. In this situation the CDX generation code might be better checking for and escaping spaces in the redirect url, while emitting a warning that the record is broken.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions