forked from apache/nutch
-
Notifications
You must be signed in to change notification settings - Fork 3
Pull requests: commoncrawl/nutch
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Integrate Apache Nutch upstream improvements
#55
by sebastian-nagel
was merged May 18, 2026
Loading…
SitemapInjector: extract and inject localized links
#51
by sebastian-nagel
was merged Apr 7, 2026
Loading…
Merge cherry-picked commits from upstream 1.22
#46
by sebastian-nagel
was merged Feb 27, 2026
Loading…
UrlSamplerHost: make stripping of leading www. from host name configurable
#38
by sebastian-nagel
was merged Feb 8, 2026
Loading…
AdaptiveScoringFilter: Delay revisits of non-canonical pages
#37
by sebastian-nagel
was merged Jan 10, 2026
Loading…
docs: Add comment about disabled plugins. Fix broken Apache Nutch link.
#35
by handecelikkanat
was merged Jul 30, 2025
Loading…
WarcCdxWriter: normalize URL of redirect target location
#34
by sebastian-nagel
was merged Mar 15, 2025
Loading…
Add Github workflow to build the branch 'cc'
#31
by sebastian-nagel
was merged Nov 21, 2024
Loading…
WARC writer (CDX writer): new optional CDX JSON fields "redirect" and "truncated"
#15
by sebastian-nagel
was merged Nov 12, 2019
Loading…
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.