Skip to content

Improve host-level PageRanks #52

@sylvinus

Description

@sylvinus

As explained in our blog post, our host-level PageRank is very experimental and still very subject to spam.

Here is a list of our current ideas to improve it, feel free to contribute yours!

  • Don't follow rel=nofollow links
  • Better weights on the edges (treat links between subdomains differently? give less weight for links in the boilerplate and/or at the end of the page? give more weight depending on the number of distinct pages linking to the domain?)
  • Try to group domains belonging to the same owner (By IP address/DNS info? See Import DNS metadata #15)

Going to URL-level PageRanks would obviously help a a lot but it is out of scope for this issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions