Skip to content

Commit 5ca3851

Browse files
authored
Merge pull request creativecommons#145 from creativecommons/distributed_indexer
Hotlink CC Search
2 parents a2272fc + 5193e63 commit 5ca3851

File tree

1 file changed

+1
-1
lines changed
  • content/blog/entries/building-distributed-indexer

1 file changed

+1
-1
lines changed

content/blog/entries/building-distributed-indexer/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ pub_date: 2019-12-11
1212
---
1313
body:
1414

15-
With CC Search, we want to make it possible to search all of the estimated 1.6 billion Creative Commons works on the internet. In order to make it possible for thousands of people to search billions of records in a reasonable period of time, we have to build a big inverted index (a data structure similar to the index in the back of a textbook), which allows very fast lookups of documents related to the user’s search query. To populate this index, we have to build a large database of Creative Commons works and then replicate it to our search index, which is powered by Elasticsearch.
15+
With [CC Search](https://search.creativecommons.org), we want to make it possible to search all of the estimated 1.6 billion Creative Commons works on the internet. In order to make it possible for thousands of people to search billions of records in a reasonable period of time, we have to build a big inverted index (a data structure similar to the index in the back of a textbook), which allows very fast lookups of documents related to the user’s search query. To populate this index, we have to build a large database of Creative Commons works and then replicate it to our search index, which is powered by Elasticsearch.
1616

1717
It turns out that, once your search index contains more than just a few million documents, maintaining the index is a non-trivial problem. Some of the concerns we had for our implementation:
1818

0 commit comments

Comments
 (0)