Skip to content

Commit 5193e63

Browse files
committed
Hotlink CC Search
1 parent e93fffd commit 5193e63

File tree

1 file changed

+1
-1
lines changed
  • content/blog/entries/building-distributed-indexer

1 file changed

+1
-1
lines changed

content/blog/entries/building-distributed-indexer/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ pub_date: 2019-12-11
1212
---
1313
body:
1414

15-
With CC Search, we want to make it possible to search all of the estimated 1.6 billion Creative Commons works on the internet. In order to make it possible for thousands of people to search billions of records in a reasonable period of time, we have to build a big inverted index (a data structure similar to the index in the back of a textbook), which allows very fast lookups of documents related to the user’s search query. To populate this index, we have to build a large database of Creative Commons works and then replicate it to our search index, which is powered by Elasticsearch.
15+
With [CC Search](https://search.creativecommons.org), we want to make it possible to search all of the estimated 1.6 billion Creative Commons works on the internet. In order to make it possible for thousands of people to search billions of records in a reasonable period of time, we have to build a big inverted index (a data structure similar to the index in the back of a textbook), which allows very fast lookups of documents related to the user’s search query. To populate this index, we have to build a large database of Creative Commons works and then replicate it to our search index, which is powered by Elasticsearch.
1616

1717
It turns out that, once your search index contains more than just a few million documents, maintaining the index is a non-trivial problem. Some of the concerns we had for our implementation:
1818

0 commit comments

Comments
 (0)