-
-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathindex.html
419 lines (337 loc) · 26.4 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
<!doctype html>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="shortcut icon" type="image/x-icon" href="/static/favicon.ico">
<link href="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-GJzZqFGwb1QTTN6wy59ffF1BuGJpLSa9DkKMp0DgiMDm4iYMj70gZWKYbI706tWS" crossorigin="anonymous">
<link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,700" rel="stylesheet">
<link rel="stylesheet" href="/static/gen/style.css">
<link rel="stylesheet" href="/static/pygments.css">
<meta property="og:site_name" content="Creative Commons" />
<meta property="og:title" content="CC Search" />
<meta property="og:url" content="/cc-search/" />
<meta property="og:type" content="article" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="CC Search">
<meta name="twitter:description" content="">
<meta name="twitter:site" content="@creativecommons">
<meta name="twitter:creator" content="@creativecommons">
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.6/umd/popper.min.js" integrity="sha384-wHAiFfRlMFy6i5SRaxvfOCifBUQy1xHdJ/yoi7FRNXMRBu5WHdZYu1hA6ZOblgut" crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/js/bootstrap.min.js" integrity="sha384-B0UglyR+jN6CkvvICOB2joaf5I4l3gm9GU6Hc1og6Ls7i6U/mkkaduKaBhlAXv9k" crossorigin="anonymous"></script>
<script type="text/javascript" src="/static/gen/script.js"></script>
<title>CC Search — Creative Commons on GitHub</title>
<body>
<div class="ga-script">
<div id="ga-script"></div>
<script type="text/javascript">
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-2010376-37', 'auto');
ga('send', 'pageview');
</script>
</div>
<header class="main-header">
<div class="container-fluid">
<div class="row justify-content-md-center">
<div class="col-lg-9 col-md-9 col-sm-12">
<nav class="navbar navbar-expand-xl navbar-dark" name="top">
<a class="navbar-brand" href="/">
<img src="/cclogo.svg">
<span class="legend">Creative Commons Open Source</span>
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent"
aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav">
<li class="nav-item">
<a class="nav-link" href="/blog/">Blog</a>
</li>
<li class="nav-item dropdown ">
<a class="nav-link" href="#" id="navbarContributingCodeDropdown" role="button">Contribute</a>
<div class="dropdown-menu" aria-labelledby="navbarContributingCodeDropdown">
<a class="dropdown-item" href="/contributing-code/">Contribution Guidelines</a>
<a class="dropdown-item" href="/contributing-code/projects/">Project List</a>
<a class="dropdown-item" href="/contributing-code/pr-guidelines/">Pull Request Guidelines</a>
<a class="dropdown-item" href="/contributing-code/github-repo-guidelines/">GitHub Repo Guidelines</a>
<a class="dropdown-item" href="/contributing-code/cc-search/">CC Search</a>
<a class="dropdown-item" href="/contributing-code/usability/">Usability</a>
</div>
</li>
<li class="nav-item dropdown ">
<a class="nav-link" href="#" id="navbarCommunityDropdown" role="button">Community</a>
<div class="dropdown-menu" aria-labelledby="navbarCommunityDropdown">
<a class="dropdown-item" href="/community/">Join the Community</a>
<a class="dropdown-item" href="/community/community-team/">Community Team</a>
<a class="dropdown-item" href="/community/community-team/members/">Community Team Members</a>
<a class="dropdown-item" href="/community/community-team/project-roles/">Project Roles</a>
<a class="dropdown-item" href="/community/community-team/community-building-roles/">Community Building Roles</a>
<a class="dropdown-item" href="/community/write-a-blog-post/">Write a Blog Post</a>
<a class="dropdown-item" href="/community/code-of-conduct/">Code of Conduct</a>
<a class="dropdown-item" href="/community/code-of-conduct/enforcement/">Code of Conduct Enforcement</a>
</div>
</li>
<li class="nav-item dropdown ">
<a class="nav-link" href="#" id="navbarInternshipsDropdown" role="button">Internships</a>
<div class="dropdown-menu" aria-labelledby="navbarInternshipsDropdown">
<a class="dropdown-item" href="/internships/">Overview</a>
<a class="dropdown-item" href="/internships/project-ideas/">Project Ideas</a>
<a class="dropdown-item" href="/internships/applicant-guide/">Applicant Guide</a>
<a class="dropdown-item" href="/internships/intern-guide/">Intern Guide</a>
<a class="dropdown-item" href="/internships/mentor-guide/">Mentor Guide</a>
<a class="dropdown-item" href="/internships/history/">History</a>
</div>
</li>
<li class="nav-item active">
<a class="nav-link" href="/cc-search/">CC Search</a>
</li>
<li class="nav-item dropdown ">
<a class="nav-link" href="#" id="navbarArchivesDropdown" role="button">Archives</a>
<div class="dropdown-menu" aria-labelledby="navbarArchivesDropdown">
<a class="dropdown-item" href="/archives/old-tech-blog">CC Tech Blog (2007-2014)</a>
<a class="dropdown-item" href="https://lists.ibiblio.org/pipermail/cc-devel/">cc-devel mailing list (2005-2015)</a>
</div>
</li>
<li class="nav-item">
<a class="nav-link" href="https://creativecommons.org" target="_blank">creativecommons.org</a>
</li>
</ul>
</nav>
</div>
<a href="https://github.com/creativecommons/creativecommons.github.io-source" target="_blank" class="github-corner" aria-label="View source on GitHub"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; left: 0; transform: scale(-1, 1);" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg>
</a>
</div>
</div>
</header>
<div class="container-fluid page-content">
<div class="row justify-content-md-center">
<div class="col-lg-9 col-md-9 col-sm-12 content-wrap">
<div class="page py-3">
<h1 class="page-title pb-3 mb-4">CC Search</h1>
<p>The largest open source product in CC’s portfolio is CC Search: a search engine for CC licensed and public domain creative works. The product involves the following components:</p>
<ul>
<li><b>CC Catalog</b>, where CC licensed and public domain works are ingested. [<a href="https://github.com/creativecommons/cccatalog">GitHub Repo</a>]</li>
<li><b>CC Catalog API</b>, our open API, used by CC Search to make the Catalog discoverable. [<a href="https://github.com/creativecommons/cccatalog-api">GitHub Repo</a>][<a href="https://api.creativecommons.engineering/v1/">Developer Documentation</a>]</li>
<li><b>CC Search</b>, our user facing search engine. [<a href="https://github.com/creativecommons/cccatalog-frontend">GitHub Repo</a>][<a href="https://search.creativecommons.org">Website</a>]</li>
</ul>
<p>CC Catalog ingests and processes CC licensed and public domain works, then makes that data available to CC Catalog API. CC Catalog API is publicly accessible, and is used to serve the data from the catalog to CC Search.</p>
<h3>What Are We Up To?</h3>
<p>The highlights on our roadmap for the current and upcoming quarters are as follows:</p>
<div>
<h4>Q3 2020</h4>
<table class="table table-striped">
<thead class="thead-dark">
<tr>
<th scope="col">Task Name</th>
<th scope="col">Task Description</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">Improve Search Algorithm with Popularity Data Integration</th>
<td scope="row">Make changes to the search algorithm that incorporate image popularity data gathered from sources that provide it. </td>
</tr>
<tr>
<th scope="row">Move data cleaning pipeline from API to Catalog</th>
<td scope="row">Move our data cleaning code from the ingestion step of the API to the initial data processing step of the Catalog to eliminate unnecessary repetitive data cleaning.</td>
</tr>
<tr>
<th scope="row">Implement architecture for schema for new metadata [AWS Grant]</th>
<td scope="row">Update Catalog schema to include new metadata generated through AWS Rekognition.</td>
</tr>
<tr>
<th scope="row">Plan search algorithm changes for new metadata [AWS Grant]</th>
<td scope="row">Plan out search algorithm changes to incorporate image metadata generated via AWS Rekognition.</td>
</tr>
<tr>
<th scope="row">License Explanation/Compliance Improvements</th>
<td scope="row">Improve how and where we explain licenses, and consider ways to make it easier for reusers to understand and comply with license requirements.</td>
</tr>
<tr>
<th scope="row">Offline old CC Search</th>
<td scope="row">Offline Old Search (oldsearch.creativecommons.org) and redirect traffic to CC Search. Prior to this, build in messaging on Old Search, and support similar functionality on CC Search. See "Meta Search Integration" for related work.</td>
</tr>
<tr>
<th scope="row">Web Monetization: Phase 1</th>
<td scope="row">Research and test potential integrations for Web Monetization into CC Search and other CC web properties.</td>
</tr>
<tr>
<th scope="row">Improved Support Pages</th>
<td scope="row">Improve the support pages on CC Search, which includes the Collections page, for a better experience. Add explanation text for collections, improve flow.</td>
</tr>
<tr>
<th scope="row">Accessibility Improvements</th>
<td scope="row">Make accessibility improvements to the UI.</td>
</tr>
<tr>
<th scope="row">Internationalization Infrastructure</th>
<td scope="row">Build infrastructure necessary for internationalization, to allow CC Search to be accessible in other languages.</td>
</tr>
<tr>
<th scope="row">Improve Common Crawl Infrastructure</th>
<td scope="row">Update our Common Crawl provider infrastructure to:
(1) use Apache Airflow instead of AWS tools like Data Pipeline & Glue for processing data
(2) unify provider processing to use the same base classes as API providers</td>
</tr>
<tr>
<th scope="row">Design Sprint: Audio UI for CC Search</th>
<td scope="row">Designing and prototyping an upcoming user interface for searching for audio on CC Search.</td>
</tr>
<tr>
<th scope="row">Audio Support and Integration</th>
<td scope="row">Design and user test UIs for audio. Ingest a pilot collection of audio to the Catalog, build support in the API. Integrate design to frontend to allow users to search for CC licensed audio.</td>
</tr>
<tr>
<th scope="row">Scraping & Resizing Work for Rekognition [AWS Grant]</th>
<td scope="row">Store a private copy of all the images in the CC Catalog to analyze via machine learning.</td>
</tr>
<tr>
<th scope="row">Run Rekognition on 100m images [AWS Grant]</th>
<td scope="row">Generate metadata via machine learning (using AWS Rekognition) on a set of ~100 million high quality images from the CC Catalog.</td>
</tr>
<tr>
<th scope="row">Switch from Common Crawl to API</th>
<td scope="row">For all possible providers, use their APIs to ingest data into the CC Catalog instead of scraping websites via Common Crawl data.</td>
</tr>
</tbody>
</table>
<h4>Q4 2020</h4>
<table class="table table-striped">
<thead class="thead-dark">
<tr>
<th scope="col">Task Name</th>
<th scope="col">Task Description</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">Search Relevance Improvements: Language Analysis, Quality Metrics, Minimums</th>
<td scope="row">None</td>
</tr>
<tr>
<th scope="row">Plan UI Updates in Response to Metadata [AWS Grant]</th>
<td scope="row">Design updates to the CC Search UI in response to new metadata available as a result of applying machine learning to selected images in the Catalog. At a minimum, we expect new filters will be an option. Integration of design will take place subsequently.</td>
</tr>
<tr>
<th scope="row">Provider Review Automation</th>
<td scope="row">Automate the process of finding new providers of CC-licensed content to index into the CC Catalog.</td>
</tr>
<tr>
<th scope="row">Usage/Reuse Metrics Dashboard</th>
<td scope="row">Build an analytics UI that is fed by Google Analytics and our internal analytics database.</td>
</tr>
<tr>
<th scope="row">Scrape all images and set up feed for new ones</th>
<td scope="row">Once the Rekognition crawl finishes, we want to crawl the rest of the catalog (but not feed them to rekognition). This will give us useful metadata like dimensions and quality.</td>
</tr>
<tr>
<th scope="row">Improve Documentation for Community Contributors</th>
<td scope="row">Create better documentation for community contributors by consolidating internal and public documentation and making it available for everyone.</td>
</tr>
<tr>
<th scope="row">Improve Catalog Deployment and Provisioning</th>
<td scope="row">Manage Catalog deployment and provisioning entirely through infrastructure as code.</td>
</tr>
<tr>
<th scope="row">API documentation improvements</th>
<td scope="row">Make CC Catalog API documentation more accessible to CC Search users, and improve user experience.</td>
</tr>
<tr>
<th scope="row">CC Search HTML Embed</th>
<td scope="row">Design and build an embed of CC Search that can be placed on any website, as a starting point to discover objects in CC Search. Components from Design Library must be used, with the goal of simplicity.</td>
</tr>
<tr>
<th scope="row">Plan use of ccREL for easily adding content to cccatalog</th>
<td scope="row">Plan out the usage of scraping ccREL metadata from the internet to index new content into the CC Catalog.</td>
</tr>
<tr>
<th scope="row">User Persona Redevelopment</th>
<td scope="row">Update CC Search user personas based on user research during 2020.</td>
</tr>
<tr>
<th scope="row">Support multiple languages in CC Search
</th>
<td scope="row">Design and implement seamless support for multiple languages in CC Search, as content in languages becomes available. This is preceded by Internationalization Infrastructure work.</td>
</tr>
<tr>
<th scope="row">Implement UI Updates for new Metadata [AWS Grant]</th>
<td scope="row">Implement design updates to the CC Search UI. Designs will be created in response to new metadata available as a result of applying machine learning to selected images in the Catalog. At a minimum, we expect new filters to be rolled out.</td>
</tr>
<tr>
<th scope="row">Implement Search Algorithm Changes [AWS Grant]</th>
<td scope="row">Update our search algorithm to use metadata gathered using machine learning analysis (using AWS Rekognition).</td>
</tr>
<tr>
<th scope="row">Ensure Infrastructure Code is Open Source [AWS Grant]</th>
<td scope="row">Release the infrastructure code used to power the CC Catalog, API, and CC Search projects publicly.</td>
</tr>
<tr>
<th scope="row">Enrich CC Catalog data with data from Common Crawl</th>
<td scope="row">Enrich CC Catalog and data found in the wild using Common Crawl, for example, to track where CC-licensed content is reused.</td>
</tr>
</tbody>
</table>
</div>
<p>Review our Pipeline of Future Ideas <a href="https://docs.google.com/document/d/1qAZu1_ZltfdVylH6WkWcp5_mBAPa5ilflHWUc7zbIgk/edit#">here</a> if you want to see what else has been suggested for CC Search.</p>
<h3>How Can I Help?</h3>
<h4>Contribute Code</h4>
<p>To contribute code, take the following steps:</p>
<ol>
<li>
Review the following:
<ul>
<li>Our Contribution Guidelines: <a href="/contributing-code/">https://opensource.creativecommons.org/contributing-code/</a></li>
<li>CC Search's Contribution Guidelines: <a href="/contributing-code/cc-search/">https://opensource.creativecommons.org/contributing-code/cc-search/</a></li>
</ul>
</li>
<li>Determine which project works best for you</li>
<li>Read through <a href="/contributing-code/#finding-an-issue">how to find open issues</a>.</li>
<li>Start contributing!</li>
</ol>
<b>We keep track of our work in three projects in GitHub:</b>
<dl>
<dt><a href="https://github.com/orgs/creativecommons/projects/7">CC Search Active Sprint</a></dt>
<dd>• The best place to start! If an issue isn’t in progress yet, and is marked for community contribution, you’ll know it’s our highest priority.</dd>
<dt><a href="https://github.com/orgs/creativecommons/projects/10">CC Search Backlog</a></dt>
<dd>• The column called “Next Sprint” contains what our second highest priority items are.</dd>
<dd>• The current quarter (Q1, Q2, Q3, Q4) will tell you what we plan to work on, up to three months out.</dd>
<dd>• Check out the “Any Time/Community” for some fun tickets that aren’t a high priority for CC staff, but would be great if they got built.</dd>
<dt><a href="https://github.com/orgs/creativecommons/projects/12">CC Catalog Pipeline</a></dt>
<dd>• There are two columns with “Ready for Work” tickets. If they’re not blocked or marked as CC Staff only, we welcome your contribution.</dd>
</dl>
<h4>Contribute Design</h4>
<p>There are two ways you can show your interest in contributing to the design of CC Search:</p>
<ol>
<li>Follow <a href="https://opensource.creativecommons.org/contributing-code/cc-search/">the steps for suggesting a new feature for CC Search</a></li>
<li>Join the #cc-search channel in the <a href="https://slack-signup.creativecommons.org/">Creative Commons Slack</a> and start a conversation about your design ideas.</li>
</ol>
<p>Before you start work on any design project, get familiar with <a href="https://www.figma.com/file/l4Mt3dn3Ndtrvrb4aLcwXI/Design-Library?node-id=1433%3A0">our design library in Figma</a>. All CC Search designs use this design library.</p>
<h4>Participate in Usability Tests and User Interviews</h4>
<p>When we’re rolling out a specific feature, we do usability tests to test the proposed experience.</p>
<p>At any point in time, we’re engaging with our users through user interviews, where we learn more about attitudes towards the product as it stands, and dig into expansion areas we’re considering.</p>
<p>If you’re interested, we invite you to sign up for a time <a href="https://calendly.com/cc-product-design/usability-test-30">via this Calendly link</a>.</p>
<p>Visit our <a href="/contributing-code/usability/">Usability page</a> for more details on how to participate.</p>
<a id="back-to-top" href="#top" class="btn btn-dark btn-sm" role="button">Back to top</a>
</div>
</div>
</div>
</div>
<footer class="main-footer bg-dark">
<div class="container-fluid">
<div class="row justify-content-md-center">
<div class="col-lg-9 col-md-9 col-sm-12 footer text-light py-4 px-3">
<small>
<p><a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License"
style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a></p>
<p class="text-muted">All the content on this website is licensed under a <strong><a rel="license"
href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International
License</a></strong> unless otherwise specified.</p>
</small>
</div>
</div>
</div>
</footer>
</body>