You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cc-search/index.html
+42-62
Original file line number
Diff line number
Diff line change
@@ -180,43 +180,33 @@ <h4>Q3 2020</h4>
180
180
</tr>
181
181
182
182
<tr>
183
-
<thscope="row">Improve Catalog Deployment and Provisioning</th>
184
-
<tdscope="row">Manage Catalog deployment and provisioning entirely through infrastructure as code.</td>
185
-
</tr>
186
-
187
-
<tr>
188
-
<thscope="row">Improve Documentation for Community Contributors</th>
189
-
<tdscope="row">Create better documentation for community contributors by consolidating internal and public documentation and making it available for everyone.</td>
183
+
<thscope="row">Implement architecture for schema for new metadata [AWS Grant]</th>
184
+
<tdscope="row">Update Catalog schema to include new metadata generated through AWS Rekognition.</td>
190
185
</tr>
191
186
192
187
<tr>
193
188
<thscope="row">Plan search algorithm changes for new metadata [AWS Grant]</th>
194
189
<tdscope="row">Plan out search algorithm changes to incorporate image metadata generated via AWS Rekognition.</td>
195
190
</tr>
196
191
197
-
<tr>
198
-
<thscope="row">Implement architecture for schema for new metadata [AWS Grant]</th>
199
-
<tdscope="row">Update Catalog schema to include new metadata generated through AWS Rekognition.</td>
<tdscope="row">Improve how and where we explain licenses, and consider ways to make it easier for reusers to understand and comply with license requirements.</td>
205
195
</tr>
206
196
207
197
<tr>
208
-
<thscope="row">Improved Support Pages</th>
209
-
<tdscope="row">Improve the support pages on CC Search, which includes the Collections page, for a better experience. Add explanation text for collections, improve flow.</td>
198
+
<thscope="row">Offline old CC Search</th>
199
+
<tdscope="row">Offline Old Search (oldsearch.creativecommons.org) and redirect traffic to CC Search. Prior to this, build in messaging on Old Search, and support similar functionality on CC Search. See "Meta Search Integration" for related work.</td>
210
200
</tr>
211
201
212
202
<tr>
213
-
<thscope="row">Design Sprint: Meta Search Integration</th>
214
-
<tdscope="row">Integrating meta search functionality into CC Search for sources that are not currently indexed, and content types we do not currently support.</td>
203
+
<thscope="row">Web Monetization: Phase 1</th>
204
+
<tdscope="row">Research and test potential integrations for Web Monetization into CC Search and other CC web properties.</td>
215
205
</tr>
216
206
217
207
<tr>
218
-
<thscope="row">Offline old CC Search</th>
219
-
<tdscope="row">Offline Old Search (oldsearch.creativecommons.org) and redirect traffic to CC Search. Prior to this, build in messaging on Old Search, and support similar functionality on CC Search. See "Meta Search Integration" for related work.</td>
208
+
<thscope="row">Improved Support Pages</th>
209
+
<tdscope="row">Improve the support pages on CC Search, which includes the Collections page, for a better experience. Add explanation text for collections, improve flow.</td>
220
210
</tr>
221
211
222
212
<tr>
@@ -229,16 +219,6 @@ <h4>Q3 2020</h4>
229
219
<tdscope="row">Build infrastructure necessary for internationalization, to allow CC Search to be accessible in other languages.</td>
230
220
</tr>
231
221
232
-
<tr>
233
-
<thscope="row">Design Sprint: Audio UI for CC Search</th>
234
-
<tdscope="row">Designing and prototyping an upcoming user interface for searching for audio on CC Search.</td>
235
-
</tr>
236
-
237
-
<tr>
238
-
<thscope="row">Audio Support and Integration</th>
239
-
<tdscope="row">Design and user test UIs for audio. Ingest a pilot collection of audio to the Catalog, build support in the API. Integrate design to frontend to allow users to search for CC licensed audio.</td>
240
-
</tr>
241
-
242
222
<tr>
243
223
<thscope="row">Improve Common Crawl Infrastructure</th>
244
224
<tdscope="row">Update our Common Crawl provider infrastructure to:
@@ -247,43 +227,51 @@ <h4>Q3 2020</h4>
247
227
</tr>
248
228
249
229
<tr>
250
-
<thscope="row">Use Data Dumps for Wikimedia Ingestion</th>
251
-
<tdscope="row">Switch our Catalog data ingestion for Wikimedia Commons to use the data dumps provided by Wikimedia instead of the MediaWiki API.</td>
230
+
<thscope="row">Design Sprint: Audio UI for CC Search</th>
231
+
<tdscope="row">Designing and prototyping an upcoming user interface for searching for audio on CC Search.</td>
252
232
</tr>
253
233
254
234
<tr>
255
-
<thscope="row">Web Monetization: Phase 1</th>
256
-
<tdscope="row">Research and test potential integrations for Web Monetization into CC Search and other CC web properties.</td>
235
+
<thscope="row">Audio Support and Integration</th>
236
+
<tdscope="row">Design and user test UIs for audio. Ingest a pilot collection of audio to the Catalog, build support in the API. Integrate design to frontend to allow users to search for CC licensed audio.</td>
257
237
</tr>
258
238
259
239
<tr>
260
-
<thscope="row">Scraping & Resizing Work [AWS Grant]</th>
240
+
<thscope="row">Scraping & Resizing Work for Rekognition [AWS Grant]</th>
261
241
<tdscope="row">Store a private copy of all the images in the CC Catalog to analyze via machine learning.</td>
262
242
</tr>
263
243
264
244
<tr>
265
-
<thscope="row">Wikidata integration with Catalog & Search Algorithm</th>
266
-
<tdscope="row">Collect and use structured data from Wikidata to enhance our search algorithm with semantic search.</td>
267
-
</tr>
268
-
269
-
<tr>
270
-
<thscope="row">Usage/Reuse Metrics Dashboard</th>
271
-
<tdscope="row">Build an analytics UI that is fed by Google Analytics and our internal analytics database.</td>
245
+
<thscope="row">Run Rekognition on 100m images [AWS Grant]</th>
246
+
<tdscope="row">Generate metadata via machine learning (using AWS Rekognition) on a set of ~100 million high quality images from the CC Catalog.</td>
272
247
</tr>
273
248
274
249
<tr>
275
250
<thscope="row">Switch from Common Crawl to API</th>
276
251
<tdscope="row">For all possible providers, use their APIs to ingest data into the CC Catalog instead of scraping websites via Common Crawl data.</td>
277
252
</tr>
278
253
254
+
</tbody>
255
+
</table>
256
+
257
+
<h4>Q4 2020</h4>
258
+
<tableclass="table table-striped">
259
+
<theadclass="thead-dark">
260
+
<tr>
261
+
<thscope="col">Task Name</th>
262
+
<thscope="col">Task Description</th>
263
+
</tr>
264
+
</thead>
265
+
<tbody>
266
+
279
267
<tr>
280
-
<thscope="row">Run Rekognition on 100m images [AWS Grant]</th>
281
-
<tdscope="row">Generate metadata via machine learning (using AWS Rekognition) on a set of ~100 million high quality images from the CC Catalog.</td>
268
+
<thscope="row">Search Relevance Improvements: Language Analysis, Quality Metrics, Minimums</th>
269
+
<tdscope="row">None</td>
282
270
</tr>
283
271
284
272
<tr>
285
-
<thscope="row">Upgrade Catalog: Data Lake</th>
286
-
<tdscope="row">Upgrade the CC Catalog database to use a schema-less database instead of the relational database (Postgres) that we currently use.</td>
273
+
<thscope="row">Plan UI Updates in Response to Metadata [AWS Grant]</th>
274
+
<tdscope="row">Design updates to the CC Search UI in response to new metadata available as a result of applying machine learning to selected images in the Catalog. At a minimum, we expect new filters will be an option. Integration of design will take place subsequently.</td>
287
275
</tr>
288
276
289
277
<tr>
@@ -292,32 +280,24 @@ <h4>Q3 2020</h4>
292
280
</tr>
293
281
294
282
<tr>
295
-
<thscope="row">Implement Use of Thumbnails in Search & Catalog [AWS Grant]</th>
296
-
<tdscope="row">Implement changes to CC Search (frontend) and Catalog to make use of thumbnails, as they become available.</td>
283
+
<thscope="row">Usage/Reuse Metrics Dashboard</th>
284
+
<tdscope="row">Build an analytics UI that is fed by Google Analytics and our internal analytics database.</td>
297
285
</tr>
298
286
299
287
<tr>
300
-
<thscope="row">Partnership guidelines for all integration types</th>
301
-
<tdscope="row">Prepare partnership guidelines for CC Search. Create a page on CC Search publishing these guidelines.</td>
288
+
<thscope="row">Scrape all images and set up feed for new ones</th>
289
+
<tdscope="row">Once the Rekognition crawl finishes, we want to crawl the rest of the catalog (but not feed them to rekognition). This will give us useful metadata like dimensions and quality.</td>
302
290
</tr>
303
291
304
292
<tr>
305
-
<thscope="row">Plan UI Updates in Response to Metadata [AWS Grant]</th>
306
-
<tdscope="row">Design updates to the CC Search UI in response to new metadata available as a result of applying machine learning to selected images in the Catalog. At a minimum, we expect new filters will be an option. Integration of design will take place subsequently.</td>
293
+
<thscope="row">Improve Documentation for Community Contributors</th>
294
+
<tdscope="row">Create better documentation for community contributors by consolidating internal and public documentation and making it available for everyone.</td>
307
295
</tr>
308
296
309
-
</tbody>
310
-
</table>
311
-
312
-
<h4>Q4 2020</h4>
313
-
<tableclass="table table-striped">
314
-
<theadclass="thead-dark">
315
-
<tr>
316
-
<thscope="col">Task Name</th>
317
-
<thscope="col">Task Description</th>
318
-
</tr>
319
-
</thead>
320
-
<tbody>
297
+
<tr>
298
+
<thscope="row">Improve Catalog Deployment and Provisioning</th>
299
+
<tdscope="row">Manage Catalog deployment and provisioning entirely through infrastructure as code.</td>
0 commit comments