Skip to content

Commit 51fdea6

Browse files
committed
Merge branch 'ayan-patch-1' of github.com:AyanChoudhary/creativecommons.github.io-source into ayan-patch-1
2 parents f58829a + abaf95b commit 51fdea6

File tree

4 files changed

+86
-18
lines changed

4 files changed

+86
-18
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
title: Smithsonian Unit Code Update
2+
---
3+
categories:
4+
5+
cc-catalog
6+
gsoc
7+
gsoc-2020
8+
---
9+
author: charini
10+
---
11+
series: gsoc-2020-cccatalog
12+
---
13+
pub_date: 2020-08-03
14+
---
15+
body:
16+
## Introduction
17+
The Creative Commons (CC) Catalog project collects and stores CC licensed images scattered across the internet, such
18+
that they can be made accessible to the general public via the [CC Search][cc_search] and [CC Catalog API][cc_api]
19+
tools. Numerous information associated with each image, which help in the image search and categorisation process are
20+
stored via CC Catalog in the CC database.
21+
22+
In my [previous blog post][flickr_blog_post] of this series entitled 'Flickr Sub-provider Retrieval', I discussed how
23+
the images from a certain provider (such as Flickr) can be categorised based on the sub-provider values (which reflects
24+
the underlying organisation or entity that published the images through the provider). We have similarly implemented
25+
the sub-provider retrieval logic for Europeana and Smithsonian providers. Unlike in Flickr and Europeana, every single
26+
image from Smithsonian is categorised under some sub-provider value where the sub-providers are identified based on a
27+
unit code value as contained in the API response (for more information please refer to the pull request [#455][pr_455]).
28+
The unit code values and the corresponding sub provider values are maintained in the dictionary
29+
*SMITHSONIAN_SUB_PROVIDERS*. However, there is the possibility of the *unit code* values being updated at the
30+
Smithsonian API level, and it is important that we have a mechanism of reflecting those updates in the
31+
*SMITHSONIAN_SUB_PROVIDERS* dictionary as well. In this blog post, we discuss how we learn the potential
32+
changes to the *unit code* values and keep the *SMITHSONIAN_SUB_PROVIDERS* dictionary up-to-date.
33+
34+
[cc_search]: https://ccsearch.creativecommons.org/
35+
[cc_api]: https://api.creativecommons.engineering/v1/
36+
[flickr_blog_post]: ../flickr-sub-provider-retrieval/
37+
[pr_455]: https://github.com/creativecommons/cccatalog/pull/455
38+
39+
## Implementation
40+
### Retrieving the latest unit codes
41+
We are required to obtain the latest *unit codes* supported by the Smithsonian API to achieve this task. Furthermore,
42+
since we are only interested in image data, the *unit codes* which are associated with images alone need to be
43+
retrieved. The latest Smithsonian *unit codes* corresponding to images can be retrieved by calling the end point
44+
https://api.si.edu/openaccess/api/v1.0/terms/unit_code?q=online_media_type:Images&api_key=REDACTED
45+
46+
### Check for unit code updates
47+
In order to identify whether changes have occurred to the collection of *unit codes* supported by the Smithsonian API
48+
(in the form of additions and/or deletions), we compare the values retrieved by calling the previously mentioned
49+
endpoint, with the values contained in the *SMITHSONIAN_SUB_PROVIDERS* dictionary. All changes are reflected in a table
50+
named *smithsonian_new_unit_codes* which contains the two fields, 'new_unit_code' and 'action'. If a new *unit code* is
51+
introduced at the API level, we store that *unit code* value with the corresponding action value 'add' in the table.
52+
This reflects that the given *unit code* value needs to be added to the *SMITHSONIAN_SUB_PROVIDERS dictionary*. If a
53+
*unit code* that appears in the *SMITHSONIAN_SUB_PROVIDERS* dictionary does not appear at the API level, we store
54+
the *unit code* value with the corresponding action value 'delete' in the table, reflecting that it needs to be deleted
55+
from the dictionary.
56+
57+
### Triggering the unit code update workflow
58+
A separate workflow named *check_new_smithsonian_unit_codes_workflow* allows executing the logic we discussed via the
59+
Airflow UI. For each execution, the table *smithsonian_new_unit_codes* is completely cleared of previous data, and the
60+
latest updates to reflect in the *SMITHSONIAN_SUB_PROVIDERS* dictionary are stored. Note that the actual updates to
61+
the dictionary (as reflected in the table) needs to be carried out by a person, since editing the dictionary is not
62+
automated. Furthermore, this workflow is expected to be executed at-least once a week, preferably prior to running
63+
the Smithsonian image retrieval script such that the Smithsonian sub-provider retrieval task can be run with no issue.
64+
65+
## Acknowledgement
66+
I express my gratitude to my GSoC supervisor Brent Moran for assisting me with this task.

content/community/community-team/contents.lr

+2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Please read more about the roles here:
3434

3535
Please note that we do not expect you to do work just because you have a role on the Community Team. Any role you are granted is based on appreciation for the work you’ve _already_ done.
3636

37+
Another noteworthy point is that although the roles grant increasing levels of access, they are not intended to be a hierarchy that you _must_ climb. Instead, they are aimed at reflecting the level of involvement you intend to have with the project. Provided you meet the requirements for a particular role, you may switch between them as desired.
38+
3739
## How to Apply
3840

3941
Please apply via [this Google Form](https://forms.gle/zDmp4yu2Yw2ktcsR8).

content/community/community-team/project-roles/contents.lr

+10-18
Original file line numberDiff line numberDiff line change
@@ -25,28 +25,13 @@ If you’ve been accepted as a Project Contributor, you are encouraged to:
2525
* participate in discussions in Slack or via email.
2626
* review pull requests opened by other contributors.
2727

28-
## Project Member
29-
**Who should apply:** If you’ve made multiple contributions that took you a couple of hours each to complete, you should apply for this role.
28+
## Project Collaborator
29+
**Who should apply:** If you’ve made a few significant contributions to the project (added new features, for example) and know the project’s overall codebase pretty well, you should apply.
3030

3131
**What does this role give you?**
3232
* Everything a Project Contributor gets.
3333
* You’ll be added to the `creativecommons` GitHub organization and given [Triage](https://help.github.com/en/github/setting-up-and-managing-organizations-and-teams/repository-permission-levels-for-an-organization#repository-access-for-each-permission-level) permissions to the project repository.
34-
* You’ll receive previews of upcoming changes to the Community Team program.
3534
* You’ll be able to assign people and change labels on GitHub issues associated with the project.
36-
37-
**Guidelines for Project Members**
38-
If you’ve been accepted as a Project Member, you are encouraged to:
39-
* do everything a Project Contributor does
40-
* review and triage new issues
41-
* ask the issue author for more details if appropriate
42-
* check with the project maintainers if the issue makes sense
43-
* update the labels on the issue appropriately once you have all the information you need (e.g. remove “awaiting triage” label)
44-
45-
## Project Collaborator
46-
**Who should apply:** If you’ve made a few significant contributions to the project (added new features, for example) and know the project’s overall codebase pretty well, you should apply.
47-
48-
**What does this role give you?**
49-
* Everything a Project Member gets.
5035
* You’ll be added to the `CODEOWNERS` file for the project.
5136
* This will allow your PR reviews to block merge.
5237
* This will auto assign you PRs to review.
@@ -56,11 +41,18 @@ If you’ve been accepted as a Project Member, you are encouraged to:
5641

5742
**Guidelines for Project Collaborators**
5843
If you’ve been accepted as a Project Collaborator, you are encouraged to:
59-
* do everything a Project Member does
44+
* do everything a Project Contributor does
45+
* review and triage new issues
46+
* ask the issue author for more details if appropriate
47+
* check with the project maintainers if the issue makes sense
48+
* update the labels on the issue appropriately once you have all the information you need (e.g. remove “awaiting triage” label)
6049
* review assigned pull requests to unblock merges.
6150
* participate in discussions in the new meetings and channels you’ve been added to.
6251
* identify promising contributors to the project and invite them to join the Community Team.
6352

53+
**Note**
54+
The role of Project Member was deprecated in July 2020 and all members were redesignated as collaborators.
55+
6456
## Project Core Committer
6557
**Who should apply:** If you’ve made many significant contributions to the project, know the codebase really well, and are interested in active maintenance of the project, you should apply.
6658

databags/community_team_members.json

+8
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,10 @@
4545
{
4646
"name": "Onyenanu Princewill",
4747
"role": "Project Contributor"
48+
},
49+
{
50+
"name": "Dhruv Bhanushali",
51+
"role": "Project Collaborator"
4852
}
4953
],
5054
"name": "CC Catalog API",
@@ -67,6 +71,10 @@
6771
{
6872
"name": "Abhishek Naidu",
6973
"role": "Project Collaborator"
74+
},
75+
{
76+
"name": "Dhruv Bhanushali",
77+
"role": "Project Collaborator"
7078
}
7179
],
7280
"name": "CC Search",

0 commit comments

Comments
 (0)