Skip to content

Commit 0f94d59

Browse files
committed
Merge master
2 parents e04717a + ac78fc6 commit 0f94d59

File tree

42 files changed

+490
-104
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+490
-104
lines changed

.github/CODEOWNERS

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# These owners will be the default owners for everything in
22
# the repo. Unless a later match takes precedence, they will
33
# be requested for review when someone opens a pull request.
4-
* @creativecommons/engineering @creativecommons/ct-cc-open-source-core-committers
4+
* @creativecommons/engineering @creativecommons/ct-cc-open-source-core-committers @creativecommons/ct-cc-open-source-collaborators
55

66
# These users own any files in the specified directory and
77
# any of its subdirectories.

content/archives/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ body:
66

77
This section contains archives related to older CC projects.
88

9-
* [CC Tech Blog (2007-2014)](/archives/old-tech-blog/entries)
9+
* [CC Tech Blog (2007-2014)](/archives/old-tech-blog/entries/)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
username: AyanChoudhary
2+
---
3+
name: Ayan Choudhary
4+
---
5+
md5_hashed_email: ba5f8ac4afb162644051544e25b5cfe8
6+
---
7+
about:
8+
Ayan Choudhary is an Electrical Engineering undergraduate student from India and will be interning with Creative Commons during the summer. He has been involved with coding quite heavily for the past couple of years which is one of his numerous hobbies. Some of the sectors which really fascinate him include network security, blockchain, and data science. Apart from this he loves reading and painting and is quite interested in PC gaming and binge-watching online shows.
9+
He is working on [ccsearch](https://github.com/creativecommons/cccatalog-frontend) as a part of GSoC20.
10+
He is `@ayan` on slack.

content/blog/authors/ahmadbilaldev/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ md5_hashed_email: 870502bc55d77d77522ad3f27876b511
77
about:
88
Ahmad Bilal is a Computer Science undergrad from UET Lahore, who likes computers, problems and using the former to solve the later. He is always excited about Open Source, and is currently focused on Node.js, Serverless, GraphQL, Cloud, Gatsby.js with React.js and WordPress. He likes organizing meetups, conferences and meeting new people. Cats are his weakness, and he is a sucker for well-engineered cars.
99

10-
Ahmad is working on [the CC WordPress plugin](https://github.com/creativecommons/creativecommons-wordpress-plugin) as part of [Google Summer of Code 2019](/gsoc-2019).
10+
Ahmad is working on [the CC WordPress plugin](https://github.com/creativecommons/creativecommons-wordpress-plugin) as part of [Google Summer of Code 2019](/gsoc-2019/).

content/blog/authors/conye/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ md5_hashed_email: 9088efad6d512ef79556a3b6adcf048f
77
about:
88
Chidiebere Onyegbuchulem is a Frontend developer based in Lagos, Nigeria.
99

10-
Chidi is currently working on [CC Vocabulary](https://github.com/creativecommons/cc-vocabulary) as part of 2019-2020 [Outreachy Internship](/programs/outreachy/2019-12-start).
10+
Chidi is currently working on [CC Vocabulary](https://github.com/creativecommons/cc-vocabulary) as part of 2019-2020 [Outreachy Internship](/programs/outreachy/2019-12-start/).

content/blog/authors/dhruvkb/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ md5_hashed_email: 0eab64adad056cff2492e7c407a9aa21
77
about:
88
Dhruv Bhanushali is a Mumbai-based software developer and an Engineering-Physics graduate from IIT Roorkee. He started programming as a hobby in high-school and having found his calling, is now pursuing a career in the field. He is a huge fan of alternative and post-rock music and keeps his curated collection with him at all times. He also loves to binge watch TV shows and movies, especially indie art films.
99

10-
Dhruv developed [CC Vocabulary](https://opensource.creativecommons.org/cc-vocabulary/) as part of [Google Summer of Code 2019](/gsoc-2019) and now is a maintainer for the project. He is consistently [`@dhruvkb`](https://dhruvkb.github.io/) everywhere.
10+
Dhruv developed [CC Vocabulary](https://opensource.creativecommons.org/cc-vocabulary/) as part of [Google Summer of Code 2019](/gsoc-2019/) and now is a maintainer for the project. He is consistently [`@dhruvkb`](https://dhruvkb.github.io/) everywhere.

content/blog/authors/obulat/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ md5_hashed_email: acd34b5434369aeaf31de8ea94368bf0
77
about:
88
[Olga](https://creativecommons.org/author/obulat/) is a developer based in Istanbul, Turkey. She loves programming in Python and Javascript. Her main areas of interest are web development, Natural Language Processing, languages, geography and education. Apart from that, she is busy raising her (soon to be) three kids.
99

10-
Olga is currently working on improving [the CC License Chooser](https://github.com/creativecommons/cc-chooser) as part of 2019-2020 [Outreachy Internship](/programs/outreachy).
10+
Olga is currently working on improving [the CC License Chooser](https://github.com/creativecommons/cc-chooser) as part of 2019-2020 [Outreachy Internship](/programs/outreachy/).

content/blog/authors/soccerdroid/contents.lr

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@ name: María Belén Guaranda Cabezas
55
md5_hashed_email: a177edcce952c2c82ac8716a4586a28f
66
---
77
about:
8-
Maria is an undergraduate Computer Science student from ESPOL, in Ecuador. She has worked for the past 2 years as a research assistant. She has worked in projects including computer vision, the estimation of socio-economic indexes through CDRs analysis, and a machine learning model with sensors data. During her spare time, she likes to watch animes and read. She loves sports, especially soccer. She is also committed to environmental causes, and she is a huge fan of cats and dogs (she has 4 and 1 respectively).
8+
Maria is a Bachelor of Computer Science from Ecuador. As a research assistant, she worked in projects including computer vision, the estimation of socio-economic indexes through CDRs analysis, and a machine learning model with sensors data. During her spare time, she likes to watch animes and read. She loves sports, especially soccer. She is also committed to environmental causes, and she is a huge fan of cats and dogs (she has 4 and 1 respectively).
99

10-
Maria is working on [data visualizations of the CC Catalog](https://github.com/creativecommons/cccatalog-dataviz) as part of [Google Summer of Code 2019](/gsoc-2019).
10+
Maria worked in the [data visualizations of the CC Catalog](https://github.com/creativecommons/cccatalog-dataviz) as part of [Google Summer of Code 2019](/gsoc-2019/), and is currently a mentor in this year's edition of the program.
+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
username: subhamX
2+
---
3+
name: Subham Sahu
4+
---
5+
md5_hashed_email: 1ca2562f3046509e3273fe5afd3fdab2
6+
---
7+
about:
8+
Subham Sahu is an undergraduate student from Indian Institute Of Technology, Ropar. He is currently working on the [Linked Commons](https://github.com/creativecommons/cccatalog-dataviz) as part of [Google Summer of Code 2020](/gsoc-2020/).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
name: cc-dataviz
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
title: CC Search, Proposal Drafting and Community Bonding
2+
---
3+
categories:
4+
cc-search
5+
community
6+
gsoc
7+
open-source
8+
9+
---
10+
author: AyanChoudhary
11+
---
12+
series: gsoc-2020-ccsearch-accessibility
13+
---
14+
pub_date: 2020-05-22
15+
---
16+
body:
17+
18+
### Proposal Drafting
19+
20+
The majority of my time in March was spent on drafting the proposal for my project **Improve CC Search Accessibility**.
21+
While drafting my proposal I had two broad topic that I had to focus on: Accessibility and Internationalization.
22+
23+
So the first thing which I did was go through the various resources available with me such as w3 guidelines for accessibility, dequeuniversity accessibility insights and MDN notes on accessibility.
24+
After I made myself acquainted myself wih all of these, the next challenge was to sort out which of the metrics were relevant and important enough to be detailed in the proposal and also some of the others metrics which made notable appearances.
25+
Finally by including all of these I had the accessibility part of my proposal complete. Next, I had to work out the part for internationalization. Since it was already decided upon that we will be using vue-i18n, I did some research as to how to we can leverage it to gain the best possible result.
26+
27+
One of the important parts of internationalization happens to be deciding upon the JSON structure which was a highlighted section in my proposal.
28+
The other notable sections included strategies for modification of templates while translating and also how the translations would be carried out without hindering any further development of the platform.
29+
30+
### Community Bonding
31+
32+
Community Bonding involved getting to the mentors and the people whom I will be working with during this internship. Also we decided upon running the audit tests for the cc-search website during this time as it would help identify the key issues we would be facing and also would provide a suitable foundation to start working upon.
33+
The audits were done using Lighthouse, Accessibility Insights and pa11y and they provided useful insights on which parts of the website we should be focusing on such as the contrast issues and the aria-label fixes.
34+
35+
Coming up next will be the progress on the first 2 weeks of the project.
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
title: CC Search, Setting up vue-i18n and internationalizing homepage
2+
---
3+
categories:
4+
cc-search
5+
community
6+
gsoc
7+
open-source
8+
9+
---
10+
author: AyanChoudhary
11+
---
12+
series: gsoc-2020-ccsearch-accessibility
13+
---
14+
pub_date: 2020-06-10
15+
---
16+
body:
17+
18+
These are the first two weeks of my internship with CC. I am working on improving the accessibility of cc-search and internationalizing it as well.
19+
We started with first compiling the accessibility reports from accessibility insights, lighthouse and pa11y into a single document and then opening up appropriate issues ont he repo to address them.
20+
21+
The accessibility issues are listed here:
22+
1. [Accessibility - Improve labels](https://github.com/creativecommons/cccatalog-frontend/issues/996)
23+
2. [Evaluate keyboard navigation effectiveness](https://github.com/creativecommons/cccatalog-frontend/issues/997)
24+
3. [Fix color contrast problems](https://github.com/creativecommons/cccatalog-frontend/issues/998)
25+
4. [Improve elements markup](https://github.com/creativecommons/cccatalog-frontend/issues/999)
26+
5. [Evaluate any accessibility linter tools](https://github.com/creativecommons/cccatalog-frontend/issues/1000)
27+
28+
The decision was made to audit the tab indices along with internationalizing the page.
29+
The accessibility changes will be done after the completion of internationalization as the aria-labels will have to be internationalized as well.
30+
31+
The first two weeks involved setting up vue-i18n, auditing the tab index for homepage and internationalizing it.
32+
The tab index adit for homepage is displayed:
33+
34+
![audit.png](audit.png)
35+
36+
The internationalization part was pretty straightforward, we just had to export all the strings to the JSON files and load transaltions through the i18n module.
37+
For complex elements of the type ```string <tag>string</tag> string``` I went for the templating method.
38+
Here we use the v-slot attribute of the i18n functional component to convert the element into a template where the tag occupies a slot in the syntax.
39+
40+
```
41+
<i18n path="footer.caption.label" tag="p" class="caption">
42+
<template v-slot:noted>
43+
<a href="https://creativecommons.org/policies#license" target="_blank" rel="noopener">{{$t('footer.caption.noted')}}</a>
44+
</template>
45+
<template v-slot:attribution>
46+
<a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" rel="noopener">
47+
{{$t('footer.caption.attribution')}}
48+
</a>
49+
</template>
50+
<template v-slot:icons>
51+
<a href="https://fontawesome.com/" target="_blank" rel="noopener" class="has-text-white">
52+
{{$t('footer.caption.icons')}}
53+
</a>
54+
</template>
55+
</i18n>
56+
```
57+
58+
The final outcome looks pretty good:
59+
60+
![final.png](final.png)
61+
62+
And voila we are done with the first two weeks. I also internationalized the header and the footer along with the homepage.
63+
You can track the work done for these weeks through these PRs:
64+
65+
1. [setup internationalization plugin](https://github.com/creativecommons/cccatalog-frontend/pull/1007)
66+
2. [Internationalize homepage, header and footer](https://github.com/creativecommons/cccatalog-frontend/pull/1013)
67+
68+
The progress of the project can be tracked on [cc-search](https://github.com/creativecommons/cccatalog-frontend)
69+
70+
CC Search Accessiblity is my GSoC 2020 project under the guidance of [Ari Madian](https://opensource.creativecommons.org/blog/authors/akmadian/), who is the primary mentor for this project, [Anna Tumadóttir](https://creativecommons.org/author/annacreativecommons-org/) for helping all along and engineering director [Kriti
71+
Godey](https://creativecommons.org/author/kriticreativecommons-org/), have been very supportive.
Loading

content/blog/entries/cc-vocabulary-the-main-course/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ brainchild comes of age.
8585
There are a number of components under construction right now such as cards and social media buttons. They will be
8686
published on the styleguide very soon. After these, the final month, phase III of of the project's GSoC term will be
8787
spent continuously polishing the project to suit the needs of all CC apps as discovered during the integration with
88-
CC Search as mentioned by [Breno Ferreira](/blog/authors/brenoferreira) in the 'Next steps' in his post on
88+
CC Search as mentioned by [Breno Ferreira](/blog/authors/brenoferreira/) in the 'Next steps' in his post on
8989
[CC Search Redesign](/blog/entries/cc-search-redesign/).
9090

9191
In keeping with the culinary theme of this post, think of it as sweet sweet dessert.

content/blog/entries/cc-vocabulary-week9-13/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,4 +54,4 @@ Before this internship, I had just switched careers from Network engineering to
5454

5555
I will continue to contribute to CC open source projects especially to the Vocabulary project that I have become a part of. I would love to see the application of Vocabulary to the development of other CC platforms and applications. I also want to apply the skills that I have acquired to get a full-time software developer position.
5656

57-
My special appreciation to Outreachy for this opportunity, the entire CC team especially those I worked with, My mentors [Hugo Solar](/blog/authors/hugosolar) and [Dhruv Bhanushali](/blog/authors/dhruvkb) for their guidance, direction, and help whenever I got stuck, also to the Director of Engineering [Kriti Godey](/blog/authors/kgodey) for always checking up on me ensuring I had a wonderful internship experience.
57+
My special appreciation to Outreachy for this opportunity, the entire CC team especially those I worked with, My mentors [Hugo Solar](/blog/authors/hugosolar/) and [Dhruv Bhanushali](/blog/authors/dhruvkb/) for their guidance, direction, and help whenever I got stuck, also to the Director of Engineering [Kriti Godey](/blog/authors/kgodey/) for always checking up on me ensuring I had a wonderful internship experience.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
title: Data flow: from API to DB
2+
---
3+
categories:
4+
5+
cc-catalog
6+
airflow
7+
gsoc
8+
gsoc-2020
9+
---
10+
author: srinidhi
11+
---
12+
series: gsoc-2020-cccatalog
13+
---
14+
pub_date: 2020-07-22
15+
---
16+
body:
17+
18+
## Introduction
19+
The CC Catalog project handles the flow of image metadata from the source or
20+
provider and loads it to the database, which is then surfaced to the [CC
21+
search][CC_search] tool. The workflows are set up for each provider to gather
22+
metadata about CC licensed images. These workflows are handled with the help of
23+
Apache Airflow. Airflow is an open source tool that helps us to schedule and
24+
monitor workflows.
25+
[CC_search]: https://ccsearch.creativecommons.org/about
26+
27+
## Airflow intro
28+
Apache Airflow is an open source tool that helps us to schedule tasks and
29+
monitor workflows . It provides an easy to use UI that makes managing tasks
30+
easy. In Airflow, the tasks we want to schedule are organised in DAGs
31+
(Directed Acyclic Graphs). DAGs consist of a collection of tasks, and a
32+
relationship defined among these tasks, so that they run in an organised
33+
manner. DAGs files are standard python files that are loaded from the defined
34+
`DAG_FOLDER` on a host. Airflow selects all the python files in the
35+
`DAG_FOLDER` that have a DAG instance defined globally, and executes them to
36+
create the DAG objects.
37+
38+
## CC Catalog Workflow
39+
In the CC catalog, Airflow is set up inside a docker container along with other
40+
services . The loader and provider workflows are inside the `dags` directory in
41+
the repo [dag folder][dags]. Provider workflows are set up to pull metadata
42+
about CC licensed images from the respective providers , the data pulled is
43+
structured into a standardised format and written into a TSV (Tab Separated
44+
Values) file locally. These TSV files are then loaded into S3 and then finally
45+
to PostgreSQL DB by the loader workflow.
46+
[dags]: https://github.com/creativecommons/cccatalog/tree/dacb48d24c6ae9b532ff108589b9326bde0d37a3/src/cc_catalog_airflow/dags
47+
48+
## Provider API workflow
49+
The provider workflows are usually scheduled in one of two time frequencies,
50+
daily or monthly.
51+
52+
Providers such as Flickr or Wikimedia Commons that are filtered using the date
53+
parameter are usually scheduled for daily jobs. These providers have a large
54+
volume of continuously changing data, and so daily updates are required to keep
55+
the data in sync.
56+
57+
Providers that are scheduled for monthly ingestion are ones with a relativley
58+
low volume of data, or for which filtering by date is not possible. This means
59+
we need to ingest the entire collection at once. Examples are museum providers
60+
like the [Science museum UK][science_museum] or [Statens Museum for
61+
Kunst][smk]. We don’t expect museum providers to change data on a daily basis.
62+
63+
[science_museum]: https://collection.sciencemuseumgroup.org.uk/
64+
[smk]: https://www.smk.dk/
65+
66+
The scheduling of the DAGs by the scheduler daemons depends on a few
67+
parameters.
68+
69+
- ```start_date``` - it denotes the starting date from which the
70+
task should begin running.
71+
- ```schedule_interval``` - it denotes the interval between subsequent runs, it
72+
can be specified with airflow keyword strings like “@daily”, “@weekly”,
73+
@monthly”, “@yearly” other than these we can also schedule the interval using
74+
cron expression.
75+
76+
77+
Example: Cleveland museum is currently scheduled for a monthly crawl with a
78+
starting date as ```2020-01-15```. [cleveland_museum_workflow][clm_workflow]
79+
80+
[clm_workflow]: https://github.com/creativecommons/cccatalog/blob/dacb48d24c6ae9b532ff108589b9326bde0d37a3/src/cc_catalog_airflow/dags/cleveland_museum_workflow.py
81+
82+
## Loader workflow
83+
The data from the provider scripts are not directly loaded into S3. Instead,
84+
they are stored in a TSV file on the local disk, and the tsv_postgres workflow
85+
handles loading of data to S3, and eventually PostgreSQL. The DAG starts by
86+
calling the task to stage the oldest tsv file from the output directory of the
87+
provider scripts to the staging directory. Next, two tasks run in parallel, one
88+
loads the tsv file in the staging directory to S3 , while the other creates the
89+
loading table in the PostgreSQL database. Once the data is loaded to S3 and the
90+
loading table has been created, the data from S3 is loaded to the intermediate
91+
loading table and then finally inserted into the image table. If loading from
92+
S3 fails the data is loaded to PostgreSQL from the locally stored tsv file.
93+
When the data has been successfully transferred to the image table, the
94+
intermediate loading table is dropped and the tsv files in the staging
95+
directory are deleted. If the copying the tsv files to S3 fails or then those
96+
files are moved to the failure directory for future inspection.
97+
98+
<div style="text-align:center;">
99+
<img src="loader_workflow.png" width="1000px"/>
100+
<p> Loader workflow </p>
101+
</div>
102+
103+
## Acknowledgement
104+
105+
I would like to thank Brent Moran for helping me write this blog post.
Loading

0 commit comments

Comments
 (0)