Skip to content

Commit 2a05b16

Browse files
authored
Merge branch 'master' into my-second-blog-post
2 parents 873c4d4 + 3e147c0 commit 2a05b16

File tree

24 files changed

+515
-61
lines changed

24 files changed

+515
-61
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
username: conye
2+
---
3+
name: Chidiebere Onyegbuchulem
4+
---
5+
md5_hashed_email: 9088efad6d512ef79556a3b6adcf048f
6+
---
7+
about:
8+
Chidiebere Onyegbuchulem is a Frontend developer based in Lagos, Nigeria.
9+
10+
Chidi is currently working on [CC Vocabulary](https://github.com/creativecommons/cc-vocabulary) as part of 2019-2020 [Outreachy Internship](/programs/outreachy/2019-12-start).
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
username: mathemancer
2+
---
3+
name: Brent Moran
4+
---
5+
md5_hashed_email: d06fcd3796829bac4f8fd1fb28fc47e4
6+
---
7+
about:
8+
[Brent](https://creativecommons.org/author/brent-moran/) is the Senior Data Engineer at Creative Commons. He's `mathemancer` on Freenode (IRC) and `@Brent Moran` on the CC Slack.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
name: airflow
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
name: testing
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
title: Apache Airflow testing with Pytest
2+
---
3+
categories:
4+
airflow
5+
cc-catalog
6+
cc-search
7+
open-source
8+
product
9+
testing
10+
---
11+
author: mathemancer
12+
---
13+
pub_date: 2020-01-23
14+
---
15+
body:
16+
17+
CC Catalog is a project that gathers information about images from around the
18+
internet, and stores the information so that these images can eventually be
19+
indexed in [CC Search][cc_search]. A portion of the process is directed by
20+
[Apache Airflow][airflow], which is a tool commonly used to organize workflows
21+
and data pipelines.
22+
23+
The nature of Airflow leads to some particular challenges when it comes to
24+
testing, and special care must be taken to make tests independent from the
25+
global state of the system where they are run. This blog post will describe a
26+
few of the challenges we faced when writing tests for Airflow jobs, and some
27+
tricks we used to solve those challenges.
28+
29+
[cc_search]: https://ccsearch.creativecommons.org/
30+
[airflow]: https://airflow.apache.org/
31+
32+
## Brief description of Apache Airflow
33+
34+
Apache Airflow is an open source piece of software that loads Directed Acyclic
35+
Graphs (DAGs) defined via python files. The DAG is what defines a given
36+
workflow. The nodes are pieces of jobs that need to be accomplished, and the
37+
directed edges of the graph define dependencies between the various pieces. By
38+
default, the Airflow daemon only looks for DAGs to load from a global location
39+
in the user's home folder: `~/airflow/dags/`. When a DAG is 'run', i.e., the
40+
tasks defined by the nodes of the DAG are each performed in the order defined by
41+
the directed edges of the DAG, the Airflow daemon stores information about the
42+
dag run in `~/airflow/`. The daemon also stores general information about what
43+
DAGs exist on the system, and all of their current statuses in that directory.
44+
For more details, please see [the documentation][airflow_docs_top]
45+
46+
[airflow_docs_top]: https://airflow.apache.org/docs/stable/
47+
48+
## Challenge: Localize Airflow to the project directory
49+
50+
Even when installed using `pip` within a [`virtualenv`][virtualenv] environment,
51+
all airflow commands will be run against the default locations in the user's
52+
home directory. In particular, if you want to test a DAG from your project
53+
directory, the method given in the [Airflow documentation][airflow_docs] is to
54+
copy the dag into the default location `~/airflow/dags/`, and use the
55+
command-line `airflow` tool to run the tasks defined by the nodes. The
56+
information about success and failure of the tests will be stored by the Airflow
57+
daemon in the `~/airflow/` directory. We'd rather keep all input and output
58+
from our tests to the project directory instead. This helps avoid any side
59+
effects which might arise by running tests for different projects, and also
60+
ensures that tests can't affect anything in the default directory, which may be
61+
used for production in many cases.
62+
63+
The solution is to choose a directory in your project, and set the environment
64+
variable `$AIRFLOW_HOME` whenever you run the tests, or use the `airflow`
65+
command on the project DAGs. I recommend you add the command
66+
```bash
67+
export AIRFLOW_HOME=/your/desired/full/path/
68+
```
69+
to a script (ours is called `env.sh`) that will be run in any shell dealing with
70+
the 'localized' Airflow instance, because forgetting to set the variable for
71+
even one `airflow` command will corrupt the DAG states stored in the global
72+
area. Note that setting this variable is necessary even when running in a
73+
`virtualenv` environment.
74+
75+
Now that you have `$AIRFLOW_HOME` set, you'll likely want to load some DAGs that
76+
you've written. This is made easier if you put the files defining them into a
77+
`dags` directory in the directory denoted by `$AIRFLOW_HOME`. I.e., it's wise
78+
to structure the project sub-directory dealing with Airflow and Airflow DAGs
79+
similarly to the default location, but in your project directory. At this
80+
point, you should have some `$AIRFLOW_HOME` directory as a subdirectory of your
81+
project directory, and then some `$AIRFLOW_HOME/dags` directory, where you keep
82+
any python files defining Airflow DAGs, and their dependencies. Another
83+
advantage of this structure is it's likely the directory structure you'll use in
84+
production, and replicating simplifies deployment.
85+
86+
Finally, Airflow will leave a number of files in the `$AIRFLOW_HOME` directory
87+
which you are not likely to want to track in source control (e.g., `git`).
88+
These files are:
89+
90+
* `$AIRFLOW_HOME/airflow.cfg`
91+
* `$AIRFLOW_HOME/airflow.db`
92+
* `$AIRFLOW_HOME/logs/`
93+
* `$AIRFLOW_HOME/unittests.cfg`
94+
95+
Add these files to `.gitignore` or the equivalent.
96+
97+
[virtualenv]: https://github.com/pypa/virtualenv
98+
[airflow_docs]: https://airflow.apache.org/docs/stable/tutorial.html#testing
99+
100+
## Smoketesting: Can the Airflow daemon load the DAGs?
101+
102+
Note that we're using `pytest` for our unit testing, and so most examples assume
103+
this.
104+
105+
The most basic test you'll want is to determine whether your DAGs can load
106+
without errors. To do this, you can use the following function:
107+
108+
```python
109+
from airflow.models import DagBag
110+
111+
def test_dags_load_with_no_errors():
112+
dag_bag = DagBag(include_examples=False)
113+
dag_bag.process_file('common_api_workflows.py')
114+
assert len(dag_bag.import_errors) == 0
115+
```
116+
117+
We initialize a `DagBag` (this loads DAG files). With the `process_file` method,
118+
we instruct the Airflow daemon to attempt to load any DAGs defined in the
119+
`common_api_workflows.py` file. We then check to make sure loading the DAGs
120+
didn't produce any errors.
121+
122+
## Hint: Use functions to create DAGs.
123+
124+
This will increase testability. You can test the function, bypassing the need to
125+
load the DAG into the `DagBag` (except when you're actually testing that it
126+
*can* be loaded). This may seem obvious, but none of the Airflow documentation
127+
uses this pattern. Here is an example of a function that creates a simple dag,
128+
and a test of the function:
129+
130+
```python
131+
from airflow import DAG
132+
from airflow.operators.bash_operator import BashOperator
133+
134+
def create_dag(
135+
source,
136+
script_location,
137+
dag_id,
138+
crontab_str=None,
139+
default_args=DAG_DEFAULT_ARGS):
140+
141+
dag = DAG(
142+
dag_id=dag_id,
143+
default_args=default_args,
144+
schedule_interval=crontab_str,
145+
catchup=False
146+
)
147+
148+
with dag:
149+
start_task = BashOperator(
150+
task_id='{}_{}'.format(source, status),
151+
bash_command='echo Starting {} workflow'.format(status),
152+
dag=dag
153+
)
154+
155+
run_task = BashOperator(
156+
task_id='get_{}_images'.format(source),
157+
bash_command='python {} --mode default'.format(script_location),
158+
dag=dag
159+
)
160+
161+
start_task >> run_task
162+
163+
return dag
164+
165+
def test_create_dag_creates_correct_dependencies():
166+
dag = create_dag(
167+
'test_source',
168+
'test_script_location',
169+
'test_dag_id'
170+
)
171+
start_id = 'test_source_starting'
172+
run_id = 'get_test_source_images'
173+
start_task = dag.get_task(start_id)
174+
assert start_task.upstream_task_ids == set()
175+
assert start_task.downstream_task_ids == set([run_id])
176+
run_task = dag.get_task(run_id)
177+
assert run_task.upstream_task_ids == set([start_id])
178+
assert run_task.downstream_task_ids == set([])
179+
```
180+
181+
Here, we assume that `DAG_DEFAULT_ARGS` is defined earlier in the file. See the
182+
Airflow documentation for details about default DAG arguments. Now, this
183+
function is testable (great!) but it doesn't acutally make the DAG it creates
184+
known to the Airflow daemon. To do that, we have to create the created dag into
185+
the global scope of the module defined by the file, which can be done with the
186+
following snippet:
187+
```python
188+
globals()[dag_id] = create_dag(
189+
source,
190+
script_location,
191+
dag_id
192+
)
193+
```
194+
Here, it's assumed that `source`, `script_location`, and `dag_id` are defined
195+
earlier in the python file.
196+
197+
We hope that these hints are helpful to the reader. For more, and for the
198+
context around the snippets shown here, please take a look at
199+
[the repo][frozen_cccatalog].
200+
201+
[frozen_cccatalog]: https://github.com/creativecommons/cccatalog/tree/c4b80600eb5695cc294e1791ba90bdc3a408b7b9/src/cc_catalog_airflow
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
title: CC Platform Toolkit Revamp - 3
2+
---
3+
categories:
4+
community
5+
platform-toolkit
6+
outreachy
7+
---
8+
author: apdsrocha
9+
---
10+
series: outreachy-dec-2019-platform-toolkit
11+
---
12+
pub_date: 2020-01-22
13+
---
14+
body:
15+
Last time I checked-in, I was working on revisiting the current [Platform Toolkit](https://creativecommons.org/platform/toolkit/) and making a first draft suggesting changes in both content and structure.
16+
17+
It was a lot of work, but I'm finally happy with the wire-frame that came out after all the research and experimentation. The original material went through quite a few modifications, with text rewrites and changes in the order and organization of the content. But now comes the important part: making sure that these changes make sense to the users. My vision is already a little skewed, since I've been immersed in this project for the past 7 weeks. From now on, the process of validating this material needs fresh eyes. That way, improvements can be made based on user feedback, and reflect the best possible version when it the time comes to implement.
18+
19+
For the next two weeks my schedule is focusing on two different activities: I'll be going over a round of user interviews where I intend to show my wireframe and present a few tasks. The idea is to see how both content and usability perform in this new format. In parallel, I've also began taking these wire-framed components and sketching them out in a more refined UI format by experimenting with color, type, and so on. I really wanted to get an early start on this task—even if it's subject to change as the research gives me further insights—because I feel making the visual part come together will be the hardest part for me.
20+
21+
Thankfully, I have very supportive mentors and help all-around from the CC staff and community. I'm really happy with how this project is coming together and I hope in two weeks I can come back here and report on great progress!
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
title: CC Vocabulary - My First Four Weeks
2+
---
3+
categories:
4+
5+
cc-vocabulary
6+
product
7+
outreachy
8+
---
9+
author: conye
10+
---
11+
series: outreachy-dec-2019-vocabulary
12+
---
13+
pub_date: 2020-01-14
14+
---
15+
body:
16+
17+
[Vocabulary](https://cc-vocabulary.netlify.com/) is Creative Commons's web design system; an extension of
18+
[CC Style Guide](https://creativecommons.org/wp-content/uploads/2019/10/Creative-Commons-Style-Guide-2019.pdf)
19+
for the web. This project was originally started by my mentors [Dhruv Bhanushali](https://opensource.creativecommons.org/blog/authors/dhruvkb)
20+
and [Hugo Solar](https://opensource.creativecommons.org/blog/authors/hugosolar) to unify all of CC websites and applications.
21+
Vocabulary has been undergoing a lot of changes lately. As part of my Outreachy internship,
22+
I will be contributing to extending its scope and usage.
23+
24+
##My Progress so far...
25+
Before my first contribution, Vocabulary comprised of reusable UI components built with Vue.js and a live styleguide built with Styleguidist.
26+
My first task was to create an interactive playground experience with Storybook which would eventually replace that built with Styleguidist.
27+
Storybook was chosen for obvious reasons:
28+
- It provides a workbench environment for your components in isolation where you can play around with, customize and test as you develop.
29+
- It provides Storybook Docs to generate design system documentation, customize, and share best practices with your team.
30+
Styleguidist majorly creates a UI documentation site that can be done better with Storybook Docs.
31+
32+
My first three weeks were more of struggling and learning. Coming from a React background, I had to reconfigure my brain to understand Vue
33+
and Vue storybook. I did a lot of reading of documentation and articles, testing locally and asking for help when stuck. I eventually completed
34+
the task and created a pull request. A live version of the interactive playground can be seen
35+
[here](https://cc-vue-vocabulary.netlify.com/storybook/?path=/story/vocabulary-welcome--welcome).
36+
Please feel free to play around with it and give a feedback.
37+
38+
The following week, I worked on updating some Vue Vocabulary components with the CC Vocabulary design library.
39+
40+
##What's Next...
41+
This week, I will be working on extending the usage of vocabulary to other CC websites that are not built with Vue.
42+
What I worked on previously was [Vue-Vocabulary](https://cc-vue-vocabulary.netlify.com/) for CC websites and
43+
platforms that support Vue. The goal is to use the already developed stylesheets in Vocabulary to build the
44+
same functional components with vanilla JavaScript.
45+
46+
Well, all said, the past weeks have been great. I will be sharing more in two weeks.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
title: Improving CC License Chooser: Coding
2+
---
3+
author: obulat
4+
---
5+
categories:
6+
7+
outreachy
8+
cc-chooser
9+
---
10+
series: outreachy-dec-2019-chooser
11+
---
12+
pub_date: 2020-01-24
13+
---
14+
body:
15+
During the last several weeks I have been busy with coding the redesigned version of the License Chooser.
16+
17+
When I just started working on coding the License Chooser, it wouldn't compile due to some dependency problems. This became a great opportunity to update the project from a Webpack based Vue.js template to a Vue-cli project, which makes managing dependencies much simpler. I also updated the project to a newer version of Creative Commons Vue Vocabulary for styling.
18+
19+
For the visual styles, we use Buefy component library (based on Bulma and Vue.js), namely, stepper and tabs components. It has been an interesting journey customizing them to our specific use cases.
20+
21+
While coding the site, I also tried to extract all the text to a separate file so that it would be easier to integrate it into translation workflow later during my internship.
22+
23+
After several weeks of work, 2 large PRs merged, we are finally ready to conduct usability tests to better understand how users interact with the License Chooser, and what changes we still need to implement to make it easy both for beginners and for advanced users of Creative Commons Licenses to choose the best license for their needs, and help to use the chosen license.
24+
25+
A new [beta version of the License Chooser](https://chooser-beta.creativecommons.org/) is deployed!
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
name: Outreachy Dec 2019 round: CC Vocabulary

content/contents.lr

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ So if you're looking to integrate CC licenses or CC licensed works into your app
4242
<div class="card-body" align="center">
4343
<p class="card-text text-center">Use CC-licensed works in your application or (coming soon!) use our WordPress plugin to license your content.</p>
4444
<a href="https://api.creativecommons.engineering/" class="btn btn-sm btn-outline-primary">Catalog API</a>
45-
<a href="/projects" class="btn btn-sm btn-outline-primary">All our projects</a>
45+
<a href="/contributing-code/projects" class="btn btn-sm btn-outline-primary">All our projects</a>
4646
</div>
4747
</div>
4848

content/contributing-code/contents.lr

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ _model: page
22
---
33
_template: page-with-toc.html
44
---
5-
title: Contributing Code
5+
title: Contribution Guidelines
66
---
77
body:
88

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
_model: page
2+
---
3+
_template: project_list.html
4+
---
5+
title: Open Source Projects

content/gsoc-2019/application-instructions/contents.lr

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ body:
88

99
If you are a student interested in submitting a proposal to CC, start by checking out our [Project Ideas](/gsoc-2019/project-ideas/all) page to find an idea that you would like to write a proposal to work on during GSoC.
1010

11-
[Join the `#cc-gsoc` channel on the CC Slack or the CC Developers mailing list](https://creativecommons.github.io/community/) as early as possible to introduce yourself and get feedback on your ideas. All our mentors will be on Slack and respond to emails on the mailing list and it is better to post there rather than contact them individually. Feel free to ask questions!
11+
[Join the `#cc-gsoc` channel on the CC Slack or the CC Developers mailing list](/community/) as early as possible to introduce yourself and get feedback on your ideas. All our mentors will be on Slack and respond to emails on the mailing list and it is better to post there rather than contact them individually. Feel free to ask questions!
1212

1313
Take a look at the Creative Commons website to learn more about what we do. Also look at our GitHub organization and our developer community website to get a sense of the code and projects we work on. Making a successful contribution to one of our projects will help us get a sense of your work and is highly recommended.
1414

0 commit comments

Comments
 (0)