Skip to content

Commit d7ac2aa

Browse files
committed
Merge master to add_vocabulary branch
2 parents 47ed199 + 46b8793 commit d7ac2aa

File tree

20 files changed

+665
-303
lines changed

20 files changed

+665
-303
lines changed

content/archives/old-tech-blog/entries/the-easiest-way-yet-to-integrate-cc-licensing/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ body:
99

1010
I've been working for the past week or so on a JavaScript licensing widget that has been [suggested on our wiki](http://wiki.creativecommons.org/JsWidget). It's a new way to integrate CC licensing into your web application. It's really as easy as pie: Just add the following tag somewhere in the body:
1111

12-
> <script src="http://api.creativecommons.org/jswidget/tags/0.1/complete.js" />
12+
> <script src="https://api.creativecommons.org/jswidget/tags/0.1/complete.js" />
1313
1414
and a CC licensing widget will appear. Your web application can then use
1515
regular DOM queries to determine the user's choice.
+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
username: kss682
2+
---
3+
name: K S Srinidhi Krishna
4+
---
5+
md5_hashed_email: 7E71C293A442A2CF434BFC244BD5F184
6+
---
7+
about:
8+
Srinidhi Krishna is a computer science undergraduate student from India and will be interning with Creative Commons during the summer.
9+
He is working on [cccatalog](https://github.com/creativecommons/cccatalog) as a part of GSoC20.
10+
He is `@K S Srinidhi Krishna` on slack.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
name: Design

content/blog/entries/2019-09-11-google-docs-plugin/contents.lr

+1-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ I hope in the future to use an API call to support different languages, and perh
5151
[A video tutorial is available here](https://youtu.be/sQZFlNXEVZ4) or by clicking on the image below.
5252

5353
<a href="http://www.youtube.com/watch?feature=player_embedded&v=sQZFlNXEVZ4
54-
" target="_blank"><img src="http://img.youtube.com/vi/sQZFlNXEVZ4/0.jpg"
54+
" target="_blank"><img src="https://img.youtube.com/vi/sQZFlNXEVZ4/0.jpg"
5555
alt="Video tutorial" border="10" /></a>
5656

5757
---

content/blog/entries/cc-vocabulary-the-main-course/contents.lr

+27-99
Original file line numberDiff line numberDiff line change
@@ -50,105 +50,33 @@ brainchild comes of age.
5050

5151
**Presenting to you our special menu of Vue components.**
5252

53-
<table>
54-
<tbody>
55-
<tr>
56-
<th colspan="2"><h3>Tokens</h3></th>
57-
</tr>
58-
<tr>
59-
<td>Colors</td>
60-
<td><img src="colors.png"></td>
61-
</tr>
62-
<tr>
63-
<td>Fonts</td>
64-
<td><img src="fonts.png"></td>
65-
</tr>
66-
<tr>
67-
<td>Spaces</td>
68-
<td><img src="spaces.png"></td>
69-
</tr>
70-
<tr>
71-
<th colspan="2"><h3>Elements</h3></th>
72-
</tr>
73-
<tr>
74-
<td>Button</td>
75-
<td><img src="button.png"></td>
76-
</tr>
77-
<tr>
78-
<td>InputField</td>
79-
<td><img src="inputfield.png"></td>
80-
</tr>
81-
<tr>
82-
<td>SelectField</td>
83-
<td><img src="selectfield.png"></td>
84-
</tr>
85-
<tr>
86-
<td>Heading</td>
87-
<td><img src="heading.png"></td>
88-
</tr>
89-
<tr>
90-
<td>Paragraph</td>
91-
<td><img src="paragraph.png"></td>
92-
</tr>
93-
<tr>
94-
<td>LicenseBadge</td>
95-
<td><img src="licensebadge.png"></td>
96-
</tr>
97-
<tr>
98-
<td>LicenseIconography</td>
99-
<td><img src="licenseiconography.png"></td>
100-
</tr>
101-
<tr>
102-
<td>ProgressBar</td>
103-
<td><img src="progressbar.png"></td>
104-
</tr>
105-
<tr>
106-
<td>Shield</td>
107-
<td><img src="shield.png"></td>
108-
</tr>
109-
<tr>
110-
<th colspan="2"><h3>Layouts</h3></th>
111-
</tr>
112-
<tr>
113-
<td>Container</td>
114-
<td><img src="container.png"></td>
115-
</tr>
116-
<tr>
117-
<td>Grid</td>
118-
<td><img src="grid.png"></td>
119-
</tr>
120-
<tr>
121-
<td>Table</td>
122-
<td><img src="table.png"></td>
123-
</tr>
124-
<tr>
125-
<th colspan="2"><h3>Patterns</h3></th>
126-
</tr>
127-
<tr>
128-
<td>Header</td>
129-
<td><img src="header.png"></td>
130-
</tr>
131-
<tr>
132-
<td>Footer</td>
133-
<td><img src="footer.png"></td>
134-
</tr>
135-
<tr>
136-
<td>Locale</td>
137-
<td><img src="locale.png"></td>
138-
</tr>
139-
<tr>
140-
<td>Hello</td>
141-
<td><img src="hello.png"></td>
142-
</tr>
143-
<tr>
144-
<th colspan="2"><h3>Templates</h3></th>
145-
</tr>
146-
<tr>
147-
<td>Index</td>
148-
<td><img src="orange.png" onclick="this.src = 'green.png'"></td>
149-
</tr>
150-
</tbody>
151-
</table>
53+
| **Component** | **Image** |
54+
| ------------------ | :------------------------------------------------------ |
55+
| **Tokens** | |
56+
| Colors | ![colors_image](colors.png) |
57+
| Fonts | ![fonts_image](fonts.png) |
58+
| Spaces | ![spaces_image](spaces.png) |
59+
| **Elements** | |
60+
| Button | ![button_image](button.png) |
61+
| InputField | ![inputfield_image](inputfield.png) |
62+
| SelectField | ![selectfield_image](selectfield.png) |
63+
| Heading | ![heading_image](heading.png) |
64+
| Paragraph | ![paragraph_image](paragraph.png) |
65+
| LicenseBadge | ![licensebadge_image](licensebadge.png) |
66+
| LicenseIconography | ![licenseiconography_image](licenseiconography.png) |
67+
| ProgressBar | ![progressbar_image](progressbar.png) |
68+
| Shield | ![shield_image](shield.png) |
69+
| **Layouts** | |
70+
| Container | ![container_image](container.png) |
71+
| Grid | ![grid_image](grid.png) |
72+
| Table | ![table_image](table.png) |
73+
| **Patterns** | |
74+
| Header | ![header_image](header.png) |
75+
| Footer | ![footer_image](footer.png) |
76+
| Locale | ![locale_image](locale.png) |
77+
| Hello | ![hello_image](hello.png) |
78+
| **Templates** | |
79+
| Index | <img src="orange.png" onclick="this.src = 'green.png'"> |
15280

15381
&nbsp;
15482

Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
title: CC Legal Database: Design
2+
---
3+
categories:
4+
cc-legal-database
5+
product
6+
outreachy
7+
---
8+
author: krysal
9+
---
10+
series: outreachy-may-2020-legal-database
11+
---
12+
pub_date: 2020-06-09
13+
---
14+
body:
15+
Finishing the third week since the project started (for context see this [first post](/blog/entries/legal-database-a-new-beginning/)), so the design phase is almost over and a new site look is out of the oven. The focus on these weeks was to draw the mockups for the user-facing parts of the site, integrating styles of CC Vocabulary and to get the data model for the database.
16+
17+
### Visual Styles
18+
The intention was to keep the content that is already present but improve its distribution and access by users. For this, the main menu was changed to provide direct links to listing of Cases and Scholarships. The old "Countries" page was removed and replaced by a more granular division by legal resource, so this data will be shown separately.
19+
20+
The final look for the home site is as follows.
21+
22+
<div style="text-align: center;">
23+
<figure>
24+
<img src="cc-caselaw-home.png" alt="New CC Caselaw Home Mockup" style="border: 1px solid black">
25+
<figcaption>New Home page design with Vocabulary.</figcaption>
26+
</figure>
27+
</div>
28+
29+
I made use of as many Vocabulary components as possible, like header, footer and table. This way is easier to keep consistency between CC products and to develop the frontend part of the site because those components are already built and tested, though some will require certain modifications (e.g. card link with a search input), and some others have to be created from scratch, like a pagination component that is actually now required for two sites.
30+
31+
### Data Model
32+
33+
The second main task I worked on was coding the models on Django, which is in charge of creating the database schema through migrations. For this, I had to review the sources of information (CSV files, sheets, forms) and how they are used. The key point here is to keep constant communication with staff who are more involved in the *business case*.
34+
35+
Several iterations were required for each task as well as some researching, and while the engineering and design work never seems to end, this makes good foundations to continue and advance.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
title: Resource Gathering
2+
---
3+
categories:
4+
outreachy
5+
design
6+
open-source
7+
---
8+
author: dhruvi16
9+
---
10+
series: ux-design-cycle
11+
---
12+
pub_date: 2020-06-09
13+
---
14+
body:
15+
16+
17+
As an outreachy intern, I am handling the integration of [Creative Commons](https://creativecommons.org/) design library — [Vocabulary](https://cc-vocabulary.netlify.app/) with one of our web products — [CC OS](https://opensource.creativecommons.org/). I have been working the design library for 3–4 months now and I have enjoyed the experience the library caters and I am trying to achieve the same experience in the Open source website. To understand UX in-depth, I have been reading different resources and document this knowledge through this series of blogs. This in-depth information will help me achieve the desired experience through the library.
18+
19+
Using a Coursera course, [Introduction to user experience](https://www.coursera.org/learn/user-experience-design/), I will be describing the UX design cycle with a series of articles and this article is about describing the first step of the design cycle which is Resource Gathering.
20+
21+
**Basic definitions**
22+
23+
User experience design includes designing interfaces through which a user accomplishes a task. Designing better interfaces which can help the user to perform tasks easily.
24+
25+
The interface consists of an input and output through which the user interacts with the system. For instance, clicking a photo requires the user to press the button (input) and an image is the desired output. Creating an affordable and usable interface is the main goal of this process. Design is a data-driven process and resource gathering is all about gathering this data.
26+
27+
The resource gathering process is about figuring out how the task is currently accomplished by the user. There are 4 ways to gather data and below I will describe them all in detail. There are two types of data — Quantitative (numeric) and Qualitative (thematic) and designers prefer to use both types of data as per requirement.
28+
29+
1. **Naturalistic observation** - This includes observing the user accomplishing the task in the field. This involves the least interaction with the user and the designer watches the user performing the task from distance. The designer notes down qualitative and quantitative information about this activity. This removes the effect of social desirability of the user on the information collected but also the designer’s perception can be reflected in the collected data.
30+
31+
2. **Surveys** - A survey can be interchangeably used with a questionnaire. In a survey, the user answers a set of questions about how he/she performs the tasks currently. The questions can be closed-ended which can provide quantitative data and also open-ended which gives us the qualitative data. This involves some amount of interaction with the user. Surveys can be held in the field or lab.
32+
33+
3. **Focus Groups** - Focus groups are about engaging with a group of 6–10 people and talk about how they perform a task currently. This involves a lot of interaction with the users. This can be performed in a safe environment (lab) where users can open up without hesitation. The design team includes a moderator who can ask relevant questions, a note-taker who can note down the on-going conversation and a media person (optional) who can record video or take photos of the session.
34+
35+
4. **Interview** - The interview involves asking questions to the user one-to-one about how they perform the task currently. This involves the highest amount of interaction with the user. Interviews are held in labs. The designer talks to the user about the task and collects both quantitative and qualitative data. This is the most time-taking way of collecting data but it gives the most useful data among all the methods.
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
title: Science Museum provider implementation
2+
---
3+
categories:
4+
5+
cc-catalog
6+
gsoc
7+
gsoc-2020
8+
---
9+
author: srinidhi
10+
---
11+
series: gsoc-2020-cccatalog
12+
---
13+
pub_date: 2020-06-10
14+
---
15+
body:
16+
## Introduction
17+
CC catalog project is responsible for collecting CC licensed images available in the web, CC licensed images are hosted by different
18+
sources, these sources that provide the images and its metadata are called providers. Currently, images are collected from providers using two methods
19+
Common Crawl and API based crawl. Common Crawl data is an open repository of web crawled data and we use that data to get the necessary image metadata
20+
for that provider [more information](https://commoncrawl.org/the-data/get-started/). API crawl is implemented using the API endpoint maintained
21+
by the providers. The main problem with Common Crawl is that we don't have control over the data they crawl, and this sometimes results poor
22+
data quality whereas with API based crawl we have access to the information available. API based crawl is better when we need to update image
23+
metadata and reqular intervals.
24+
25+
As a part of the internship, I will be working on moving providers from Common Crawl to API based crawl as well as integrate new providers
26+
to the API crawl. I will be starting with the Science Museum provider.
27+
28+
## Science Museum
29+
Science museum is a provider with around 80,000 CC licensed images, currently Science museum data is ingested from Common Crawl.
30+
Science museum is one such provider where our data is of poor quality and there is need to improve it. This is done by moving
31+
Science museum to an API based crawl.
32+
33+
## API research
34+
We want to index metadata using their open API [endpoint](https://collection.sciencemuseumgroup.org.uk/search/has_image/image_license).
35+
However, before the implementation we have to ensure that the API provides necessary content and there is a systematic way to get it.
36+
The first step is to take an object from their collection and check certain criterias.
37+
38+
[sample object](https://collection.sciencemuseumgroup.org.uk/api/objects/co8005638)
39+
40+
The criteria are:
41+
- parameters available for the API
42+
- Object landing url (frontend link of the object the image is associated with)
43+
- Image url (the url link of the image)
44+
- CC license associated with the image
45+
- creator, title and other metadata info
46+
47+
Once the above checks have been made, we need to find a way to get all the objects, this could be by paging through the records
48+
or partition using the parameters, etc. Since their API parameter has ```page[number]``` paging would be an appropriate choice with max size
49+
as 100 it would require around 800 pages to get all the objects but then since they don't allow paging a large number of results, and
50+
the max number of pages for Science Museum is 50 pages.This would mean we would get only 5000 objects and around 17000 images.
51+
52+
[API page-50](https://collection.sciencemuseumgroup.org.uk/search/image_license?page[size]=100&page[number]=50)
53+
54+
[API page-51](https://collection.sciencemuseumgroup.org.uk/search/image_license?page[size]=100&page[number]=51)
55+
56+
So we need to find a way to divide the collection into subsets such that each subset has less than or equal to 5000 objects.
57+
Luckily, the API had another set of parameters ```date[from]``` and ```date[to]``` which represents the time period of the object.
58+
Querying the API through different time period at the same time ensuring that records in that time period don't exceed 5000 solves the problem, starting
59+
from year 0 to year 2020 by trial and error method suitable year range was chosen.
60+
61+
```
62+
YEAR_RANGE = [
63+
(0, 1500),
64+
(1500, 1750),
65+
(1750, 1825),
66+
(1825, 1850),
67+
(1850, 1875),
68+
(1875, 1900),
69+
(1900, 1915),
70+
(1915, 1940),
71+
(1940, 1965),
72+
(1965, 1990),
73+
(1990, 2020)
74+
]
75+
```
76+
77+
78+
With this we have a method to ingest the desired records, but before writing the script we need to know the different licenses
79+
provided by the API. We need to figure out a consistent way to identify which license and version are attached to each object.
80+
To do this, we ran a test script to get counts of objects under different licenses.
81+
82+
The results are:
83+
84+
```
85+
+-----------------+----------+
86+
| license_version | count(1) |
87+
+-----------------+----------+
88+
| CC-BY-NC-ND 2.0 | 210 |
89+
| CC-BY-NC-ND 4.0 | 2376 |
90+
| CC-BY-NC-SA 2.0 | 1 |
91+
| CC-BY-NC-SA 4.0 | 61694 |
92+
+-----------------+----------+
93+
```
94+
95+
Since the licenses and their versions are confirmed, we can start the implementation.
96+
97+
## Implementation
98+
The implementation is quite simple in nature: we loop the through the ```YEAR_RANGE``` and get all the records for that period and
99+
pass it on to an object data handler method that extracts the necessary details from the record and store it in the ```ImageStore```
100+
instance. ImageStore is a class that stores image information from the provider, it stores the information in a buffer and inserts to tsv
101+
when the buffer reached threshold limit. Due to overlapping date ranges, the metadata for some objects is collected multiple times.
102+
So, we keep track of the record/object's id in a global variable ```RECORD_IDS = []```.
103+
104+
Within the object data handler method before collecting details we check if the ```id``` already exists in ```RECORD_IDS```.
105+
If it exists we move on to the next record.
106+
107+
```
108+
for obj_ in batch_data:
109+
id_ = obj_.get("id")
110+
if id_ in RECORD_IDS:
111+
continue
112+
RECORD_IDS.append(id_)
113+
```
114+
115+
```id_``` is the object id and we cannnot use this value as foreign identifier, the reason behind it is that an object could
116+
have multiple images with it and using object id we cannot determine the image uniquely, so we must use image id that is unique
117+
for each image. Currently image id is taken from ```multimedia```, multimedia is a field in the json response that lists multiple
118+
images and their metadata, for each image data in multimedia, foreign id is in ```admin.uid```.
119+
120+
The implementation can be found [here](https://github.com/creativecommons/cccatalog/blob/master/src/cc_catalog_airflow/dags/provider_api_scripts/science_museum.py).
121+
122+
### Results:
123+
Running the scripts we get:
124+
- Number of records recieved : ```35584```
125+
- Number of images collected : ``` 62497```
126+
127+
The problem with current implementation is that records with no date would be missed.
128+
129+
Science Museum provider is the first provider I worked on as a part of the internship and thank my mentor Brent Moran for the help.
130+
131+
### Additional Details :
132+
- [research work](https://github.com/creativecommons/cccatalog/issues/302)
133+
- [implementation](https://github.com/creativecommons/cccatalog/pull/400)
134+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
name: GSoC 2020: CC catalog
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
name: UX Design Cycle

0 commit comments

Comments
 (0)