Discussion: Project to Quantify the Commons

## Background:
Creative Commons has submitted a project to UMSI and they have determined that this project is a potential fit for the course [SI 485: Information Analysis Capstone and Final Project](https://www.si.umich.edu/host-student-project/data-analysis/si-485-information-analytics-project). In this course, advanced undergraduate students deliver data-oriented solutions through the development and analysis of data sets, building tools to extract useful information for clients through manipulation, analysis and visualization. This ticket is intended for discussion of the project, with the goal of refining the potential questions we'd like answered and getting input from those who have considered this challenge in the past.

## Project General Information

### Project Idea:
> Creative Commons (CC) seeks to quantify the use of CC legal tools (works in the commons). CC legal tools include the licenses (e.g. CC BY, CC BY-NC-SA) and public declarations e.g. CC0, PDM). This project would include data collection, analysis, and visualization.
>
> Potential questions to be answered:
> - How many works are in the commons?
> - What can we determine from the rate of change?
> - How can those works be characterized (e.g. by legal tool, region, language)?
> - How can the data be managed to allow future trend analysis (e.g. which languages saw the largest growth in legal tool adoption)?
> - How can the use of CC legal tools be meaningfully visualized?
>
> Developing reproducible methodologies for gathering information about the use of CC legal tools will help CC communicate its impact, support policy work (at all levels of government and within institutions), and support the wider community.

###  Full Description
> Creative Commons (CC) seeks to quantify the use of CC legal tools (works in the commons). CC legal tools include the licenses (e.g. CC BY, CC BY-NC-SA) and public declarations (e.g. CC0, PDM). This project would include data collection, analysis, and visualization.
>
> First, this project should create reproducible processes or methodologies for creating a dataset of information about works that are CC licensed or dedicated to the public domain. The dataset may be built from platform APIs (e.g. Flickr), Common Crawl data, etc. The project should create a starting place not only for the project itself, but future efforts to extend the dataset and the meaning derived from it.
>
> Second, the project should begin to create meaning from the dataset. How many works are currently in the commons? How has that changed/trended? How can those works be characterized (e.g. by legal tool, region, language)? How can the data be managed to allow future trend analysis (e.g. which languages saw the largest growth in legal tool adoption)?
>
> Third, optionally, how can the data be visualized to communicate meaning and allow exploration?


### Project Outcome
- What deliverable(s) would students produce and share with your organization as a result of this project?
- How do you plan to use the feedback, recommendations, or product you receive from the student team?
> Students should create reproducible processes or methodologies for creating a dataset, the resulting dataset, and analysis. Optionally students may create visualizations of the dataset.
> 
> The processes, dataset, and analysis will help Creative Commons communicate its impact, support policy work (at all levels of government and within institutions), and support the wider community.

###  What do students need for this project to be successful?
Examples: skills needed, social impact orientation, interest or experience in a specific field/domain/industry. 
> Curiosity, motivation, proficiency in a programming language that can be used to query APIs and manipulate data (e.g. JavaScript, Pearl, Python, Ruby; Python is preferred), and a recognition of the value of open knowledge.

## Data Proposal Information

### Data Set
> We expect students to create a new data set for us

###  Size of Data Set
How big is the data set? Approximately how many rows and columns does it have?
> Between 200 million and 2 billion rows with 10 columns. The last effort to quantify the commons in 2017 estimated 1.4 billion works. I expect metadata can be discovered on at least 200 million works. Columns could include: URL, author, date, legal_tool, language, reference_count, etc.

### Findings from Data Set
What do you want to learn from your data set? Please share 3-5 specific questions that the data can help solve:
> How many works are in the commons?
> What can we determine from the rate of change?
> How can those works be characterized (e.g. by legal tool, region, language)?
> How can the data be managed to allow future trend analysis (e.g. which languages saw the largest growth in legal tool adoption)?

### Data Availability, Type, Format
> No dataset currently exists and CC has not made a recommendation on format. Input is welcome on this subject.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Discussion: Project to Quantify the Commons #17

Background:

Project General Information

Project Idea:

Full Description

Project Outcome

What do students need for this project to be successful?

Data Proposal Information

Data Set

Size of Data Set

Findings from Data Set

Data Availability, Type, Format

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Discussion: Project to Quantify the Commons #17

Description

Background:

Project General Information

Project Idea:

Full Description

Project Outcome

What do students need for this project to be successful?

Data Proposal Information

Data Set

Size of Data Set

Findings from Data Set

Data Availability, Type, Format

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions