Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
.github		.github
analyze		analyze
dev		dev
deviantart		deviantart
education/datasets		education/datasets
flickr		flickr
google_custom_search		google_custom_search
internetarchive		internetarchive
metmuseum		metmuseum
vimeo		vimeo
visualization		visualization
wikicommons		wikicommons
wikipedia		wikipedia
youtube		youtube
.cc-metadata.yml		.cc-metadata.yml
.flake8		.flake8
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
history.md		history.md
pyproject.toml		pyproject.toml

Repository files navigation

quantifying

Quantifying the Commons

Overview

This project seeks to quantify the size and diversity of the commons--the collection of works that are openly licensed or in the public domain.

Code of Conduct

CODE_OF_CONDUCT.md:

The Creative Commons team is committed to fostering a welcoming community. This project and all other Creative Commons open source projects are governed by our Code of Conduct. Please report unacceptable behavior to conduct@creativecommons.org per our reporting guidelines.

Contributing

See CONTRIBUTING.md.

Development

Prerequisites

This repository uses pipenv to manage the required Python modules:

Linux: Installing Pipenv
macOS:
1. Install Homebrew
2. Install pipenv:
```
brew install pipenv
```

Tooling

Python Guidelines — Creative Commons Open Source
Black: the uncompromising Python code formatter
flake8: a python tool that glues together pep8, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.
isort: A Python utility / library to sort imports.

Data Sources

CC Legal Tools

legal-tool-paths.txt
- A .txt provided by Timid Robot containing all legal tool paths. The data from Google Custom Search will only cover 50+ general, most significant categories of CC License for data collection quota constraint. As an additional note, the order of precedence of license the collected data's first column is sorted due to intermediate data analysis progress.
  - add list of all current CC legal tool paths by TimidRobot · Pull Request #7 · creativecommons/quantifying

Flickr

The Flickr API exposes identifiers for users, photos, photosets and other uniquely identifiable objects.
The Flickr API consists of a set of callable methods, and some API endpoints.
For more detailed description, visit: API documentation - Flickr Services.
The hs.csv file is a sample CSV of pulled data. Ideally the script will generate final data CSVs.
Each license will have a CSV to save the data.
Due to memory limit, the license CSVs are not pushed into github.

Google Custom Search JSON API

The Custom Search JSON API allows user-defined detailed query and access towards related query data using a programmable search engine.
- Custom Search JSON API Reference | Programmable Search Engine | Google Developers
- Method: cse.list | Custom Search JSON API | Google Developers
google_countries.tsv
- Created by directly copy and pasting the cr parameter list from the following link into a .tsv file as there were no reliable algorithmic way for retrieving such data found in the process so far. The script itself will take care of the formatting and country-selection process.
  - Country Collection Values | JSON API reference | Programmable Search Engine | Google Developers
google_lang.txt
- Created by directly copy and pasting the lr parameter list from the following link into a .txt file as there were no reliable algorithmic way for retrieving such data found in the process so far. The script itself will take care of the data formatting and language-selection process.
  - Parameter: lr | Method: cse.list | Custom Search JSON API | Google Developers

Internet Archive Python Interface

A python interface to archive.org to achieve API requests towards internet archive.

internetarchive.Search - Internetarchive: A Python Interface to archive.org

The Metropolitan Museum of Art Collection API

An API endpoint for receiving Metropolitan Muesum of Art Collection's CC-Licensed works.

Latest Updates | The Metropolitan Museum of Art Collection API:

The Metropolitan Museum of Art provides select datasets of information on more than 470,000 artworks in its Collection for unrestricted commercial and noncommercial use. To the extent possible under law, The Metropolitan Museum of Art has waived all copyright and related or neighboring rights to this dataset using the Creative Commons Zero license.

Vimeo API

The Vimeo API allows users to perform filtered, advanced search on Vimeo videos.

Getting Started with the Vimeo API
- Search for videos - Vimeo API Reference: Videos

MediaWiki API

The MediaWiki Action API is a web service that allows access to some wiki features like authentication, page operations, and search. It can provide meta information about the wiki and the logged-in user.
- Example query: https://commons.wikimedia.org/w/api.php?action=query&cmtitle=Category:CC-BY&list=categorymembers
language-codes_csv.csv
- A list of language codes in ISO 639-1 Format to access statistics of each wikipedia main page across different languages. In the script, this file is named as language-codes_csv to minimize the amount of manual work required for running the script provided the same language encoding file. The user would have to rename the header and file name of their .csv ISO code list according to the concurrent file on Github if they would like to use some list other than the concurrent one.
- This file that this script uses can be downloaded from: https://datahub.io/core/language-codes

Youtube Data API

An API from YouTube for platform users to upload videos, adjust video parameters, and obtain search results.

Search: list | YouTube Data API | Google Developers

History

For information on past efforts, see history.md.

Copying & License

Code

LICENSE: the code within this repository is licensed under the Expat/MIT license.

Data

The data within this repository is dedicated to the public domain under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

Documentation

The documentation within the project is licensed under a Creative Commons Attribution 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quantifying

Overview

Code of Conduct

Contributing

Development

Prerequisites

Tooling

Data Sources

CC Legal Tools

Flickr

Google Custom Search JSON API

Internet Archive Python Interface

The Metropolitan Museum of Art Collection API

Vimeo API

MediaWiki API

Youtube Data API

History

Copying & License

Code

Data

Documentation

About

Sponsor this project

Contributors 14

Languages

License

creativecommons/quantifying

Folders and files

Latest commit

History

Repository files navigation

quantifying

Overview

Code of Conduct

Contributing

Development

Prerequisites

Tooling

Data Sources

CC Legal Tools

Flickr

Google Custom Search JSON API

Internet Archive Python Interface

The Metropolitan Museum of Art Collection API

Vimeo API

MediaWiki API

Youtube Data API

History

Copying & License

Code

Data

Documentation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Sponsor this project

Contributors 14

Languages