Quantifying the Commons
This project seeks to quantify the size and diversity of the commons--the collection of works that are openly licensed or in the public domain.
The Creative Commons team is committed to fostering a welcoming community. This project and all other Creative Commons open source projects are governed by our Code of Conduct. Please report unacceptable behavior to conduct@creativecommons.org per our reporting guidelines.
See CONTRIBUTING.md
.
This repository uses pipenv to manage the required Python modules:
- Linux: Installing Pipenv
- macOS:
- Install Homebrew
- Install pipenv:
brew install pipenv
- Python Guidelines — Creative Commons Open Source
- Black: the uncompromising Python code formatter
- flake8: a python tool that glues together pep8, pyflakes, mccabe, and third-party plugins to check the style and quality of some python code.
- isort: A Python utility / library to sort imports.
legal-tool-paths.txt
- A
.txt
provided by Timid Robot containing all legal tool paths. The data from Google Custom Search will only cover 50+ general, most significant categories of CC License for data collection quota constraint. As an additional note, the order of precedence of license the collected data's first column is sorted due to intermediate data analysis progress.
- A
- The Flickr API exposes identifiers for users, photos, photosets and other uniquely identifiable objects.
- The Flickr API consists of a set of callable methods, and some API endpoints.
- For more detailed description, visit: API documentation - Flickr Services.
- The
hs.csv
file is a sample CSV of pulled data. Ideally the script will generate final data CSVs. - Each license will have a CSV to save the data.
- Due to memory limit, the license CSVs are not pushed into github.
- The Custom Search JSON API allows user-defined detailed query and access towards related query data using a programmable search engine.
google_countries.tsv
- Created by directly copy and pasting the
cr
parameter list from the following link into a.tsv
file as there were no reliable algorithmic way for retrieving such data found in the process so far. The script itself will take care of the formatting and country-selection process.
- Created by directly copy and pasting the
google_lang.txt
- Created by directly copy and pasting the
lr
parameter list from the following link into a.txt
file as there were no reliable algorithmic way for retrieving such data found in the process so far. The script itself will take care of the data formatting and language-selection process.
- Created by directly copy and pasting the
A python interface to archive.org to achieve API requests towards internet archive.
An API endpoint for receiving Metropolitan Muesum of Art Collection's CC-Licensed works.
Latest Updates | The Metropolitan Museum of Art Collection API:
The Metropolitan Museum of Art provides select datasets of information on more than 470,000 artworks in its Collection for unrestricted commercial and noncommercial use. To the extent possible under law, The Metropolitan Museum of Art has waived all copyright and related or neighboring rights to this dataset using the Creative Commons Zero license.
The Vimeo API allows users to perform filtered, advanced search on Vimeo videos.
- The MediaWiki Action API is a web service that allows access to some wiki features like authentication, page operations, and search. It can provide meta information about the wiki and the logged-in user.
language-codes_csv.csv
- A list of language codes in ISO 639-1 Format to access statistics of each
wikipedia main page across different languages. In the script, this file is
named as
language-codes_csv
to minimize the amount of manual work required for running the script provided the same language encoding file. The user would have to rename the header and file name of their.csv
ISO code list according to the concurrent file on Github if they would like to use some list other than the concurrent one. - This file that this script uses can be downloaded from: https://datahub.io/core/language-codes
- A list of language codes in ISO 639-1 Format to access statistics of each
wikipedia main page across different languages. In the script, this file is
named as
An API from YouTube for platform users to upload videos, adjust video parameters, and obtain search results.
For information on past efforts, see history.md
.
LICENSE
: the code within this repository is licensed under the Expat/MIT license.
The data within this repository is dedicated to the public domain under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.
The documentation within the project is licensed under a Creative Commons Attribution 4.0 International License.