Skip to content

Commit 53b477c

Browse files
authored
Merge pull request #38 from samadpls/main
fixes the issue #37 Documentation of Data Sources and APIs used in the project
2 parents 2429a75 + aa7b34f commit 53b477c

File tree

2 files changed

+101
-118
lines changed

2 files changed

+101
-118
lines changed

README.md

+1-118
Original file line numberDiff line numberDiff line change
@@ -64,124 +64,7 @@ modules:
6464
6565
## Data Sources
6666
67-
68-
### CC Legal Tools
69-
70-
- [`legal-tool-paths.txt`](google_custom_search/legal-tool-paths.txt)
71-
- A `.txt` provided by Timid Robot containing all legal tool paths. The data
72-
from Google Custom Search will only cover 50+ general, most significant
73-
categories of CC License for data collection quota constraint. As an
74-
additional note, the order of precedence of license the collected data's
75-
first column is sorted due to intermediate data analysis progress.
76-
- [add list of all current CC legal tool paths by TimidRobot · Pull Request
77-
#7 · creativecommons/quantifying][pr7]
78-
79-
[pr7]: https://github.com/creativecommons/quantifying/pull/7
80-
81-
82-
### Flickr
83-
84-
- The Flickr API exposes identifiers for users, photos, photosets and other
85-
uniquely identifiable objects.
86-
- The Flickr API consists of a set of callable methods, and some API endpoints.
87-
- For more detailed description, visit: [API documentation - Flickr
88-
Services](https://www.flickr.com/services/api/).
89-
- The `hs.csv` file is a sample CSV of pulled data. Ideally the script will
90-
generate final data CSVs.
91-
- Each license will have a CSV to save the data.
92-
- Due to memory limit, the license CSVs are not pushed into github.
93-
94-
95-
### Google Custom Search JSON API
96-
97-
- The Custom Search JSON API allows user-defined detailed query and access
98-
towards related query data using a programmable search engine.
99-
- [Custom Search JSON API Reference | Programmable Search Engine | Google
100-
Developers][googlejsonapi]
101-
- [Method: cse.list | Custom Search JSON API | Google Developers][cselist]
102-
- [`google_countries.tsv`](google_custom_search/google_countries.txt)
103-
- Created by directly copy and pasting the `cr` parameter list from the
104-
following link into a `.tsv` file as there were no reliable algorithmic way
105-
for retrieving such data found in the process so far. The script itself
106-
will take care of the formatting and country-selection process.
107-
- [Country Collection Values | JSON API reference | Programmable Search
108-
Engine | Google Developers][googlecountry]
109-
- [`google_lang.txt`](google_custom_search/google_lang.txt)
110-
- Created by directly copy and pasting the `lr` parameter list from the
111-
following link into a `.txt` file as there were no reliable algorithmic way
112-
for retrieving such data found in the process so far. The script itself
113-
will take care of the data formatting and language-selection process.
114-
- [Parameter: lr | Method: cse.list | Custom Search JSON API | Google
115-
Developers][googlelang]
116-
117-
[googlejsonapi]: https://developers.google.com/custom-search/v1
118-
[cselist]: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
119-
[googlecountry]: https://developers.google.com/custom-search/docs/json_api_reference#countryCollections
120-
[googlelang]: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list#body.QUERY_PARAMETERS.lr
121-
122-
123-
### Internet Archive Python Interface
124-
125-
A python interface to archive.org to achieve API requests towards internet
126-
archive.
127-
- [`internetarchive.Search` - Internetarchive: A Python Interface to
128-
archive.org][iasearch]
129-
130-
[iasearch]: https://internetarchive.readthedocs.io/en/stable/internetarchive.html#internetarchive.Search
131-
132-
133-
### The Metropolitan Museum of Art Collection API
134-
135-
An API endpoint for receiving Metropolitan Muesum of Art Collection's
136-
CC-Licensed works.
137-
138-
[Latest Updates | The Metropolitan Museum of Art Collection API][metapi]:
139-
> The Metropolitan Museum of Art provides select datasets of information on
140-
> more than 470,000 artworks in its Collection for unrestricted commercial and
141-
> noncommercial use. To the extent possible under law, The Metropolitan Museum
142-
> of Art has waived all copyright and related or neighboring rights to this
143-
> dataset using the [Creative Commons Zero][cc-zero] license.
144-
145-
[metapi]: https://metmuseum.github.io/
146-
[cc-zero]: https://creativecommons.org/publicdomain/zero/1.0/
147-
148-
149-
### Vimeo API
150-
151-
The Vimeo API allows users to perform filtered, advanced search on Vimeo
152-
videos.
153-
- [Getting Started with the Vimeo API][vimeostart]
154-
- [Search for videos - Vimeo API Reference: Videos][vimeoapisearch]
155-
156-
[vimeostart]: https://developer.vimeo.com/api/guides/start
157-
[vimeoapisearch]: https://developer.vimeo.com/api/reference/videos#search_videos
158-
159-
160-
### MediaWiki API
161-
162-
- The MediaWiki Action API is a web service that allows access to some wiki
163-
features like authentication, page operations, and search. It can provide
164-
meta information about the wiki and the logged-in user.
165-
- Example query: https://commons.wikimedia.org/w/api.php?action=query&cmtitle=Category:CC-BY&list=categorymembers
166-
- [`language-codes_csv.csv`](wikipedia/language-codes_csv.csv)
167-
- A list of language codes in ISO 639-1 Format to access statistics of each
168-
wikipedia main page across different languages. In the script, this file is
169-
named as `language-codes_csv` to minimize the amount of manual work
170-
required for running the script provided the same language encoding file.
171-
The user would have to rename the header and file name of their `.csv` ISO
172-
code list according to the concurrent file on Github if they would like to
173-
use some list other than the concurrent one.
174-
- This file that this script uses can be downloaded from:
175-
https://datahub.io/core/language-codes
176-
177-
178-
### Youtube Data API
179-
180-
An API from YouTube for platform users to upload videos, adjust video
181-
parameters, and obtain search results.
182-
- [Search: list | YouTube Data API | Google Developers][youtubeapi]
183-
184-
[youtubeapi]: https://developers.google.com/youtube/v3/docs/search/list
67+
Kindly visit the [source.md](sources.md) file for it.
18568
18669
18770
## History

sources.md

+100
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Data Sources
2+
3+
This project uses data from various sources that are openly licensed or in the public domain. Below are the sources and their respective information:
4+
5+
## CC Legal Tools
6+
7+
**Description:** _A .txt provided by Timid Robot containing all legal tool paths. The data from Google Custom Search will only cover 50+ general, most significant categories of CC License for data collection quota constraint. As an additional note, the order of precedence of license the collected data's first column is sorted due to intermediate data analysis progress._
8+
9+
**API documentation link:**
10+
- [List of all current CC legal tool paths by TimidRobot](https://github.com/creativecommons/quantifying/blob/main/google_custom_search/legal-tool-paths.txt)
11+
12+
**API information:**
13+
- No API key required
14+
- No query limits
15+
16+
## Flickr
17+
18+
**Description:** _With over 5 billion photos (many with valuable metadata such as tags, geolocation,
19+
and Exif data), the Flickr community creates wonderfully rich data. The Flickr API is how you can
20+
access that data. In fact, almost all the functionality that runs flickr.com is available through
21+
the API._ ([Flickr: The Flickr Developer Guide](https://www.flickr.com/services/developer/))
22+
23+
**API documentation link:**
24+
- [API documentation - Flickr Services](https://www.flickr.com/services/api/)
25+
26+
**API information:**
27+
- API key required
28+
- Query limit: 3600 requests per hour
29+
- Data available through CSV format
30+
31+
## Google Custom Search JSON API
32+
33+
**Description:** _The Custom Search JSON API allows user-defined detailed query and access towards related query data using a programmable search engine._
34+
35+
**API documentation links:**
36+
- [Custom Search JSON API Reference | Programmable Search Engine | Google Developers](https://developers.google.com/custom-search/v1/reference/rest)
37+
- [Method: cse.list | Custom Search JSON API | Google Developers](https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list)
38+
39+
**API information:**
40+
- API key required
41+
- Query limit: 100 queries per day for free version
42+
- Data available through JSON format
43+
44+
## Internet Archive Python Interface
45+
46+
**Description:** _A python interface to archive.org to achieve API requests towards internet archive._
47+
48+
**API documentation link:**
49+
- [internetarchive.Search - Internetarchive: A Python Interface to archive.org](https://internetarchive.readthedocs.io/en/stable/internetarchive.html#internetarchive.Search)
50+
51+
**API information:**
52+
- No API key required
53+
- No query limits
54+
55+
## The Metropolitan Museum of Art Collection API
56+
57+
**Description:** _An API endpoint for receiving Metropolitan Muesum of Art Collection's CC-Licensed works._
58+
59+
**API documentation link:**
60+
- [Latest Updates | The Metropolitan Museum of Art Collection API](https://metmuseum.github.io/)
61+
62+
**API information:**
63+
- No API key required
64+
- 80 queries per second
65+
66+
## Vimeo API
67+
68+
**Description:** _The Vimeo API allows users to perform filtered, advanced search on Vimeo videos._
69+
70+
**API documentation link:**
71+
- [Getting Started with the Vimeo API](https://developer.vimeo.com/api/start)
72+
73+
**API information:**
74+
- API key required
75+
- Query limit: 5000 authenticated requests per day
76+
- Data available through JSON format
77+
78+
## MediaWiki Action API
79+
80+
**Description:** _The MediaWiki Action API is a web service that allows access to some wiki features like authentication, page operations, and search. It can provide meta information about the wiki and the logged-in user._
81+
82+
**API documentation link:**
83+
- [MediaWiki Action API](https://www.mediawiki.org/wiki/API:Main_page)
84+
85+
**API information:**
86+
- No API key required
87+
- Query limit: depends on user status and request type
88+
- Data available through XML or JSON format
89+
90+
## YouTube Data API
91+
92+
**Description:** _An API from YouTube for platform users to upload videos, adjust video parameters, and obtain search results._
93+
94+
**API documentation link:**
95+
- [Search: list | YouTube Data API | Google Developers](https://developers.google.com/youtube/v3/docs/search/list)
96+
97+
**API information:**
98+
- API key required
99+
- Query limit: depends on the type and number of requests
100+
- Data available through JSON format

0 commit comments

Comments
 (0)