Merge pull request #38 from samadpls/main

TimidRobot · web-flow · commit 53b477c1fbdb · 2023-03-03T12:00:37.000-08:00
fixes the issue #37 Documentation of Data Sources and APIs used in the project
diff --git a/README.md b/README.md
@@ -64,124 +64,7 @@ modules:
 
 ## Data Sources
 
-
-### CC Legal Tools
-
-- [`legal-tool-paths.txt`](google_custom_search/legal-tool-paths.txt)
-  - A `.txt` provided by Timid Robot containing all legal tool paths. The data
-    from Google Custom Search will only cover 50+ general, most significant
-    categories of CC License for data collection quota constraint. As an
-    additional note, the order of precedence of license the collected data's
-    first column is sorted due to intermediate data analysis progress.
-    - [add list of all current CC legal tool paths by TimidRobot · Pull Request
-      #7 · creativecommons/quantifying][pr7]
-
-[pr7]: https://github.com/creativecommons/quantifying/pull/7
-
-
-### Flickr
-
-- The Flickr API exposes identifiers for users, photos, photosets and other
-  uniquely identifiable objects.
-- The Flickr API consists of a set of callable methods, and some API endpoints.
-- For more detailed description, visit: [API documentation - Flickr
-  Services](https://www.flickr.com/services/api/).
-- The `hs.csv` file is a sample CSV of pulled data. Ideally the script will
-  generate final data CSVs.
-- Each license will have a CSV to save the data.
-- Due to memory limit, the license CSVs are not pushed into github.
-
-
-### Google Custom Search JSON API
-
-- The Custom Search JSON API allows user-defined detailed query and access
-  towards related query data using a programmable search engine.
-  - [Custom Search JSON API Reference | Programmable Search Engine | Google
-    Developers][googlejsonapi]
-  - [Method: cse.list | Custom Search JSON API | Google Developers][cselist]
-- [`google_countries.tsv`](google_custom_search/google_countries.txt)
-  - Created by directly copy and pasting the `cr` parameter list from the
-    following link into a `.tsv` file as there were no reliable algorithmic way
-    for retrieving such data found in the process so far. The script itself
-    will take care of the formatting and country-selection process.
-    - [Country Collection Values | JSON API reference | Programmable Search
-      Engine | Google Developers][googlecountry]
-- [`google_lang.txt`](google_custom_search/google_lang.txt)
-  - Created by directly copy and pasting the `lr` parameter list from the
-    following link into a `.txt` file as there were no reliable algorithmic way
-    for retrieving such data found in the process so far. The script itself
-    will take care of the data formatting and language-selection process.
-    - [Parameter: lr | Method: cse.list | Custom Search JSON API | Google
-      Developers][googlelang]
-
-[googlejsonapi]: https://developers.google.com/custom-search/v1
-[cselist]: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
-[googlecountry]: https://developers.google.com/custom-search/docs/json_api_reference#countryCollections
-[googlelang]: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list#body.QUERY_PARAMETERS.lr
-
-
-### Internet Archive Python Interface
-
-A python interface to archive.org to achieve API requests towards internet
-archive.
-- [`internetarchive.Search` - Internetarchive: A Python Interface to
-  archive.org][iasearch]
-
-[iasearch]: https://internetarchive.readthedocs.io/en/stable/internetarchive.html#internetarchive.Search
-
-
-### The Metropolitan Museum of Art Collection API
-
-An API endpoint for receiving Metropolitan Muesum of Art Collection's
-CC-Licensed works.
-
-[Latest Updates | The Metropolitan Museum of Art Collection API][metapi]:
-> The Metropolitan Museum of Art provides select datasets of information on
-> more than 470,000 artworks in its Collection for unrestricted commercial and
-> noncommercial use. To the extent possible under law, The Metropolitan Museum
-> of Art has waived all copyright and related or neighboring rights to this
-> dataset using the [Creative Commons Zero][cc-zero] license.
-
-[metapi]: https://metmuseum.github.io/
-[cc-zero]: https://creativecommons.org/publicdomain/zero/1.0/
-
-
-### Vimeo API
-
-The Vimeo API allows users to perform filtered, advanced search on Vimeo
-videos.
-- [Getting Started with the Vimeo API][vimeostart]
-  - [Search for videos - Vimeo API Reference: Videos][vimeoapisearch]
-
-[vimeostart]: https://developer.vimeo.com/api/guides/start
-[vimeoapisearch]: https://developer.vimeo.com/api/reference/videos#search_videos
-
-
-### MediaWiki API
-
-- The MediaWiki Action API is a web service that allows access to some wiki
-  features like authentication, page operations, and search. It can provide
-  meta information about the wiki and the logged-in user.
-  - Example query: https://commons.wikimedia.org/w/api.php?action=query&cmtitle=Category:CC-BY&list=categorymembers
-- [`language-codes_csv.csv`](wikipedia/language-codes_csv.csv)
-  - A list of language codes in ISO 639-1 Format to access statistics of each
-    wikipedia main page across different languages. In the script, this file is
-    named as `language-codes_csv` to minimize the amount of manual work
-    required for running the script provided the same language encoding file.
-    The user would have to rename the header and file name of their `.csv` ISO
-    code list according to the concurrent file on Github if they would like to
-    use some list other than the concurrent one.
-  - This file that this script uses can be downloaded from:
-    https://datahub.io/core/language-codes
-
-
-### Youtube Data API
-
-An API from YouTube for platform users to upload videos, adjust video
-parameters, and obtain search results.
-- [Search: list | YouTube Data API | Google Developers][youtubeapi]
-
-[youtubeapi]: https://developers.google.com/youtube/v3/docs/search/list
+Kindly visit the [source.md](sources.md) file for it.
 
 
 ## History
diff --git a/sources.md b/sources.md
@@ -0,0 +1,100 @@
+# Data Sources
+
+This project uses data from various sources that are openly licensed or in the public domain. Below are the sources and their respective information:
+
+## CC Legal Tools
+
+**Description:** _A .txt provided by Timid Robot containing all legal tool paths. The data from Google Custom Search will only cover 50+ general, most significant categories of CC License for data collection quota constraint. As an additional note, the order of precedence of license the collected data's first column is sorted due to intermediate data analysis progress._
+
+**API documentation link:**
+- [List of all current CC legal tool paths by TimidRobot](https://github.com/creativecommons/quantifying/blob/main/google_custom_search/legal-tool-paths.txt)
+
+**API information:**
+- No API key required
+- No query limits
+
+## Flickr
+
+**Description:** _With over 5 billion photos (many with valuable metadata such as tags, geolocation,
+and Exif data), the Flickr community creates wonderfully rich data. The Flickr API is how you can
+access that data. In fact, almost all the functionality that runs flickr.com is available through
+the API._ ([Flickr: The Flickr Developer Guide](https://www.flickr.com/services/developer/))
+
+**API documentation link:** 
+- [API documentation - Flickr Services](https://www.flickr.com/services/api/)
+
+**API information:**
+- API key required
+- Query limit: 3600 requests per hour
+- Data available through CSV format
+
+## Google Custom Search JSON API
+
+**Description:** _The Custom Search JSON API allows user-defined detailed query and access towards related query data using a programmable search engine._
+
+**API documentation links:**
+- [Custom Search JSON API Reference | Programmable Search Engine | Google Developers](https://developers.google.com/custom-search/v1/reference/rest)
+- [Method: cse.list | Custom Search JSON API | Google Developers](https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list)
+
+**API information:**
+- API key required
+- Query limit: 100 queries per day for free version
+- Data available through JSON format
+
+## Internet Archive Python Interface
+
+**Description:** _A python interface to archive.org to achieve API requests towards internet archive._
+
+**API documentation link:** 
+- [internetarchive.Search - Internetarchive: A Python Interface to archive.org](https://internetarchive.readthedocs.io/en/stable/internetarchive.html#internetarchive.Search)
+
+**API information:**
+- No API key required
+- No query limits
+
+## The Metropolitan Museum of Art Collection API
+
+**Description:** _An API endpoint for receiving Metropolitan Muesum of Art Collection's CC-Licensed works._
+
+**API documentation link:** 
+- [Latest Updates | The Metropolitan Museum of Art Collection API](https://metmuseum.github.io/)
+
+**API information:**
+  - No API key required
+  - 80 queries per second
+
+## Vimeo API
+
+ **Description:** _The Vimeo API allows users to perform filtered, advanced search on Vimeo videos._
+
+**API documentation link:** 
+- [Getting Started with the Vimeo API](https://developer.vimeo.com/api/start)
+
+**API information:**
+  - API key required
+  - Query limit: 5000 authenticated requests per day
+  - Data available through JSON format
+
+## MediaWiki Action API
+
+**Description:** _The MediaWiki Action API is a web service that allows access to some wiki features like authentication, page operations, and search. It can provide meta information about the wiki and the logged-in user._
+
+**API documentation link:** 
+- [MediaWiki Action API](https://www.mediawiki.org/wiki/API:Main_page)
+
+**API information:**
+  - No API key required
+  - Query limit: depends on user status and request type
+  - Data available through XML or JSON format
+ 
+## YouTube Data API
+
+**Description:** _An API from YouTube for platform users to upload videos, adjust video parameters, and obtain search results._
+
+**API documentation link:** 
+- [Search: list | YouTube Data API | Google Developers](https://developers.google.com/youtube/v3/docs/search/list)
+
+**API information:**
+  - API key required
+  - Query limit: depends on the type and number of requests
+  - Data available through JSON format