Skip to content

Commit 651db33

Browse files
author
Greg Lindahl
committed
docs: placeholders
1 parent 8b0eb5b commit 651db33

1 file changed

Lines changed: 21 additions & 0 deletions

File tree

README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,24 @@ The [Makefile](./Makefile) contains targets to apply a consistent formatting to
3737
## Citations from Google Scholar Alerts
3838

3939
As an initial step and to get a higher coverage, citations are extracted from Google Scholar Alert e-mails received April 2016 to date. See [gscholar_alerts](./gscholar_alerts/).
40+
41+
## Updating the awesome graph that everyone loves
42+
43+
## Uploading the raw data to Hugging Face
44+
45+
### Google Scholar
46+
47+
This data is split by year to make it easier to explore.
48+
49+
- pull the updated repo
50+
- `make gscholar-bib`
51+
- look in tmp for 2024.jsonl etc.
52+
- upload at https://huggingface.co/datasets/commoncrawl/citations/tree/main
53+
54+
### Annotated Citations
55+
56+
This much smaller dataset has the extra fields mentioned above.
57+
58+
- pull the updated repo
59+
- `make tmp/commoncrawl_annotated.csv`
60+
- TODO

0 commit comments

Comments
 (0)