Jia Zhao, Jo Kent, Wander Demuynck, Paddy Smith, James Thomas
27-28 November 2025
Code produced during the Web Archives for Social Sciences Datathon, 27-28 November 2025.
Our group used a cache of web data containing all commercial websites (.co.uk, landing webpages only) archived by the Common Crawl (all of 2021 and 2024) that include at least one postcode from Manchester and Birmingham in their web text. The 2021 commercial websites were classified by economic activity, while the 2024 websites were not.
We analysed the industrial structure and spatial patterns in the data, and attempted to use the 2021 data to classify the 2024 data for both cities.