Add link to python docs

aldenstpage · aldenstpage · commit 183a50d79854 · 2020-08-17T11:08:37.000-04:00
diff --git a/content/blog/entries/crawling-500-million/contents.lr b/content/blog/entries/crawling-500-million/contents.lr
@@ -60,9 +60,9 @@ In the next section, we'll examine some of the key components that make up the c
 #### Detailed breakdown
 
 ##### Concurrency with `asyncio`
-Crawling is a massively IO bound task. The workers need to maintain lots of simultaneous open connections with internal systems like Kafka and Redis as well as 3rd party websites holding the target images. Once we have the image in memory, performing our actual analysis task is easy and cheap. For these reasons, an asynchronous approach seems more attractive than using multiple threads of execution. Even if our image processing task grows in complexity and becomes CPU bound, we can get the best of both worlds by offloading heavyweight tasks to a process pool.
+Crawling is a massively IO bound task. The workers need to maintain lots of simultaneous open connections with internal systems like Kafka and Redis as well as 3rd party websites holding the target images. Once we have the image in memory, performing our actual analysis task is easy and cheap. For these reasons, an asynchronous approach seems more attractive than using multiple threads of execution. Even if our image processing task grows in complexity and becomes CPU bound, we can get the best of both worlds by offloading heavyweight tasks to a process pool. See "[Running Blocking Code](https://docs.python.org/3/library/asyncio-dev.html#running-blocking-code)" in the `asyncio` docs for more details.
 
-Another reason that an asynchronous approach may be desirable is that we have several interlocking components which need to react to events in real-time: our crawl monitoring process needs to simultaneously control the rate limiting process and also interrupt crawling if errors are detected; our worker processes need to consume crawl events, process images, upload thumbnails, and produce metadata events. Coordinating all of these components through inter-process communication could be difficult, but breaking up tasks into small pieces and yielding to the event loop is comparatively easy.
+Another reason that an asynchronous approach may be desirable is that we have several interlocking components which need to react to events in real-time: our crawl monitoring process needs to simultaneously control the rate limiting process and also interrupt crawling if errors are detected, while our worker processes need to consume crawl events, process images, upload thumbnails, and produce events documenting the metadata of each image. Coordinating all of these components through inter-process communication could be difficult, but breaking up tasks into small pieces and yielding to the event loop is comparatively easy.
 
 ##### The resize task
 This is the most vital part of our crawling system: the part that actually does the work of fetching and processing an image. As established previously, we need to execute this task concurrently, so everything needs to be defined with `async`/`await` syntax to allow the event loop to multitask. The actual task itself is otherwise straightforward.