blog/feed.xml

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>CC technical blog</title><link href="http://opensource.creativecommons.org/" rel="alternate"></link><link href="http://opensource.creativecommons.org/blog/feed.xml" rel="self"></link><id>urn:uuid:cc6bc3c1-d0ad-365f-b7a6-1fcd73488c56</id><updated>2025-01-15T00:00:00Z</updated><author><name></name></author><entry><title>Skipping Google Summer of Code (GSoC) 2025</title><link href="http://opensource.creativecommons.org/blog/entries/2025-01-15-skipping-gsoc-2025/" rel="alternate"></link><updated>2025-01-15T00:00:00Z</updated><author><name>['TimidRobot']</name></author><id>urn:uuid:ab4ea0f1-0e3e-3ff1-acf7-17962b9a0b9a</id><content type="html">&lt;p&gt;The Creative Commons (CC) technology team regrets to announce &lt;strong&gt;we will not be
participating in Google Summer of Code (GSoC) 2025&lt;/strong&gt;. While the program remains
excellent, we do not have the resources to participate this year and meet our
core responsibilities.&lt;/p&gt;
&lt;p&gt;We are grateful to Google for the program and have found incredible value in
participating in past years. We look forward to participating in future years.
We are thankful for the work and time of contributors. This is not an exciting
announcement, but we will be better equipped to engage with work programs in
the future.&lt;/p&gt;
&lt;h2 id="preparing-to-re-engage"&gt;Preparing to re-engage&lt;/h2&gt;&lt;p&gt;In addition to revamping our CC Open Source website during the first quarter of
this year, we will also be refreshing our structured community involvement, and
improving our project lead resources.&lt;/p&gt;
&lt;p&gt;Our CC Open Source website is in the process of being updated to be less
technologically complex and to leverage the current Vocabulary design system
(&lt;a href="http://github.com/creativecommons/vocabulary"&gt;creativecommons/vocabulary&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Our structured community involvement has languished since the technology team
was downsized due to the COVID pandemic (202-12-07 &lt;a href="https://opensource.creativecommons.org/blog/entries/2020-12-07-upcoming-changes-to-community/"&gt;Upcoming Changes to the CC
Open Source Community — Creative Commons Open Source&lt;/a&gt;). We will be
simplifying community involvement so that we can be more responsive with
increased visibility.&lt;/p&gt;
&lt;p&gt;The most resource intensive period of a work program is the application phase.
During this time there is a deluge of activity that has often exceeded our
capacity. Developing our project lead resources will allow us to better set
expectations, ease communications, and better point applicants on productive
trajectories.&lt;/p&gt;
&lt;h2 id="past-participation"&gt;Past participation&lt;/h2&gt;&lt;p&gt;For information on the excellent work completed during past participations,
please see: &lt;a href="https://opensource.creativecommons.org/programs/history/"&gt;Open Source Work Programs: History — Creative Commons Open
Source&lt;/a&gt;.&lt;/p&gt;
</content></entry><entry><title>My Outreachy Internship With Creative Commons</title><link href="http://opensource.creativecommons.org/blog/entries/my-outreachy-internship-with-creative-commons/" rel="alternate"></link><updated>2024-12-10T00:00:00Z</updated><author><name>['Queen']</name></author><id>urn:uuid:0e2dfb7f-de05-34dd-9a2e-323c02bc8def</id><content type="html">&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;&lt;p&gt;Hi, everyone! My name is Queen, and I’m a fresh pharmacy graduate with a passion for tech. My journey into coding started four years ago when I wrote my first HTML code and thought, "Yes, I’m FAANG-ready!" Spoiler: I wasn’t — but that didn’t stop me from dreaming big. Balancing pharmacy school and learning to code often felt like I was biting off more than I could chew, but I’m proud that I never gave up on my dream of becoming a front-end developer.&lt;/p&gt;
&lt;h2 id="my-core-values"&gt;My Core Values&lt;/h2&gt;&lt;p&gt;When I think about my core values, three words come to mind: &lt;strong&gt;Growth&lt;/strong&gt;, &lt;strong&gt;Curiosity&lt;/strong&gt;, and &lt;strong&gt;Knowledge&lt;/strong&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Growth&lt;/strong&gt;: I strive to improve in every aspect of life—mentally, physically, intellectually, and even spiritually. Every setback is just a stepping stone for me.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Curiosity&lt;/strong&gt;: This one’s a work in progress, but I’m learning to ask questions and embrace not knowing. I love understanding &lt;em&gt;why&lt;/em&gt; things work the way they do.  &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Knowledge&lt;/strong&gt;: I read a lot because I genuinely enjoy learning new things. For me, knowledge is the key to confidence and growth.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="my-journey-to-outreachy"&gt;My Journey to Outreachy&lt;/h2&gt;&lt;p&gt;Outreachy is a three-month paid open-source internship program for underrepresented people in tech. I first heard about it last year—just two days before the application deadline. I didn’t make it past the initial application stage that time.&lt;/p&gt;
&lt;p&gt;When the December 2024 cohort initial application opened, it was the perfect timing for me. By then, I had finished my pharmacy degree and was ready to gain professional experience in front-end development and most especially, in open source. This time, I was determined to get it right.&lt;/p&gt;
&lt;p&gt;I applied on the same day the application opened, I had already kept answers for the essay questions in my notes app. While waiting for the results, I brushed up on my skills and read articles from past interns to prepare for the contribution period.&lt;/p&gt;
&lt;p&gt;When I received the email saying my initial application had been approved, I felt a rush of excitement. To move forward and be able to make a final application, I needed to make at least one contribution to a project. I narrowed my choices to two based on the skills required, but Creative Commons stood out to me. Their mission and the project description piqued my interest more.&lt;/p&gt;
&lt;h2 id="the-contribution-period"&gt;The Contribution Period&lt;/h2&gt;&lt;p&gt;The contribution period was competitive—and intimidating. Seeing the amazing work other applicants were doing made me doubt myself. But I loved the project and found the community so welcoming that I couldn’t give up.&lt;/p&gt;
&lt;p&gt;The mentors were incredibly supportive, giving feedback that helped me improve with each contribution. When it was time to draft my final application and proposal, I worked with my mentor, sharing my plans and got her input which helped me in creating my project timeline.&lt;/p&gt;
&lt;p&gt;Even before I knew if I’d be selected, I felt fulfilled. Contributing to Creative Commons was a rewarding experience, and I knew I wanted to continue contributing to the community, intern or not.&lt;/p&gt;
&lt;h2 id="my-internship-project"&gt;My Internship Project&lt;/h2&gt;&lt;h3 id="consolidating-and-implementing-the-vocabulary-design-system"&gt;&lt;strong&gt;Consolidating and Implementing the Vocabulary Design System&lt;/strong&gt;&lt;/h3&gt;&lt;p&gt;During my internship, I’ll be working on consolidating and implementing the Vocabulary design system across Creative Commons' ancillary websites.&lt;/p&gt;
&lt;p&gt;Vocabulary is a design system that ensures a consistent user interface (UI) and user experience (UX) across all Creative Commons websites. However, its implementation has been inconsistent, with variations in features and versions across different sites. My role is to identify these inconsistencies and work on a unified implementation that also focuses on accessibilty. I'd also be implementing features that might be a good addition to the design system.&lt;/p&gt;
&lt;p&gt;I’m excited about this project because it aligns with my passion for front-end development and allows me to contribute meaningfully to a global community. I also get to improve my skills and gain new ones.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;&lt;p&gt;This internship is more than just a milestone for me—it’s a testament to perseverance and growth. I’m thrilled to embark on this journey with Creative Commons, and I can’t wait to see where it leads.&lt;/p&gt;
&lt;p&gt;If you’re considering applying for Outreachy, my advice is simple: believe in yourself, stay curious, and never stop learning. Your journey might just surprise you.&lt;/p&gt;
&lt;p&gt;Thank you for reading!&lt;/p&gt;
</content></entry><entry><title>Local Environment Creation using Ansible and Docker: Part 2</title><link href="http://opensource.creativecommons.org/blog/entries/2024-08-23-create-local-ansible-dev-env/" rel="alternate"></link><updated>2024-08-23T00:00:00Z</updated><author><name>['amandayclee']</name></author><id>urn:uuid:e799d536-27a9-35ae-98d5-a21021bd8b52</id><content type="html">&lt;div style="text-align: center;"&gt;
    &lt;img src="gsoc-banner.png" alt="GSoC 2024" style="max-height: 200px;"&gt;
&lt;/div&gt;&lt;h1 id="midterm-recap"&gt;Midterm Recap&lt;/h1&gt;&lt;p&gt;I successfully created customized Dockerfiles and a docker-compose.yml for &lt;code&gt;web&lt;/code&gt;, &lt;code&gt;database&lt;/code&gt;, and &lt;code&gt;ansible&lt;/code&gt; for the past 6 weeks. However, to better replicate our production environment, which uses an AWS RDS instance, we decided to remove the customized Dockerfile for database, as SSH access is not required for the database host in this setup.&lt;/p&gt;
&lt;h3 id="week-by-week-progress"&gt;Week-by-Week Progress&lt;/h3&gt;&lt;p&gt;Following our initial architecture design, I began working on building a bastion server. One of the key lessons I learned during this process was the value of simplicity. For instance, I had to  assess the trade-offs between creating a custom Dockerfile and using a prebuilt image maintained by the community. In the world of DevOps, some terms are often loosely defined. For example, during my research on bastion servers, I encountered various use cases such as integrating MFA, logging, and other security features. However, these were beyond the scope of our current project.&lt;/p&gt;
&lt;p&gt;For this project, we are building a bastion server primarily to serve as a secure gateway for managing access to internal servers. This specific requirement dictated a more straightforward implementation. In this context, I also came across the concept of "YAGNI" (You Aren’t Gonna Need It), which reminds us to avoid adding unnecessary features until they are actually required. Along the way, while working with Creative Commons (CC), I learned an important lesson: &lt;strong&gt;with so many tools, software, and technologies available, it’s crucial to focus on implementing configurations and solutions that are tailored specifically to our environment and requirements.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Before setting up the bastion server, it's also very important to understand the different SSH configuration options and choose the one that best meets our security and convenience needs. For instance, passwordless SSH enhances security and convenience by enabling SSH key-based authentication, but it requires public key configuration on each server, which can be cumbersome in larger environments. SSH Agent, on the other hand, improves the security and management of private keys by keeping them in memory across multiple connections. However, it requires running the SSH Agent locally and loading keys, adding some complexity to the setup. We ultimately decided to use ProxyJump because it offers centralized control and simplifies multi-hop connections through a bastion server, which provides strong security and convenience. While ProxyJump requires moderate configuration of both the bastion and target servers, it excels in supporting multi-hop connections and ensuring secure access to internal servers.&lt;/p&gt;
&lt;p&gt;We finalized these details in the &lt;a href="https://github.com/creativecommons/ansible-dev/pull/14"&gt;Bastion Container Creation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The next step was to explore the best approach for integrating Ansible with Docker to closely mirror a production environment. We maintained our manual provisioning approach and focused on three key integration strategies.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Option 1 involves having Ansible manage containers directly through the Docker network, where all services (&lt;code&gt;bastion&lt;/code&gt;, &lt;code&gt;ansible&lt;/code&gt;, &lt;code&gt;web&lt;/code&gt;, &lt;code&gt;db&lt;/code&gt;) operate within the same network. Ansible handles the management of the web and db containers using their container names or IPs, with the bastion server acting as a jump host only when necessary. This approach treats each container as an independent host, with Ansible responsible for installing and configuring the necessary software.&lt;/li&gt;
&lt;li&gt;Option 2 leverages the &lt;code&gt;community.docker.docker_container_exec&lt;/code&gt; module to execute commands within Docker containers via Ansible playbooks. This method allows for application installation and configuration tasks to be performed directly inside the containers.&lt;/li&gt;
&lt;li&gt;Option 3 involves running only the bastion and Ansible services in Docker, while using Ansible to provision the web and db services. Ansible connects to these containers through the bastion server, allowing it to manage and configure the web and db as external resources.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When comparing these options, Option 1 manages applications within containers at the application layer, offering fine-grained control and simplifying setup in a unified environment. Option 2 operates at the Docker layer, providing greater flexibility and portability, ideal for quick deployments. Option 3 provides the best isolation between services, closely simulating a production environment with enhanced security, but it requires a more complex setup.&lt;/p&gt;
&lt;p&gt;After careful consideration, we decided to proceed with Option 1, which shifts most configuration tasks from the Dockerfile to Ansible playbooks. As I write this post, I am in the process of implementing these playbooks to configure the containers. You can follow the ongoing development in this &lt;a href="https://github.com/creativecommons/ansible-dev/"&gt;repository&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id="acknowledgments"&gt;Acknowledgments&lt;/h1&gt;&lt;p&gt;This experience has provided me with practical skills in implementing real-world DevOps projects. I truly enjoy learning all this knowledge outside of my daily job and dedicating my personal time to something meaningful, which is often not covered in school. If this project succeeds as a proof of concept, I can gather more feedback from users, specifically open-source developers, to enhance this setup. I mentioned this in my previous blog post, but I can’t emphasize enough how grateful I am to &lt;a href="https://opensource.creativecommons.org/blog/authors/shafiya/"&gt;Shafiya&lt;/a&gt;, &lt;a href="https://opensource.creativecommons.org/blog/authors/TimidRobot/"&gt;Timid Robot&lt;/a&gt;, and &lt;a href="https://opensource.creativecommons.org/blog/authors/sara/"&gt;Sara&lt;/a&gt; for their guidance, and to Google Summer of Code for giving me the opportunity to contribute to open source. As a content creator who both produces and enjoys various open content online, I am incredibly excited and honored to contribute my technical expertise to CC.
Thanks to CC’ impact on society, I am committed to continually advancing my technical skills and supporting this organization in the long term. I look forward to continuing my involvement in the open-source community!&lt;/p&gt;
</content></entry><entry><title>Automating Quantifying the Commons: Part 2</title><link href="http://opensource.creativecommons.org/blog/entries/2024-08-22-automating-quantifying/" rel="alternate"></link><updated>2024-08-22T00:00:00Z</updated><author><name>['NaishaSinha']</name></author><id>urn:uuid:8e7c9006-3567-3028-841c-c6c91d114f07</id><content type="html">&lt;p&gt;&lt;img src="/blog/entries/2024-08-22-automating-quantifying/Automating - GSoC Logo.png" alt="GSoC 2024"&gt;&lt;/p&gt;
&lt;h2 id="introduction-midterm-recap"&gt;Introduction: Midterm Recap&lt;/h2&gt;&lt;hr&gt;
&lt;p&gt;This post serves as a technical journal for the development process of the
concluding stretch of Automating Quantifying the Commons, a project initiative
for the 2024 Google Summer of Code program. Please visit &lt;strong&gt;&lt;a href="https://opensource.creativecommons.org/blog/entries/2024-07-10-automating-quantifying/"&gt;Part 1&lt;/a&gt;&lt;/strong&gt; for more context
if you haven't already done so.&lt;/p&gt;
&lt;p&gt;At the point of the midterm evaluation, I successfully completed Phases 1, 2, and 3
(&lt;code&gt;fetch&lt;/code&gt;, &lt;code&gt;process&lt;/code&gt;, and &lt;code&gt;report&lt;/code&gt;) of the Google Custom Search (GCS) data source, with a working report &lt;code&gt;README&lt;/code&gt; generation
for each quarter. My documented goal for the second half of the period was to complete a baseline automation
software for these processes across all data sources.&lt;/p&gt;
&lt;h2 id="development-process"&gt;Development Process&lt;/h2&gt;&lt;hr&gt;
&lt;h3 id="i-midpoint-reassessment"&gt;I. Midpoint Reassessment&lt;/h3&gt;&lt;p&gt;If you read my previous post, you might have seen that my next steps involved completing the phases for the remaining data sources. 
However, I soon realized that the GCS phases, along with the base analysis and visualization code from the Data Discovery Program, 
already serve as a standard reference for these tasks. Given that the primary goal of this project is to develop automation software
for these phases, my mentor suggested shifting the focus of the final time period towards programming the Git functions for automation. 
This approach, which will require more time and effort, will ensure that anyone working on the remaining data sources can easily integrate 
them using the existing code as a reference.&lt;/p&gt;
&lt;h3 id="ii-github-actions-development"&gt;II. GitHub Actions Development&lt;/h3&gt;&lt;p&gt;We defined GitHub Actions to host our CI/CD workflows, and since I had never used YAML before, 
I needed to learn and familiarize myself with this new technology. Learning YAML presented challenges, 
particularly in developing the Git automation. My mentor emphasized focusing on the Git programming due to these challenges. 
For example, I encountered errors during workflow runs without clear ways to debug them.&lt;/p&gt;
&lt;p&gt;In my previous post, I shared three strategies that helped me familiarize myself with new technology during the first half of the summer. Here, I’m sharing two additional strategies that were particularly useful for GitHub Actions programming:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub Actions Extension for Visual Studio Code:&lt;/strong&gt; As I was using VSCode for development, I initially struggled to debug issues during workflow runs. Discovering the GitHub Actions Extension for VSCode was a game-changer. This extension highlights issues in the workflow, making it much easier to diagnose and fix problems. I highly recommend searching extensions for any development task, as having relevant tools can make programming much easier.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Creating Mini-Tasks for Experimentation:&lt;/strong&gt; I set up my own GitHub repository with minimal, functional code to experiment with GitHub Actions in a low-risk environment. This approach facilitated easier debugging and comparison, helping me understand why certain things weren’t working. Although I gained more repository privileges after being accepted for GSoC, I still didn’t have the same access level as my mentor. By using a separate repository, I gained a better understanding of GitHub Actions and was able to interpret error logs more effectively. For instance, I realized that the automation wasn’t working initially due to outdated repository secrets, which I discovered without access to the secrets.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After successfully compiling the initial steps, I focused on refining the scripts for optimal performance. I moved the commit functions into a shared module, which reduced the risk of crashes by allowing functions to be called within individual scripts rather than directly in the YAML workflow. Once the workflows ran successfully, I implemented Cron functions to schedule them quarterly.&lt;/p&gt;
&lt;h3 id="iii-engineering-a-custom-error-handling-and-exception-system"&gt;III. Engineering a Custom Error Handling and Exception System&lt;/h3&gt;&lt;p&gt;A key innovation in this project was the creation of a custom &lt;code&gt;QuantifyingException&lt;/code&gt; class tailored specifically for the unique needs of the data pipeline. 
Unlike generic exceptions, this specialized exception class was designed to capture and handle errors that are particular to the Quantifying process, such as data inconsistencies, 
API rate limits, and file handling errors. By centralizing these exceptions within QuantifyingException, I ensured that all three phases could consistently manage errors in a coherent and structured manner.
While testing this system across all phases, I made sure to purposely include "edge-case" errors upon commits to guarantee that the system could handle all these errors.&lt;/p&gt;
&lt;p&gt;Upon completion of a robust error and exception handling system, I completed all phase outlines of the remainder of the data sources. For fetching data from these sources, I have developed
codebases combining the GCS fetch system and the original Data Discovery API fetching for a complete fetching system. However, it should be noted that I have not actually fetched data from these 
APIs using the new codebase, as Timid Robot will undertake an initiative to add GitHub bots for the API keys after the GSoC period — this is due to best practice purposes, as it is fundamental
to create dedicated accounts for clear API usage and automated git commits. Therefore, these fetch files may need to be slightly tweaked after that, which will be discussed in &lt;strong&gt;Next Steps&lt;/strong&gt;. 
However, I have made sure to utilize fake data to ensure that the third phase successfully generates reports within the respective README file for ALL data sources.&lt;/p&gt;
&lt;h3 id="iv-finalized-flow-of-system-data"&gt;IV. Finalized Flow of System + Data&lt;/h3&gt;&lt;p&gt;In Part 1, I had shared the initial data flow diagram (DFD) for arranging the codebase. By the end of the program, however, the DFD and the overall system had solidified into something different. 
Below is the final diagram for data flow, which establish an official framework for future endeavors.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/2024-08-22-automating-quantifying/Final DFD.png" alt="DFD"&gt;&lt;/p&gt;
&lt;h2 id="final-conclusions"&gt;Final Conclusions&lt;/h2&gt;&lt;hr&gt;
&lt;h3 id="i-all-deliverables-completed-over-the-course-of-the-program"&gt;I. All Deliverables Completed Over the Course of the Program&lt;/h3&gt;&lt;p&gt;Although this 12-week period allowed significant expansion of the Quantifying codebase, there were still time and resource constraints that we had to consider; primarily, the lack 
of data we could collect using the given APIs over this time period. However, as mentioned earlier, given strategic implementations, I was able to still complete the summer goal of developing a baseline
automation software for data gathering, flow, and report generation, ensuring script runs on a quarterly basis. The &lt;strong&gt;Next Steps&lt;/strong&gt; section will elaborate on how this software will be solidified over
the upcoming quarters and years.&lt;/p&gt;
&lt;p&gt;130+ commits, 7,615+ net code additions, and 360+ hours of work later, I present ten pivotal deliverables that I have completed over the summer period:&lt;/p&gt;
&lt;table class="table table-striped"&gt;
&lt;thead class="thead-dark"&gt;&lt;tr&gt;
&lt;th&gt;Deliverable&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Phase 1: Fetch Data&lt;/td&gt;
&lt;td&gt;Building on previous efforts in the Quantifying initiative, this phase efficiently fetches raw data from various data sources using APIs. The retrieved data is then stored in a structured CSV format, preparing it for processing and analysis.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 2: Process Data (Outline)&lt;/td&gt;
&lt;td&gt;This phase focuses on analyzing the fetched data between quarters. Since only &lt;code&gt;2024Q3&lt;/code&gt; data (07/01/2024 - 09/30/2024) could comprehensively be generated during the summer period, a psuedocode outline of analysis was developed. Although this phase will be further solidified as more quarters and years pass by, a base error system was tested and implemented during the GSoC period to ensure thoroughness for this phase.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phase 3: Generate Reports&lt;/td&gt;
&lt;td&gt;The final phase successfuly creates visualizations and reports based on the generated datasets. These reports are designed to present key findings and trends in a clear, concise manner, and have been designed to automatically be integrated into a quarterly README file to provide a comprehensive overview of license data across data sources.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared Module&lt;/td&gt;
&lt;td&gt;Created a singular, shared module to organize and streamline the codebase, allowing different directories, paths, and components to be imported through that module across different files.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Directory Sequence (OS)&lt;/td&gt;
&lt;td&gt;Using Operating System (OS) Modules, the codebase effectively facilitates the interaction between all three phases, ensuring smooth communication of 10 different data sources with their respective data storages.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automation using GitHub Actions CI/CD&lt;/td&gt;
&lt;td&gt;All three phases of the project — data fetching, processing, and reporting — have been automated using YAML scripts in GitHub Actions. This CI/CD pipeline ensures that every update to the codebase triggers the entire workflow, from data retrieval to the generation of final reports, maintaining consistency and reliability across the process. Cron functions are used to ensure that these scripts are run every quarter in a timely manner.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom Error &amp;amp; Exception Handling System&lt;/td&gt;
&lt;td&gt;Implemented a custom exception system that centralizes the error-handling logic, keeping the codebases more specific, maintainable, and consistent overall. This system has been thoroughly tested and verified across all three phases.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project Directory Tree&lt;/td&gt;
&lt;td&gt;Added a structured layout of the project (hierarchical representation of directories and files with descriptive comments), which provides developers with a clear understanding of the project's organization and help them navigate through different components easily.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Flow + System Design&lt;/td&gt;
&lt;td&gt;Finalized an overall data flow and system design diagram to establish an official framework for the codebase.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comprehensive Documentation&lt;/td&gt;
&lt;td&gt;This document was developed to serve as a reference guide for any contributors having questions or needing detailed clarification on specific topics within the Quantifying codebase — each section has its own page with expanded information. It also includes external references and documentation regarding the languages and tools used for this project.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="ii-acknowledgements-impact-next-steps"&gt;II. Acknowledgements, Impact, Next Steps&lt;/h3&gt;&lt;p&gt;This project would not have been possible without the constant guidance and insights of my mentors: &lt;strong&gt;&lt;a href="https://opensource.creativecommons.org/blog/authors/TimidRobot/"&gt;Timid Robot Zehta&lt;/a&gt;&lt;/strong&gt; (lead), &lt;strong&gt;&lt;a href="https://opensource.creativecommons.org/blog/authors/shafiya/"&gt;Shafiya Heena&lt;/a&gt;&lt;/strong&gt; (supporting), and &lt;strong&gt;&lt;a href="https://opensource.creativecommons.org/blog/authors/sara/"&gt;Sara Lovell&lt;/a&gt;&lt;/strong&gt; (supporting).
I appreciate how they created a safe space for working since the very beginning. I've never felt hesitant to ask questions
and have never felt out-of-place working in the organization, despite my introductory-level skillset at the start. In fact, this allowed
me to feel open to ask questions and be able to undertake side-projects that facilitated my growth. I truly believe that being able to work in an environment like this 
has played a large role in my ability to perform well, and this was the sole reason for the overall fast progress and depth of my deliverables.&lt;/p&gt;
&lt;p&gt;As for overall impact, it is very evident that Creative Commons is integral to facilitating the sharing and utilization of creative works worldwide. With over 2.5 billion
licenses globally, Creative Commons and its open-source inititives hold heavy impact, promising to empower researchers, policymakers, and stakeholders with up-to-date insights into the global
usage patterns of open doman and CC-licensed content. Therefore, I'm looking forward to witnessing the direct influence this project holds in paving the way for future advancements
in leveraging open content licenses globally. I am extremely grateful and honored to be able to play such a major role in contributing to this organization, and am excited to see
future contributions I facilitate alongside other CC open-source developers.&lt;/p&gt;
&lt;p&gt;As for next steps, I am opening several post-GSoC issues in the Quantifying repository that can be worked on by any open-source contributor. 
These issues cover some of the necessary adjustments that need to be made once we cross certain time periods and codebase additions. 
If you're interested in getting involved, please visit the &lt;strong&gt;&lt;a href="https://github.com/creativecommons/quantifying/issues"&gt;Issues&lt;/a&gt;&lt;/strong&gt; page linked for your convenience. 
Your contributions will be invaluable as we continue to enhance and expand this project, 
and I’m eager to see the innovative solutions and improvements that will unfold these upcoming years!&lt;/p&gt;
&lt;h2 id="additional-readings"&gt;Additional Readings&lt;/h2&gt;&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://opensource.creativecommons.org/blog/entries/2024-07-10-automating-quantifying/"&gt;Automating Quantifying the Commons: Part 1&lt;/a&gt; | Author: Naisha Sinha | Jul. 2024&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opensource.creativecommons.org/blog/entries/2022-12-07-berkeley-quantifying/"&gt;Data Science Discovery: Quantifying the Commons&lt;/a&gt; | Author: Dun-Ming Huang (Brandon Huang) | Dec. 2022&lt;/li&gt;
&lt;/ul&gt;
</content></entry><entry><title>Continuing Open Collaboration: GSoC 2024 With Creative Commons</title><link href="http://opensource.creativecommons.org/blog/entries/continuing-open-collaboration-gsoc-2024-with-creative-commons/" rel="alternate"></link><updated>2024-08-22T00:00:00Z</updated><author><name>['Murdock9803']</name></author><id>urn:uuid:8e16286a-0387-37a6-bef4-9fe5162ea135</id><content type="html">&lt;p&gt;As I reach the final phase of my work on the Creative Commons Resource Archive under this program, I’ve been thinking about how far we’ve come since the beginning. We started with the idea of modernizing the &lt;a href="https://resources.creativecommons.org/"&gt;resource archive website&lt;/a&gt;, and we have built features that make it safer and more accessible. This journey has had its challenges, but it’s also been very rewarding. 
In the first post, we discussed the early steps that set up this project. Now, as we are in the final weeks, I’m excited to share the progress we’ve made to turn the Resource Archive into a valuable tool for the community.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/continuing-open-collaboration-gsoc-2024-with-creative-commons/GSoC+CC-banner.png" alt="GSoC and CC banner"&gt;&lt;/p&gt;
&lt;p&gt;Join me as I take you through this complete journey of covering new features, the hurdles we’ve overcome and the final product that I hope will continue to grow and serve the open knowledge community.&lt;/p&gt;
&lt;h2 id="transitioning-from-midterm"&gt;Transitioning From Midterm&lt;/h2&gt;&lt;p&gt;After completing the Midterm milestone, the focus was to utilize the learnings and experience I got across the past weeks to complete the tasks mentioned in the timeline in the least time possible. My mentor suggested not to hurry up things too much, but just to work at a comfortable higher pace. This way, we can have some room for the implementation of stretch goals for the project. The midterm review provided valuable feedback, which helped guide the next steps. With a solid foundation in place, it was time to tackle the more complex challenges and polish the user experience. From enhancing the search functionality, filtering experience to improving the accessibility, the post midterm work aimed to ensure the resource archive becomes a great tool to serve the community.&lt;/p&gt;
&lt;h2 id="completing-the-timeline-tasks-weeks-7-8-and-9"&gt;Completing The Timeline Tasks - Weeks 7, 8 and 9&lt;/h2&gt;&lt;p&gt;These three weeks were planned with the aim of completing the timeline tasks to focus on further goals of the projects. The &lt;code&gt;UI-related&lt;/code&gt; tasks included Submission page, A guide for the newcomers to submit resources, and working on the filters.&lt;/p&gt;
&lt;h3 id="submission-page-ui-changes"&gt;Submission Page UI Changes&lt;/h3&gt;&lt;p&gt;The &lt;code&gt;submission.html&lt;/code&gt; page is responsible for the area where contributors come to contribute to the resource archive with their resources about Creative Commons or Open sharing of knowledge in general. 
With the help of the &lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/315"&gt;PR#315&lt;/a&gt;, The following tasks were performed :&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;adds a new context to the element, named &lt;code&gt;submit-page&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;refactors the whole html code for &lt;code&gt;submission.html&lt;/code&gt; with semantic code.&lt;/li&gt;
&lt;li&gt;makes the page responsive, by adding &lt;code&gt;media queries&lt;/code&gt; wherever needed.&lt;/li&gt;
&lt;li&gt;adds a step by step written guide including images, for submission of resource for new github users.&lt;/li&gt;
&lt;li&gt;ensures the page meets current CC aesthetics.&lt;/li&gt;
&lt;li&gt;adds documentation for better understanding and maintainability.&lt;/li&gt;
&lt;li&gt;formats the files with &lt;code&gt;prettier code formatter&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="submission-guide-for-newcomers"&gt;Submission guide For Newcomers&lt;/h3&gt;&lt;p&gt;For people not familiar with GitHub or Opening &lt;code&gt;Pull Requests&lt;/code&gt; on GitHub, a comprehensive guide was added with step by step process to submit a resource to the resource archive. The guide included instructions to fork, clone the repository and also Open the PR by committing to the repository. This was accompanied with well-labeled images for better understandability. This work was achieved with the help of &lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/315"&gt;PR#315&lt;/a&gt;. This was the final addition to the &lt;code&gt;submission.html&lt;/code&gt; page and the whole tasks related to this page were completed by this.&lt;/p&gt;
&lt;h3 id="filters-placement-and-functioning"&gt;Filters Placement and Functioning&lt;/h3&gt;&lt;p&gt;The resource archive utilizes filters to select similar resources. These filters are grouped in the form of three categories, &lt;code&gt;TOPIC&lt;/code&gt;, &lt;code&gt;MEDIUM&lt;/code&gt; and &lt;code&gt;LANGUAGE&lt;/code&gt;. Each category has some filter options to choose from. In the previous iteration of the resource archive site, the category filters were placed in the middle of the page, and they also did not have any icon for the indication of a selected filter. 
The &lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/316"&gt;PR#316&lt;/a&gt; makes the filters responsiveness, and more easy to access by performing the following tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;adds semantic html to &lt;code&gt;index.html&lt;/code&gt; and &lt;code&gt;all.html&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;removes the filters from &lt;code&gt;index.html&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;removes inline styles from html pages.&lt;/li&gt;
&lt;li&gt;adds a new context to &lt;code&gt;listing.html&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;works on the &lt;code&gt;see-all-resources&lt;/code&gt; link.&lt;/li&gt;
&lt;li&gt;adds checkbox in place of &lt;code&gt;[x]&lt;/code&gt; in category filters.&lt;/li&gt;
&lt;li&gt;makes the filters as a sidebar.&lt;/li&gt;
&lt;li&gt;makes the whole category filters responsive.&lt;/li&gt;
&lt;li&gt;Re-works on the &lt;code&gt;media-query&lt;/code&gt; breakpoints.&lt;/li&gt;
&lt;li&gt;formats the code with &lt;code&gt;prettier code formatter&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;adds documentation for the &lt;code&gt;index.html&lt;/code&gt;, &lt;code&gt;all.html&lt;/code&gt; and &lt;code&gt;style.css&lt;/code&gt; files.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="stretch-goals-and-ideation-weeks-10-11-and-12"&gt;Stretch Goals And Ideation - Weeks 10, 11 and 12&lt;/h2&gt;&lt;p&gt;After the completion of the tasks assigned in the timeline, we shifted the focus to the Stretch goals suggested in my proposal, and also by my mentor (&lt;a href="https://github.com/possumbilities"&gt;Sara&lt;/a&gt;). There were 3 major goals that were considered for a discussion. These were, Implementation of Search Functionality through &lt;a href="https://lunrjs.com/docs/index.html"&gt;Lunr.js&lt;/a&gt;, Improving accessibility through the inclusion of &lt;a href="https://www.w3.org/WAI/standards-guidelines/aria/"&gt;ARIA&lt;/a&gt; attributes to the elements in the website, and The use of &lt;a href="https://docs.github.com/en/rest?apiVersion=2022-11-28"&gt;GitHub API&lt;/a&gt; to automate the process of submission of resources to the site. Out of these three, the goals that were to be implemented were &lt;code&gt;ARIA&lt;/code&gt; accessibility and &lt;code&gt;LUNR.js&lt;/code&gt; search functionality. This was decided keeping in mind the scope of the project and the desired limit of complexity we wanted the site to have at this point.&lt;/p&gt;
&lt;h3 id="aria-accessibility-and-search-functionsality-lunr.js"&gt;Aria Accessibility and Search Functionsality - LUNR.js&lt;/h3&gt;&lt;p&gt;My mentor suggested that I read about &lt;code&gt;WAI-ARIA&lt;/code&gt; accessibility to gain insights about the process and better select the number of attributes and features we need to implement. I studied about this thoroughly and also watched some videos on the topic. I realized about the importance of having these features in our site and how greatly they impact the accessibility for different users. 
Apart from this, we wanted to implement a search feature for the resources with the help of a lightweight searching library like &lt;code&gt;LUNR.js&lt;/code&gt;. I started to read about the library and its execution process from various sources, majorly from the &lt;a href="https://lunrjs.com/docs/index.html"&gt;LUNR Documentation&lt;/a&gt; present in their site. After reading it and a bit of panning, I started to code for this feature. The goal was to keep the UI similar to the CC Search feature present in the header of many Creative Commons sites.&lt;/p&gt;
&lt;h2 id="issues-solved-from-midterm-to-end"&gt;Issues Solved From Midterm To End&lt;/h2&gt;&lt;p&gt;The issues relevant to the project which have been solved in the period between midterm and final week are listed below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/52"&gt;#52&lt;/a&gt; - use checkbox for &lt;code&gt;resourcenavtopicknown&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/61"&gt;#61&lt;/a&gt; - change UI of the homepage&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/72"&gt;#72&lt;/a&gt; - UI refinement of the website&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/119"&gt;#119&lt;/a&gt; - adding icon on filter text&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/274"&gt;#274&lt;/a&gt; - Improve the UI of submission page with &lt;code&gt;vocabulary&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/306"&gt;#306&lt;/a&gt; - Language List Columns Collapsing on Website &lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/310"&gt;#310&lt;/a&gt; - Add a step by step guide for submitting resources&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/311"&gt;#311&lt;/a&gt; - html markup contains inline styles&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/313"&gt;#313&lt;/a&gt; - improve mobile view layout for filter columns&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/314"&gt;#314&lt;/a&gt; - remove extra white space between main content and footer&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/318"&gt;#318&lt;/a&gt; - Links in submit page open on same page and spelling mistake-comapre&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="future-plans-and-execution"&gt;Future Plans And Execution&lt;/h2&gt;&lt;p&gt;The future plans include the execution of the Stretch Goals we discussed in the later weeks of the project. Firstly, I will be working on the &lt;code&gt;LUNR.js&lt;/code&gt; search functionality and will be trying to complete this in the 13th week, and also the contributions on the &lt;code&gt;ARIA&lt;/code&gt; accessibility are welcome on GitHub. Also, apart from these fixed goals, I aim to remain engaged with the community for the years ahead. I have planned to keep contributing to the organization - &lt;a href="https://github.com/creativecommons"&gt;Creative Commons&lt;/a&gt;, and especially the repository - &lt;a href="https://github.com/creativecommons/cc-resource-archive"&gt;CC-Resource-Archive&lt;/a&gt; for as long as I can. I will be contributing in the form of issues, pull requests and also code reviews. This is the first organization I got to connect professionally with, and I aim to continue to be a part of this mission of Sharing of Open Knowledge.&lt;/p&gt;
&lt;h2 id="personal-growth-and-thoughts-on-completion"&gt;Personal Growth And Thoughts On Completion&lt;/h2&gt;&lt;p&gt;As I mentioned in my previous blog post, Being a part of Google Summer of Code was a very big deal for me. With an organization like Creative Commons, this could not have been better. I did not only know about the organization before all this GSoC 2024 preparation, but also resonated with the idea and mission behind it, and was grateful for the learning opportunities it creates for people across the world. Being at the final week of this program makes me emotional as I had a really good time with the mentors and the project. In the weekly meetings we conducted, the amount of motivation I got every single time talking with my mentor was unmatched. I was a complete newbie in terms of professional work experience, and this is the way everything should have been. I am a better individual at this point, and far more experienced. I think I will be able to guide my juniors in a better way from now onwards.
This program also instilled in me a newfound confidence that is needed to take on tough tasks. I believe the program is less about coding skills and more about the self-improvement journey one has with their mentors and fellow contributors. One more thing which caught my interest is reviewing the Pull Requests of other contributors. My mentor suggested that I try reviewing pull requests of new contributors for experience. “This is a great learning opportunity for you” as they said it. Exactly as I was told, this was indeed a great and enjoyable opportunity.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;&lt;h3 id="gratitude-and-acknowledgements"&gt;Gratitude And Acknowledgements&lt;/h3&gt;&lt;p&gt;&lt;a href="https://github.com/possumbilities"&gt;Sara Lovell&lt;/a&gt;, &lt;a href="https://github.com/TimidRobot"&gt;Timid Robot Zehta&lt;/a&gt; and &lt;a href="https://github.com/Shafiya-Heena"&gt;Shafiya Heena&lt;/a&gt; were the best mentors I could have asked for this project program. My primary mentor, Sara, always encouraged me to be on track whenever I lagged behind in work. Considering this was a straightforward project, I did not predict the learning opportunities to be this much in number. I will always be grateful for these 12 weeks of support, learning and growth. Thank you !&lt;/p&gt;
&lt;h3 id="for-future-contributors"&gt;For Future Contributors&lt;/h3&gt;&lt;p&gt;All the contributors that are hoping to contribute to this repository, or this organization, I welcome you with all my heart. If you are targeting to get selected in an Open Source mentorship program, that is a great idea. But do not just contribute to Open Source for the sake of resume building and stipend. I agree they are very good benefits, but Open Source is simply much more than this. Once you contribute to Open Source, you will fall in love with the support you get from this community, just like I did. You can find me on &lt;a href="https://github.com/"&gt;GitHub&lt;/a&gt; and on the &lt;a href="https://github.com/creativecommons/cc-resource-archive"&gt;CC-Resource-Archive&lt;/a&gt; repository mainly. Let’s have the conversation there!&lt;/p&gt;
</content></entry><entry><title>Local Environment Creation using Ansible and Docker: Part 1</title><link href="http://opensource.creativecommons.org/blog/entries/2024-07-19-create-local-ansible-dev-env/" rel="alternate"></link><updated>2024-07-18T00:00:00Z</updated><author><name>['amandayclee']</name></author><id>urn:uuid:d8f7642b-d733-3011-981a-9aadff749fef</id><content type="html">&lt;p&gt;This project explores how Creative Commons
(CC) uses Ansible, an automated system administration tool, to build a local development environment. It is part of Google Summer of Code (GSoC) 2024.&lt;/p&gt;
&lt;div style="text-align: center;"&gt;
    &lt;img src="gsoc-banner.png" alt="GSoC 2024" style="max-height: 200px;"&gt;
&lt;/div&gt;&lt;h1 id="project-objective"&gt;Project Objective&lt;/h1&gt;&lt;h2 id="project-background"&gt;Project Background&lt;/h2&gt;&lt;p&gt;&lt;a href="https://github.com/creativecommons/ansible-dev"&gt;This project&lt;/a&gt; aims to establish a local development environment that closely mirrors our production setup at CC. Currently, CC uses Salt Stack for configuration management. However, the team is evaluating other tools for various reasons. In this project, we explored Ansible, renowned for its simplicity and robust automation capabilities. We combined Ansible with Docker containers to streamline and secure development processes, creating lightweight, isolated environments for running applications.&lt;/p&gt;
&lt;h2 id="challenges-and-learning-opportunities"&gt;Challenges and Learning Opportunities&lt;/h2&gt;&lt;p&gt;Before this project, I didn't have exposure to professional DevOps practices, so this project has been a significant learning experience for me. It focuses on the deployment phase of the DevOps lifecycle, particularly provisioning (setting up servers) and configuration management (managing software and settings). During our early stage exploration, we performed manual provisioning and concentrated on utilizing Ansible for configuration management. Our primary goal is to containerize existing applications, packaging them with their dependencies into Docker containers. Ansible itself operates within a container and manages other containers via SSH.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/2024-07-19-create-local-ansible-dev-env/server-structure.png" alt="Server Structure"&gt;
&lt;em&gt;This architecture diagram is designed by my mentor and project lead &lt;a href="https://opensource.creativecommons.org/blog/authors/shafiya/"&gt;Shafiya&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h3 id="week-by-week-progress"&gt;Week-by-Week Progress&lt;/h3&gt;&lt;p&gt;I began by following the &lt;a href="https://docs.docker.com/guides/getting-started/"&gt;Docker&lt;/a&gt; and &lt;a href="https://docs.ansible.com/ansible/latest/getting_started/index.html"&gt;Ansible&lt;/a&gt; setup guides from the official documentation to successfully deploy an initial &lt;code&gt;ansible&lt;/code&gt; container in &lt;a href="https://github.com/creativecommons/ansible-dev/pull/9"&gt;Creating Initial Structure for Ansible&lt;/a&gt;. This step was crucial for gaining a foundational understanding of Ansible's basic functionality and setup within a containerized environment.&lt;/p&gt;
&lt;p&gt;In the second week, I separated the existing &lt;a href="https://github.com/creativecommons/index-dev-env"&gt;&lt;code&gt;index-dev&lt;/code&gt;&lt;/a&gt; repository, which is the local development environment for current CreativeCommons.org, into individual containers for the &lt;code&gt;web&lt;/code&gt; server and &lt;code&gt;database&lt;/code&gt; server in &lt;a href="https://github.com/creativecommons/ansible-dev/pull/11"&gt;Setting Up Ansible Environment and Hosts&lt;/a&gt;. At the same time, I started investigating the setup of a &lt;a href="https://ovh.github.io/the-bastion/index.html"&gt;Bastion server&lt;/a&gt; and its integration into our system, aiming to enforce a security-focused approach for controlling access to a private network.&lt;/p&gt;
&lt;p&gt;In the third week, I established SSH access between the local machine and &lt;code&gt;web&lt;/code&gt;, &lt;code&gt;database&lt;/code&gt;, and &lt;code&gt;ansible&lt;/code&gt; servers with my mentor Shafiya's guidance in &lt;a href="https://github.com/creativecommons/ansible-dev/pull/12"&gt;Setting Up SSH For &lt;code&gt;web&lt;/code&gt; and &lt;code&gt;database&lt;/code&gt; and Integrate with &lt;code&gt;ansible&lt;/code&gt;&lt;/a&gt;. This step was crucial for enabling secure, automated management of the containers from the Ansible container. One important lesson I learned from Shafiya is to build things from scratch, making frequent commits that document your thought process, rather than trying to put everything together at once and complicating matters.&lt;/p&gt;
&lt;p&gt;In the fourth week, I started writing Ansible playbooks and moved several configurations originally located in the &lt;code&gt;web&lt;/code&gt; Dockerfile to the playbook. Combining Dockerfiles and Ansible playbooks is a common best practice: &lt;strong&gt;Dockerfiles are responsible for building the base image, including the OS and basic tools, while Ansible playbooks handle the application and service configurations.&lt;/strong&gt; However, this part took longer than expected, so we had to extend the work for one more week. Looking back, it was likely because I had no previous experience in developing using LAMP (Linux, Apache, MySQL, PHP) stack, and didn't know how to properly configure each component, which prevented me from successfully launching the services. As a result, I had to review the &lt;code&gt;index-dev&lt;/code&gt; repo and what Shafiya and I did in the previous week, and finally got the service to start up correctly in &lt;a href="https://github.com/creativecommons/ansible-dev/pull/13"&gt;Creating A Playbook to Configure Wordpress Over Apache2&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="communication-and-collaboration-in-open-source"&gt;Communication and Collaboration in Open Source&lt;/h3&gt;&lt;p&gt;The CC team, including mentor Shafiya and team members &lt;a href="https://opensource.creativecommons.org/blog/authors/TimidRobot/"&gt;Timid Robot&lt;/a&gt; and &lt;a href="https://opensource.creativecommons.org/blog/authors/sara/"&gt;Sara&lt;/a&gt;, provided valuable insights into system design and broader architectural considerations. Weekly sync meetings and the flexibility to schedule 1:1 sessions facilitated smooth progress. The team provided clear documentation and actively engaged in public Slack channels, making it easy for any contributor to get involved and stay informed.&lt;/p&gt;
&lt;h2 id="conclusion-and-next-steps"&gt;Conclusion and Next Steps&lt;/h2&gt;&lt;p&gt;Moving forward, the focus will be on refining the Ansible playbooks, addressing any bugs or issues, and working on security and scalability concerns. The goal is to deliver a robust and efficient local development environment that closely mirrors the production setup. I'll continue contributing to the community and providing detailed documentation to support future developers in this project.&lt;/p&gt;
</content></entry><entry><title>Empowering Open Knowledge: GSoC 2024 With Creative Commons</title><link href="http://opensource.creativecommons.org/blog/entries/empowering-open-knowledge-gsoc-2024-with-creative-commons/" rel="alternate"></link><updated>2024-07-10T00:00:00Z</updated><author><name>['Murdock9803']</name></author><id>urn:uuid:ff375fe1-1d83-3c9f-b782-2b6166675041</id><content type="html">&lt;p&gt;Hello everyone! My name is Ayush Sahu, and I am thrilled to announce that I have joined Creative Commons this summer through the &lt;a href="https://summerofcode.withgoogle.com/"&gt;Google Summer of Code (2024)&lt;/a&gt; program. As a passionate advocate for open knowledge and a firm believer in the power of collaborative innovation, I am grateful to contribute to an organization that has been championing the free exchange of information and creativity for years.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/empowering-open-knowledge-gsoc-2024-with-creative-commons/GSoC+CC-banner.png" alt="GSoC and CC banner"&gt;&lt;/p&gt;
&lt;p&gt;I was inspired to collaborate with Creative Commons because of the profound impact it had on me personally. As a child creating videos for my YouTube channel, I was immensely grateful for the resources provided by Creative Commons. Their promotion of free and open knowledge enabled me to access high-quality content without the constraints of traditional licensing, fostering my creativity and passion for sharing information. This experience instilled in me a deep appreciation for the organization and its mission.&lt;/p&gt;
&lt;h2 id="the-project-i-am-working-on"&gt;The Project I Am Working On&lt;/h2&gt;&lt;p&gt;The project - &lt;strong&gt;Modernize CC Resource Archive&lt;/strong&gt; focuses on implementing a comprehensive  visual overhaul to the &lt;a href="https://resources.creativecommons.org/"&gt;Resource Archive&lt;/a&gt; to align with current Creative Commons aesthetics and functionality standards. Utilizing the &lt;a href="https://github.com/creativecommons/vocabulary"&gt;Internal Design System (Vocabulary)&lt;/a&gt;, the project aims to upgrade the visual design, implement semantic, accessible, and standards-compliant &lt;code&gt;HTML&lt;/code&gt;, &lt;code&gt;CSS&lt;/code&gt;, and &lt;code&gt;JavaScript&lt;/code&gt;, and improve user experience (UX) for resource submission while ensuring site stability on &lt;code&gt;GitHub Pages&lt;/code&gt;. Through these efforts and solid documentation, the revamped Resources Archive will meet modern standards, enhance usability, and facilitate maintainability for both users and developers.&lt;/p&gt;
&lt;h2 id="community-bonding-period"&gt;Community Bonding Period&lt;/h2&gt;&lt;p&gt;The community bonding period has been an incredibly enriching experience. During this time, I had the opportunity to meet my mentors, familiarize myself with the project, and engage with the vibrant community behind Creative Commons. I participated in meetings and discussions, which have deepened my understanding of the organization's values as well as the codebase. The warm welcome and the wealth of knowledge shared by the community have been truly inspiring.&lt;/p&gt;
&lt;h2 id="environment-code-ideation-weeks-1-2-3"&gt;Environment, Code &amp;amp; Ideation - Weeks 1, 2 &amp;amp; 3&lt;/h2&gt;&lt;p&gt;These were the initial weeks of the project which went into testing the development environment, planning upcoming UI changes and getting acquainted with the working process.  Firstly, my project mentor &lt;a href="https://opensource.creativecommons.org/blog/authors/sara/"&gt;Sara&lt;/a&gt; guided me through my first contribution in the coding period. I was granted member status at the Creative Commons Organization on GitHub, which was both new and exciting for me.&lt;/p&gt;
&lt;p&gt;The key achievements were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/266"&gt;PR#266&lt;/a&gt; - Updated the &lt;code&gt;docker-compose.yml&lt;/code&gt; file to the current spec. With the help of my mentor, I opened this first pull request in the coding period. The file was out of specification, as the &lt;code&gt;version&lt;/code&gt; element at the top of &lt;code&gt;docker-compose.yml&lt;/code&gt; file was just informative.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/279"&gt;Testing Docker Configurations&lt;/a&gt;: With immense help from mentors Sara and Timid Robot, I got the Docker environment ready for development.&lt;/li&gt;
&lt;li&gt;Learned more about &lt;a href="https://jekyllrb.com/docs/"&gt;Jekyll&lt;/a&gt; and read the &lt;a href="https://github.com/creativecommons/vocabulary"&gt;Vocabulary&lt;/a&gt; code. Got familiar with classes in vocabulary.css and the custom CSS variables in library-vars.css. &lt;/li&gt;
&lt;li&gt;Accessibility Improvements: Learned about keyboard navigability and optimizing the website for better accessibility using semantic HTML and appropriate CSS properties.&lt;/li&gt;
&lt;li&gt;Issue Listing: Identified and listed relevant issues related to semantic code and UI changes. Also added some issues as a to-do list, as suggested by my mentor.&lt;/li&gt;
&lt;li&gt;Reviewed the present structure of the files, and worked on ideas to improve the structure for better understandability and grouping of similar files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the end of these initial weeks, I realized that I have spent enough time on understanding the code and planning on things. But as my mentor suggested, things will get smoother once we start working on them practically. So I decided to increase the pace of work in the upcoming weeks to get the planned tasks executed.&lt;/p&gt;
&lt;h2 id="execution-updation-refactor-weeks-4-5-6"&gt;Execution, Updation &amp;amp; Refactor - Weeks 4, 5 &amp;amp; 6&lt;/h2&gt;&lt;p&gt;As the midterm evaluation approached, we held weekly review meetings to plan changes and contributions. The pace picked up in weeks 5 and 6, resulting in several presentable pull requests dedicated to UI changes and code refactoring. Notable tasks executed include:&lt;/p&gt;
&lt;h3 id="improving-file-structure"&gt;Improving File Structure&lt;/h3&gt;&lt;p&gt;With discussion with my mentor, I improved the file structure in the codebase for better understandability and maintainability. After updating the structure, the paths to all the files that were changed were modified.  This was achieved by a group of Pull Requests that were created catering to the issue. The pull requests were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/280"&gt;PR#280&lt;/a&gt; - Adds &lt;code&gt;footer.html&lt;/code&gt; to the &lt;code&gt;_includes&lt;/code&gt; directory.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/281"&gt;PR#281&lt;/a&gt; - Includes &lt;code&gt;footer.html&lt;/code&gt; to all the pages of the site.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/282"&gt;PR#282&lt;/a&gt; - Shifted &lt;code&gt;an-explanation-of-creative-commons&lt;/code&gt; to &lt;code&gt;_resources&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/289"&gt;PR#289&lt;/a&gt; - Improves the file structure in the codebase.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/292"&gt;PR#292&lt;/a&gt; - Updates the paths to downloadable resources and PDFs.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/296"&gt;PR#296&lt;/a&gt; - Updated the &lt;code&gt;resource-template&lt;/code&gt; with new paths for images.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="listing-page-ui-changes"&gt;Listing Page UI Changes&lt;/h3&gt;&lt;p&gt;The &lt;code&gt;listing.html&lt;/code&gt; page is responsible for the display of resource cards on the &lt;code&gt;index.html&lt;/code&gt; and &lt;code&gt;all.html&lt;/code&gt; pages. The resource cards had an outdated visual setup and needed to be aligned with the Internal Design System of Creative Commons known as Vocabulary. 
Through the &lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/298"&gt;PR#298&lt;/a&gt;, I performed the following tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Refactored the html structure of the resource card to &lt;code&gt;IMAGE - TITLE - BLURB&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Utilizing vocabulary, enhanced the style for &lt;code&gt;thumbnail list&lt;/code&gt; in &lt;code&gt;listing.html&lt;/code&gt;. Worked on the grid structure in &lt;code&gt;style.css&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Likewise, worked on enhancing style for the &lt;code&gt;thumbnail box&lt;/code&gt;, &lt;code&gt;thumbnail title&lt;/code&gt;, &lt;code&gt;thumbnail image&lt;/code&gt; and &lt;code&gt;thumbnail blurb&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Worked on Fonts, colors, background colors, etc. according to &lt;code&gt;vocabulary&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Assigned properties like &lt;code&gt;--underline-background-color&lt;/code&gt; from vocabulary into style.css.&lt;/li&gt;
&lt;li&gt;Formatted the &lt;code&gt;style.css&lt;/code&gt; and &lt;code&gt;listing.html&lt;/code&gt; files with &lt;code&gt;prettier&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Fixed the responsiveness of the resource cards.&lt;/li&gt;
&lt;li&gt;Added Documentation in &lt;code&gt;style.css&lt;/code&gt; for understandability.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And through the &lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/302"&gt;PR#302&lt;/a&gt;, the following tasks were completed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Added a heading to the page.&lt;/li&gt;
&lt;li&gt;Added clear documentation about various sections in the file.&lt;/li&gt;
&lt;li&gt;Formatted the code with &lt;code&gt;prettier&lt;/code&gt; code formatter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All these changes gave the website a new look, aligning more closely with the standard Creative Commons design schemes.&lt;/p&gt;
&lt;h3 id="listing-page-all-javascript-changes"&gt;Listing Page (All) Javascript Changes&lt;/h3&gt;&lt;p&gt;The javascript code in listing.html file resided in the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; section of the page. The code was outdated and was lacking the ES6 Javascript concepts. For example, there were uses of &lt;code&gt;var&lt;/code&gt; keyword, &lt;code&gt;document.write()&lt;/code&gt; method, etc.. The code was responsible for a number of tasks related to the display of resources. It extracted the user-selected categories from the URL and then returned them as variables. Also, the javascript code was responsible for displaying the resources which contained the selected categories. 
Through the &lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/300"&gt;PR#300&lt;/a&gt;, the following tasks were completed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Updated the functions and code to follow ES6 javascript concepts.&lt;/li&gt;
&lt;li&gt;Replaced the &lt;code&gt;document.write()&lt;/code&gt; method with &lt;code&gt;Document Object Manipulation&lt;/code&gt;. The &lt;code&gt;document.write()&lt;/code&gt; is old and not preferable.&lt;/li&gt;
&lt;li&gt;Utilized the javascript &lt;code&gt;DOM&lt;/code&gt; for all the tasks related to filtering of resources in &lt;code&gt;listing.js&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Assigned checks to the categories selected by the user (which are extracted through the URL), sanitizing the input and preventing any attacks on the website.&lt;/li&gt;
&lt;li&gt;Added proper documentation for all the functions and sections of code for better understandability.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="resource-page-ui-changes"&gt;Resource Page UI Changes&lt;/h3&gt;&lt;p&gt;The resource page is a &lt;code&gt;layout&lt;/code&gt; to show various resources that are submitted in the resource archive. This layout page accepts values from the &lt;code&gt;front matter&lt;/code&gt; of various resources. This page was overall enhanced in terms of alignment to Creative Commons’ Design System. 
The &lt;a href="https://github.com/creativecommons/cc-resource-archive/pull/304"&gt;PR#304&lt;/a&gt; performs the following tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Refactors the whole code for &lt;code&gt;resource.html&lt;/code&gt; page by implementing semantic HTML.&lt;/li&gt;
&lt;li&gt;Improves the styling of the page in &lt;code&gt;styles.css&lt;/code&gt;, by utilizing vocabulary.&lt;/li&gt;
&lt;li&gt;Makes the page responsive. &lt;/li&gt;
&lt;li&gt;The inspiration is drawn from the &lt;a href="https://vocabulary-docs.netlify.app/specimen/contexts/blog-post.html"&gt;Vocabulary Blog Post Page&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These three weeks saw the most significant work being merged, resulting in a refreshed interface. Despite a slow start, consistent effort and mentor support helped me catch up by the midterm evaluation, making these weeks a great learning experience.&lt;/p&gt;
&lt;h2 id="issues-solved-till-now"&gt;Issues Solved Till Now&lt;/h2&gt;&lt;p&gt;The issues relevant to the project which have been solved until now are listed below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/17"&gt;#17&lt;/a&gt; - upgrade JS code in listings.html&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/176"&gt;#176&lt;/a&gt; - make thumbnails responsive&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/242"&gt;#242&lt;/a&gt; - add footer in submission and resource pages&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/265"&gt;#265&lt;/a&gt; - The docker-compose.yml file currently out of spec&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/267"&gt;#267&lt;/a&gt; - Improve documentation for Dockerfiles&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/269"&gt;#269&lt;/a&gt; - relocate footer code to separate file ‘footer.html’ for reuse&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/273"&gt;#273&lt;/a&gt; - Resource file in the wrong location&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/276"&gt;#276&lt;/a&gt; - Inconsistent docker behavior - Parsing Gemfile&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/283"&gt;#283&lt;/a&gt; - Unnecessary google analytics function in listing.html&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/285"&gt;#285&lt;/a&gt; - Unnecessary google analytics function in resource and submission.html&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/287"&gt;#287&lt;/a&gt; - The file structure in the codebase can be improved. (re-structuring)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/288"&gt;#288&lt;/a&gt; - The style.css file is not properly organized and lacks documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/290"&gt;#290&lt;/a&gt; - The download [pdf] file links at bottom of resources aren't working&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/293"&gt;#293&lt;/a&gt; - The resourcetemplate.md needs to be updated.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/297"&gt;#297&lt;/a&gt; - Change the design of resource cards on homepage using vocabulary&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/165"&gt;#165&lt;/a&gt; - section heading for resource cards&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/301"&gt;#301&lt;/a&gt; - Add proper documentation to listing.html, refactor for structure&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/101"&gt;#101&lt;/a&gt; - Organize code with proper indentation&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/41"&gt;#41&lt;/a&gt; - UI/UX for resource page&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-resource-archive/issues/272"&gt;#272&lt;/a&gt; - unwanted underline in the resource page&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="my-experience-from-getting-selected-to-midterm"&gt;My Experience - From Getting Selected To Midterm&lt;/h2&gt;&lt;p&gt;Getting selected to Google Summer of Code was honestly a very big deal for me. When I joined my university for my undergraduate degree, our seniors introduced us to two guys who got selected in GSoC that year. They were given immense importance and respect by our seniors and also my batchmates. From that instance I got to know that being a GSoC Contributor is a very prestigious thing for someone hoping to start their career. Having zero background knowledge in programming, I thought it’s not something that I should aim for, and leave it to the guys who already are pro at coding. But the dream of being a GSoC contributor just stuck somewhere in my head. 
Fast forward to my third year at the university, I became good with frontend development as it was something that naturally excited me. After working hard for some months, I was finally selected for the GSoC 2024 program with Creative Commons.&lt;/p&gt;
&lt;p&gt;Now after being selected to GSoC, I was very happy and satisfied as it was a dream come true. As a result of this, I could not do the amount of work that I should have, in the initial weeks of the program. I had weekly review meetings with the org mentors, who constantly supported and encouraged me to catch up to the planned timeline of tasks. As a result of their motivation and some extra efforts, I was able to finish the tasks that needed to be done till the midterm evaluation. At this point I feel really good that we are successful in completing the tasks till the midterm evaluation. The best thing about this is that it was always &lt;strong&gt;a combined effort&lt;/strong&gt;. I am planning to execute more tasks in the other half of the period, than we have till now. I’ll be faster as I have become really comfortable working with my mentors in these six weeks.&lt;/p&gt;
&lt;h2 id="gratitude-and-acknowledgements"&gt;Gratitude and Acknowledgements&lt;/h2&gt;&lt;p&gt;I would like to express my heartfelt gratitude to my mentors and the entire Creative Commons community for giving me this incredible opportunity. Your support and guidance have been invaluable. Special thanks to my mentor &lt;a href="https://opensource.creativecommons.org/blog/authors/sara/"&gt;Sara Lovell (Possumbilities)&lt;/a&gt; for the constant support that I received from you all the time. I am really grateful for not only the technical help, but also the motivation, support and encouragement I got from your side. Also &lt;a href="https://opensource.creativecommons.org/blog/authors/TimidRobot/"&gt;Timid Robot&lt;/a&gt; and &lt;a href="https://opensource.creativecommons.org/blog/authors/shafiya/"&gt;Shafiya Heena&lt;/a&gt; were always present there whenever I needed them. Be it the weekly review meetings, or my confusion related to the development environment, I never felt I’m alone in this . I am excited to work further under your guidance and contribute to the shared vision of Creative Commons.&lt;/p&gt;
&lt;h2 id="join-the-discussion"&gt;Join The Discussion&lt;/h2&gt;&lt;p&gt;There are numerous ways you can join the discussion and contribute to the project. Whether it’s by providing feedback, contributing to the codebase or simply spreading the word about open knowledge, your participation is highly encouraged. You can check out our github repository &lt;a href="https://github.com/creativecommons/cc-resource-archive"&gt;here&lt;/a&gt;, to find the codebase and join the discussion over there.&lt;/p&gt;
</content></entry><entry><title>Automating Quantifying the Commons: Part 1</title><link href="http://opensource.creativecommons.org/blog/entries/2024-07-10-automating-quantifying/" rel="alternate"></link><updated>2024-07-10T00:00:00Z</updated><author><name>['NaishaSinha']</name></author><id>urn:uuid:6696bca3-7190-3d40-a3b6-75fd01ebe8c5</id><content type="html">&lt;p&gt;&lt;img src="/blog/entries/2024-07-10-automating-quantifying/Automating - GSoC Logo.png" alt="GSoC 2024"&gt;&lt;/p&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;&lt;hr&gt;
&lt;p&gt;Quantifying the Commons, an initiative emerging from the UC Berkeley Data Science Discovery Program, 
aims to quantify the frequency of open domain and CC license usage for future accessibility and analysis purposes 
(Refer to the initial CC article for Quantifying &lt;strong&gt;&lt;a href="https://opensource.creativecommons.org/blog/entries/2022-12-07-berkeley-quantifying/"&gt;here!&lt;/a&gt;&lt;/strong&gt;). 
To date, the scope of the previous project advancements has not included automation or combined reporting, 
which is necessary to minimize the potential for human error and allow for more timely updates, 
especially for a system that engages with substantial streams of data. &lt;br&gt;&lt;/p&gt;
&lt;p&gt;As a selected developer for Google Summer of Code 2024, 
my goal this summer is to develop automation software for data gathering, flow, and report generation, 
ensuring that reports are never more than 3 months out-of-date. This blog post serves as a technical journal
for my endeavor till the midterm evaluation period. &lt;strong&gt;&lt;a href="https://opensource.creativecommons.org/blog/entries/2024-08-22-automating-quantifying/"&gt;Automating Quantifying the Commons: Part 2&lt;/a&gt;&lt;/strong&gt; will be posted after successful completion of the
entire summer program.&lt;/p&gt;
&lt;h2 id="pre-program-knowledge-and-associated-challenges"&gt;Pre-Program Knowledge and Associated Challenges&lt;/h2&gt;&lt;hr&gt;
&lt;p&gt;As an undergraduate CS student, I had not yet had any experience working with codebases
as intricate as this one; the most complex software I had worked on prior to this undertaking
was most probably a medium-complexity full-stack application. In my pre-GSoC contributions to Quantifying, I did successfully 
implement logging across all the Python files (&lt;strong&gt;&lt;a href="https://github.com/creativecommons/quantifying/pull/97"&gt;PR #97&lt;/a&gt;&lt;/strong&gt;), but admittedly, I was not familiar with a lot of the other modules that 
were being used in these files. As a result, this caused minor inconveniences to my development process from the very beginning. 
For example, not being experienced with operating system (OS) modules had me confused as to how I was supposed to 
join new directories. In addition, I had never worked with such large streams of data before, so it was initially a
challenge to map out pseudocode for handling big data effectively. The next section elaborates on my development process and how I resolved these setbacks.&lt;/p&gt;
&lt;h2 id="development-process-midterm"&gt;Development Process (Midterm)&lt;/h2&gt;&lt;hr&gt;
&lt;h3 id="i-data-flow-diagram-construction"&gt;I. Data Flow Diagram Construction&lt;/h3&gt;&lt;p&gt;Before starting the code implementation, I decided to develop a &lt;strong&gt;Data Flow Diagram (DFD)&lt;/strong&gt;, which provides a visual
representation of how data flows through a software system. While researching effective DFDs for inspiration, I came across
a &lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/whitepapers/latest/microservices-on-aws/distributed-data-management.html"&gt;technical whitepaper by Amazon Web Services (AWS)&lt;/a&gt;&lt;/strong&gt; on Distributed Data Management, and I found it very helpful in drafting
my own DFD. As I was still relatively new to the codebase, it helped me simplify
the current system into manageable components and better understand how to implement the rest of the project.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/2024-07-10-automating-quantifying/DFD.png" alt="DFD"&gt;
This was the initial layout for the data directory flow; however, the more I delved into the development process,
the more the steps changed. I will present the final directory flow in Part 2 at the end of the program.&lt;/p&gt;
&lt;h3 id="ii-identifying-the-first-data-source-to-target"&gt;II. Identifying the First Data Source to Target&lt;/h3&gt;&lt;p&gt;The main approach for implementing this project was to target one specific data source and complete its data extraction, analysis,
and report generation process before adding more data sources to the codebase. There were two possible strategies to consider:
(1) work on the easier data sources first, or (2) begin with the highest complexity data source and then add the easier
ones later. Both approaches have notable pros and cons; however, I decided to adopt the second strategy of 
starting with the most complex data source first. Although this would take slightly longer to implement, it would simplify the process
later on. As a result, I began implementing the software for the &lt;strong&gt;Google Custom Search&lt;/strong&gt;
data source, which has the largest number of data retrieval potential among all the other sources.&lt;/p&gt;
&lt;h3 id="iii-directory-setup-code-implementation"&gt;III. Directory Setup + Code Implementation&lt;/h3&gt;&lt;p&gt;Based on the DFD, &lt;strong&gt;&lt;a href="https://opensource.creativecommons.org/blog/authors/TimidRobot/"&gt;Timid Robot&lt;/a&gt;&lt;/strong&gt; (my mentor) and I identified the directory process to be as such: within our &lt;code&gt;scripts&lt;/code&gt; directory, we would have
separate sub-directories to reflect the phases of data flow, &lt;code&gt;1-fetch&lt;/code&gt;, &lt;code&gt;2-process&lt;/code&gt;, &lt;code&gt;3-report&lt;/code&gt;. The code would then be
set up to interact between systems in chronological order. Additionally, a shared directory was implemented to optimize similar functions and paths. &lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;1-fetch&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As I mentioned in the previous sections, starting to code the initial file was a challenge, as I had to learn how to use
new technologies and libraries on-the-go. As a matter of fact, my struggles began when I couldn't even import the 
shared module correctly. However, slowly but surely, I found that consistent research of available documentation as well
as constant insights from Timid Robot made it so that I finally understood everything that I was working with. There were
a few specific things that helped me especially, and I would like to share them here in case it helps any software
developer reading this post:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reading Technical Whitepapers:&lt;/strong&gt; As I mentioned earlier, I studied a technical whitepaper by AWS to help me design my DFD.
From this, I realized that consulting relevant whitepapers by industry giants to see how they approach similar tasks
helped me a lot in understanding best practices to implementing the system. Here is another resource by Meta that I referenced, 
called &lt;strong&gt;&lt;a href="https://engineering.fb.com/2024/05/22/data-infrastructure/composable-data-management-at-meta/"&gt;Composable Data Management at Meta&lt;/a&gt;&lt;/strong&gt; (I mainly used the  &lt;em&gt;Building on Similarities&lt;/em&gt; section
to study the logical components of data systems).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Referencing the Most Recent Quantifying Codebase:&lt;/strong&gt; The pre-automation code that was already implemented by previous developers
for &lt;em&gt;Quantifying the Commons&lt;/em&gt;
was the closest thing to my own project that I could reference. Although not all of the code was relevant to the Automating project,
there were many aspects of the codebase I found very helpful to take inspiration from, especially when online research led to a
dead end.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Writing Documentation for the Code:&lt;/strong&gt; As a part of this project, I assigned myself the task of developing documentation for
the Automating Quantifying the Commons project (&lt;strong&gt;&lt;a href="https://unmarred-gym-686.notion.site/Automating-Quantifying-the-Commons-Documentation-441056ae02364d8a9a51d5e820401db5?pvs=4"&gt;can be accessed here!&lt;/a&gt;&lt;/strong&gt;). Heavily inspired by the "rubber duck debugging"
method, where explaining the code or problem step-by-step to someone or something will make the solution present itself, I decided to create documentation
for future developers to reference, in which I break down the code step-by-step to explain each module or function. I found that in
doing this, I was able to better understand my own code better.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As for the license data retrieval process using the Google Custom Search API Key, 
I did have a little hesitation running everything for the first time. 
Since I had never worked with confidential information or such large data inputs before, 
I was scared of messing something up. Sure enough, the first time I ran everything with the language and country parameters, 
it did cause a crash, since the API query-per-day limit was crossed with one script run. As I continued to update 
the script, I learned a very useful trick when it comes to handling big data: 
to avoid hitting the query limit while testing, you can replace the actual API calls 
with logging statements to show the parameters being used. This helps you 
understand the outputs without actually consuming API quota, and it can help you identify bugs more easily. &lt;br&gt;&lt;/p&gt;
&lt;p&gt;A notable aspect of this software is the directory organization. Throughout the process, I designed it so that the datasets are automatically stored within their
respective quarter's directories rather than being stored altogether. This ensures efficient organization in order for users to easily access in the future, 
especially when the number of datasets multiplies.&lt;/p&gt;
&lt;p&gt;Upon successful completion of basic data retrieval and state management in Phase 1, 
I felt much more confident about the trajectory of this project, and implementing 
future steps and fixing new bugs became progressively easier.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;2-process&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The long-term goal of the Quantifying project is to have comprehensive datasets for each quarter, encompassing
license data that scales up to millions and even billions. For the &lt;code&gt;2-process&lt;/code&gt; phase specifically, the aim is 
to analyze and compare data between quarters to be able to display in the reports. However, given our Google Custom Search
API constraints as well as the time period we're working with for the GSoC period (most of this period is mainly
2024Q3), it is not possible to have a fully completed Phase 2. However, in order to deploy as complete of an automation software as possible,
I have set up a basic psuedocode that can be implemented
and built upon by future development efforts as more data is collected in the upcoming quarters/years.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;3-report&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As mentioned earlier, the Google Custom Search API constraints made it difficult to create a comprehensive and detailed dataset, so I plan to 
initiate the development of a more fletched-out Google Custom Search post-GSoC, when more data can be accumulated (discussed further in the next section).
As of now, there are three main completed report visualization schemes: &lt;strong&gt;(1)&lt;/strong&gt; Reports by Country, &lt;strong&gt;(2)&lt;/strong&gt; Reports by License Type,
and &lt;strong&gt;(3)&lt;/strong&gt; Reports by Language. Although the visualizations are basic in design, I made sure to incorporate accessibility into the 
visualizations for a better user experience. This included adding elements like labels on top of the bars with specific number counts for better
readability and understanding of the reports. In addition, I included three key features in the reports codebase to cater to various possible 
needs of the report users.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Key Feature #1:&lt;/strong&gt; I implemented command line arguments in which users can choose any quarter to visualize, as I believe this would be useful
for anyone in need of individual reports from previous quarters, not just reports from this quarter.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Key Feature #2:&lt;/strong&gt; Successfully stores reports into the data reports directory specific to each quarter for optimal organization (similar to the
dataset organization in Phase 1). In this way, 
reports from one quarter will not be mixed up with reports from another quarter, making it easier for users to navigate and use.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Key Feature #3:&lt;/strong&gt; The program automatically generates and/or updates an individual &lt;code&gt;README&lt;/code&gt; file for each quarter's reports. This &lt;code&gt;README&lt;/code&gt; organizes
all generated report images within that quarter into one page, alongside basic report descriptions.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="mid-program-conclusions-and-upcoming-tasks"&gt;Mid-Program Conclusions and Upcoming Tasks&lt;/h2&gt;&lt;hr&gt;
&lt;p&gt;Overall, my understanding and skillset for this project increased ten-fold after completing all the phases for Google Custom Search. 
Going into the second half of the Google Summer of Code program, I expect that I will complete the future data sources at a more efficient and faster rate,
given the license data sizes and my heightened expertise. In fact, as of now (the midterm evaluation point), I have completed
a relatively detailed Phase 1 for Flickr, which only involves 10 licenses. My biggest takeaway from the first half of the coding period is that rather than developing
a basic querying process and adding on later, it's easier to start off with a complex and detailed version before moving on to Phases 2 and 3. Additionally, using
the &lt;code&gt;shared&lt;/code&gt; module within the scripts can be very beneficial to simplify the coding process.&lt;/p&gt;
&lt;p&gt;In the second half of the GSoC program, I plan to keep both of these takeaways in mind when developing scripts for the rest of the data sources. On a formal level,
the final goal for the end of GSoC 2024 is to have a working codebase for Phases 1, 2, and 3 of all data sources, 
including a completed automation setup for these scripts. Due to the effectiveness of the current directory organization and report generation features, 
I will be standardizing them across all data sources.&lt;/p&gt;
&lt;p&gt;Finally, after the software is complete to the extent that is possible during the GSoC period, 
I plan to raise issues in the repository respective to all the next steps that
could be taken post-GSoC by open-source developers for a more comprehensive software system.&lt;/p&gt;
&lt;p&gt;So far, my journey at Creative Commons has significantly enhanced my skillset as a software developer, and I have never felt more motivated to take on more challenging tasks. 
I'm looking forward to more levels of growth and accomplishments in the second half of the program. 
I'll be back with Part 2 at the end of the summer with an updated, completed project!&lt;/p&gt;
&lt;h2 id="additional-readings"&gt;Additional Readings&lt;/h2&gt;&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://opensource.creativecommons.org/blog/entries/2024-08-22-automating-quantifying/"&gt;Automating Quantifying the Commons: Part 2&lt;/a&gt; | Author: Naisha Sinha | Aug. 2024&lt;/li&gt;
&lt;li&gt;&lt;a href="https://opensource.creativecommons.org/blog/entries/2022-12-07-berkeley-quantifying/"&gt;Data Science Discovery: Quantifying the Commons&lt;/a&gt; | Author: Dun-Ming Huang (Brandon Huang) | Dec. 2022&lt;/li&gt;
&lt;/ul&gt;
</content></entry><entry><title>New CreativeCommons.org launched 2023 September</title><link href="http://opensource.creativecommons.org/blog/entries/2024-05-28-creativecommons-org/" rel="alternate"></link><updated>2024-05-28T00:00:00Z</updated><author><name>['sara', 'shafiya', 'TimidRobot']</name></author><id>urn:uuid:7ea5e14e-e9cb-35f3-8be8-e344b66f7a7e</id><content type="html">&lt;p&gt;Creative Commons (CC) launched a new
&lt;a href="https://creativecommons.org/"&gt;CreativeCommons.org&lt;/a&gt; website on 2023 September
27th. This relaunch included not just the website, but the entire technology
stack (platform, server, and website components).&lt;/p&gt;
&lt;h2 id="improved-platform"&gt;Improved platform&lt;/h2&gt;&lt;p&gt;The new website is hosted on AWS. This allowed us to design a more secure
network architecture between services and deploy/manage the services using
infrastructure as code.&lt;/p&gt;
&lt;h2 id="improved-services"&gt;Improved services&lt;/h2&gt;&lt;p&gt;The services running the website were simplified and updated. The number of
distinct servers was reduced from six down to two. Previously, loading the
homepage required five services (HAProxy, Varnish, Apache2, PHP+FPM, and
MariaDB). The complexity of the old services made troubleshooting more
difficult. They were designed before Cloudflare began supporting us through
&lt;a href="https://www.cloudflare.com/galileo/"&gt;Project Galileo&lt;/a&gt;. The new website
requires only two services (Apache2 and MariaDB).&lt;/p&gt;
&lt;h2 id="improved-website-components"&gt;Improved website components&lt;/h2&gt;&lt;h3 id="vocabulary"&gt;Vocabulary&lt;/h3&gt;&lt;p&gt;The website consists of a variety of components that use the Vocabulary design
system (&lt;a href="https://github.com/creativecommons/vocabulary"&gt;creativecommons/vocabulary&lt;/a&gt;) to present a unified user
experience. This relaunch was the first implementation of the new Vocabulary.
It has returned to web core principals favoring semantic HTML and appropriately
scoped CSS styling. It keeps the style layer responsibilities firmly within the
CSS, rather than utilizing a framework like Bootstrap to add a myriad of
style-based classes to the HTML layer.  Furthermore, JavaScript use has been
kept incredibly minimal, offering routes of behavior that can’t already be
accomplished via HTML and/or CSS, letting HTML and CSS do what they do best.
This simplicity improves performance and also lowers barriers for community
contributions.&lt;/p&gt;
&lt;p&gt;Accessibility was a priority, making the code more semantic already helps, but
we went further in ensuring that all the affordances you get from HTML aren’t
blocked or altered via opinionated (and often non-standard) frameworks. The
site performs better generally, and is much kinder to slower connection speeds.&lt;/p&gt;
&lt;p&gt;The new implementation of Vocabulary includes a new Information Architecture
and more stable UX approach for better visitor experiences. CC licensed media
is one of our strengths and as such it was important to allow proper
attribution to be baked into every instance of media rendering within the
design. This means that while the image or video may be important to the flow
of content, its attribution also gets a level of appropriate importance as
well, highlighting ways in which others might handle attribution and following
through on our own mission in the pursuit of better sharing at large.&lt;/p&gt;
&lt;h3 id="wordpress"&gt;WordPress&lt;/h3&gt;&lt;p&gt;The project utilizes a custom WordPress theme
(&lt;a href="https://github.com/creativecommons/vocabulary-theme"&gt;creativecommons/vocabulary-theme&lt;/a&gt;) that implements the new
Vocabulary design system.&lt;/p&gt;
&lt;p&gt;The theme utilizes the WordPress Classic Editor because of its long-term
stability and more stable UX. Gutenberg still does not adhere to adequate
Accessibility approaches, nor does it have a sense of stable
feature-completeness. This creates an unreliable landscape to build upon.
Gutenberg also requires one to build Block composition through React.js to
accomplish tasks that are far easier and more approachable with the standard
PHP templates that the Classic Editor is compatible with. This dramatically
improves the ability for a new contributor to help, and speeds up the
development process.&lt;/p&gt;
&lt;p&gt;To allow a degree of more varied page composition, Advanced Custom Fields was
utilized to more easily add, update, and version control custom fields across
pages and page templates. This strikes a balance between more complex page
composition, but within a more controllable set of circumstances.&lt;/p&gt;
&lt;p&gt;Plugins in general were cut dramatically. The legacy site contained 20 active
plugins, while this project relies on less than half, at 9, with hopeful
pathways to eventually cut that number even further.&lt;/p&gt;
&lt;p&gt;The site utilizes several custom content types and better taxonomies to split
up the UX flow of varied kinds of content creation, allowing for smoother
multi-author attribution, site-wide notices for fundraising and event
announcements, and better blog post organization and way-finding overall.&lt;/p&gt;
&lt;h3 id="cc-legal-tools"&gt;CC Legal Tools&lt;/h3&gt;&lt;p&gt;With the deployment of our new website, we also replaced the legacy ccEngine
with the new CC Legal Tools. The current legal tool landscape is refreshingly
simple with only seven tools (CC BY 4.0, CC BY-NC 4.0, CC BY-NC-ND 4.0, CC
BY-NC-SA 4.0, CC BY-ND 4.0, CC BY-SA 4.0, CC0 1.0). However since previous
versions of the licenses were adapted to specific jurisdictions (ported) and we
collaborate with the community to support many translations, the new CC Legal
Tools app manages over 30,000 documents!&lt;/p&gt;
&lt;p&gt;The project to rewrite the CC Legal Tools and replace the legacy ccEngine began
in 2020 with a request for proposals (&lt;a href="https://docs.google.com/document/d/1mlgmjDorTEwgIRRrvILK3v0pTJbGx8fB5SE1yplrz3Y/edit"&gt;RFP: License Infrastructure - Google
Docs&lt;/a&gt;). The &lt;a href="https://www.caktusgroup.com/"&gt;Caktus Group&lt;/a&gt; began the new CC
Legal Tools using the Django Python web framework. The work was continued by
Timid Robot. Saurabh helped with RDF/XML generation (&lt;a href="/blog/entries/2023-08-25-machine-layer/"&gt;CC Legal Tools:
Machine-Readable Layer — Creative Commons Open
Source&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The new CC Legal Tools consist of two repositories:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-legal-tools-app"&gt;creativecommons/cc-legal-tools-app&lt;/a&gt;: &lt;em&gt;Static site
generator using Django&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-legal-tools-data"&gt;creativecommons/cc-legal-tools-data&lt;/a&gt;: &lt;em&gt;Inputs and
outputs of the application&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The legacy ccEngine consists of around 15,960 lines of Python 2. It was
developed and extended organically over time, resulting in a less coherent
codebase. The new CC Legal Tools has the benefit of hindsight and was
architected as a single application to meet all of current requirements of CC.
It consists of around 17,400 lines of Python 3 (including around 4,000 lines of
tests). Benefits of the new CC Legal Tools include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Currently supported software (Python 3, Django 4.2, etc.)&lt;/li&gt;
&lt;li&gt;Simplified data model&lt;/li&gt;
&lt;li&gt;Improved translation handling&lt;/li&gt;
&lt;li&gt;Improved RDF/XML generation/management&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In particular, the fact that the new CC Legal Tools generate static assets is
noteworthy. Static assets can be hosted performantly with a very simple service
setup.&lt;/p&gt;
&lt;h3 id="chooser"&gt;Chooser&lt;/h3&gt;&lt;p&gt;The new chooser beta (&lt;a href="https://github.com/creativecommons/chooser"&gt;creativecommons/chooser&lt;/a&gt;) was promoted to
production with the new header and footer from the Vocabulary design system for
a more uniform user experience.&lt;/p&gt;
&lt;h3 id="faq-platform-toolkit"&gt;FAQ &amp;amp; Platform Toolkit&lt;/h3&gt;&lt;p&gt;The FAQ (&lt;a href="https://github.com/creativecommons/faq"&gt;creativecommons/faq&lt;/a&gt;) and
Platform Toolkit (&lt;a href="https://github.com/creativecommons/mp"&gt;creativecommons/mp&lt;/a&gt;)
were updated to use the new header and footer from the Vocabulary design system
for a more uniform user experience.&lt;/p&gt;
&lt;h2 id="improved-development"&gt;Improved development&lt;/h2&gt;&lt;p&gt;Utilizing infrastructure as code, we now have a much more robust staging
environment. This allows us to preview larger changes so that they can be
deployed to production with minimum risk. We also improved our local
development environment and content synchronization tooling
(&lt;a href="https://github.com/creativecommons/index-dev-env"&gt;creativecommons/index-dev-env&lt;/a&gt;). This means that not only did
we fix many old bugs, but when new bugs are identified, we can fix them more
rapidly!&lt;/p&gt;
&lt;h2 id="thank-you"&gt;Thank you&lt;/h2&gt;&lt;p&gt;Thank you to the people who directly contributed to the success of the new
website!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Nate, former Director of Communications &amp;amp; Community&lt;/li&gt;
&lt;li&gt;Sara, Full Stack Engineer&lt;/li&gt;
&lt;li&gt;Shafiya, Systems Engineer&lt;/li&gt;
&lt;li&gt;Timid Robot, Director of Technology&lt;/li&gt;
&lt;li&gt;&lt;em&gt;as well as many other previous staff, community contributors, and other
&lt;a href="/community/supporters/"&gt;supporters&lt;/a&gt;!&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
</content></entry><entry><title>CC Legal Tools: Machine-Readable Layer</title><link href="http://opensource.creativecommons.org/blog/entries/2023-08-25-machine-layer/" rel="alternate"></link><updated>2023-08-28T00:00:00Z</updated><author><name>['saurabh']</name></author><id>urn:uuid:ca25cf56-a3e2-3703-bd66-7619befdf05c</id><content type="html">&lt;p&gt;Greetings, readers!🌟 I'm excited to share that as part of Google Summer of
Code (GSoC) 2023, I had the incredible opportunity to contribute to the
exciting project "CC Legal Tools: Machine-Readable Layer." This journey has
been a remarkable blend of learning, coding, and collaboration, and I'm
thrilled to share the highlights of this journey with you all.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/2023-08-25-machine-layer/gsoc2023cc.png" alt="GSoC 2023 and CC"&gt;&lt;/p&gt;
&lt;h2 id="project-overview"&gt;Project Overview&lt;/h2&gt;&lt;p&gt;The project's core focus was to enhance the Creative Commons (CC) &lt;a href="https://github.com/creativecommons/cc-legal-tools-app"&gt;Legal Tools
app&lt;/a&gt; by introducing a robust machine-readable layer. The machine-readable
layer enables computers to understand the intricacies of CC licenses, making it
easier for legal professionals, developers, and enthusiasts to work with CC
licenses programmatically.&lt;/p&gt;
&lt;h2 id="getting-started"&gt;Getting Started&lt;/h2&gt;&lt;p&gt;My journey began with delving into the existing codebase and understanding the
project's requirements i.e. understanding the app's architecture, its
components, and how it currently handled CC licenses was crucial for what lay
ahead.&lt;/p&gt;
&lt;p&gt;RDF, or Resource Description Framework, emerged as a crucial player in the
project. Grasping the intricacies of RDF and its role in representing licenses
was a necessary step in the journey.&lt;/p&gt;
&lt;h2 id="challenges-and-learning-opportunities"&gt;Challenges and Learning Opportunities&lt;/h2&gt;&lt;p&gt;One of my early challenges was unraveling the complexities of the legacy
RDF/XML files. How did they differ from the new RDF/XML files we aimed to
generate? This exploration led me to discover improvements in structure,
updated license information, and additional metadata.&lt;/p&gt;
&lt;p&gt;Generating RDF files for various licenses and versions became a puzzle to
solve. Crafting RDF triples, understanding licensing nuances, and weaving this
logic into the app's views became both a learning opportunity and a rewarding
challenge.&lt;/p&gt;
&lt;h2 id="contributions-and-the-work"&gt;Contributions and The Work&lt;/h2&gt;&lt;p&gt;As the project evolved, I worked to dynamically generate RDF/XML files,
allowing the app to generate machine-readable licenses on-the-fly.&lt;/p&gt;
&lt;p&gt;To maintain an organized approach, it is ensured that the generated RDF files
are sorted, all the credit goes to &lt;a href="/blog/authors/TimidRobot/"&gt;Timid Robot&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The newly generated RDF/XML aims to enhance the clarity, accuracy,
compatibility, and standardization of Creative Commons license representation
in RDF format. These improvements boost machine-readability and semantic
understanding, fostering seamless integration and interpretation in digital
systems.&lt;/p&gt;
&lt;h2 id="overview-of-changes"&gt;Overview of changes:&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Improved Structure and Consistency:&lt;/strong&gt;&lt;ul&gt;
&lt;li&gt;The new RDF/XML boasts a more organized, standardized structure, aligning
with RDF standards. This enhances machine comprehension and accurate data
processing.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Updated License Information&lt;/strong&gt;:&lt;ul&gt;
&lt;li&gt;License information has been updated to reflect the latest permissions and
restrictions. This ensures users and systems are informed accurately.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Alignment with RDF Best Practices&lt;/strong&gt;:&lt;ul&gt;
&lt;li&gt;Changes align the representation with RDF best practices. This boosts
interoperability and compatibility, thanks to standardized namespaces,
 consistent naming, and proper relationship definitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Throughout the journey, I had the privilege of working closely with my mentor,
engaging in collaborative discussions and receiving insightful code reviews..&lt;/p&gt;
&lt;p&gt;As my GSoC journey draws to a close, I'm excited about the foundation we've
laid for the CC Legal Tools app. The machine-readable layer opens doors to a
future of smarter, automated legal processes.&lt;/p&gt;
&lt;p&gt;The improvements made during GSoC will continue to ripple through the CC Legal
Tools app, benefiting users and the broader open-source community.&lt;/p&gt;
&lt;h2 id="mentor-and-support"&gt;Mentor and Support&lt;/h2&gt;&lt;p&gt;A heartfelt thanks to my mentor, &lt;a href="/blog/authors/TimidRobot/"&gt;Timid Robot&lt;/a&gt;, for guiding me
through this incredible journey. Your unwavering support, wisdom, feedback, and
willingness to share knowledge have truly been invaluable. I'm deeply grateful
for the opportunity to learn and grow under your mentorship. Thank you for
making this journey unforgettable.&lt;/p&gt;
&lt;h2 id="takeaways-and-conclusion"&gt;Takeaways and Conclusion&lt;/h2&gt;&lt;p&gt;GSoC became a platform for me to acquire new skills, dive into complex
concepts, and broaden my horizons. The learning experience was immersive and
transformative.&lt;/p&gt;
&lt;p&gt;Being part of the open-source community was a revelation. Interacting with
like-minded individuals, contributing to a shared goal, and experiencing the
true essence of collaboration was a highlight.&lt;/p&gt;
&lt;p&gt;My GSoC journey has been a remarkable adventure of exploration, discovery, and
growth. The project's mission to create a machine-readable layer for CC Legal
Tools has left an indelible mark on my journey as a developer.&lt;/p&gt;
&lt;p&gt;Thank you for joining me on this expedition. Here's to the future of
open-source contributions and the endless possibilities they hold.&lt;/p&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Saurabh Kumar&lt;/p&gt;
</content></entry><entry><title>New Chapter of My Professional Life</title><link href="http://opensource.creativecommons.org/blog/entries/2023-06-16-new-chapter-of-my-professional-life/" rel="alternate"></link><updated>2023-06-20T00:00:00Z</updated><author><name>['shafiya']</name></author><id>urn:uuid:65e12b2b-0d17-3dea-82a1-96d24beaa8a4</id><content type="html">&lt;p&gt;Greetings, readers! I’m Shafiya Heena, from Hyderabad, India, who now finds herself immersed in the vibrant city of Toronto, Canada. After spending six fruitful years as a DevOps Engineer, I recently embarked on a new professional journey with Creative Commons, a nonprofit organization. Today, I want to share my experiences and thoughts on the stark differences in culture between these two organizations and shed light on my recent encounter with an event called InTown Week (ITW).&lt;/p&gt;
&lt;p&gt;Before joining Creative Commons, I had limited exposure to open source initiatives and nonprofit organizations. However, upon stepping foot into Creative Commons, I found myself captivated by its unique culture. The emphasis on collaboration, transparency, and fostering a sense of responsibility among staff members left me awestruck. The organization's commitment to open source and its associated ethos ignited a newfound passion within me.&lt;/p&gt;
&lt;p&gt;Prior to ITW, I had heard positive whispers of this event, but had little knowledge about its significance. Little did I know that this week-long gathering would prove to be an enlightening and transformative experience. Over the course of five days, I delved deep into self-discovery, learning more about my team members, and gaining profound insights into Creative Commons as an organization.&lt;/p&gt;
&lt;p&gt;One aspect that particularly struck me during ITW was the knowledgeable, sense of equality among staff members. Everyone was encouraged to share their opinions, even if they contradicted the prevailing decisions. This environment fostered a culture of inclusivity and collective growth. Additionally, I had the opportunity to take an insightful discovery test that provided valuable insights into my personality traits and how I could enhance my contributions within the team.&lt;/p&gt;
&lt;p&gt;During all the conversations and activities, I found myself occasionally getting lost in the whirlwind of information. Thankfully, I was fortunate to have a dedicated mentor who skillfully guided me back on track, ensuring that I comprehended the nuances of the happenings around me. Through this mentorship, I discovered how to boost my energy levels and become an even better fit within the team.&lt;/p&gt;
&lt;p&gt;While the experience of ITW was enriching, it was not without its challenges. Due to visa issues, I was not able to attend in person. Engaging in extended virtual calls throughout the week was demanding, but the active participation from every individual in all activities made it all worthwhile. One particularly heartwarming moment was bidding farewell, as everyone stood in a row to say goodbye to me personally on the screen. This gesture made me feel truly present, transcending the virtual realm, and I express my heartfelt gratitude to my mentor for facilitating such connections.&lt;/p&gt;
&lt;p&gt;A notable contrast I observed during my time at Creative Commons was the vivaciousness and approachability of the CEO. In stark contrast to my previous organization, where CEO communication was primarily limited to formal emails announcing changes or decisions, I was pleasantly surprised by the CEO's warmth and genuine interest in engaging with employees in a personable manner. This refreshing leadership style evoked a sense of enthusiasm and bolstered my commitment to the organization's goals.&lt;/p&gt;
&lt;p&gt;To encapsulate my thoughts on the open source culture at Creative Commons in a single word, I would choose "likable." My mentor, in particular, played a crucial role in establishing transparency by designing a comprehensive three-month onboarding document, which laid out expectations and goals. Here, although the infrastructure may be smaller, the workflow is streamlined, and there is an absence of restrictive IT teams that hinder access to websites or prioritize hardware security over staff responsibility. Instead, Creative Commons embraces a culture where employees take ownership of their hardware, fostering an environment of trust and empowerment.&lt;/p&gt;
</content></entry><entry><title>Many Mona Lisas? Artistic Data Quantification and Assessment</title><link href="http://opensource.creativecommons.org/blog/entries/2023-04-26-umsi-how-many-mona-lisas/" rel="alternate"></link><updated>2023-04-26T00:00:00Z</updated><author><name>['grace_coleman', 'anthony_ho', 'tyler_phillips', 'claire_wan']</name></author><id>urn:uuid:c9c225be-0216-3f8b-98d2-db7836cd912b</id><content type="html">&lt;p&gt;Quantifying the Commons&lt;/p&gt;
&lt;p&gt;University of Michigan, School of Information&lt;/p&gt;
&lt;h2 id="project-objective-and-problem-statement"&gt;Project Objective and Problem Statement&lt;/h2&gt;&lt;p&gt;Creative Commons (CC) has over one billion licensed works. However, there is no
central data or organization of CC’s licensed works, making it difficult to
quantify the number of works and to analyze which licenses are useful or should
be retired. The goal of this project is to help CC staff identify redundant
licenses and use quantitative data in marketing its impact. It focuses on Open Education Resources (OER).&lt;/p&gt;
&lt;h2 id="data-collection"&gt;Data Collection&lt;/h2&gt;&lt;p&gt;Data was collected from &lt;a href="https://www.oercommons.org/"&gt;OER Commons&lt;/a&gt;, which is one of CC’s
platforms and a library containing digital education resources. The first step
in data collection was identifying which licenses this data source uses and how
many works are under each license within OER Commons. OER Commons uses the
licenses CC-BY, CC-BY-SA, CC-BY-ND, CC-BY-NC, CC-BY-NC-SA, and CC-BY-NC-ND
which contribute to both ‘fair use’ and ‘commercial use’ assets, respectively.
The next step in data collection was querying the Application Programming
Interface (API) by license. In order to retrieve all works for a license,
queries are batched by a maximum of 50 works retrieved at once.  This process
is repeated until all works for a license are retrieved. These steps are run
for every license. For every API call, the response is in XML which is parsed
for features including education level, subject area, material type, media
format, languages, primary user, and educational use. The results are outputted
to a tab-separated CSV file.&lt;/p&gt;
&lt;h2 id="exploratory-data-analysis-eda"&gt;Exploratory Data Analysis (EDA)&lt;/h2&gt;&lt;p&gt;After collecting all of our data, we began exploring the different columns in
our dataframe. In particular, we looked at the distribution of different
languages, the distribution of items by license type, and when items were added
to the OER Commons API. Through this exploration, we were able to further
specify our analysis and dig deeper into the different relationships of the
data.&lt;/p&gt;
&lt;h3 id="diagram-1"&gt;Diagram #1:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_01.png" alt="Diagram #1: Percentage of Items per License Type"&gt;&lt;/p&gt;
&lt;p&gt;Diagram #1 shows the distribution of items taken from OER Commons by license
type. It is clear that the CC-BY license type is the most popular, with 43% of
the items having that license type. The CC-BY-SA license is also fairly
popular, accounting for 27% of the items collected.&lt;/p&gt;
&lt;h3 id="diagram-2"&gt;Diagram #2:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_02.png" alt="Diagram #2: Number of Items by Month since Dec 2015"&gt;&lt;/p&gt;
&lt;p&gt;Diagram #2 shows when items have been added to the OER Commons API. There is
little activity from December 2015, up to the beginning of 2023. However, close
to 30,000 items were added to the API in early 2023.&lt;/p&gt;
&lt;h3 id="diagram-3"&gt;Diagram #3:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_03.png" alt="Diagram #3: Percentage of Items by Language"&gt;&lt;/p&gt;
&lt;p&gt;Diagram #3 shows the percentage of items by language. English is the most used
language, with about 86% of the items being in English. The other languages
each have a small amount of the items.&lt;/p&gt;
&lt;h3 id="diagram-4"&gt;Diagram #4:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_04.png" alt="Diagram #4: Percentage of Items in English per License Type"&gt;&lt;/p&gt;
&lt;p&gt;Since English is clearly the most popular language, we decided to see the
license distribution for items that are in English. Diagram #4 shows a similar
distribution to the pie chart depicting the overall license distribution; this
is to be expected since items in English account for 86% of all items, so the
distribution of licenses is similar to the overall distribution.&lt;/p&gt;
&lt;h3 id="diagram-5"&gt;Diagram #5:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_05.png" alt="Diagram #5: Percentage of Items in French per License Type"&gt;&lt;/p&gt;
&lt;p&gt;We continued to look at the distribution of licenses by each language.
Diagram #5 shows that for the items in French, CC-BY license is the most
popular at 49%, with CC-BY-SA being right behind it at 32%.&lt;/p&gt;
&lt;h2 id="visualizations"&gt;Visualizations&lt;/h2&gt;&lt;h3 id="diagram-6"&gt;Diagram #6:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_06.png" alt="Diagram #6: License Type Breakdown by Primary User"&gt;&lt;/p&gt;
&lt;p&gt;Diagram #6 shows the distribution of items on OER commons by primary user and
broken down by license type. The platform predominantly contains items designed
for teachers and students, with the rest for parents, administrators,
librarians among others. The breakdown of licenses for each primary user is
relatively consistent with the overall breakdown of the platform, as seen from
the charts below (Diagram #7 and Diagram #8).&lt;/p&gt;
&lt;h3 id="diagram-7"&gt;Diagram #7:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_07.png" alt="Diagram #7: Percentage of Items Used by Teachers per License Type"&gt;&lt;/p&gt;
&lt;h3 id="diagram-8"&gt;Diagram #8:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_08.png" alt="Diagram #8: Percentage of Items Used by Students per License Type"&gt;&lt;/p&gt;
&lt;h3 id="diagram-9"&gt;Diagram #9:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_09.png" alt="Diagram #9: Subject Area by License"&gt;&lt;/p&gt;
&lt;p&gt;Another aspect analyzed was inspecting the subject areas and the licenses that
they hold as shown in Diagram #9. Some preliminary data cleaning had to be
conducted as there were too many subjects on the platform, while some subjects
had very low counts. The team grouped similar subjects into nine different
categories, for example, social science, anthropology, sociology,
communication, world cultures, psychology, women’s studies, and social work
were grouped into social sciences.&lt;/p&gt;
&lt;p&gt;It can be seen from Diagram #9 that the most popular subject areas on the
platform are health sciences, language/arts and other sciences. Diving deeper
into these subject areas, health sciences and language/arts have a higher
proportion of items with the CC-BY-NC-SA license.&lt;/p&gt;
&lt;h3 id="diagram-10"&gt;Diagram #10:&lt;/h3&gt;&lt;p&gt;&lt;img src="/blog/entries/2023-04-26-umsi-how-many-mona-lisas/diagram_10.png" alt="Diagram #10: Material Type Breakdown by Education Level"&gt;&lt;/p&gt;
&lt;p&gt;Finally, the team analyzed the material types of the items and sorted it by
education level that the items were created for. Again, some data cleaning was
required as there were too many material types to analyze and some also had
very small data counts. The seven material types shown in Diagram #10 were the
most popular, and represented roughly 2/3 of the total.&lt;/p&gt;
&lt;p&gt;After sorting the education levels in chronological order, an interesting trend
that emerged is that the number of items increases with education level from
preschool, hits a peak at the community college level, and then decreases
afterwards. A shift in the material types can also be drawn from the graph, as
lesson plans represent a large proportion of items from preschool to high
school, but become insignificant from the college level onwards. On the other
hand, this is replaced by a higher proportion of readings. Another observation
worth remarking is that there is also a higher proportion of items at the
college level for textbooks.&lt;/p&gt;
&lt;h2 id="key-value"&gt;Key Value&lt;/h2&gt;&lt;p&gt;The insights created through the analysis of this project will be helpful for
CC’s marketing efforts. The ability to understand the distribution of license
types in different contexts such as education level, will help CC be better
equipped to target their marketing toward key demographics such as preschool
education materials for example. Another take away in terms of key value was
CC’s initiative to long term preservation. CC’s need to centralize their
collaborators' content into a database warehouse system has been an identified
direction since the start of this project. Our prototype database of the OER
Commons has contributed to these efforts in both small scale implementation as
well as meeting the scope of our database system modeling. As other CC cohort
chapters contribute their own databases of licenced works, there is a hopeful
expectation that a merger of acquisition will take place with other CC chapters
in the future.&lt;/p&gt;
&lt;h2 id="next-steps"&gt;Next Steps&lt;/h2&gt;&lt;p&gt;As CC expands its contributing members into the open-source initiative of
bringing licensed works to the world, other internal systems of data
preservation and maintenance start to become a point of serious interest as the
databases start to become an integrated endeavor in the future. Running our
prototype case study of the OER-Commons database has given us insights on the
direction of CC current database system and how this system will be better
suited to evolve into a data warehouse hub as a long-term solution. When we
started the process of data mining and data analysis, using Python3 has been a
staple in both our groups efforts as well as CC’s previous protocols with Git.
So, complementing this framework with other Python libraries that allow for
easier database querying will be a step in the right direction for the next
cohort of CC contributors to further this process along. An example of this
library integration would be pandasql to utilize the family pandas library
methods along with the SQL command logic that makes database maintenance easy
and manageable. Besides updating the data storage, future work can continue to
collect data from other sources with CC licensed work including the GLAM and
Internet Archive.&lt;/p&gt;
&lt;h2 id="acknowledgements"&gt;Acknowledgements&lt;/h2&gt;&lt;p&gt;We would like to express our gratitude towards Timid Robot Zehta, our client,
for working on behalf of CC, as well as &lt;a href="https://www.oercommons.org/"&gt;OER Commons&lt;/a&gt; for their
valuable contributions towards the development of digital licensing and open
source databasing initiatives. Without them, this project would not have been
possible. Their efforts have been instrumental in giving us the tools and
resources to help progress in the open-source initiative by allowing us to
promote the free exchange of ideas, knowledge, and resources within the art,
health, and education sectors of non-profit endeavors. Open source projects are
important because they allow the public to use and work on projects without
restrictions or keys. Since this initiative is open source, our efforts can be
added to and built upon, allowing the project to continue through the addition
of new contributors with fresh perspectives. Both of their commitment to
promoting accessible and inclusive content has enabled individuals and
organizations to create and distribute digital assets without facing any legal
restrictions around the world. It has been an absolute pleasure to work with
these organizations and be a part of their mission to democratize access to
information.&lt;/p&gt;
</content></entry><entry><title>Considering Community Contributions at Creative Commons</title><link href="http://opensource.creativecommons.org/blog/entries/2023-03-24-community-contributions/" rel="alternate"></link><updated>2023-03-24T00:00:00Z</updated><author><name>['sara']</name></author><id>urn:uuid:40a26755-2701-374d-a187-7bb3b00b503c</id><content type="html">&lt;p&gt;Different open source communities work differently and so everyone may arrive
at Creative Commons' projects with their own set of individual expectations.
Someone might expect to directly submit a Pull Request to a project without an
Issue. Or they may submit an Issue and then immediately an associated Pull
Request. At Creative Commons we have a process we hope to follow so there's a
chance for consideration, community participation, and discussion along the
way. Where we make collaborative, well documented, and informed work more
possible!&lt;/p&gt;
&lt;p&gt;Things usually begin with an idea for new functionality, new/revised
documentation, or an encountered error of sorts. That idea or error is then
captured as a GitHub Issue used to describe its details. Think of this as the
Abstract that comes before the Implementation.&lt;/p&gt;
&lt;p&gt;It's important to first look through all the existing Issues, including ones
that have been Closed to determine if someone else has already made an
overlapping Issue. If they have, it's best to add any new information you've
discovered or thought of as a comment (or series of comments) to that Issue,
rather than create a new one.&lt;/p&gt;
&lt;p&gt;Errors (often referred to as Bugs) should be verified, and reproducible if
possible. Things like screenshots, steps to reproduce, a video, and environment
details are all incredibly helpful for others when they want to review the
error. All that information is gathered and placed in a succinct, but detailed
Issue on the associated repository. It's worth noting that the documented Issue
alone is a valued contribution. It will provide guidance and documentation for
whomever works on resolving or implementing it, so it's just as important as
the eventual code that will be written. That means it should be done well,
because the better an Issue describes an error and provides a clear way to
reproduce it the easier it will be for anyone to address it.&lt;/p&gt;
&lt;p&gt;Functionality and Feature proposals are often a little more involved. Errors
are some aberration in the existing expectations or functionality of the
codebase's state, but new/changed functionality or features introduce larger
planning considerations. They have to take into account the current state of
things and the proposed future state they're introducing as an Issue. This is
an exercise in communication and description first and foremost, and that means
that having a detailed writeup, wireframes, mockups, and evidence to support
the proposal is vital to its success. Where Errors might be able to consider a
more isolated set of consequences to fixing something, introducing new
features/functionality may have unintended side effects, it may require
multiple parts of the codebase to be changed or altered. All of these larger
picture considerations should be taken into account and addressed within the
Issue. One should expect that a Feature Issue may on average take longer to
introduce, and longer to adequately document in a clear and concise way to get
the point across to the rest of the community.&lt;/p&gt;
&lt;p&gt;Documentation can always use improvements whether within code comments, a
project's README.md, or associated documentation. These would largely be
considered a "Feature Issue" technically but it's worth pointing them out
separately because they're as important, if not more so, than fixing errors or
adding codebase level functionality. Good documentation makes the project
strong and the community more informed. Improvements here should document where
there's a gap or where revisions are needed, and how they should be corrected.&lt;/p&gt;
&lt;p&gt;Whether an Error or Feature/functionality Issue, once it's been submitted, in
accordance with the &lt;a href="/contributing-code/"&gt;Contribution
Guidelines&lt;/a&gt;, it will
move to a status of "awaiting triage". This means that it is waiting to be
reviewed by one of the core codebase contributors. While it's in this state no
implementation work should be done (no PRs, no code work to add or correct the
behavior). An Issue submitted is largely the start of a process, and a
conversation. Core contributors will review the Issue and see if it adequately
describes its appropriate details, and if its objectives fit within the larger
pattern and goals of the codebase itself. It's entirely possible that a well
thought through Feature Issue that adds some new menu functionality is in
isolation a good idea, but that it doesn't fit within the goals of the project
in question and won't move forward. And that's OK, even if an Issue doesn't
move forward it can now stand as documentation for the community on what won't
be worked on at this time, which is just as important as what will. It's a
contribution whether it moves forward or not, so long as it describes itself
well enough.&lt;/p&gt;
&lt;p&gt;If this happens, the Issue will be moved to a status of "discarded", and will
be closed with a comment explaining why. The other reason an Issue might be
moved to "discarded" is that it duplicates the work in another Issue, which is
why it's important to first check all the existing Issues prior to submitting a
new one.&lt;/p&gt;
&lt;p&gt;Sometimes an Issue might describe something much broader than can be easily
contained within itself and may be converted to a status of "discussion". This
means that the Issue should spark a larger conversation within the community to
consider all the angles of abstract, and possibly split the idea up into more
manageable pieces across multiple Issues. Other outcomes might be a discussion
that realizes that while the idea is sound, it's not implementable at this time
and won't move forward.&lt;/p&gt;
&lt;p&gt;Some Issues are solid ideas, but they are not something that can move forward
until work on other Issues is completed first. As such they tend to move to a
status of "blocked". They'll sit in that state until they're unblocked and the
work can happen.&lt;/p&gt;
&lt;p&gt;If an Issue seems like it doesn't have enough information to determine what to
do with it, then it will likely move to a status of "ticket work required" and
a comment will usually be left describing what needs to be worked on.&lt;/p&gt;
&lt;p&gt;Remember, an Issue is a form of documentation, and in a way it's a
conversation, and that means that until it moves forward it's very much a work
in progress.&lt;/p&gt;
&lt;p&gt;If an Issue passes through this period as implementable, then it'll move to a
status of "ready for work". This is the point at which it can be implemented,
and a contributor can submit a Pull Request addressing it. (See the &lt;a href="/contributing-code/repo-labels/#status"&gt;Repository
Labels Status section&lt;/a&gt; for more
information)&lt;/p&gt;
&lt;p&gt;During this process it is worth noting that there will be multiple types of
contribution. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Issue itself is a contribution&lt;/li&gt;
&lt;li&gt;Comments on the Issue from the community refining it are each contributions&lt;/li&gt;
&lt;li&gt;Someone's comment on the Issue helping another person sort out why the Error
is occurring is a contribution.&lt;/li&gt;
&lt;li&gt;Someone finding another related Issue and linking it as relevant to that
Issue is a contribution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of these contributions occurred before a Pull Request was ever initiated.
Once an Issue enters a status of "ready for work" someone who has indicated
interest on that Issue will be assigned to it and can then fork the repository,
make a branch to work within, and once settled submit a Pull Request. That
process alone may involve several contributions as well, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The code work encounters a problem, someone asks for assistance within their
draft PR, and several members offer help as comments.&lt;/li&gt;
&lt;li&gt;Someone reviews the final PR and leaves a detailed review on what might need
addressing&lt;/li&gt;
&lt;li&gt;A discussion breaks out on the best way to resolve an encountered problem
with the PR, each of these comments is a contribution&lt;/li&gt;
&lt;li&gt;And, of course, the PR itself is a contribution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the PR passes Review then it'll be marked as Approved and merged into the
codebase, that will trigger the associated Issue to close as complete and now
the Error Fix or Functionality in question will be fully implemented into the
project.&lt;/p&gt;
&lt;p&gt;To get here it took multiple contributions, from different community members,
that's the power of open source!&lt;/p&gt;
</content></entry><entry><title>Outreachy Internship Mid-point Progress Update</title><link href="http://opensource.creativecommons.org/blog/entries/2023-02-01-outreachy-mid-point/" rel="alternate"></link><updated>2023-02-01T00:00:00Z</updated><author><name>['precious']</name></author><id>urn:uuid:b485694d-3627-3a6f-b124-489f4786b53b</id><content type="html">&lt;p&gt;&lt;img src="https://res.cloudinary.com/dexcmkxjl/image/upload/v1675262087/1157214599-I-may-not-be-there-yet_s3cjxm.webp" alt="quote image"&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Outreachy Internship- Refactor CC Meta Search- Mid-point Progress Update&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As an intern at Creative Commons, my original project timeline was to refactor the old CC search website to use semantic and modern HTML and CSS. This project was intended to improve the user experience and make the website more accessible to a broader audience. The old CC Meta Searcg website is built on PHP and JavaScript. My major goal for this internship is to convert the PHP to semantic html and modern CSS while ensuring that all neccessary functionalities are intact.&lt;/p&gt;
&lt;p&gt;I have met several of my goals in the first half of my internship. So far, I have successfully refactored the website's HTML to use semantic elements, which improves the website's accessibility and makes it easier for users to understand the content. I achieved this by creating a new index.html file and rebuilding the site with semantic HTML. Additionally, I have also implemented modern CSS techniques to improve the website's visual design and make it more responsive on various devices.  All of this is currently being reviewed by my mentors for feedback on additional changes that might be needed.&lt;/p&gt;
&lt;p&gt;However, there were some project goals that took longer than expected to complete. One of the main reasons for this was that the website's codebase was not well-organized, which made it difficult for me to navigate and understand. Additionally, the site had a pre-determined CSS file that I was supposed to follow or incorporate while building the new site, but this file was so cumbersome and most of the styles did not give the desired result. I spent a lot of time trying to understand and navigate through this and eventually I had to speak to my mentor about it, and also brought up the suggestion of me writing out my own CSS stylings, which she agreed to. Thus this made the original goal, of incorporating the CC Vocabulary CSS file, to be modified.&lt;/p&gt;
&lt;p&gt;Additionally, I had to prioritize certain tasks over others and make adjustments to my plan as necessary.&lt;/p&gt;
&lt;p&gt;The new CSS I have written so far already makes the website's layout responsive. I have also created a new script.js file and started working on the neccessary functionalities of the website. I plan to implement all feedback gotten from my mentors and debug any remaining issues. Additionally, I will be working on improving the website's overall performance by implementing several optimization techniques as necessary.&lt;/p&gt;
&lt;p&gt;Overall, My aim is to ensure that the website is fully functional and user-friendly for all users.&lt;/p&gt;
</content></entry><entry><title>How I Landed My First Internship With Outreachy</title><link href="http://opensource.creativecommons.org/blog/entries/2023-01-04-how-i-landed-my-first-internship/" rel="alternate"></link><updated>2023-01-04T00:00:00Z</updated><author><name>['precious']</name></author><id>urn:uuid:54076fa3-da5e-30ee-afe8-9b9d8baadb6d</id><content type="html">&lt;p&gt;&lt;img src="https://res.cloudinary.com/dexcmkxjl/image/upload/v1671657493/blog_image_kjvep8.jpg" alt="quote image"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Take a moment to resonate with the words above. This is a principle I always follow in life. "Doing it scared"&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Get To Know Me&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;My name is Precious Oritsedere. I am a Nigerian. I am a software engineer who is known for her dedication to her work and her strong core values. I believe in the power of love, contribution, and empathy, hence I use these values as my guide in both my personal and professional life.&lt;/p&gt;
&lt;p&gt;As a young girl, I have always been fascinated by technology, gadgets, and even games. However, I didn't study any technology-related courses at university. I graduated with a degree in International Studies and Diplomacy but this didn't stop my curiosity for Technology. I really wanted to pursue my passion for technology but I was unsure of the means to go about it because I had no guide.&lt;/p&gt;
&lt;p&gt;Fast-forward to January 2022, I spoke to a friend about how I really wanted to know and learn more about Software Engineering and he introduced me to AltSchool Africa. This was the starting point of my Tech Career. Subsequently, I began to learn all about Frontend Engineering, and I also got introduced to open source.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why I applied to Outreachy&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As a software engineer, I am committed to using my skills to make a positive impact on the world. I believe in the power of collaboration and teamwork and I am always willing to lend a helping hand to my colleagues. This was why the concept of &lt;a href="https://opensource.com/resources/what-open-source"&gt;open source&lt;/a&gt; really appealed to my personality.&lt;/p&gt;
&lt;p&gt;One of the things that motivate me, is my desire to give back to my community. I am passionate about making a difference, and this is why I applied for the &lt;a href="https://www.outreachy.org/"&gt;Outreachy&lt;/a&gt; internship. Outreachy is a program that provides paid internships in open source and open science. Outreachy provides internships to people subject to systemic bias and impacted by underrepresentation in the technology industry where they are living. Interns are paid a stipend of $7,000 for a period of 3 months.&lt;/p&gt;
&lt;p&gt;The Outreachy application process consists of three stages: the initial application, the contribution period, and the final application. And I was determined to succeed in each stage and eventually secure an internship. Which I did! Here's how;&lt;/p&gt;
&lt;p&gt;The first stage of the application process was the &lt;strong&gt;initial application&lt;/strong&gt;. I carefully reviewed the requirements and made sure I met all of them. We were asked to write 3 essays centered around how we have been underrepresented in the tech industry. I took quite some time to carefully think this through before writing my essays. After which, I submitted my application on time and waited for a response.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://res.cloudinary.com/dexcmkxjl/image/upload/v1671658423/initial_applic_bnriee.png" alt="initial application mail"&gt;&lt;/p&gt;
&lt;p&gt;A few weeks later, I received an email from the Outreachy organizers stating that I had been selected to move on to the next stage: the contribution period. This stage involved making a contribution to an open-source project. I was excited to have the opportunity to make a real impact and I spent hours researching the different projects. I eventually chose to contribute to the &lt;a href="https://creativecommons.org/"&gt;Creative Commons&lt;/a&gt; project "Refactor CC Meta Search".&lt;/p&gt;
&lt;p&gt;&lt;a href="https://creativecommons.org/"&gt;Creative Commons&lt;/a&gt; is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. They also help overcome legal obstacles to the sharing of knowledge and creativity to address the world’s most pressing challenges.&lt;/p&gt;
&lt;p&gt;I was so determined to make a high-quality contribution to this project. I learnt a lot during this stage. I got introduced to new tools like Docker , Linux and even PHP. I spent countless hours learning the codebase and working on my contribution. In the spirit of love and collaboration, I spent a lot of hours helping my fellow Outreachy applicants who were beginners or confused about what to do. I also reached out to the project mentors for guidance when I got stuck and feedback on the contributions I was making.&lt;/p&gt;
&lt;p&gt;My hard work paid off and I successfully submitted my contribution on time. For the final stage, I made a final application to be considered for an internship position with Creative Commons.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://res.cloudinary.com/dexcmkxjl/image/upload/v1671658873/accepted_chmuad.png" alt="acceptance mail"&gt;&lt;/p&gt;
&lt;p&gt;I was so thrilled when I received the congratulatory mail some weeks later that I had been selected for the internship. My success in the Outreachy application process was a result of my dedication and hard work. I put in the time and effort to ensure I met all the requirements and made a valuable contribution to the project. And I am happy to announce that my determination and perseverance paid off and I was rewarded with a valuable internship opportunity.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://res.cloudinary.com/dexcmkxjl/image/upload/v1671659054/interns_tedoag.png" alt="interns page"&gt;&lt;/p&gt;
&lt;p&gt;In conclusion, Precious Oritsedere is a talented software engineer who is dedicated to her work and driven by her core values of love, contribution, and empathy. She is motivated by her desire to make a difference, and this is evident in her commitment to her work and her participation in the Outreachy internship.&lt;/p&gt;
</content></entry><entry><title>Thinking More Openly About Working in The Open</title><link href="http://opensource.creativecommons.org/blog/entries/2022-12-16-new-to-working-in-open/" rel="alternate"></link><updated>2022-12-16T00:00:00Z</updated><author><name>['sara']</name></author><id>urn:uuid:8f6dcc9a-f3b5-3f3f-9e29-56ef6d657688</id><content type="html">&lt;p&gt;I began working at Creative Commons (CC) as the Full Stack Engineer this year
and it’s been amazing to get to work in the open at CC. But as someone who
has been working in closed, internal source environments for a very long
time it’s definitely been a learning experience and a perspective shift.&lt;/p&gt;
&lt;p&gt;For years I benefited from, observed, and offered up personal work into the
world of open source, but I was never deeply involved in other projects in
a big way, nor was I able to contribute anything I did at my professional
day job back into the open source world (despite the benefit open source
afforded the work I did every day). It had been a hope of mine, something
I had advocated for, but had ultimately not worked out. Now at CC I
finally get to participate in projects that operate in the open, and a
larger community of contributors around the world.&lt;/p&gt;
&lt;p&gt;It's been refreshing and rewarding, but it's also been enlightening. There's
so much that's different now. Working in the open doesn't just shift the
terms under which your code is licensed or how many people can contribute, it
requires a significant shift in both approach and process.&lt;/p&gt;
&lt;p&gt;For example, working in the open means that while there may be community members
eager to contribute they may lack contextual understanding that someone more
intimately familiar with a project might develop over time and rely upon. To
support contributions well you need to have a heavily documentation-first
strategy that affords new contributors key information in understandable and
clear instructions.&lt;/p&gt;
&lt;p&gt;That also means that documenting &lt;em&gt;issues&lt;/em&gt; isn't just an item on a todo list
you'll get to later. There's extreme value in writing out detailed information
both for your future self, but also for any would-be community contributors to
understand the problem and address it. Setup instructions, contextual
documentation about the codebase, as well as detailed known issues, roadmaps,
etc. All of it needs to be documented and written out, which not only
benefits the community contributors, but also benefits the project as a
whole. It means key information has to live in the open alongside the code
it informs. It's truly a win-win all around.&lt;/p&gt;
&lt;p&gt;The process also has to shift, you can't just make a list of things you want to
tackle and get to work, you have to consider how each item can be smoothly
adopted as granular and iterative Pull Requests that might all be worked on
by entirely different individuals. The level of care in how the work is
divided and scoped matters even more in this situation than it would have
with an internal team. Working in the open doesn't just mean coding in the
open, it also means planning in the open, and that means having a clearer
view on the overall roadmap and goals the project hopes to meet.&lt;/p&gt;
&lt;p&gt;If you are the steward of a codebase any task list you create or &lt;em&gt;issues&lt;/em&gt; you
identify are ultimately not just for you alone. Putting an item on your list
when you're working alone isn't enough, you've also got to find time to work
on that item, and work your way through completing it.&lt;/p&gt;
&lt;p&gt;In the open source context, working with a community of contributors, creating
an &lt;em&gt;issue&lt;/em&gt; is just as important and meaningful as writing code, in many cases
it might actually be MORE important. Because &lt;em&gt;issues&lt;/em&gt; are often the way in
which contributors first offer up help and insight, they're the first contact
they have with your project. Furthermore, any &lt;em&gt;issue&lt;/em&gt; you create may end up
getting completed by one or more people that are not you, which means it
doesn't just sit on a list till you do it. It's a small, but significant
shift in how you think about planning and breaking down work on a codebase
in the open.&lt;/p&gt;
&lt;p&gt;It’s certainly new, but incredibly rewarding. Even on days where I might not get
to submit a Pull Request myself, or squash a bug in a meaningful way, I can still
feel I offered up meaningful contributions to the community and the codebase
through better documentation, answering someone’s question, reworking a
process, or reviewing someone else’s generous contribution. Open Source means
opening up your definition of what contribution means, and it’s a lot broader
and more meaningful than I thought.&lt;/p&gt;
</content></entry><entry><title>Data Science Discovery: Quantifying the Commons</title><link href="http://opensource.creativecommons.org/blog/entries/2022-12-07-berkeley-quantifying/" rel="alternate"></link><updated>2022-12-07T00:00:00Z</updated><author><name>['Dun-MingHuang', 'ShuranYang']</name></author><id>urn:uuid:b5ca9376-727d-3a82-97f0-d150f1827d77</id><content type="html">&lt;p&gt;University of California, Berkeley, Data Science Discovery Program Fall 2022&lt;/p&gt;
&lt;h2 id="project-objective"&gt;Project Objective&lt;/h2&gt;&lt;h3 id="problem-statement"&gt;Problem Statement&lt;/h3&gt;&lt;p&gt;In the previous years, from 2014 to 2017, Creative Commons (CC) have been
releasing public reports detailing the growth, size, and usage of Creative
Commons, demonstrating the significance and influences of Creative Commons.
However, the effort to quantity Creative Commons has ceased at the proceeding
year. This is the preincarnation of our current open-source project:
&lt;a href="https://github.com/creativecommons/quantifying"&gt;Quantifying the Commons&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;An example visualization from the previous report in 2017:
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/2017_state_of_the_commons_data.png" alt="2017 State of the Commons data graph"&gt;&lt;/p&gt;
&lt;p&gt;The reason is that prior efforts to generate usage reports suffered unreliable
data retrieval methods; while prone to malfunction over the updates of website
architecture from data sources, these data extraction methods are not
particularly rigorous in performance and have a significantly low (compared to
current methods, at the scale or an hour v.s. 5 business days).&lt;/p&gt;
&lt;p&gt;To advance and continue the work of quantifying CC product states, the student
researchers are delegated the design and implementation for reliable data
retrieval processes on CC data that were employed in previous reports to
replicate past efforts of this project's preincarnation, quantify the size and
diversity of CC Product Usage on the Internet.&lt;/p&gt;
&lt;h2 id="data-retrieval"&gt;Data Retrieval&lt;/h2&gt;&lt;h3 id="how-to-detect-county-of-cc-licensed-documents"&gt;How to detect county of CC-Licensed Documents?&lt;/h3&gt;&lt;p&gt;If an online document uses a CC tool to protect it, then it will either be
labeled as license under that tool or contain a hyperlink towards a
creativecommons.org webpage that explains the license's rules (the deed).&lt;/p&gt;
&lt;p&gt;Therefore, we may use the following approach to identify and count CC-licensed
documents:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Select a list of CC tools to inspect (provided by CC).&lt;/li&gt;
&lt;li&gt;Use APIs of different online platforms to detect and count documents that
are labeled as license by platform and/or contains a hyperlink towards CC
license webpages.&lt;/li&gt;
&lt;li&gt;Store these data in tabular form to contain the count of documents protected
under each type of CC tools.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="what-platforms-to-collect-counts-from"&gt;What platforms to collect counts from?&lt;/h3&gt;&lt;p&gt;Here is a list of online platforms that we sampled document count from, as well
as the delegations for platforms' data collection, visualization, and modeling
in this project:&lt;/p&gt;
&lt;table class="table table-striped"&gt;
&lt;thead class="thead-dark"&gt;&lt;tr&gt;
&lt;th&gt;Platforms Containing Webpages&lt;/th&gt;
&lt;th&gt;Platforms Containing Photos&lt;/th&gt;
&lt;th&gt;Platforms containing Videos&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Google (Dun-Ming Huang)&lt;/td&gt;
&lt;td&gt;DeviantArt (Dun-Ming Huang)&lt;/td&gt;
&lt;td&gt;Vimeo (Dun-Ming Huang)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internet Archive (Dun-Ming Huang)&lt;/td&gt;
&lt;td&gt;Flickr (Shuran Yang)&lt;/td&gt;
&lt;td&gt;YouTube (Dun-Ming Huang)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;MetMuseum (Dun-Ming Huang)&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;WikiCommons (Dun-Ming Huang)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="exploratory-data-analysis-eda"&gt;Exploratory Data Analysis (EDA)&lt;/h3&gt;&lt;p&gt;Here are some significant defects found in datasets across sampled platforms
during EDA:&lt;/p&gt;
&lt;h3 id="flickr"&gt;Flickr&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Sampled Document Count from this dataset is at 35,000% ~ 100,000% of
deviation from official statistics per CC product (license) investigated.&lt;/li&gt;
&lt;li&gt;Sampling frame locked at 4,000 available searched photos from each license.&lt;/li&gt;
&lt;li&gt;Significant duplication issue (resolved).&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="google-custom-search-api"&gt;Google Custom Search API&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;Programmable Search Engine only reaches a subset of Google's website. The
impact is not significant (then, further resolved via sampling frame
adjustments in PSE).&lt;/li&gt;
&lt;li&gt;Accidentally used deprecated operators and parameters, causing faithfulness
problems (resolved).&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="youtube-data-api"&gt;YouTube Data API&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;API has maximum response value on total count of YouTube videos, causing
severe underestimate.&lt;ul&gt;
&lt;li&gt;Resolved via implementing custom granularity on data to enable honest
response, conserve development cost, and introduce imputations in
visualization.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="expanding-the-dataset"&gt;Expanding the Dataset&lt;/h3&gt;&lt;p&gt;Here are reasons and efforts of dataset expansion on platforms that received
more data:&lt;/p&gt;
&lt;h4 id="google-custom-search-api"&gt;Google Custom Search API&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;Revised Data Sampling process to solve EDA-discovered inaccuracies.&lt;/li&gt;
&lt;li&gt;For expanding the horizons of CC product usage analyses upon past boundaries,
where visualization was only conducted to compare cross-product performance,
I incorporated further CC-product usage data across temporal axis and
geographical demographics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="youtube-data-api"&gt;YouTube Data API&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;Revised Data Sampling process to solve EDA-discovered inaccuracies.&lt;/li&gt;
&lt;li&gt;To perform unprecedented analyses on media-specific time-respective
developments of CC options on popular platforms, YouTube's CC-licensed
video count across two-month periods.&lt;/li&gt;
&lt;li&gt;Introduced imputation to alleviate unresolvable capped responses from YouTube
and mitigate developmental cost in response to Youtube API's capping
behaviour.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="visualization"&gt;Visualization&lt;/h2&gt;&lt;h3 id="philosophies-and-principles"&gt;Philosophies and Principles&lt;/h3&gt;&lt;p&gt;The visualizations of Quantifying the Commons is to be communicative and
exhibitory.&lt;/p&gt;
&lt;p&gt;Some new aesthetics and principles we adopted (as a response to enhancement of
prior efforts) are to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Present length in place of area for comprehensibility&lt;/li&gt;
&lt;li&gt;Analyze product development beyond license-wise comparisons&lt;/li&gt;
&lt;li&gt;Utilize colors for presenting data inclinations via works in Pandas, Seaborn,
NumPy, Geopandas, and SpaCy&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="exhibiting-a-selection-of-visualizations"&gt;Exhibiting a Selection of Visualizations&lt;/h3&gt;&lt;h4 id="diagram-1c"&gt;Diagram 1C&lt;/h4&gt;&lt;p&gt;Trend Chart of Creative Commons Usage on Google
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_1c.png" alt="Trend Chart of Creative Commons Usage on Google"&gt;&lt;/p&gt;
&lt;p&gt;There are now &lt;strong&gt;more than 2.7 Billion webpages protected by Creative Commons&lt;/strong&gt;
indexed by Google!&lt;/p&gt;
&lt;h4 id="diagram-2"&gt;Diagram 2&lt;/h4&gt;&lt;p&gt;Heatmap on density of CC-licensed Google indexed webpages over country
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_2.png" alt="Heatmap on density of CC-licensed Google indexed webpages over country"&gt;&lt;/p&gt;
&lt;p&gt;Particularly, &lt;strong&gt;Western Europe and Americas enjoy a much robust use&lt;/strong&gt; of
Creative Commons document in terms of quantity. A Development in Asia and
Africa should be encouraged.&lt;/p&gt;
&lt;h4 id="diagram-3c"&gt;Diagram 3C&lt;/h4&gt;&lt;p&gt;Barplot for number of webpages protected by six primary CC licenses
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_3c.png" alt="Barplot for number of webpages protected by six primary CC licenses"&gt;&lt;/p&gt;
&lt;p&gt;We can see that &lt;strong&gt;Attribution&lt;/strong&gt; (BY) and &lt;strong&gt;Attribution-Nonderivative (BY-ND)
are popular licenses&lt;/strong&gt; among the 3 billion documents sampled across the
dataset.&lt;/p&gt;
&lt;h4 id="diagram-6"&gt;Diagram 6&lt;/h4&gt;&lt;p&gt;Barplot of CC-licensed documents across Free Culture and Non Free Culture
licenses
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_6.png" alt="Barplot of CC-licensed documents across Free Culture and Non Free Culture licenses"&gt;&lt;/p&gt;
&lt;p&gt;Roughly &lt;strong&gt;45.3% of the documents under CC protection are covered by Free
Culture&lt;/strong&gt; legal tools.&lt;/p&gt;
&lt;h4 id="flickr-diagrams"&gt;Flickr Diagrams&lt;/h4&gt;&lt;p&gt;Usage of CC licenses on Flickr concentrated on Australia, Brazil, United Stated
of America while is pretty low in Asia countries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Sampling Frame of these visualizations are locked at the first 4,000
search results on photos under each general license types.&lt;/p&gt;
&lt;h5 id="diagram-7a"&gt;Diagram 7A&lt;/h5&gt;&lt;p&gt;Analysis of Creative Commons Usage on Flickr&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_7a.png" alt="CC BY-SA 2.0 license usage in Flickr pictures taken during 1962-2022"&gt;&lt;/p&gt;
&lt;h5 id="diagram-7b"&gt;Diagram 7B&lt;/h5&gt;&lt;p&gt;&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_7b.png" alt="Flickr maximum views of pictures under all licenses"&gt;&lt;/p&gt;
&lt;p&gt;Photos on Flickr under Attribution-NonCommercial-NoDerivs (BY-NC-ND) license has
gained highest possible views, while usage of license Public Domain Mark has
highest increasing trend in recent years.&lt;/p&gt;
&lt;h5 id="diagram-7c"&gt;Diagram 7C&lt;/h5&gt;&lt;p&gt;&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_7c.png" alt="Flickr yearly trend of all licenses 2018-2022"&gt;&lt;/p&gt;
&lt;h5 id="diagram-7d"&gt;Diagram 7D&lt;/h5&gt;&lt;p&gt;&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_7d.png" alt="Flickr Photos under CC-BY-NC-SA 2.0 and CC BY-NC 2.0: Categories Keywords"&gt;&lt;/p&gt;
&lt;h4 id="diagram-8"&gt;Diagram 8&lt;/h4&gt;&lt;p&gt;Number of works under Creative Commons Tools across Platforms
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_8.png" alt="Number of works under Creative Commons Tools across Platforms"&gt;&lt;/p&gt;
&lt;p&gt;DeviantArt presents the most number of works under Creative Commons licenses
and tools, followed by Wikipedia and WikiCommons. The estimate of video counts
on YouTube is understimated, as demonstrated in Diagram 11B.&lt;/p&gt;
&lt;h4 id="diagram-9b"&gt;Diagram 9B&lt;/h4&gt;&lt;p&gt;Barplot of Creative Commons Protected Documents across Countries
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_9b.png" alt="Barplot of Creative Commons Protected Documents across Countries"&gt;&lt;/p&gt;
&lt;h4 id="diagram-10"&gt;Diagram 10&lt;/h4&gt;&lt;p&gt;Barplot of Creative Commons Protected Documents across languages
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_10.png" alt="Barplot of Creative Commons Protected Documents across languages"&gt;&lt;/p&gt;
&lt;h4 id="diagram-11b"&gt;Diagram 11B&lt;/h4&gt;&lt;p&gt;Trend Chart of Cumulative Count of CC-Licensed YouTube Videos across Each Two-Months
&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/diagram_11b.png" alt="Trend Chart of Cumulative Count of CC-Licensed YouTube Videos across Each Two-Months"&gt;&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;orange line stand for the imputed value of new CC-Licensed YouTube video
counts based on linear regression,&lt;/strong&gt; which is the decided method of imputation
because most medias' growth of CC-licensed document count also experience a
linear growth.&lt;/p&gt;
&lt;h2 id="modeling"&gt;Modeling&lt;/h2&gt;&lt;p&gt;(A side track)&lt;/p&gt;
&lt;h3 id="objectives-of-modeling"&gt;Objectives of Modeling&lt;/h3&gt;&lt;p&gt;The models of this project aim to answer: "What is the license typing of a
webpage/web document given its content?"&lt;/p&gt;
&lt;p&gt;Individual researchers have attempted each of their solutions via different
resources, metrics, under different modeling contexts:&lt;/p&gt;
&lt;h4 id="model-of-google-webpages-dun-ming-huang"&gt;Model of Google Webpages (Dun-Ming Huang)&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;Modeling Context: Multiclass Classifier (7 classes).&lt;/li&gt;
&lt;li&gt;Modeling Training set: Text webpage contents acquired from Google API
collected webpages (Common Crawl, the original choice, was marked
unavailable due to source code corruption).&lt;/li&gt;
&lt;li&gt;Main Model Metric: Top-k accuracy, as this model is considered as the backend
of a license recommendation system that receives webpage content and
recommend 2 to 3 licenses to the user.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="model-for-flickr-photos-shuran-yang"&gt;Model for Flickr Photos (Shuran Yang)&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;Modeling Context: Binary Classifier (BY vs. BY-SA)&lt;/li&gt;
&lt;li&gt;Modeling Training set: Text photo descriptions acquired from Flickr API (with
sampling frame of visualizations)&lt;/li&gt;
&lt;li&gt;Main Model Metric: Accuracy&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="training-process-summary-google-model"&gt;Training Process Summary: Google Model&lt;/h3&gt;&lt;h4 id="preprocessing-pipeline"&gt;Preprocessing Pipeline&lt;/h4&gt;&lt;ol&gt;
&lt;li&gt;Deduplication&lt;/li&gt;
&lt;li&gt;Remove Non-English Characters&lt;/li&gt;
&lt;li&gt;URL, &lt;code&gt;[^\w\s]&lt;/code&gt;, Stopword Removal&lt;/li&gt;
&lt;li&gt;Remove Non-English Words&lt;/li&gt;
&lt;li&gt;Remove Short Words, Short Contents&lt;/li&gt;
&lt;li&gt;TF-IDF + SVD&lt;/li&gt;
&lt;li&gt;SMOTE&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="model-selection"&gt;Model Selection&lt;/h4&gt;&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;Logistic&lt;/span&gt; &lt;span class="n"&gt;Regression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;l2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;solver&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;liblinear&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;class_weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;balanced&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;SVC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;probability&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;poly&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;degreee&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;class_weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;balanced&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;RandomClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;class_weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;balanced_subsample&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;GradientBoostingClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;n_estimators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;NultinomialNB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;fit_prior&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;text : InputLayer&lt;/li&gt;
&lt;li&gt;preprocessing : KerasLayer&lt;/li&gt;
&lt;li&gt;BERT_encoder : KerasLayer&lt;/li&gt;
&lt;li&gt;dropout : Dropout&lt;/li&gt;
&lt;li&gt;classifier : Dense&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="training-results"&gt;Training Results&lt;/h4&gt;&lt;p&gt;&lt;img src="/blog/entries/2022-12-07-berkeley-quantifying/training_performance.png" alt="Testing Performances across Models by Top-k Accuracy"&gt;&lt;/p&gt;
&lt;h3 id="training-process-summary-flickr-model"&gt;Training Process Summary: Flickr Model&lt;/h3&gt;&lt;h4 id="preprocessing-pipeline"&gt;Preprocessing Pipeline&lt;/h4&gt;&lt;ol&gt;
&lt;li&gt;Deduplication&lt;/li&gt;
&lt;li&gt;Translation&lt;/li&gt;
&lt;li&gt;Stopword Removal, Lemmatization&lt;/li&gt;
&lt;li&gt;TF-IDF&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="model-selection"&gt;Model Selection&lt;/h4&gt;&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;SVC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;kernel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;linear&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;gamma&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;auto&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id="training-results"&gt;Training Results&lt;/h4&gt;&lt;p&gt;An accuracy of 66.87% was reached.&lt;/p&gt;
&lt;h2 id="next-steps"&gt;Next Steps&lt;/h2&gt;&lt;h3 id="from-preincarnation-to-present"&gt;From Preincarnation to Present&lt;/h3&gt;&lt;p&gt;Via the efforts addressed above, we have not only managed to transform a data
retrieval process from unstable, unexplored, and unavailable into an
algorithmic, deterministic process reliable, documented, and interpretable! And
the visualizations have become more exhibitory, concentrating on more
effortfully extracted insights, and look at Creative Commons in further depth
and more remarkable breadth.&lt;/p&gt;
&lt;p&gt;With significant re-implementations and designing policies to the data
retrieval process for Quantifying the Commons, visualizations can be readily,
immediately produced upon command; and upon the conceptual transformations of
visualization production, Creative Commons will obtain new insights into the
development of product and eventual policies upon the axes along which data was
extracted from. Furthermore, we expect the production of model to work beyond
the bounds of a Machine Learning product, but as a possibility to draw
inferences upon product usage upon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Such efforts are a short jump start to the long-term reincarnation of
Quantifying the Commons.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="from-reincarnation-onto-baton-touches"&gt;From Reincarnation onto Baton Touches&lt;/h3&gt;&lt;p&gt;The current team would encourage the future team to increase the availability
and user experience for our open source data extraction method, via automation
and by-batch data extraction methods, for which Dun-Ming has written a design
policy for. For modeling, the team also encourage building ingerence pipelines
for using ELI5 for Logistic Regression models, as well as experiment more with
loss function options of Gradient Boosting Classifier. For Flickr, the writer
of this poster would like to suggest some data extraction method outside Flickr
API but has access towards Flickr media, say Google Custom Search API.&lt;/p&gt;
&lt;h2 id="additional-reading"&gt;Additional Reading&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Dun-Ming Huang blogs:&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-0-10-d1844092fc7a"&gt;DSD Fall 2022: Quantifying the Commons (0/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-1-10-970dc24626b"&gt;DSD Fall 2022: Quantifying the Commons (1/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-2-10-537a5b204d7b"&gt;DSD Fall 2022: Quantifying the Commons (2/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-3-10-79bbfeb90daa"&gt;DSD Fall 2022: Quantifying the Commons (3/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-4-10-9bc90ec98262"&gt;DSD Fall 2022: Quantifying the Commons (4/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-5-10-475334a8895"&gt;DSD Fall 2022: Quantifying the Commons (5/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-6-10-961de95ef3aa"&gt;DSD Fall 2022: Quantifying the Commons (6/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-7a-10-ea011b9e05ee"&gt;DSD Fall 2022: Quantifying the Commons (7A/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-7b-10-e8bd8ba1c18a"&gt;DSD Fall 2022: Quantifying the Commons (7B/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-8a-10-6f5336c00d11"&gt;DSD Fall 2022: Quantifying the Commons (8A/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-8b-10-aa1ec8e2ae63"&gt;DSD Fall 2022: Quantifying the Commons (8B/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-9-10-536617bdcbb0"&gt;DSD Fall 2022: Quantifying the Commons (9/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bransthre/dsd-fall-2022-quantifying-the-commons-10-10-47cbcb9bc8c2"&gt;DSD Fall 2022: Quantifying the Commons (10/10) | by Bransthre | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Shuran Yang blog:&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/@shuran1030/quantifying-the-commons-data-science-discovery-program-fall-2022-8e8c15b1ace3"&gt;Quantifying the Commons — Data Science Discovery Program Fall 2022 | by Shuran Yang | Nov, 2022 | Medium&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
</content></entry><entry><title>CalVer to SemVer</title><link href="http://opensource.creativecommons.org/blog/entries/2022-11-11-calver-to-semver/" rel="alternate"></link><updated>2022-11-11T00:00:00Z</updated><author><name>['TimidRobot']</name></author><id>urn:uuid:70d61d34-1664-30a4-81f1-8cf222f5b31f</id><content type="html">&lt;p&gt;Creative Commons (CC) tried to use CalVer (calendar versioning), but
encountered too many issues and decided on SemVer (semantic versioning)
instead.&lt;/p&gt;
&lt;h2 id="why-we-chose-calver"&gt;Why we chose CalVer&lt;/h2&gt;&lt;p&gt;Years ago, the CC technology team standardized on using &lt;a href="https://calver.org/"&gt;CalVer&lt;/a&gt; as our
versioning scheme. Specifically, we selected &lt;code&gt;YYYY.0M.MICRO&lt;/code&gt;. &lt;a href="https://calver.org/"&gt;CalVer&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;YYYY&lt;/code&gt;&lt;/strong&gt; - Full year - 2006, 2016, 2106&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;0M&lt;/code&gt;&lt;/strong&gt; - Zero-padded month - 01, 02 ... 11, 12&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;Micro&lt;/code&gt;&lt;/strong&gt; - The third and usually final number in the version. Sometimes
referred to as the "patch" segment.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;p&gt;The use of CalVer was inspired by Ubuntu, pip, SaltStack, and others. It was
thought that CalVer not only matched &lt;a href="https://semver.org/"&gt;SemVer&lt;/a&gt; in communicating
potential risks to users, but also gave additional temporal context. Also, many
argue that the promises of SemVer’s &lt;code&gt;MAJOR.MINOR.PATCH&lt;/code&gt; go unfulfilled often
enough that they lose meaning and that the differences between MINOR/PATCH are
too poorly defined (more on these later).&lt;/p&gt;
&lt;h2 id="issues-encountered-with-calver"&gt;Issues Encountered with CalVer&lt;/h2&gt;&lt;h3 id="time/duration-is-not-primarily-relevant"&gt;Time/Duration Is Not Primarily Relevant&lt;/h3&gt;&lt;p&gt;CalVer is often favored by projects for which time/duration is of primary
relevance (ex. Ubuntu releases which have a limited support window). However,
none of CC’s projects have time/duration as a primary relevance.&lt;/p&gt;
&lt;h3 id="major-expectations-and-slow-iteration"&gt;&lt;code&gt;MAJOR&lt;/code&gt; Expectations and Slow Iteration&lt;/h3&gt;&lt;p&gt;SemVer is a formalization of longstanding convention. Many many users,
especially developers, expect the first number of a versioning scheme to
indicate change severity. With &lt;code&gt;YYYY&lt;/code&gt; indicating current release year, the
&lt;code&gt;YYYY.0M.MICRO&lt;/code&gt; versioning scheme might set an expectation of significant
changes or improvements (ex. &lt;code&gt;2021.09.1&lt;/code&gt; to &lt;code&gt;2022.02.1&lt;/code&gt;) even when the content
of the changes are trivial. With &lt;code&gt;YYYY&lt;/code&gt; indicating original release year, a
slow moving but stable and functional release might appear abandoned or
insecure (ex.  &lt;code&gt;2019.03.2&lt;/code&gt; in 2022).&lt;/p&gt;
&lt;h3 id="poor-support-for-calver"&gt;Poor Support for CalVer&lt;/h3&gt;&lt;p&gt;We also encountered poor support for CalVer in software and systems. For
example, NPM currently strips leading zeros which breaks CDN integration
(&lt;a href="https://github.com/cc-archive/vocabulary-legacy/issues/588."&gt;CalVer and CDN compatibility · Issue #588 ·
creativecommons/vocabulary&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id="using-semver"&gt;Using SemVer&lt;/h2&gt;&lt;p&gt;Our experiment with CalVer is a win for the scientific method. We can be more
confident, today, that SemVer will treat both the developers and users of CC
software better than CalVer.&lt;/p&gt;
&lt;h3 id="semvers-promises-commitments"&gt;SemVer’s &lt;del&gt;Promises&lt;/del&gt; Commitments&lt;/h3&gt;&lt;p&gt;The CC Technology team sees SemVer as a set of commitments we are making to the
users and developers of CC open source software. We may not achieve perfection
in fulfilling those commitments, but they outline expectations and we hope
you’ll open an issue if we make a mistake.&lt;/p&gt;
&lt;h3 id="cc-semver-specifics"&gt;CC SemVer Specifics&lt;/h3&gt;&lt;p&gt;We will be using &lt;a href="https://semver.org/"&gt;SemVer&lt;/a&gt; (semantic versioning) going forward. To add
additional clarity, we will avoid mixing functionality changes and bug fixes in
releases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;MAJOR&lt;/code&gt; version when you make incompatible API changes&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MINOR&lt;/code&gt; version when you add functionality in a backwards compatible manner&lt;ul&gt;
&lt;li&gt;Releases that increment the &lt;code&gt;MINOR&lt;/code&gt; version &lt;strong&gt;must not&lt;/strong&gt; include bug
fixes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PATCH&lt;/code&gt; version when you make backwards compatible bug fixes&lt;ul&gt;
&lt;li&gt;Releases that increment the &lt;code&gt;PATCH&lt;/code&gt; version &lt;strong&gt;must not&lt;/strong&gt; include
functionality additions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When a bug fix &lt;em&gt;technically&lt;/em&gt; changes functionality, we will release a bug fix
(incrementing only the &lt;code&gt;PATCH&lt;/code&gt; version) as the change preserves the &lt;em&gt;intended
functionality&lt;/em&gt;.&lt;/p&gt;
</content></entry><entry><title>Building the CC Global Components Library</title><link href="http://opensource.creativecommons.org/blog/entries/building-the-cc-global-components-library/" rel="alternate"></link><updated>2022-03-17T00:00:00Z</updated><author><name>['MuluhGodson']</name></author><id>urn:uuid:1140a314-fe2d-30a3-aeb4-19a4ba942e21</id><content type="html">&lt;h3 id="introduction"&gt;Introduction&lt;/h3&gt;&lt;p&gt;During the course of my Outreachy internship with the Creative Commons, I got to work on some cool projects, one of which is the CC Global Components
library supervised by my mentor &lt;a href="/blog/authors/brylie/"&gt;Brylie Christopher Oxley&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Having a unified design theme/look or experience accross the different CC websites has always been an important factor while developing these
websites.
With this in mind, there are several components which are part of most CC web properties. The three components in particular are:-&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt; The Global navigation menu &lt;/strong&gt; : displayed on sub-paths of the main creativecommons.org website, such as /licenses&lt;/li&gt;
&lt;li&gt;&lt;strong&gt; The Global footer &lt;/strong&gt; : displayed on most Creative Commons properties&lt;/li&gt;
&lt;li&gt;&lt;strong&gt; The Explore CC component &lt;/strong&gt; : displayed on all CC web properties, such as Global Summit etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Instead of having each project implement these components leading to code duplication accross projects and maintenance issues, we decided it was
preferable
to have a seperate library of these components which finally led to the CC Global Components project.&lt;/p&gt;
&lt;h3 id="choosing-a-technology"&gt;Choosing a technology&lt;/h3&gt;&lt;p&gt;The goal of the Global components library was to build a custom web component that can be served via CDN. While planning, we needed to decide on
the technology to use. Agreeably, most web frameworks like React and Vue can be used to develop this but we wanted
a simple implementation with fewer dependencies. Some ideal characteristics of what we were looking for was a technology that meets the following
criteria:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Web Standards oriented&lt;/li&gt;
&lt;li&gt;Clean separation of HTML, CSS, and JavaScript (structure, aesthetics, and functionality)&lt;/li&gt;
&lt;li&gt;Lightweight / small bundle size&lt;/li&gt;
&lt;li&gt;Loosely coupled (no tight or unrelated dependencies)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The two primary technologies we were considering were &lt;a href="https://v3.vuejs.org"&gt;Vue JS&lt;/a&gt; and &lt;a href="https://lwc.dev"&gt;Lightning Web Components&lt;/a&gt; but finally
decided to use Vue JS
since we already had other projects developed in Vue (such as the Chooser project).&lt;/p&gt;
&lt;h3 id="building-the-components"&gt;Building the components&lt;/h3&gt;&lt;p&gt;To scaffold the project, we used &lt;a href="https://www.npmjs.com/package/vue-sfc-rollup"&gt;Vue SFC rollup&lt;/a&gt;, which is a CLI templating utility that scaffolds
a minimal setup for compiling a library of multiple Vue SFCs (Single File Components) - into a form ready to share via npm. With this,
we could just focus on building the templates. We used &lt;a href="https://cc-vocabulary.netlify.app/"&gt;Vocabulary CSS&lt;/a&gt;, our own CC design package to style
the components.&lt;/p&gt;
&lt;h4 id="1-cc-global-footer"&gt;1) CC Global Footer&lt;/h4&gt;&lt;p&gt;The CC Global Footer component was the easiest given that it's mostly static HTML. This component takes two attributes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;logo-url&lt;/code&gt;: which should point to the logo of the website it is used on.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;donation-url&lt;/code&gt;: which is used for the donation button.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After importing the CDN script for the CC Global components, we can then use the CC Global footer in any page as such:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;cc-global-footer&lt;/span&gt;
  &lt;span class="na"&gt;donation-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http://example.com&amp;quot;&lt;/span&gt;
  &lt;span class="na"&gt;logo-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/example/logo-white.png&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and this renders as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/building-the-cc-global-components-library/cc_global_footer.png" alt="CC Global Footer"&gt;&lt;/p&gt;
&lt;h4 id="2-cc-explore"&gt;2) CC Explore&lt;/h4&gt;&lt;p&gt;The CC Explore component is an expandable banner which coontains links to all the CC Web properties. This component use a click listener which just
toggles the expandable banner to show or hide when it is clicked. As with the CC Global Footer component, the CC Explore component takes two attributes.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;cc-explore&lt;/span&gt;
  &lt;span class="na"&gt;donation-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http://example.com&amp;quot;&lt;/span&gt;
  &lt;span class="na"&gt;logo-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/example/logo-white.png&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and this renders as shown below:&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/building-the-cc-global-components-library/cc_explore.gif" alt="CC Explore"&gt;&lt;/p&gt;
&lt;h4 id="3-cc-global-header"&gt;3) CC Global Header&lt;/h4&gt;&lt;p&gt;The CC Global Header was an important component given that we had to make API calls to be able to render the Menu items for downstream projects
such as the &lt;a href="https://github.com/creativecommons/cc-legal-tools-app"&gt;Licenses and Tools&lt;/a&gt;. We used the Axios library for the API calls to the Wordpress
backend of the parent project &lt;a href="https://github.com/creativecommons/project_creativecommons.org"&gt;Projec_creativecommons.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The CC Global Header has three required attributes, &lt;code&gt;base-url&lt;/code&gt;, &lt;code&gt;donation-url&lt;/code&gt; and &lt;code&gt;logo-url&lt;/code&gt;, which are the URLs used for the API call,
Donation button and Logo respectively. There is one additional attribute &lt;code&gt;use-menu-placeholders&lt;/code&gt; you can set which renders placeholder Menu Items
if you are in a development environment. However, for a stagin/production setup we do not use this attribute.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;cc-global-header&lt;/span&gt;
  &lt;span class="na"&gt;base-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http://127.0.0.1:8000&amp;quot;&lt;/span&gt;
  &lt;span class="na"&gt;donation-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http:/example.com&amp;quot;&lt;/span&gt;
  &lt;span class="na"&gt;use-menu-placeholders&lt;/span&gt;
  &lt;span class="na"&gt;logo-url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/example/logo-black.png&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and this renders as shown:&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/building-the-cc-global-components-library/cc_global_header.png" alt="CC Global Header"&gt;&lt;/p&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;&lt;p&gt;The first version of this library (0.1.1) was released and published to NPM on Dec 10, 2021. Till date [the time of this writing] we have had several
changes and optimizations to the code and are currently on version &lt;code&gt;0.5.0&lt;/code&gt;. This was a really enriching experience for me as it was my first time
working with Vue JS. We've also had additional code review and optimizations from &lt;a href="/blog/authors/TimidRobot/"&gt;Timid Robot&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The CC Global Components with all 3 components used renders as:&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/building-the-cc-global-components-library/cc_global_components.gif" alt="CC global components"&gt;&lt;/p&gt;
&lt;p&gt;You can find the CC Global Components project at:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/cc-archive/cc-global-components"&gt;CC Global Components&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;NPM: &lt;a href="https://www.npmjs.com/package/@creativecommons/cc-global-components"&gt;cc-global-components&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content></entry><entry><title>CC Messaging Update 2022Q1 (Dropping IRC)</title><link href="http://opensource.creativecommons.org/blog/entries/2022-01-06-cc-messaging/" rel="alternate"></link><updated>2022-01-06T00:00:00Z</updated><author><name>['TimidRobot']</name></author><id>urn:uuid:043a7604-a0bf-3890-b20d-de0f99b67f8c</id><content type="html">&lt;h2 id="past-moved-to-slack"&gt;Past: Moved to Slack&lt;/h2&gt;&lt;p&gt;In 2016, Creative Commons (CC) moved to Slack as our primary messaging platform
(&lt;a href="https://creativecommons.org/2016/10/18/slack-announcement/"&gt;We're on Slack! Join us! - Creative Commons&lt;/a&gt;). We are very thankful
for the generous support that Slack has provided. The Slack messaging platform
is far more accessible than IRC. We saw an immediate and sustained increase in
our messaging community (&lt;a href="https://creativecommons.org/2016/12/09/a-month-of-slack/"&gt;A month of Slack: Growing global communities every
day - Creative Commons,  Lessons learned from a year of Slack, 1000 members,
and immeasurable community growth - Creative Commons&lt;/a&gt;). We
currently have 10,293 members in our Slack workspace. Of those, we see daily
activity from an average of 250 of them spread across almost 70 public
channels. The Slack platform is not without valid criticisms, but those will be
addressed in the Future: Open Source section, below.&lt;/p&gt;
&lt;h2 id="present-dropping-irc"&gt;Present: Dropping IRC&lt;/h2&gt;&lt;p&gt;When CC moved to Slack, we also set up a bridge with our three IRC channels on
Freenode. However those channels only see ones of active users and tens of
messages per year. With the hostile takeover of Freenode in 2021, the
Free/Libre and Open Source (FOSS) community has largely moved to
&lt;a href="https://libera.chat/"&gt;libera.chat&lt;/a&gt;. However, we will not be moving our Slack/IRC bridge
there. &lt;strong&gt;Effective 2022-01-24 we are dropping IRC as an officially supported
messaging platform.&lt;/strong&gt; In addition to there having been very few active users on
IRC, many of the active IRC users also have active Slack accounts. Dropping IRC
will allow us to better allocate our technical resources to better serve the
community as a whole.&lt;/p&gt;
&lt;h2 id="future-open-source"&gt;Future: Open Source&lt;/h2&gt;&lt;p&gt;Over the years, Slack has had performance and UX issues. It is also designed
around assumptions that do not fit a large open community. Those issues have
not prevented it from being a strong and capable messaging platform that has
served our community well. However, an Open Source messaging platform would
better align with the Creative Commons community and the values we champion.
The Open Source and Open Content communities have long enjoyed a significant
overlap and collaboration. With regards to messaging, we hope to increase that
overlap in the next year or two.&lt;/p&gt;
</content></entry><entry><title>Upcoming Changes to the CC Open Source Community</title><link href="http://opensource.creativecommons.org/blog/entries/2020-12-07-upcoming-changes-to-community/" rel="alternate"></link><updated>2020-12-07T00:00:00Z</updated><author><name>['kgodey']</name></author><id>urn:uuid:f2584ce6-4f24-3ccb-97d8-9f68e62bc65a</id><content type="html">&lt;p&gt;Creative Commons (CC) is adopting a brand new organizational strategy in 2021, just in time for our 20th anniversary. As part of the organization's evolution in alignment with the new strategy, &lt;a href="/blog/authors/aldenpage/"&gt;Alden Page&lt;/a&gt;, &lt;a href="/blog/authors/mathemancer/"&gt;Brent Moran&lt;/a&gt;, &lt;a href="/blog/authors/hugosolar/"&gt;Hugo Solar&lt;/a&gt;, and I (&lt;a href="/blog/authors/kgodey/"&gt;Kriti Godey&lt;/a&gt;) will have departed Creative Commons by the end of December. Moving forward, the CC staff engineering team of &lt;a href="/blog/authors/TimidRobot/"&gt;Timid Robot Zehta&lt;/a&gt; and &lt;a href="/blog/authors/zackkrida/"&gt;Zack Krida&lt;/a&gt; will focus on supporting a smaller set of core projects.&lt;/p&gt;
&lt;p&gt;We are extremely proud of the work we have done together to build CC's vibrant open source community over the past two years. And of course, we're thankful for all the amazing contributions that all our community members have made. We've made significant improvements to existing tools, and launched entirely new projects with your help. &lt;a href="/blog/categories/cc-vocabulary/"&gt;We created Vocabulary,&lt;/a&gt; a design system for Creative Commons and launched half a dozen sites using it.  We added &lt;a href="/blog/categories/cc-catalog/"&gt;dozens of new sources to CC Search&lt;/a&gt; and improved &lt;a href="/blog/authors/AyanChoudhary/"&gt;its accessibility&lt;/a&gt;. We released tools such as the &lt;a href="/blog/authors/ahmadbilaldev/"&gt;CC WordPress plugin&lt;/a&gt; and &lt;a href="/blog/authors/makkoncept/"&gt;CC Search browser extension&lt;/a&gt; that integrated CC licensing with widely used software. And, there's so much more.&lt;/p&gt;
&lt;h3 id="community-changes"&gt;Community Changes&lt;/h3&gt;&lt;p&gt;The CC Open Source community remains central to our engineering work, and we will continue to support you in every way we can. However, based on the new staff capacity, we will be making a few changes to our community processes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Community Team members will no longer have access to CC's Asana. Most tasks are tracked on GitHub, and managing Asana adds unnecessary complexity to the community team.&lt;/li&gt;
&lt;li&gt;We will invite all Community Team members to meetings and documents open to the community, regardless of role.&lt;/li&gt;
&lt;li&gt;We will deprecate the  "community-team-core" mailing list in favor of a single "community-team" mailing list.&lt;/li&gt;
&lt;li&gt;We will have a new monthly Open Source Community meeting and cancel the existing biweekly Engineering Meeting.&lt;/li&gt;
&lt;li&gt;We will no longer have a paid Open Source Community Coordinator, &lt;a href="/community/community-team/community-building-roles/"&gt;relying instead on volunteers&lt;/a&gt; to help assist new community members, maintain our Twitter account, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We welcome new Community Team members and we will continue to participate in internship programs such as Google Summer of Code.&lt;/p&gt;
&lt;h3 id="project-changes"&gt;Project Changes&lt;/h3&gt;&lt;p&gt;With a smaller engineering team, we will need to support fewer projects. Please see below for the current status of all projects with at least one Community Team member.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Active Development&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We will continue to actively develop the following projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/ccsearch-browser-extension"&gt;CC Search Browser Extension&lt;/a&gt; (maintainer: Mayank Nader)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/creativecommons.github.io-source"&gt;CC Open Source website&lt;/a&gt; (maintainers: Zack Krida &amp;amp; Timid Robot Zehta)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/creativecommons-base"&gt;CC WordPress base&lt;/a&gt; &amp;amp; child themes (new maintainer: Zack Krida)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/legaldb"&gt;CC Legal Database&lt;/a&gt; (maintainer: Timid Robot Zehta)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/chooser"&gt;CC Chooser&lt;/a&gt; (maintainer: Zack Krida)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/cc-link-checker/"&gt;CC Link Checker&lt;/a&gt; (maintainer: Timid Robot Zehta)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/licensebuttons/"&gt;License Buttons&lt;/a&gt; (maintainer: Timid Robot Zehta)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/mp/"&gt;Platform Toolkit&lt;/a&gt; (maintainer: Timid Robot Zehta)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/vocabulary"&gt;Vocabulary&lt;/a&gt; (maintainers: Zack Krida &amp;amp; Dhruv Bhanushali)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/creativecommons/wp-plugin-creativecommons"&gt;WordPress Plugin&lt;/a&gt; (new maintainer: Zack Krida)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Maintenance Mode&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The following projects are entering maintenance mode. The services will remain online, but we will not accept any new pull requests or deploy new code after Dec 15, 2020.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog"&gt;CC Catalog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-api"&gt;CC Catalog API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-frontend/"&gt;CC Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/"&gt;Linked Commons&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Catalog, API, and Linked Commons contributors are encouraged to contribute  to our other Python projects such as the &lt;a href="https://github.com/creativecommons/legaldb"&gt;CC Legal Database&lt;/a&gt; or the upcoming &lt;a href="https://github.com/creativecommons/cc-licenses"&gt;CC Licenses&lt;/a&gt; project. If you are a CC Search contributor, we recommend checking out frontend projects such as the &lt;a href="https://github.com/creativecommons/chooser"&gt;CC Chooser&lt;/a&gt; or &lt;a href="https://github.com/creativecommons/vocabulary"&gt;Vocabulary&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="thank-you"&gt;Thank You!&lt;/h3&gt;&lt;p&gt;We cannot express our gratitude for our community enough. You are all an absolute pleasure to work with, and we're looking forward to continuing to collaborate with you for years to come.&lt;/p&gt;
</content></entry><entry><title>Vocabulary Landing Page &amp; Usage Guide Final Report</title><link href="http://opensource.creativecommons.org/blog/entries/cc-vocabulary-docs-updates-closing/" rel="alternate"></link><updated>2020-12-03T00:00:00Z</updated><author><name>['nimishbongale']</name></author><id>urn:uuid:9c4ce7a3-30fa-397d-bbcd-c2b9ba3658c3</id><content type="html">&lt;p&gt;We have reached the end of this wonderful journey. Let's comprehensively recap all my contributions during the GSoD internship period!&lt;/p&gt;
&lt;h2 id="vocabulary-site-updates-edition-4/4"&gt;Vocabulary Site Updates (Edition 4/4)&lt;/h2&gt;&lt;p&gt;After securing acceptance, I received the necessary github invites. I was given write access to the &lt;a href="https://github.com/creativecommons/vocabulary"&gt;Vocabulary GitHub repository&lt;/a&gt; as a &lt;strong&gt;CC Vocabulary Core Committer&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="proposed-initial-plan"&gt;Proposed Initial Plan&lt;/h3&gt;&lt;h4 id="project-synopsis"&gt;Project Synopsis&lt;/h4&gt;&lt;p&gt;Vocabulary has immense potential to be used as a primary UI component library for website building. What it needs is a robust yet layman-friendly how-to guide. Important developer information such as component guides, usage specifications and configuration tweaks form an essential part of any documentation. This will not only encourage existing users to get a feel of how vocabulary continues to grow and reach new milestones, but also promote the usage of Vocabulary in comparatively newer projects. The desired outcomes of my stint as an intern would not only involve penning out a no-nonsense guide to using the pre-existing components but also the designing and developing of a home page (leading to an integrated documentation for each) for Vocabulary, Vue-Vocabulary and Fonts.&lt;/p&gt;
&lt;h3 id="proposed-improvised-timelines-deliverables"&gt;Proposed &amp;amp; Improvised Timelines &amp;amp; Deliverables&lt;/h3&gt;&lt;p&gt;Here's a list of all the weekly goals that I met:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pre-Internship&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understood Creative Commons as an organisation, its work and related ethics.&lt;/li&gt;
&lt;li&gt;Had a look at CC’s github repositories and understand the code structure.&lt;/li&gt;
&lt;li&gt;Opened Issues and PR’s to get acquainted with the repository workflows.&lt;/li&gt;
&lt;li&gt;Interacted with my mentor and established the basic ideas regarding the project in question.&lt;/li&gt;
&lt;li&gt;Further researched about the needs of the project, and ponder over its potential impact after implementation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 1&lt;/strong&gt;
(09/14 - 09/21)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Understood Vocabulary, Vue-Vocabulary and Fonts in greater depth, and their existing components.&lt;/li&gt;
&lt;li&gt;Designed a first look unified landing page for Vocabulary, Vue-Vocabulary and Fonts based on Vocabulary components.&lt;/li&gt;
&lt;li&gt;Interacted with my mentor and other team members and established a rapport.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 2&lt;/strong&gt;
(09/22 - 09/28)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tackled queries regarding the choice of design, page structure etc., and sought approval from CC’s UX Designer.&lt;/li&gt;
&lt;li&gt;Began to write the content which will need to fill up the main landing page.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 3&lt;/strong&gt;
(09/29 - 10/06)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Finalized the headings, sub-headings and other sections which will need to be present in the landing site &amp;amp; documentation.&lt;/li&gt;
&lt;li&gt;Kept the code ready for accepting documentation contents. Have github pages/netlify/surge configured for continuous integration and deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 4&lt;/strong&gt;
(10/07 - 10/14)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Began to write under “Introduction”, “Getting Started” and ”Grid Components” sub-headings of the documentation.&lt;/li&gt;
&lt;li&gt;Started developing the main landing page using Vocabulary components.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 5&lt;/strong&gt;
(10/15 - 10/22)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Got complete approval for the main page contents.&lt;/li&gt;
&lt;li&gt;Worked on coding the “Dark Theme”.&lt;/li&gt;
&lt;li&gt;Facilitated hacktoberfest contributors and spoke at a CCOS event.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 6&lt;/strong&gt;
(10/23 - 10/30)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Wrote a mid-internship blog post describing work done and how the experience has been so far with CC.&lt;/li&gt;
&lt;li&gt;Started compiling the document guides for all the components in Vocabulary. Made revamps where necessary.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 7&lt;/strong&gt;
(10/31 - 11/07)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Integrated the main page contents and the main landing page itself, had it up and running.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 8&lt;/strong&gt;
(11/08 - 11/15)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Finished writing the Vocabulary usage guide and seek initial approval.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 9&lt;/strong&gt;
(11/16 - 11/23)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Finalized on the guides and the main page contents.&lt;/li&gt;
&lt;li&gt;Carried out the necessary landing page to doc integration.&lt;/li&gt;
&lt;li&gt;Published a sample build using surge for viewing and surveying purposes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 10&lt;/strong&gt;
(11/24 - 11/30)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Surveyed development builds for Accessibility using WAVE and Accessibility Insights for Web.&lt;/li&gt;
&lt;li&gt;Surveyed the site for responsiveness using Chrome Dev Tools.&lt;/li&gt;
&lt;li&gt;Generated Lighthouse reports.&lt;/li&gt;
&lt;li&gt;Optimised for Search Engines using meta tags and external links.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 11&lt;/strong&gt;
(11/30 - 12/05)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Worked towards improving the report statistics until they reach a respectable target.&lt;/li&gt;
&lt;li&gt;Wrote a blog post summarizing everything, and about my performance cum involvement in CC.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 12&lt;/strong&gt;
(12/06 - 12/12)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sought daily approvals until everything is finalised.&lt;/li&gt;
&lt;li&gt;Go through my writings and code upteen times for any miniscule errors.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Week 13&lt;/strong&gt;
(12/13 - 12/19)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cleaned code, make sure everything is properly linted and ready before the final closing commits.&lt;/li&gt;
&lt;li&gt;Published the “Concluding Internship” blog post, rounding up my wholesome journey.&lt;/li&gt;
&lt;li&gt;Sought final closing approval.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Post-Internship&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Promote the use of CC attributed works.&lt;/li&gt;
&lt;li&gt;Interact with the community, answer queries or doubts regarding CC.&lt;/li&gt;
&lt;li&gt;Carry out community work of the repositories I’ve contributed to.&lt;/li&gt;
&lt;li&gt;Leverage experience gained during this internship for future endeavours.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-vocabulary-site"&gt;The Vocabulary Site&lt;/h3&gt;&lt;p&gt;Here's the link to &lt;a href="https://cc-vocab-draft.web.app"&gt;the landing site&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Went through &lt;strong&gt;3&lt;/strong&gt; Design Iterations.&lt;/li&gt;
&lt;li&gt;Designed the mockups in &lt;a href="https://figma.com"&gt;Figma&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Wrote the content filling up the landing page.&lt;/li&gt;
&lt;li&gt;After approval from the UX Designer, waited for an approval from the Frontend Engineer.&lt;/li&gt;
&lt;li&gt;Sought continuous approval from my mentor &lt;a href="/blog/authors/dhruvkb/"&gt;dhruvkb&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Used &lt;a href="https://vuejs.org"&gt;Vue.js&lt;/a&gt; + &lt;a href="https://www.npmjs.com/package/@creativecommons/vocabulary"&gt;CC Vocabulary&lt;/a&gt; to build a highly modularised site.&lt;/li&gt;
&lt;li&gt;Went through a couple of iterations of the website itself.&lt;/li&gt;
&lt;li&gt;Made about &lt;strong&gt;112&lt;/strong&gt; commits (&lt;strong&gt;15,000&lt;/strong&gt; lines of code) in my &lt;em&gt;gsod-nimish&lt;/em&gt; branch.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;
&lt;center&gt;
&lt;img alt"Contributions to CC" src="github.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;All my contributions to Creative Commons!&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Used Github API to display repository statistics.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;
&lt;center&gt;
&lt;img alt"Fetch stats from Github API" src="stats.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Fetching dynamic stats from the GitHub API&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;PR was reviewed and merged on the &lt;strong&gt;25th of November&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here's how the site looks right now:&lt;/p&gt;
&lt;pre&gt;
&lt;center&gt;
&lt;img alt"The final website!" src="website.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Snapshot of the final website!&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Used &lt;a href="https://surge.sh"&gt;surge&lt;/a&gt; &amp;amp; &lt;a href="https://web.app"&gt;firebase&lt;/a&gt; for draft deploys.&lt;/li&gt;
&lt;li&gt;Carried out &lt;a href="https://developers.google.com/web/tools/lighthouse"&gt;lighthouse&lt;/a&gt; testing.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;
&lt;center&gt;
&lt;img alt="Lighthouse reports" src="light.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Lighthouse reports for our live site&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;Prompted changes to improve accessibility, SEO and PWA characteristics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="core-documentation"&gt;Core Documentation&lt;/h3&gt;&lt;p&gt;Here's the link to the &lt;a href="https://cc-vocabulary.netlify.app"&gt;documentation site&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Used &lt;a href="https://storybook.js.org/"&gt;StorybookJS&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Modified the existing overview page.&lt;/li&gt;
&lt;li&gt;Removed highly verbose sections from the docs.&lt;/li&gt;
&lt;li&gt;Documented Vocabulary sprint planning workflow.&lt;/li&gt;
&lt;li&gt;Documented how to use a markdown component with CC Vocabulary.&lt;/li&gt;
&lt;li&gt;Embedded hyperlink to other open source projects to improve SEO.&lt;/li&gt;
&lt;li&gt;Increased uniformity across documentation present in the storybooks.&lt;/li&gt;
&lt;li&gt;Added alt descriptions &amp;amp; aria labels for certain images to improve accessibility.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="my-learnings-and-challenges"&gt;My Learnings And Challenges&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Design is more than just picking colors and placing components on a grey screen.&lt;/li&gt;
&lt;li&gt;It's important to read your own writings from an unbiased perspective to actually understand how well it would be perceived.&lt;/li&gt;
&lt;li&gt;Publishing to npmjs is not difficult!&lt;/li&gt;
&lt;li&gt;Knowing the previously existing code in your project is of serious essence. It's important to understand the code styles, structure &amp;amp; activity of the code that you are dealing with.&lt;/li&gt;
&lt;li&gt;Be patient! Its fine to delay something if it makes sense to have it logically accomplished only after certain other tasks are done &amp;amp; dusted with.&lt;/li&gt;
&lt;li&gt;How essential it is to write neat code is something that's not spoken too often. (I wonder why...)&lt;/li&gt;
&lt;li&gt;I always thought Vue.js sets up SPA's by default. I'm surprised you need to configure it additionally to do just that!&lt;/li&gt;
&lt;li&gt;Storybook is just a really nifty OSS with great community support!&lt;/li&gt;
&lt;li&gt;Vue.js is fantastic. Maybe I'm a Vue.js fan now. Should I remain loyal to React? I don't know.&lt;/li&gt;
&lt;li&gt;Making a site responsive isn't the easiest of tasks, but it's certainly doable after a lot of stretching &amp;amp; compressing; lets say that.&lt;/li&gt;
&lt;li&gt;"Code formatting is essential" would be an understatement to make.&lt;/li&gt;
&lt;li&gt;Monorepo's have their own pro's and con's. But in our case the con's were negligible, thankfully!&lt;/li&gt;
&lt;li&gt;GSoD isn't just about documentation; there's some serious amount of coding too!&lt;/li&gt;
&lt;li&gt;You don't have to sit and write code for hours together. Take breaks, come back, and the fix will strike you sooner than ever.&lt;/li&gt;
&lt;li&gt;Timelines change; improvisation being an essential aspect of any project!&lt;/li&gt;
&lt;li&gt;MDX is a neat little format to code in! Documenting code is just so much easier.&lt;/li&gt;
&lt;li&gt;Things become obsolete. Versions become outdated. Code maintaining is therefore, easier said than done!&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="issues-pr-s-raised-during-gsod-period"&gt;Issues &amp;amp; PR's raised during GSoD period&lt;/h3&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;th&gt;Contribution&lt;/th&gt;
&lt;th&gt;Relevant links&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td rowspan=14&gt;&lt;a href="https://github.com/creativecommons/vocabulary"&gt;@creativecommons/vocabulary&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Developed the CC Vocabulary Landing Page&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/747"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/747&lt;br&gt;&lt;a href="https://cc-vocab-draft.web.app"&gt;https://cc-vocab-draft.web.app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Implemented dark mode for our storybooks&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/806"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/806&lt;/a&gt;&lt;br&gt;&lt;a href="https://cc-vocabulary.netlify.app"&gt;https://cc-vocabulary.netlify.app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Carried out a monorepo wide documentation revamp&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/813"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/813&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrote the Monorepo Documentation Story&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/785"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/785&lt;/a&gt;&lt;br&gt;&lt;a href="https://cc-vocabulary.netlify.app/?path=/docs/vocabulary-structure--page#why-is-vocabulary-a-monorepo"&gt;https://cc-vocabulary.netlify.app/?path=/docs/vocabulary-structure--page#why-is-vocabulary-a-monorepo&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrote the Grid Documentation Story&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/802"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/802&lt;/a&gt;&lt;br&gt;&lt;a href="https://cc-vocabulary.netlify.app/?path=/docs/layouts-grid--fullhd#grid-system"&gt;https://cc-vocabulary.netlify.app/?path=/docs/layouts-grid--fullhd#grid-system&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wrote the "Getting Started" Usage Guide&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/774"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/774&lt;/a&gt;&lt;br&gt;&lt;a href="https://cc-vocabulary.netlify.app/?path=/story/vocabulary-getting-started--page#getting-started"&gt;https://cc-vocabulary.netlify.app/?path=/story/vocabulary-getting-started--page#getting-started&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Added a CHANGELOG.md to adhere to OSS conventions&lt;/td&gt;
&lt;td&gt;&lt;a href=https://github.com/cc-archive/vocabulary-legacy/pull/671"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/671&lt;/a&gt;&lt;br&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/blob/main/CHANGELOG.md"&gt;https://github.com/cc-archive/vocabulary-legacy/blob/main/CHANGELOG.md&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unified README.md and updated monorepo build process&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/649"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/649&lt;/a&gt;&lt;br&gt;&lt;a href="https://www.npmjs.com/package/@creativecommons/vocabulary"&gt;https://www.npmjs.com/package/@creativecommons/vocabulary&lt;/a&gt;&lt;br&gt;&lt;a href="https://www.npmjs.com/package/@creativecommons/fonts"&gt;https://www.npmjs.com/package/@creativecommons/fonts&lt;/a&gt;&lt;br&gt;&lt;a href="https://www.npmjs.com/package/@creativecommons/vue-vocabulary"&gt;https://www.npmjs.com/package/@creativecommons/vue-vocabulary&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configured GitHub native dependabot&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/452"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/452&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Added phone screen backgrounds&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/445"&gt;https://github.com/cc-archive/vocabulary-legacy/pull/445&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Introduce Snapshot Testing to Vocabulary using Chromatic&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/issues/735"&gt;https://github.com/cc-archive/vocabulary-legacy/issues/735&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add a maintained with Lerna badge&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/issues/807"&gt;https://github.com/cc-archive/vocabulary-legacy/issues/807&lt;/a&gt;&lt;br&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/blob/main/README.md"&gt;https://github.com/cc-archive/vocabulary-legacy/blob/main/README.md&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add new install size badges for our packages&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/issues/776"&gt;https://github.com/cc-archive/vocabulary-legacy/issues/776&lt;/a&gt;&lt;br&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/blob/main/README.md"&gt;https://github.com/cc-archive/vocabulary-legacy/blob/main/README.md&lt;/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customise individual README's for our packages&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/vocabulary-legacy/issues/736"&gt;https://github.com/cc-archive/vocabulary-legacy/issues/736&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td rowspan=5&gt;&lt;a href="https://github.com/creativecommons/creativecommons.github.io-source"&gt;@creativecommons/creativecommons.github.io-source&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Introductory First Blog Post&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/creativecommons/creativecommons.github.io-source/pull/530"&gt;https://github.com/creativecommons/creativecommons.github.io-source/pull/530&lt;/a&gt;&lt;br&gt;&lt;a href="/blog/entries/cc-vocabulary-docs-intro/"&gt;/blog/entries/cc-vocabulary-docs-intro/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary Site Update v1&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/creativecommons/creativecommons.github.io-source/pull/549"&gt;https://github.com/creativecommons/creativecommons.github.io-source/pull/549&lt;/a&gt;&lt;br&gt;&lt;a href="/blog/entries/cc-vocabulary-docs-updates-1/"&gt;/blog/entries/cc-vocabulary-docs-updates-1/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary Mid Internship Update v2&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/creativecommons/creativecommons.github.io-source/pull/555"&gt;https://github.com/creativecommons/creativecommons.github.io-source/pull/555&lt;/a&gt;&lt;br&gt;&lt;a href="/blog/entries/cc-vocabulary-docs-updates-2/"&gt;/blog/entries/cc-vocabulary-docs-updates-2/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary Site Update v3&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/creativecommons/creativecommons.github.io-source/pull/561"&gt;https://github.com/creativecommons/creativecommons.github.io-source/pull/561&lt;/a&gt;&lt;br&gt;&lt;a href="/blog/entries/cc-vocabulary-docs-updates-3/"&gt;/blog/entries/cc-vocabulary-docs-updates-3/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary Site Final Update&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/creativecommons/creativecommons.github.io-source/pull/564"&gt;https://github.com/creativecommons/creativecommons.github.io-source/pull/564&lt;/a&gt;&lt;br&gt;&lt;a href="/"&gt;/blog/entries/cc-vocabulary-docs-updates-closing/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/cccatalog-api"&gt;@cc-archive/cccatalog-api&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Configured GitHub native dependabot&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cc-archive/cccatalog-api/pull/53"&gt;https://github.com/cc-archive/cccatalog-api/pull/53&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/creativecommons/ccos-scripts"&gt;@creativecommons/ccos-scripts&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Fix file extension in README.md docs&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/creativecommons/ccos-scripts/pull/100"&gt;https://github.com/creativecommons/ccos-scripts/pull/100&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;p&gt;Follow along my complete GSoD journey through &lt;a href="/blog/series/gsod-2020-vocabulary-usage-guide/"&gt;these series of posts&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="memorable-milestones-screenshots"&gt;Memorable Milestones Screenshots&lt;/h3&gt;&lt;pre&gt;
&lt;center&gt;
&lt;img alt"Merged!" src="merged747.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;GSoD PR merged!&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;
&lt;br&gt;
&lt;pre&gt;
&lt;center&gt;
&lt;img alt"Dark Mode" src="darkmode.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Behold the dark theme!&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;
&lt;br&gt;
&lt;pre&gt;
&lt;center&gt;
&lt;img alt"Grid Docs" src="grid.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Grid Documenation Story&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;
&lt;br&gt;
&lt;pre&gt;
&lt;center&gt;
&lt;img alt"Monorepo Document Story" src="structure.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Monorepo Structure Story&lt;/small&gt;
&lt;/center&gt;
&lt;/pre&gt;&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;&lt;p&gt;My GSoD internship has been by far, a very successful and a fruitful one. I thank the the GSoD team for all their efforts in oragnising it this year. I would also like to thank the entire Creative Commons team for all their motivation and support. The onboarding &amp;amp; see-off was very smooth indeed!&lt;/p&gt;
&lt;p align="center"&gt;
    &lt;strong&gt;Thank you for all your time! This was the final blog post under the Vocabulary docs series. I'll be around for times to come, but until then, sayonara!&lt;/strong&gt;
&lt;/p&gt;</content></entry><entry><title>Summary: My GSoD 2020 Journey</title><link href="http://opensource.creativecommons.org/blog/entries/summary-my-gsod-2020-journey/" rel="alternate"></link><updated>2020-12-02T00:00:00Z</updated><author><name>['ariessa']</name></author><id>urn:uuid:8ef2254d-edd9-37d9-a144-5b539249b19f</id><content type="html">&lt;p&gt;Thank you for the wonderful experience, Creative Commons!&lt;/p&gt;
&lt;p&gt;This blog post serves as a project report for ‘Improve CC Catalog API Usage Guide’.
It describes the work that I’ve done during my Google Season Of Docs (GSOD) 2020.
My mentors for this project are Alden Page and Kriti Godey from Creative Commons.&lt;/p&gt;
&lt;p&gt;In total, there are 12 weeks in the Doc Development Phase.
Every 2 weeks, I would publish a blog post to update my progress to my mentors and organization.&lt;/p&gt;
&lt;h3 id="week-1"&gt;Week 1&lt;/h3&gt;&lt;p&gt;So, the first two weeks of Google Season of Docs have passed.
For the first week, I added examples to perform the query using curl command.
I hit some problem with a Forbidden error. Turns out my access key got expired.
My problem was solved after obtaining a new access key.&lt;/p&gt;
&lt;h3 id="week-2"&gt;Week 2&lt;/h3&gt;&lt;p&gt;For the second week, I started to write response samples.
It was tough as I have a hard time understanding drf-yasg, which is an automatic Swagger generator.
It can produce Swagger / OpenAPI 2.0 specifications from a Django Rest Framework API.
I tried to find as many examples as I could to increase my understanding.
Funny, but it took me awhile to realise that drf-yasg is not made up of random letters.
The DRF part stands for Django Rest Framework while YASG stands for Yet Another Swagger Generator.&lt;/p&gt;
&lt;h3 id="week-3"&gt;Week 3&lt;/h3&gt;&lt;p&gt;Week 3 was quite hectic.
I moved back to my hometown during week 3.
Took 3 days off to settle my stuff and set up a workspace.
I worked on my GSoD project for only 2 days, Monday and Tuesday.
I managed to create response samples for most API endpoints.
Had a monthly video call with Kriti this week.&lt;/p&gt;
&lt;h3 id="week-4"&gt;Week 4&lt;/h3&gt;&lt;p&gt;I reviewed what I’ve done and what I haven’t to estimate new completion time.
Thank god, I have a buffer week in my GSoD timeline and deliverables.
So yeah, all is good in terms of completion time.
I started to write descriptions for API endpoints.
Submitted first PR and published blog entry.&lt;/p&gt;
&lt;h3 id="week-5"&gt;Week 5&lt;/h3&gt;&lt;p&gt;I managed to add a lot of stuff into the documentation.
I figured out how to add help texts to classes and how to create serializers.
I also managed to move all code examples under response samples.
In order to do this, I created a new class called CustomAutoSchema to add x-code-samples.
Other stuff that I did include creating new sections such as “Register and Authenticate” and “Glossary”.
The hardest part of this week is probably trying to figure out how to add request body examples and move code examples.&lt;/p&gt;
&lt;h3 id="week-6"&gt;Week 6&lt;/h3&gt;&lt;p&gt;I added another section called Contribute that provides a todolist to start contributing on Github.
I also wrote and published this blog post.&lt;/p&gt;
&lt;h3 id="week-7"&gt;Week 7&lt;/h3&gt;&lt;p&gt;I restructured the file README in CC Catalog API repository.
I added a step by step guide on how to run the server locally.
I hope new users will be less intimidated to contribute to this project with the updated guide on how to run the server locally.&lt;/p&gt;
&lt;h3 id="week-8"&gt;Week 8&lt;/h3&gt;&lt;p&gt;I created Documentation Guidelines which provides steps on how to contribute to CC Catalog API documentation, documentation styles, and cheat sheet for drf-yasg.
I also wrote and published this blog post.&lt;/p&gt;
&lt;h3 id="week-9"&gt;Week 9&lt;/h3&gt;&lt;p&gt;I had completed all GSoD tasks by week 9.
So, I took a couple of days off and fixed last week's PR.
Kriti assigned me with new tasks, which is to port CC Catalog documentation from the internal wiki into GitHub repository.
Brent, the CC Catalog maintainer explained to me about what needs to be done.&lt;/p&gt;
&lt;h3 id="week-10"&gt;Week 10&lt;/h3&gt;&lt;p&gt;I started exploring CC Catalog and its documentation.
Reminds me a lot about the first and second weeks of GSoD.
Trying to understand new stuff and having an "aha" moment when the dots finally connect.
I started to move the documentation from the internal wiki to CC Catalog’s GitHub repository.
I also wrote and published this blog post.&lt;/p&gt;
&lt;h3 id="week-11"&gt;Week 11&lt;/h3&gt;&lt;p&gt;I finished working on porting CC Catalog documentation from internal wiki to CC Catalog’s GitHub repository.
Kriti told me that there would be a meeting in which I have to present what I've done for GSoD.
Since the meeting will take place at 1AM in my local time, Kriti told me that I should send a video presentation instead.&lt;/p&gt;
&lt;h3 id="week-12"&gt;Week 12&lt;/h3&gt;&lt;p&gt;I submitted a video presentation to Kriti.
Finished writing project report and evaluation for GSoD.
I published 2 blog posts this week.
One for updates on Week 11 and Week 12.
Another one is this blog post.&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;You can view the latest CC Catalog API documentation &lt;a href="https://api.creativecommons.engineering/v1/"&gt;here&lt;/a&gt;.&lt;/p&gt;
</content></entry><entry><title>Finish Video Presentation, Project Report and Evaluation Form</title><link href="http://opensource.creativecommons.org/blog/entries/finish-video-presentation-project-report-and-evaluation-form/" rel="alternate"></link><updated>2020-12-01T00:00:00Z</updated><author><name>['ariessa']</name></author><id>urn:uuid:3c10027c-ae79-392a-961f-ef9a2362be2a</id><content type="html">&lt;p&gt;For week 10 and 11, I finished porting CC Catalog documentation, submitted a video presentation, and wrapped up my GSoD 2020 journey.&lt;/p&gt;
&lt;h3 id="week-11"&gt;Week 11&lt;/h3&gt;&lt;p&gt;For Week 11, I finished working on porting CC Catalog documentation from internal wiki to CC Catalog’s GitHub repository.
Kriti told me that there would be a meeting in which I have to present what I've done for GSoD.
Since the meeting will take place at 1AM in my local time, Kriti told me that I should send a video presentation instead.&lt;/p&gt;
&lt;h3 id="week-12"&gt;Week 12&lt;/h3&gt;&lt;p&gt;For this week, I submitted a video presentation to Kriti.
Finished writing project report and evaluation for GSoD.
I published 2 blog posts this week.
One for updates on Week 11 and Week 12.
Another one is a summary of my GSoD 2020 journey, which also serves as a project report.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Signing off.&lt;/p&gt;
</content></entry><entry><title>Presenting CC Base docs - A WordPress Base Theme Usage Guide for the CC Base Theme</title><link href="http://opensource.creativecommons.org/blog/entries/cc-wp-base-theme-docs-launch/" rel="alternate"></link><updated>2020-11-27T00:00:00Z</updated><author><name>['JackieBinya']</name></author><id>urn:uuid:c27f3a10-a8a1-3cd9-8c66-89cb70d26f58</id><content type="html">&lt;p&gt;We are live 🎉&lt;/p&gt;
&lt;p&gt;The CC Base documentation is live and its available on this &lt;a href="https://cc-wp-theme-base.netlify.app/"&gt;link&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The docs were successfully migrated from Google Docs to the site! One of the most notable changes in the theme and consequently reflected in the documentation is the product name change. The CC WP Theme Base has been renamed to CC Base.&lt;/p&gt;
&lt;p&gt;But the old adage says good documentation is never complete, we hope to engage the Creative Commons Community and perform usability tests. Any feed back gathered from the usability tests will then be used to further improve the CC Base docs.&lt;/p&gt;
&lt;p&gt;In future iterations of the docs development we hope to include the following features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increase the quantity of illustrative media so as to make the docs more intuitive this will be marked by adding video tutorials on how to use certain features of the CC Base theme and also adding illustrative tree diagrams to explain hierarchy of key directories and files in the CC Base project structure.&lt;/li&gt;
&lt;li&gt;Integration of &lt;a href="https://www.algolia.com/"&gt;Algolia&lt;/a&gt; a software tool used to power search functionality in static generated sites.&lt;/li&gt;
&lt;li&gt;We also hope to improve SEO for the site.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All the above mentioned improvements are geared at improving the overall user experience of the docs as well as ensure faster onboarding for our community members to get started on using the CC Base theme.&lt;/p&gt;
&lt;p&gt;In conclusion I would like to thank all members of the Creative Commons engineering team, with special mention to Hugo Solar and Kriti Godey. Thank you for your guidance and faith in my abilities as a technical writer and software developer.&lt;/p&gt;
</content></entry><entry><title>Vocabulary Site Updates (Part 3/n)</title><link href="http://opensource.creativecommons.org/blog/entries/cc-vocabulary-docs-updates-3/" rel="alternate"></link><updated>2020-11-25T00:00:00Z</updated><author><name>['nimishbongale']</name></author><id>urn:uuid:9b0d8e22-8041-3b9a-a0dc-66d44cdb924f</id><content type="html">&lt;p&gt;Excited to know more about this week's vocabulary site updates? Read on to find out!&lt;/p&gt;
&lt;h2 id="vocabulary-site-updates-edition-3/many-more-to-come"&gt;Vocabulary Site Updates (Edition 3/many more to come)&lt;/h2&gt;&lt;h3 id="what-i-ve-been-up-to"&gt;What I've been up to&lt;/h3&gt;&lt;center&gt;
&lt;img alt"Halfway There" src="merged.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;The surreal feeling...&lt;/small&gt;
&lt;/center&gt;&lt;p&gt;Merged? Yes. &lt;strong&gt;Merged&lt;/strong&gt;. Here's my story!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;After getting a thumbs up from the UX Designer, I put up my &lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/747"&gt;GSoD Website PR&lt;/a&gt; for review.&lt;/li&gt;
&lt;li&gt;I was confident there would be changes, and I let them roll in. It's important to note here that what seems perfect to you may not be so to others, and only experience teaches you the right from the wrong.&lt;/li&gt;
&lt;li&gt;There were a few of them, mainly dealing with spacing, textual content and colors. I resolved them as soon as I could.&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/authors/zackkrida/"&gt;zackkrida&lt;/a&gt; has been kind enough to point and enumerate all of them for me!&lt;/li&gt;
&lt;li&gt;After receiving a final approval from the engineering team, my PR was finally merged!&lt;/li&gt;
&lt;li&gt;The final draft of the vocabulary site is live! It will soon be deployed (on &lt;a href="https://netlify,com"&gt;Netlify&lt;/a&gt;) and be made available for public viewing.&lt;/li&gt;
&lt;li&gt;For my readers, here's &lt;a href="https://cc-vocab-draft.web.app"&gt;exclusive preview&lt;/a&gt; of the final draft.&lt;/li&gt;
&lt;li&gt;I've tried making it as optimised as possible, but if you have any inputs whatsoever feel free to raise issues over on our &lt;a href="https://github.com/creativecommons/vocabulary"&gt;GitHub repository&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The famed &lt;a href="https://developers.google.com/web/tools/lighthouse"&gt;Lighthouse report&lt;/a&gt; suggests that it's a pretty good start! I've also taken care of the &lt;a href="https://www.w3.org/standards/webdesign/accessibility"&gt;accessibility aspect&lt;/a&gt; wherever applicable.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img alt"Halfway There" src="light.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Aiming high!&lt;/small&gt;
&lt;/center&gt;&lt;h3 id="what-i-ve-learnt"&gt;What I've learnt&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;GSoD isn't just about documentation; there's some serious amount of coding too!&lt;/li&gt;
&lt;li&gt;You don't have to sit and write code for hours together. Take breaks, come back, and the fix will strike you sooner than ever.&lt;/li&gt;
&lt;li&gt;Timelines change; improvisation being an essential aspect of any project!&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mdxjs.com/"&gt;MDX&lt;/a&gt; is a neat little format to code in! Documenting code is just so much easier.&lt;/li&gt;
&lt;li&gt;Things become obsolete. Versions become outdated. Code maintaining is therefore, easier said than done!&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="other-community-work-tidbits"&gt;Other community work tidbits&lt;/h3&gt;&lt;p&gt;Being a part of an open source organisation also means that I must try to bring in contributions from existing &amp;amp; first time contributors. Here's a peek into my efforts for the same:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/806"&gt;dark mode PR&lt;/a&gt; started off as a hacktoberfest contribution, and it is now complete!&lt;/li&gt;
&lt;li&gt;Created a &lt;code&gt;/shared&lt;/code&gt; package to house common files between packages (such as the dark &amp;amp; light theme after referring to the &lt;a href="https://reactjs.org/"&gt;React&lt;/a&gt; documentation.&lt;/li&gt;
&lt;li&gt;The automated npm &lt;a href="https://github.com/cc-archive/vocabulary-legacy/pull/746"&gt;README.md customisation&lt;/a&gt; is now up and running. (really had a blast solving that issue!)&lt;/li&gt;
&lt;li&gt;If the snapshot testing stands approved, we'll have it running on chromatic!&lt;/li&gt;
&lt;li&gt;Raised issues to add multiple badges to the root README.md  file; namely &lt;code&gt;maintained with Lerna&lt;/code&gt; &amp;amp; custom badges for package sizes from &lt;a href="https://packagephobia.com/"&gt;packagephobia&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p align="center"&gt;
    &lt;strong&gt;Thank you for your time! Stay put for the season finale!&lt;/strong&gt;
&lt;/p&gt;</content></entry><entry><title>Finish GSoD Tasks and Explore CC Catalog Documentation</title><link href="http://opensource.creativecommons.org/blog/entries/finish-gsod-tasks-and-explore-cc-catalog-documentation/" rel="alternate"></link><updated>2020-11-20T00:00:00Z</updated><author><name>['ariessa']</name></author><id>urn:uuid:d5ea49f0-63f2-3466-93e0-7c513b4dc2d6</id><content type="html">&lt;p&gt;Today marks my fifth blog entry on Creative Commons.
For week 9 and 10, I explored CC Catalog documentation and began improving the documentation by removing keys and generalizing instructions.&lt;/p&gt;
&lt;h3 id="week-9"&gt;Week 9&lt;/h3&gt;&lt;p&gt;I had completed all GSoD tasks by week 9.
So, I took a couple of days off and fixed last week's PR.
Kriti assigned me with new tasks, which is to port CC Catalog documentation from the internal wiki into GitHub repository.
Brent, the CC Catalog maintainer explained to me about what needs to be done.&lt;/p&gt;
&lt;h3 id="week-10"&gt;Week 10&lt;/h3&gt;&lt;p&gt;For week 10, I started exploring CC Catalog and its documentation.
Reminds me a lot about the first and second weeks of GSoD.
Trying to understand new stuff and having an "aha" moment when the dots finally connect.
I started to move the documentation from the internal wiki to CC Catalog’s GitHub repository.
I also wrote and published this blog post.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;End of blog entry.&lt;/p&gt;
</content></entry><entry><title>Content Creation Phase: WordPress Base Theme Usage Guide</title><link href="http://opensource.creativecommons.org/blog/entries/cc-wp-base-theme-docs-content-creation/" rel="alternate"></link><updated>2020-11-10T00:00:00Z</updated><author><name>['JackieBinya']</name></author><id>urn:uuid:bb3168cf-7243-320e-a8e2-0586030dc3a7</id><content type="html">&lt;p&gt;For the past couple of weeks we have been actively creating content for the Creative Commons WordPress Base Theme Usage Guide. Currently the draft content is under final review before it is migrated to the main docs site.&lt;/p&gt;
&lt;h2 id="our-strategy"&gt;Our Strategy&lt;/h2&gt;&lt;p&gt;Our main goal in creating the docs is to create rich, intuitive, engaging, and beautifully presented community facing documentation for the Creative Commons WordPress Base Theme.&lt;/p&gt;
&lt;p&gt;In alignment with the defined goal, our core focus is to create the docs collaboratively.&lt;/p&gt;
&lt;p&gt;The CC WordPress team consists of I, Jacqueline Binya, Hugo Solar and Timid Robot Zehta. Although our team is small it is quite diverse. It consists of a diverse mix of technical skills: I am a junior developer whereas Hugo and Timid are way senior. We also have non-native and native English speakers.&lt;/p&gt;
&lt;p&gt;Diversity is important as we hope to create a high quality product that caters for everyone.&lt;/p&gt;
&lt;p&gt;My role as the tech writer/frontend developer is to create the content: write the documentation, build the docs site and also to create all illustrative media.&lt;/p&gt;
&lt;p&gt;During the content creation phase, the first step involved creating the skeleton of the actual docs site. We created a git branch called &lt;em&gt;docs&lt;/em&gt; within the &lt;a href="https://github.com/creativecommons/wp-theme-base"&gt;creative-commons/wp-base-theme&lt;/a&gt; repository. All content related to the documentation is persisted in that branch. So,please feel free to contribute. We then used &lt;a href="https://gridsome.org/starters/jamdocs/"&gt;JamDocs&lt;/a&gt;, a &lt;a href="https://gridsome.org/"&gt;Gridsome&lt;/a&gt; theme to quickly scaffold the site. We had to adapt the theme so as to make it meet our own specific needs, this involved overhauling the styles and changing the functionality of some of the features in the theme. After that was completed, we then created a &lt;a href="https://docs.google.com/document/d/1yfAQGG70T8BUhZYWglAlQ_lTo4_tYpyjhPN5FsZnSvI/edit?usp=sharing"&gt;Google Doc&lt;/a&gt; we use for collaboratively writing the draft content for the docs site.&lt;/p&gt;
&lt;h2 id="tech-stack"&gt;Tech Stack&lt;/h2&gt;&lt;p&gt;As it was mentioned we used &lt;a href="https://gridsome.org/"&gt;Gridsome&lt;/a&gt; a static generator for &lt;a href="https://vuejs.org/"&gt;Vuejs&lt;/a&gt;. We chose Gridsome because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;We wanted to lower the barrier of entry to contributing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gridsome/Vuejs community is very active, help is but a click away.&lt;/li&gt;
&lt;li&gt;The Gridsome official documentation is very resourceful and well maintained.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gridsome is highly flexible: The content for the actual documentation is written in &lt;a href="https://www.markdownguide.org/getting-started/"&gt;Markdown&lt;/a&gt; but using &lt;a href="https://gridsome.org/plugins/@gridsome/vue-remark"&gt;@gridsome/vue-remark&lt;/a&gt;, which is a Gridsome plugin, we are able to use javascript in Markdown. We intend to include a copy to the clipboard Vuejs component in the site.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Time Constraint: This project is a short running project which has to be completed in a 3 month period. Through the use of JamDocs, a Gridsome templating theme as well various plugins it was easy and fast to get started we were able to add more functionality to the theme with minimal effort.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ease of integrating &lt;a href="https://cc-vocabulary.netlify.app/"&gt;CC Vocabulary&lt;/a&gt; with Gridsome: it is a requirement that the general aesthetics of all front facing Creative Commons applications is derived from the CC Vocabulary Design System. Major cons for using a design system include the ensuring uniformity in design for all front facing CC products.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tools-used"&gt;Tools Used&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.figma.com/"&gt;Figma&lt;/a&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="/blog/entries/cc-wp-base-theme-docs-content-creation/image.png" alt="An example of illustrative media"&gt;&lt;/p&gt;
&lt;p&gt;Figma was used to make assets(banners, logos and illustrations) in the theme. The illustrative media was created with accessibility in mind and all the topography used in the illustrative assets was derived from the CC Vocabulary.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://linuxecke.volkoh.de/vokoscreen/vokoscreen.html"&gt;VokoScreenNG&lt;/a&gt;: an open source screencast recording tool used to record all the screen cast demos available in the docs site.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://shotcut.org/"&gt;ShortCut&lt;/a&gt;: an open source video editing tool.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-comes-next"&gt;What comes next ?&lt;/h2&gt;&lt;p&gt;After the final review is completed and all feedback implemented we will migrate all the content to the main docs site.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stay tuned for an update about the launch of the CC WP Base Theme Docs site.&lt;/em&gt;&lt;/p&gt;
</content></entry><entry><title>Vocabulary Site Mid-Internship Update (v2)</title><link href="http://opensource.creativecommons.org/blog/entries/cc-vocabulary-docs-updates-2/" rel="alternate"></link><updated>2020-11-09T00:00:00Z</updated><author><name>['nimishbongale']</name></author><id>urn:uuid:9d7836e4-5dca-3b69-b77e-ba196ec923cc</id><content type="html">&lt;p&gt;This is a mid-internship blog post. Wait. what!? Already? Let's glance over my progress, shall we?&lt;/p&gt;
&lt;h2 id="vocabulary-site-updates-edition-2/many-more-to-come"&gt;Vocabulary Site Updates (Edition 2/many more to come)&lt;/h2&gt;&lt;p&gt;Oh boy! 1.5 months have passed since I've been investing time in building a landing site &amp;amp; usage guide for CC Vocabulary. A lot has changed since the time of posting my last blog post. &lt;strong&gt;A lot&lt;/strong&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img alt"Halfway There" src="speed.gif"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;Hitting "the point of no return" has never been this exciting! Time to step on the throttle! Source: &lt;a href="https://cliply.co"&gt;Cliply&lt;/a&gt;&lt;/small&gt;
&lt;/center&gt;&lt;h3 id="what-i-ve-been-up-to"&gt;What I've been up to&lt;/h3&gt;&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;Designing&lt;/strong&gt; ⇨ &lt;strong&gt;Drafting&lt;/strong&gt; ⇨ &lt;strong&gt;Developing&lt;/strong&gt; ⇨ &lt;strong&gt;Debugging&lt;/strong&gt; ⇨ &lt;strong&gt;Deploying&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And the cycle contrinues. I guess it sums it all up very nicely. &lt;em&gt;Can somebody appreciate the alliteration though?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Here's a gist of what I've achieved so far:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I've gone through &lt;strong&gt;2&lt;/strong&gt; iterations of the design. I'm happy with how the new site looks (and I genuinely hope the design team does too!).&lt;/li&gt;
&lt;li&gt;I've drafted around &lt;strong&gt;5+&lt;/strong&gt; writeups dealing with Monorepo Migration, Getting Started guide, Vocabulary Overview and of course these blog posts.&lt;/li&gt;
&lt;li&gt;My branch on the vocabulary repository now has over &lt;strong&gt;50+&lt;/strong&gt; commits &amp;amp; over &lt;strong&gt;13,000&lt;/strong&gt; lines of code (not that I've written all of them, but you know, just for the stats)&lt;/li&gt;
&lt;li&gt;The first draft of the vocabulary site is now live! I'm expecting a whole bunch of changes still, but here it is if you want to have a sneak peek: &lt;a href="https://cc-vocab-draft.surge.sh"&gt;https://cc-vocab-draft.surge.sh&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I've consumed the &lt;a href="https://docs.github.com/en/free-pro-team@latest/rest"&gt;Github API&lt;/a&gt; to get live release history, forks and starrers count. I think it adds a really nice touch to the site in general.&lt;/li&gt;
&lt;li&gt;I've used &lt;a href="https://surge.sh"&gt;surge.sh&lt;/a&gt; to deploy the draft site. I believe it's a really simple tool to have your site deployed within seconds!&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img alt"Github commit gif" src="github.png"/&gt;&lt;br&gt;
&lt;small class="muted"&gt;My github contribution chart is filling up!&lt;/small&gt;
&lt;/center&gt;&lt;h3 id="what-i-ve-learnt"&gt;What I've learnt&lt;/h3&gt;&lt;p&gt;Some say it's hard to learn through virtual internships. Well, let me prove you wrong. Here are my leanings in the past few weeks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It's surprising how subjective (&amp;amp; yet objective) designing really is.&lt;/li&gt;
&lt;li&gt;Vue.js is &lt;em&gt;fantastic&lt;/em&gt;. Maybe I'm a Vue.js fan now. Should I remain loyal to React? I don't know.&lt;/li&gt;
&lt;li&gt;Making a site responsive isn't the &lt;em&gt;easiest&lt;/em&gt; of tasks, but it's certainly doable after a lot of stretching &amp;amp; compressing; lets say that.&lt;/li&gt;
&lt;li&gt;"Code formatting is essential" would be an &lt;em&gt;understatement&lt;/em&gt; to make.&lt;/li&gt;
&lt;li&gt;Monorepo's have their own pro's and con's. But in our case the con's were negligible, thankfully!&lt;/li&gt;
&lt;li&gt;I'll be following up with some performance &amp;amp; accessibility testing this coming week, so let's see how that plays out!&lt;/li&gt;
&lt;li&gt;A mentor plays a vital role in any project. My mentor &lt;code&gt;@dhruvkb&lt;/code&gt; has been very supportive and has made sure I stick to my timeline!&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="other-community-work-tidbits"&gt;Other community work tidbits&lt;/h3&gt;&lt;p&gt;I believe apart from the internship work that I'm engaged in, I should also help around with some community PR work. I've been told I'm always welcome to, which is great!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I got the opportunity to speak at a CCOS event alongwith fellow speakers &lt;a href="/blog/authors/dhruvkb/"&gt;dhruvkb&lt;/a&gt; &amp;amp; &lt;a href="/blog/authors/dhruvi16/"&gt;dhruvi16&lt;/a&gt;. I had a blast talking to budding students from DSC-IIT Surat &amp;amp; DSC-RIT.&lt;/li&gt;
&lt;li&gt;The dark mode (as promised) should be out before my next blog post.&lt;/li&gt;
&lt;li&gt;Deployed the vocabulary storybook on &lt;a href="https://chromatic.com"&gt;Chromatic&lt;/a&gt; and compared &amp;amp; contrasted the pros &amp;amp; cons. Snapshot testing in the near future maybe?&lt;/li&gt;
&lt;li&gt;Completed the hacktoberfest challenge.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="bonus-content"&gt;Bonus content&lt;/h3&gt;&lt;p&gt;Not many of you may know this, but this site uses the &lt;a href="https://getlektor.com"&gt;Lektor&lt;/a&gt; CMS. I needed to have it installed on my system (windows 10 OS) to run the code in our site repository.
Lektor suggests running the following code in powershell as an installation step:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;new-object&lt;/span&gt; &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;webclient&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;DownloadString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;https://www.getlektor.com/installer.py&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I just didn't think this is a very elegant way. Being an ardent &lt;a href="/blog/entries/cc-vocabulary-docs-updates-2/chocolatey.org"&gt;chocolatey.org&lt;/a&gt; fan, I just had to have it up on there! Now the installation step for lektor is simply:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;choco&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;lektor&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;on the Windows PowerShell!&lt;/p&gt;
&lt;p&gt;Have a look at the package here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://chocolatey.org/packages/lektor"&gt;https://chocolatey.org/packages/lektor&lt;/a&gt;&lt;/p&gt;
&lt;p align="center"&gt;
    &lt;strong&gt;Thank you for your time! Stay put for the next Vocabulary site update!&lt;/strong&gt;
&lt;/p&gt;</content></entry><entry><title>Restructure README and Add Documentation Guidelines</title><link href="http://opensource.creativecommons.org/blog/entries/restructure-readme-and-add-documentation-guidelines/" rel="alternate"></link><updated>2020-11-05T00:00:00Z</updated><author><name>['ariessa']</name></author><id>urn:uuid:98db0051-b46f-3e4b-ae18-a56418565d50</id><content type="html">&lt;p&gt;This is my fourth blog entry on Creative Commons.
For week 7 and 8, I restructured the file README to be more digestible to new users and created Documentation Guidelines for CC Catalog API documentation.&lt;/p&gt;
&lt;h3 id="week-7"&gt;Week 7&lt;/h3&gt;&lt;p&gt;For this week, I restructured the file README in CC Catalog API repository.
I added a step by step guide on how to run the server locally.
I hope new users will be less intimidated to contribute to this project with the updated guide on how to run the server locally.&lt;/p&gt;
&lt;h3 id="week-8"&gt;Week 8&lt;/h3&gt;&lt;p&gt;For week 8, I created Documentation Guidelines which provides steps on how to contribute to CC Catalog API documentation, documentation styles, and cheat sheet for drf-yasg.
I also wrote and published this blog post.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Finis.&lt;/p&gt;
</content></entry><entry><title>Vocabulary Site Updates (v1)</title><link href="http://opensource.creativecommons.org/blog/entries/cc-vocabulary-docs-updates-1/" rel="alternate"></link><updated>2020-10-26T00:00:00Z</updated><author><name>['nimishbongale']</name></author><id>urn:uuid:51a44307-9725-3d59-bfcd-9badc7cce229</id><content type="html">&lt;p&gt;Hello there! Well well well. It has been an eventful first few weeks, to say the least! Let's gauge my progress, shall we?&lt;/p&gt;
&lt;h2 id="vocabulary-site-updates-edition-1/many-more-to-come"&gt;Vocabulary Site Updates (Edition 1/many more to come)&lt;/h2&gt;&lt;h3 id="what-i-ve-been-upto"&gt;What I've been upto&lt;/h3&gt;&lt;p&gt;I've mainly got myself invested in a survey of the existing documentation that vocabulary currently possesses, and find places where it could be made better. After clearing those issues out, I began building the main landing site for &lt;code&gt;Vocabulary&lt;/code&gt;, &lt;code&gt;Vue-vocabulary&lt;/code&gt; and &lt;code&gt;Fonts&lt;/code&gt;. It wasn't particularly difficult to establish the necessary workflows as I had done something similar before. During the process of designing the basic structure of the site, I came across a few instances where I felt we needed new/improved components &amp;amp; I discussed the same with my team over on the sprint calls. The design of the site is nearly done. I'm also building the site parallelly &amp;amp; seeking approval from the CC Design Team. I've gotten myself involved in multiple other community contributions to CC as well across multiple of our repositories.&lt;/p&gt;
&lt;h3 id="what-i-ve-learnt"&gt;What I've learnt&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Knowing the previously existing code in your project is of serious essence. It's important to understand the code styles, structure &amp;amp; activity of the code that you are dealing with.&lt;/li&gt;
&lt;li&gt;Be patient! Its fine to delay something if it makes sense to have it logically accomplished only after certain other tasks are done &amp;amp; dusted with.&lt;/li&gt;
&lt;li&gt;How essential it is to write &lt;em&gt;neat code&lt;/em&gt; is something that's not spoken too often. (I wonder why...)&lt;/li&gt;
&lt;li&gt;I always thought VueJS sets up SPA's by default. I'm surprised you need to configure it additionally to do just that!&lt;/li&gt;
&lt;li&gt;Storybook is just a really nifty OSS with great community support!&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="other-community-work-tidbits"&gt;Other community work tidbits&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;I've been working on the Dark Mode (a much awaited feature, at least for me!) for our storybooks with some support from our community. It should be up and running shortly!&lt;/li&gt;
&lt;li&gt;Fixed some formatting bugs in the &lt;code&gt;README.md&lt;/code&gt; &amp;amp; suggested changes wrt to &lt;code&gt;npm v7&lt;/code&gt; considerations.&lt;/li&gt;
&lt;li&gt;Fixed storybook components docs for 2 features.&lt;/li&gt;
&lt;li&gt;Raised a ticket for a component to render markdown text within vocabulary itself.&lt;/li&gt;
&lt;li&gt;Raised a few other issues for potential hacktoberfest contributions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p align="center"&gt;
    &lt;strong&gt;Thank you for your time! To be continued...&lt;/strong&gt;
&lt;/p&gt;</content></entry><entry><title>Add New Sections, Descriptions, Help Texts, Code Examples, Schemas, and Serializers</title><link href="http://opensource.creativecommons.org/blog/entries/add-new-sections-descriptions-help-texts-code-examples-schemas-and-serializers/" rel="alternate"></link><updated>2020-10-21T00:00:00Z</updated><author><name>['ariessa']</name></author><id>urn:uuid:b832956d-0c04-37e5-bde6-eb53722a4ba3</id><content type="html">&lt;p&gt;Welcome to my third blog entry! For week 5 and 6, I added new sections, descriptions, help texts, code examples, schemas, and serializers. I was so productive these past two weeks.&lt;/p&gt;
&lt;h3 id="week-5"&gt;Week 5&lt;/h3&gt;&lt;p&gt;For this week, I managed to add a lot of stuff into the documentation.
I figured out how to add help texts to classes and how to create serializers.
I also managed to move all code examples under response samples.
In order to do this, I created a new class called CustomAutoSchema to add &lt;a href="https://github.com/Redocly/redoc/blob/master/docs/redoc-vendor-extensions.md#x-codesamples"&gt;x-code-samples&lt;/a&gt;.
Other stuff that I did include creating new sections such as “Register and Authenticate” and “Glossary”.
The hardest part of this week is probably trying to figure out how to add request body examples and move code examples.&lt;/p&gt;
&lt;h3 id="week-6"&gt;Week 6&lt;/h3&gt;&lt;p&gt;For week 6, I added another section called Contribute that provides a todolist to start contributing on Github.
I also wrote and published this blog post.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;All caught up!&lt;/p&gt;
</content></entry><entry><title>Add Response Samples and Descriptions for API Endpoints</title><link href="http://opensource.creativecommons.org/blog/entries/add-response-samples-and-descriptions-for-api-endpoints/" rel="alternate"></link><updated>2020-10-09T00:00:00Z</updated><author><name>['ariessa']</name></author><id>urn:uuid:23d6eb6e-e157-34fb-af99-ddfcad7c5044</id><content type="html">&lt;p&gt;Well, hello again 👋! For week 3 and week 4, I added response samples and descriptions for API endpoints. Writing documentation feels a bit like coding at this point because I need to read a lot about drf-yasg, dig through issues and questions at Github / Stackoverflow to ensure that I don’t ask redundant (or even stupid) questions.&lt;/p&gt;
&lt;h3 id="week-3"&gt;Week 3&lt;/h3&gt;&lt;p&gt;Week 3 was quite hectic. I moved back to my hometown during week 3.
Took 3 days off to settle my stuff and set up a workspace.
I worked on my GSoD project for only 2 days, Monday and Tuesday.
I managed to create response samples for most API endpoints.
Had a monthly video call with Kriti this week.&lt;/p&gt;
&lt;h3 id="week-4"&gt;Week 4&lt;/h3&gt;&lt;p&gt;For this week, I reviewed what I’ve done and what I haven’t to estimate new completion time.
Thank god, I have a buffer week in my GSoD timeline and deliverables.
So yeah, all is good in terms of completion time.
I started to write descriptions for API endpoints.
Submitted first PR and published blog entry.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Over and out.&lt;/p&gt;
</content></entry><entry><title>Vocabulary Site &amp; Usage Guide Introduction (GSoD'20)</title><link href="http://opensource.creativecommons.org/blog/entries/cc-vocabulary-docs-intro/" rel="alternate"></link><updated>2020-10-02T00:00:00Z</updated><author><name>['nimishbongale']</name></author><id>urn:uuid:4974b78b-a4f9-3bbc-8875-d595042b954c</id><content type="html">&lt;p&gt;Hey there! I'm Nimish Bongale, a Technical Writer &amp;amp; Software Developer based out of Bangalore, India. My other hobbies include playing chess and the guitar. I look forward to build the CC Vocabulary site and usage guides as a part of GSoD'20.&lt;/p&gt;
&lt;h2 id="but-what-is-gsod"&gt;But what is GSoD?&lt;/h2&gt;&lt;p&gt;GSoD, or Google Season of Docs, is a program that stresses on the importance of the documentation aspect of Open Source projects. It invites technical writers from across the world to submit proposals based on projects floated in by the participating Open Source Organisations. The selected technical writers then work with the their respective organisations and look to complete their work by the end of their internship period. More information about the same can be found &lt;a href="https://developers.google.com/season-of-docs"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let's talk a bit about my project, shall we?&lt;/p&gt;
&lt;h2 id="vocabulary-site-usage-guide"&gt;Vocabulary Site &amp;amp; Usage Guide&lt;/h2&gt;&lt;h3 id="introduction"&gt;Introduction&lt;/h3&gt;&lt;p&gt;&lt;a href="https://github.com/creativecommons/vocabulary"&gt;CC Vocabulary&lt;/a&gt; is a cohesive design system &amp;amp; Vue component library to unify the web-facing Creative Commons. It's currently comprised of 3 packages, namely Vocabulary, Vue-Vocabulary &amp;amp; Fonts. My contribution to this project would majorly involve building the landing site for CC Vocabulary, and refactor the documentation wherever necessary.&lt;/p&gt;
&lt;h3 id="what-drives-me"&gt;What drives me&lt;/h3&gt;&lt;p&gt;Documentation is one of the primary reasons which determines how successful a certain open source library will be. The major question that developers think of while choosing a suitable tech stack to build their applications is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is the library &lt;em&gt;well documented&lt;/em&gt;?&lt;/li&gt;
&lt;li&gt;Is it &lt;em&gt;well maintained&lt;/em&gt;?&lt;/li&gt;
&lt;li&gt;Does it have some &lt;em&gt;considerable usage and error support&lt;/em&gt;?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are exactly the questions I should be asking myself while going about this project idea.&lt;/p&gt;
&lt;p&gt;As aforementioned, there is an immanent need to have a concise and consolidated documentation. The lack of documentation hurts the future perspectives of open source applications, and is by far, an essential and non-negligible component. Linking to these documentations should be an appealing home page, which captures the interest of the people in an instant. The documentation should be well organised, thereby enabling a seamless flow through it.&lt;/p&gt;
&lt;h3 id="tech-stack-of-the-project"&gt;Tech stack of the project&lt;/h3&gt;&lt;p&gt;We have decided to move forward with &lt;a href="https://vuejs.org/"&gt;Vuejs&lt;/a&gt; for building the site, and continue work on the existing &lt;a href="https://storybook.js.org/"&gt;storybooks&lt;/a&gt; of Vocabulary, Vue-Vocabulary and Fonts. Storybookjs has had some great improvements in recent times, and the new addons that are offered will greatly support my work. Besides these, I will also be using &lt;a href="https://stackedit.io/"&gt;StackEdit&lt;/a&gt; to write and share Markdown files of my writings.&lt;/p&gt;
&lt;h3 id="progress-baby-steps"&gt;Progress - Baby Steps&lt;/h3&gt;&lt;p&gt;I have contributed to CC in the past. It would now be my first time contributing to a specific project within CC, while being a member of CC Open Source. Some tasks that I've been able to initiate/accomplish so far:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Look at Open Source documentation conventions, and see if we violate any.&lt;/li&gt;
&lt;li&gt;Understand the level of existing documentation currently present in our storybooks.&lt;/li&gt;
&lt;li&gt;Discuss about the Monorepo migration and help out with the implementation.&lt;/li&gt;
&lt;li&gt;Migrate &lt;code&gt;storybookjs&lt;/code&gt; to the latest version.&lt;/li&gt;
&lt;li&gt;Implement &lt;code&gt;addon-controls&lt;/code&gt; for vocabulary.&lt;/li&gt;
&lt;li&gt;Design the vocabulary site.&lt;/li&gt;
&lt;li&gt;Promote the involvement of CC Open Source in &lt;a href="https://hacktoberfest.digitalocean.com/"&gt;Hacktoberfest&lt;/a&gt; 2020.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-did-i-learn"&gt;What did I learn?&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Design is more than just picking colors and placing components on a grey screen.&lt;/li&gt;
&lt;li&gt;It's important to read your own writings from an unbiased perspective to actually understand how well it would be perceived.&lt;/li&gt;
&lt;li&gt;Interacting with your mentor on a regular basis is of the absolute essence.&lt;/li&gt;
&lt;li&gt;Publishing to &lt;a href="/blog/entries/cc-vocabulary-docs-intro/npmjs.com"&gt;npmjs&lt;/a&gt; is not difficult!&lt;/li&gt;
&lt;/ul&gt;
&lt;p align="center"&gt;
    &lt;strong&gt;Thank you for your time!&lt;/strong&gt;
&lt;/p&gt;</content></entry><entry><title>Creative Commons WordPress plugin: attribution for images</title><link href="http://opensource.creativecommons.org/blog/entries/cc-wp-plugin-attribution-for-images/" rel="alternate"></link><updated>2020-10-01T00:00:00Z</updated><author><name>['rczajka']</name></author><id>urn:uuid:818016d3-7344-3937-9fd6-7e5ffad98071</id><content type="html">&lt;p&gt;As a part of &lt;a href="https://centrumcyfrowe.pl"&gt;Centrum Cyfrowe&lt;/a&gt;'s &lt;a href="https://otwartakultura.org/noworries/"&gt;#NoWorries project&lt;/a&gt; funded by EUIPO,
I have had the pleasure of enhancing the Creative Commons Wordpress plugin.
The new version of CC's Wordpress plugin has a feature called
“attribution information for images”. It works like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;you upload an image to the Wordpress Media Library and fill out the
correct attribution information there.&lt;/li&gt;
&lt;li&gt;You then insert the image into a page using the Image Gutenberg block.&lt;/li&gt;
&lt;li&gt;When the image is then displayed on site, the plugin will show the
attribution information – the name of the author, the image's title
and link to source, and the CC license used – right there, in a nice
semi-transparent overlay over the image.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="how-does-it-work"&gt;How does it work?&lt;/h2&gt;&lt;p&gt;To find the relevant information from the Media Library, the plugin
reuses the information already provided by Gutenberg Image Blocks.
Each time an image is inserted using such a block, Wordpress adds a
special CSS class to it, in the form of &lt;code&gt;wp-image-{id}&lt;/code&gt;, containing
the image's identifier in the Media Library. It can be used to add
individual styles to a specific image – we're using it to find the
relevant entry in the Media Library and add individual attribution
information. With this approach, we avoid the need for any custom
markup – while also only hitting the database with a query when an
actual image from the Media Library is found on the page.&lt;/p&gt;
&lt;p&gt;All you need to do is make sure the licensing information is there in
the Media Library, and that the images are inserted using the Image
block.&lt;/p&gt;
&lt;p&gt;This wasn't the first attempt at adding a similar function to the CC
Wordpress plugin. The previous attempt used a &lt;code&gt;[license]&lt;/code&gt; shortcode
wrapping the image – which it's unwieldy with the current Wordpress
Gutenberg editor. It also used multiple calls to
&lt;code&gt;attachment_url_to_postid&lt;/code&gt; to locate the image in the Media Library, which
meant executing more database queries for each image. With the new
approach, the user doesn't have to change their posts at all – all
they need to do is install the plugin and add attribution information
in the Media Library, and it will automatically start working for
their normally inserted images.&lt;/p&gt;
&lt;p&gt;See here how to install the plugin:&lt;/p&gt;
&lt;video src="install.mp4" controls&gt;&lt;/video&gt;&lt;p&gt;See here how to use the image attribution function:&lt;/p&gt;
&lt;video src="use.mp4" controls&gt;&lt;/video&gt;</content></entry><entry><title>WordPress Base Theme Usage Guide (GSOD-2020): Hello World!</title><link href="http://opensource.creativecommons.org/blog/entries/cc-wp-base-theme-docs-intro/" rel="alternate"></link><updated>2020-09-30T00:00:00Z</updated><author><name>['JackieBinya']</name></author><id>urn:uuid:8ca33f24-33d1-3bdb-a55d-5fb6381316f5</id><content type="html">&lt;p&gt;My name is Jacqueline Binya. I am a software developer and technical writer from Zimbabwe. I am going to write a series of blog posts documenting my experience and lessons as I contribute to the &lt;a href="https://github.com/creativecommons/wp-theme-base"&gt;Creative Commons WordPress Base Theme(CC WP Base Theme)&lt;/a&gt; during the &lt;a href="https://developers.google.com/season-of-docs"&gt;Google Season of Docs (GSOD-2020)&lt;/a&gt; as a technical writer.&lt;/p&gt;
&lt;h2 id="what-is-google-season-of-docs"&gt;What is Google Season of Docs?&lt;/h2&gt;&lt;p&gt;The Google Season of the Docs was born out of a need to improve the quality of open-source documentation as well as to advocate for open source, for documentation, and for technical writing. Annually during the GSOD, technical writers are invited to contribute to open-source projects through a highly intensive process geared at ensuring that the technical writers and the projects they contribute to during GSOD are a good fit, after that has been determined GSOD then resumes.&lt;/p&gt;
&lt;h2 id="building-the-docs"&gt;Building the docs&lt;/h2&gt;&lt;p&gt;The CC WP Base theme is a WordPress theme used to create front-facing Creative Commons (CC) websites. My task is to collaborate with the engineering team to create community facing docs for the theme.&lt;/p&gt;
&lt;h3 id="guiding-principles"&gt;Guiding principles&lt;/h3&gt;&lt;p&gt;The docs should be inclusive meaning: they should be written in an easy-to-understand manner taking care to avoid the use of excessive technical jargon, they should be accessible and they should have support for internationalization. We hope to provide our users with a smooth and memorable experience whilst using the docs hence the docs site should be fast and easy to navigate.&lt;/p&gt;
&lt;h3 id="technical-stack-of-the-project"&gt;Technical stack of the project&lt;/h3&gt;&lt;p&gt;We decided to build the docs using &lt;a href="https://jamstack.org/"&gt;Jamstack&lt;/a&gt;, to be specific we are using  &lt;a href="https://gridsome.org/"&gt;Gridsome&lt;/a&gt; a static generator for &lt;a href="https://vuejs.org/"&gt;Vuejs&lt;/a&gt;. We are using Gridsome as it is highly performant, and it also integrates smoothly with the &lt;a href="https://cc-vocabulary.netlify.app/"&gt;CC Vocabulary&lt;/a&gt;. Gridsome also has out-of-the-box support for important features like Google Analytics and &lt;a href="https://www.algolia.com/"&gt;Angolia&lt;/a&gt;, these features will obviously be useful in future iterations of the docs. To quickly scaffold the docs we used a Gridsome theme called &lt;a href="https://gridsome.org/starters/jamdocs/"&gt;JamDocs&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="progress"&gt;Progress&lt;/h3&gt;&lt;p&gt;Currently, the project is on track. As it's been stated we are creating the docs collaboratively. The very first step in our workflow is to create draft content using Google docs. That task is assigned to me, it involves doing lots of research, reading and also testing out the theme. Afterwards, my mentors Hugo Solar and Timid Robot Zehta then give me feedback on the draft. Then I implement the feedback and continuously work on improvements. The final step is migrating the approved draft content to the docs projects in markdown format.&lt;/p&gt;
&lt;h3 id="my-lessons-so-far"&gt;My lessons so far:&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Always ask questions: frankly, the only way you can create good content is when you have a solid understanding of the subject matter.&lt;/li&gt;
&lt;li&gt;It's better to over-communicate than under-communicate especially when working in a remotely, this is especially more important if you encounter blockers whilst executing your work.&lt;/li&gt;
&lt;li&gt;Push that code and open PR quickly and then go ahead and ask for a review don't procrastinate this will ensure  fast turnover you get feedback quickly and can work on improvements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Thank you for reading, watch out for the next update which will be posted soon.&lt;/em&gt;&lt;/p&gt;
</content></entry><entry><title>Add Query Using curl Command and Provide Response Samples</title><link href="http://opensource.creativecommons.org/blog/entries/add-query-using-curl-command-and-provide-response-samples/" rel="alternate"></link><updated>2020-09-25T00:00:00Z</updated><author><name>['ariessa']</name></author><id>urn:uuid:2841326a-a1a6-3593-8236-10fc75b59aa0</id><content type="html">&lt;p&gt;First of all, I’m very thankful to get selected as a Google Season of Docs participant under Creative Commons. My project name is Improve CC Catalog API Usage Guide. The project aims to revamp the existing CC Catalog API documentation to include more narrative elements and increase user friendliness. As the focal point of this project will potentially be delivered before the end of the GSOD period, this project will also improve the CC Catalog API repo documentation for potential contributors. This project will also produce guidelines for contributing to documentation. For this project, my mentor is Alden Page.&lt;/p&gt;
&lt;h3 id="week-1"&gt;Week 1&lt;/h3&gt;&lt;p&gt;So, the first two weeks of Google Season of Docs have passed. For the first week, I added examples to perform the query using curl command. I hit some problem with a Forbidden error. Turns out my access key got expired. My problem was solved after obtaining a new access key.&lt;/p&gt;
&lt;h3 id="week-2"&gt;Week 2&lt;/h3&gt;&lt;p&gt;For the second week, I started to write response samples. It was tough as I have a hard time understanding &lt;a href="https://github.com/axnsan12/drf-yasg"&gt;drf-yasg&lt;/a&gt;, which is an automatic Swagger generator. It can produce Swagger / OpenAPI 2.0 specifications from a Django Rest Framework API. I tried to find as many examples as I could to increase my understanding. Funny, but it took me awhile to realise that drf-yasg is not made up of random letters. The DRF part stands for Django Rest Framework while YASG stands for Yet Another Swagger Generator.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;That’s all!&lt;/p&gt;
</content></entry><entry><title>The specifics - Revamping CCOS</title><link href="http://opensource.creativecommons.org/blog/entries/the-specifics-revamping-CCOS/" rel="alternate"></link><updated>2020-09-02T00:00:00Z</updated><author><name>['dhruvi16']</name></author><id>urn:uuid:013d291c-a7ee-3a05-8aef-87f357322155</id><content type="html">&lt;p&gt;In this blog, I will be talking about how I managed to use Vocabulary ( Creative Commons's Design Library ) efficiently in our Open Source website.&lt;/p&gt;
&lt;h3 id="what-is-vocabulary"&gt;What is Vocabulary?&lt;/h3&gt;&lt;p&gt;&lt;a href="https://cc-vocabulary.netlify.app/?path=/story/vocabulary-introduction--page"&gt;Vocabulary&lt;/a&gt; is a cohesive design system to unite the web-facing Creative Commons. In essence Vocabulary is a component library that uses and extends Bulma CSS library. Vocabulary makes it easier to develop Creative Commons apps while ensuring a consistently familiar experience. This project is still under development.&lt;/p&gt;
&lt;h3 id="why-vocabulary"&gt;Why Vocabulary?&lt;/h3&gt;&lt;p&gt;Vocabulary is used to describe the overall visual design of our digital products. At first glance, it appears to be: an amalgamation of component designs with a consistent visual aesthetic and brand, typically accompanied by usage guidelines in the form of online documentation. But there is a lot more to it.
When it comes to a large software community with a huge range of products, certain problems come along. One of those problems is maintaining the level of harmony across all the products of the network. So, there comes a need for a unified visual language that heightens the level of harmony in a digital ecosystem. And in our case, Vocabulary solves this problem.
This design system is well built and helps us bring the following aspects to the table -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Recognizability&lt;/li&gt;
&lt;li&gt;Consistency&lt;/li&gt;
&lt;li&gt;Authenticity&lt;/li&gt;
&lt;li&gt;Efficiency&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And many more.&lt;/p&gt;
&lt;h3 id="how-did-i-use-it-examples"&gt;How did I use it? — Examples&lt;/h3&gt;&lt;p&gt;As I stated before, I have added Vocabulary by updating all the Templates in the CCOS &lt;a href="https://www.getlektor.com/"&gt;Lektor&lt;/a&gt; project.&lt;/p&gt;
&lt;p&gt;As far as components are concerned, I just had to paste the code snippets given on the Vocabulary’s website with the requires changes -&lt;/p&gt;
&lt;h4 id="integration-of-breadcrumb"&gt;Integration of Breadcrumb -&lt;/h4&gt;&lt;figure style="text-align: center;"&gt;
    &lt;img src="breadcrumb.png" alt="Breadcrumb"&gt;
    &lt;figcaption&gt;Screenshot — &lt;a href="https://cc-vocabulary.netlify.app/?path=/docs/navigation-breadcrumb--default-story"&gt;Breadcrumb&lt;/a&gt; (Vocabulary)&lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;The code for integration —&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;!-- Breadcrumb --&amp;gt;
  {% if this._path !=  '/'%}
    &amp;lt;div class="breadcrumb-container"&amp;gt;
      &amp;lt;nav class="container breadcrumb caption bold" aria-label="breadcrumbs"&amp;gt;
        &amp;lt;ul&amp;gt;
          {% set crumbs = [] %}
          {% set current = {'crumb': this} %}
          &amp;lt;!-- Extracting the slugs of URL --&amp;gt;
          {% for i in this._path.split("/") %}
            {% if current.crumb is not none %}
              {% if crumbs.insert(0, current.crumb._slug) %}{% endif %}
              {% if current.update({"crumb": current.crumb.parent}) %}{% endif %}
            {% endif %}
          {% endfor %}
          {% for crumb in crumbs %}
            &amp;lt;!-- Active link --&amp;gt;
            {% if this._slug == crumb %}
              &amp;lt;li class="is-active"&amp;gt;&amp;lt;a aria-current="page displayed"&amp;gt;{{ crumb | title | replace('-', ' ') }}&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
            {% else %}
              &amp;lt;!-- Forming the URL using extracted slugs --&amp;gt;
              {% set i = loop.index %}
              {% set ns = namespace (link = '') %}
              {% for j in range(i) %}
                {% set ns.link = ns.link + crumbs[j] + '/' %}
              {% endfor %}
              &amp;lt;li&amp;gt;&amp;lt;a class="link" href="{{ ns.link|url }}"&amp;gt;
              {% if crumb != '' %}
                {{ crumb | title | replace('-', ' ') }}
              {% else %}
                Home
              {% endif %}
              &amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
            {% endif %}
          {% endfor %}
        &amp;lt;/ul&amp;gt;
      &amp;lt;/nav&amp;gt;
    &amp;lt;/div&amp;gt;
  {% endif %}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Other than the components, there are other visual elements like typography, colors, spacing, and others that are extensively used in CCOS.&lt;/p&gt;
&lt;p&gt;This is code for the Hero section of the home page.&lt;/p&gt;
&lt;h5 id="the-block-template"&gt;The block template -&lt;/h5&gt;&lt;pre&gt;&lt;code&gt;&amp;lt;section class="hero"&amp;gt;
  &amp;lt;div class="container"&amp;gt;
    &amp;lt;div class="hero-title column is-12 is-paddingless"&amp;gt;
      &amp;lt;h1&amp;gt;
        {{ this.title }}
      &amp;lt;/h1&amp;gt;
    &amp;lt;/div&amp;gt;
    &amp;lt;div class="columns"&amp;gt;
      &amp;lt;div class="column is-5"&amp;gt;
        &amp;lt;p class="hero-description"&amp;gt;
          {{ this.description }}
        &amp;lt;/p&amp;gt;
        {{ this.links }}
      &amp;lt;/div&amp;gt;
    &amp;lt;/div&amp;gt;
  &amp;lt;/div&amp;gt;
  &amp;lt;div class="level-right hero-image"&amp;gt;
    &amp;lt;img class="image" src="./github.svg" /&amp;gt;
  &amp;lt;/div&amp;gt;
&amp;lt;/section&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h5 id="the-block-styling"&gt;The block styling -&lt;/h5&gt;&lt;pre&gt;&lt;code&gt;// Hero section - Home page
.hero {
  @extend .margin-top-large;

  .hero-title {
    @extend .padding-horizontal-big;
  }

  .hero-description {
    @extend .body-bigger;
    @extend .padding-top-big;
    @extend .padding-horizontal-big;
  }

  .hero-links {
    @extend .margin-vertical-normal;
    @extend .padding-horizontal-big;

    .button {
      @extend .margin-top-normal;
      text-decoration: none;

      .icon {
        @extend .margin-right-small;
        @extend .padding-vertical-smaller;
      }
    }
  }

  .hero-image {
    @include from($fullhd) {
      margin-top: -20rem;
      .image {
        width: 50%;
      }
    }
    @include until($fullhd) {
      .image {
        width: 100%;
      }
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
    &lt;img src="output.png" alt="Output"&gt;
    &lt;figcaption&gt;Output&lt;/figcaption&gt;
&lt;/figure&gt;&lt;h3 id="improvements-in-the-lektor-project"&gt;Improvements in the Lektor project -&lt;/h3&gt;&lt;p&gt;I tried to write the perfect code that is cleaner and readable. I would try to demonstrate my effort using the home page code where I used &lt;a href="https://www.getlektor.com/docs/models/flow/"&gt;Lektor Flowblocks&lt;/a&gt;. The new homepage design have four sections where each section communicated something and I realized they were all independent and building the whole page through one single template would become a bit messy and hard to handle. So I did some research and found a way where I could build sub-templates and use them all to develop a single page and Lektor’s flowblocks allowed me to do so. Here is one of the flowblock and if you want to check out the whole working you can go to — &lt;a href="https://github.com/creativecommons/creativecommons.github.io-source"&gt;CCOS Repository&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="recent-blog-post-block"&gt;Recent Blog Post block -&lt;/h4&gt;&lt;h5 id="the-block-template"&gt;The block Template -&lt;/h5&gt;&lt;pre&gt;&lt;code&gt;{% from "macros/author_name.html" import render_author_name %}

&amp;lt;section class="recent-posts"&amp;gt;
  &amp;lt;div class="container"&amp;gt;
    &amp;lt;div class="level"&amp;gt;
      &amp;lt;h2 class="is-paddingless level-left"&amp;gt;
        {{ this.title }}
      &amp;lt;/h2&amp;gt;
      &amp;lt;span class="level-right"&amp;gt;
        &amp;lt;a class="posts-link" href="/blog"&amp;gt;See all posts &amp;lt;i class="icon angle-right"&amp;gt;&amp;lt;/i&amp;gt;&amp;lt;/a&amp;gt;
      &amp;lt;/span&amp;gt;
    &amp;lt;/div&amp;gt;
    &amp;lt;div class="columns"&amp;gt;
      {% for post in site.query('/blog/entries') %}
        {% if loop.index &amp;lt;= 3 %}
          {% set author = post.parent.parent.children.get('authors').children.get(post.author) %}
          &amp;lt;div class="column is-one-third is-paddingless padding-horizontal-big padding-top-bigger"&amp;gt;
            &amp;lt;article class="card entry-post horizontal no-border blog-entry"&amp;gt;
              &amp;lt;header&amp;gt;
                &amp;lt;figure class="image blog-image"&amp;gt;
                {% if author.about %}
                  {% if author.md5_hashed_email %}
                    &amp;lt;img class="profile" src="https://secure.gravatar.com/avatar/{{ author.md5_hashed_email }}?size=200"
                    alt="gravatar" /&amp;gt;
                  {% endif %}
                {% endif %}
                &amp;lt;/figure&amp;gt;
              &amp;lt;/header&amp;gt;
              &amp;lt;div class="blog-content"&amp;gt;
                &amp;lt;h4 class="b-header"&amp;gt;&amp;lt;a class="blog-title" href="{{ post|url }}"&amp;gt;{{ post.title }}&amp;lt;/a&amp;gt;&amp;lt;/h4&amp;gt;
                &amp;lt;span class="blog-author"&amp;gt;by &amp;lt;a class="author-name" href="{{ author|url }}"&amp;gt;{{ render_author_name(author) }}&amp;lt;/a&amp;gt;
                on {{ post.pub_date|dateformat("YYYY-MM-dd") }}&amp;lt;/span&amp;gt;
                &amp;lt;div class="excerpt"&amp;gt;
                  {{ post.body | excerpt | string | striptags() | truncate(100) }}
                &amp;lt;/div&amp;gt;
              &amp;lt;/div&amp;gt;
            &amp;lt;/article&amp;gt;
          &amp;lt;/div&amp;gt;
        {% endif %}
      {% endfor %}
    &amp;lt;/div&amp;gt;
  &amp;lt;/div&amp;gt;
&amp;lt;/section&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h5 id="the-block-model"&gt;The block Model -&lt;/h5&gt;&lt;pre&gt;&lt;code&gt;[block]
name = Recent Posts

[fields.title]
label = Title
type = string
&lt;/code&gt;&lt;/pre&gt;
&lt;h5 id="the-block-styling"&gt;The block styling -&lt;/h5&gt;&lt;pre&gt;&lt;code&gt;// Recent-posts section - Home page
.recent-posts {
  background-color: rgba(4, 166, 53, 0.1);

  .container {
    @extend .padding-vertical-xl;
    @extend .padding-horizontal-big;

    .columns {
      @extend .padding-top-bigger;
      @extend .padding-bottom-xl;
    }
  }

  .blog-title {
    @extend .has-color-dark-slate-gray;
  }

  .posts-link {
    @extend .has-color-forest-green;
    @extend .body-normal;

    font-weight: bold;
    line-height: 1.5;
    text-decoration: none;

    .icon {
      @extend .has-color-forest-green;
      @extend .padding-left-small;
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;figure&gt;
    &lt;img src="output2.png" alt="Output"&gt;
    &lt;figcaption&gt;Output — Recent blog posts.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;I would also like to point out the amazing Query functionality provided by Lektor where you can access the child pages of the root. Here I am accessing blog posts from our Blog page and limiting the count of posts to three.&lt;/p&gt;
&lt;h3 id="difference-in-experience"&gt;Difference in Experience&lt;/h3&gt;&lt;p&gt;The level of user experience has been significantly elevated due to the use of Vocabulary. I would like to point one of the major experience change here. The major part of the website is guidelines — we have guidelines for contributing, guidelines for how to join a community, guidelines for how to write a blog and many more. The new website has cleaner and readable guideline with a proper hierarchy and every piece of information is made accessible using secondary navigation.&lt;/p&gt;
&lt;h5 id="below-are-the-images-of-some-guidelines-pages-from-new-website"&gt;Below are the images of some guidelines pages from new website.&lt;/h5&gt;&lt;figure&gt;
    &lt;img width="300" height="300" src="new1.png" alt="Screenshot"&gt;
    &lt;img width="300" height="300" src="new2.png" alt="Screenshot"&gt;
    &lt;img width="300" height="300" src="new3.png" alt="Screenshot"&gt;
    &lt;figcaption&gt;Screenshots from new website&lt;/figcaption&gt;
&lt;/figure&gt;&lt;h5 id="below-are-the-images-of-some-guidelines-pages-from-old-website-you-can-see-the-difference-of-experience-in-both-cases"&gt;Below are the images of some guidelines pages from old website. You can see the difference of experience in both cases.&lt;/h5&gt;&lt;figure&gt;
    &lt;img width="400" src="old1.png" alt="Screenshot"&gt;
    &lt;img width="400" src="old2.png" alt="Screenshot"&gt;
    &lt;figcaption&gt;Screenshots from old website&lt;/figcaption&gt;
&lt;/figure&gt;&lt;h3 id="how-you-can-use-vocabulary-and-also-contribute-to-it"&gt;How you can use Vocabulary and also contribute to it?&lt;/h3&gt;&lt;p&gt;Vocabulary is very easy to use. It is intuitive, consistent and highly reusable. Vocabulary uses Storybook to present each visual element that makes it very convenient for a user to integrate Vocabulary in their project. The code snippets attached with every element can be copied as it is and can be used. The code snippets above indicate how the library can be used and how easily you can achieve desired web pages. For more details, you can visit &lt;a href="https://cc-vocabulary.netlify.app/?path=/docs/vocabulary-usage--page"&gt;usage guidelines&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Vocabulary is still under development, feedback and bug reports are welcome, fixes and patches even more so. Here is the link to &lt;a href="https://cc-vocabulary.netlify.app/?path=/docs/vocabulary-contribution--page"&gt;contribution guidelines&lt;/a&gt;.&lt;/p&gt;
</content></entry><entry><title>Accessibility and Internationalization: WrapUp GSoC 2020</title><link href="http://opensource.creativecommons.org/blog/entries/cc-search-accessibility-wrapup/" rel="alternate"></link><updated>2020-08-31T00:00:00Z</updated><author><name>['AyanChoudhary']</name></author><id>urn:uuid:8dd9da39-50da-34b2-91ac-bbe3eee35aee</id><content type="html">&lt;p&gt;These is the final blog of my internship with CC. I am working on improving the accessibility of cc-search and internationalizing it as well.
This blog is the conclusion of my work. These past 10 weeks with CC have taught me a lot and I am really grateful to have got this opportunity.
The experience was just amazing and the poeple are so helpful I really enjoyed working with them and am looking forward to continue working with the CC team.&lt;/p&gt;
&lt;p&gt;You can glance through my work through these blog posts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="/blog/entries/cc-search-accessibility-and-internationalization/"&gt;CC Search, Proposal Drafting and Community Bonding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/entries/cc-search-accessibility-week1-2/"&gt;CC Search, Setting up vue-i18n and internationalizing homepage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/entries/cc-search-accessibility-week3-4/"&gt;Internationalization Continued: Handling strings in the store&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/entries/cc-search-accessibility-week5-6/"&gt;Internationalization continued: Modifying tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/entries/cc-search-accessibility-week7-8/"&gt;CC Search, Initial Accessibility Improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/entries/cc-search-accessibility-week9-10/"&gt;Accessibility Improvements: Final Changes and Modal Accessilibity&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The progress of the project can be tracked on &lt;a href="https://github.com/cc-archive/cccatalog-frontend"&gt;cc-search&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;CC Search Accessiblity is my GSoC 2020 project under the guidance of &lt;a href="https://creativecommons.org/author/zackcreativecommons-org/"&gt;Zack Krida&lt;/a&gt; and &lt;a href="/blog/authors/akmadian/"&gt;Ari Madian&lt;/a&gt;, who is the primary mentor for this project, &lt;a href="https://creativecommons.org/author/annacreativecommons-org/"&gt;Anna Tumadóttir&lt;/a&gt; for helping all along and engineering director &lt;a href="https://creativecommons.org/author/kriticreativecommons-org/"&gt;Kriti
Godey&lt;/a&gt;, have been very supportive.&lt;/p&gt;
</content></entry><entry><title>Linked Commons: GSoC'20 Wrap Up</title><link href="http://opensource.creativecommons.org/blog/entries/linked-commons-gsoc-wrap-up/" rel="alternate"></link><updated>2020-08-28T00:00:00Z</updated><author><name>['subhamX']</name></author><id>urn:uuid:511904ae-a464-335a-bcdf-76f80c5309d1</id><content type="html">&lt;p&gt;Time flies faster when you are having fun! I didn't believe it back then. But now I do after experiencing it. It couldn't have been more accurate that here I am writing my concluding blog of the &lt;strong&gt;GSoC 2020: The Linked Commons series&lt;/strong&gt; when I just started enjoying things.&lt;/p&gt;
&lt;p&gt;In this post, I will give a brief overview of the linked commons and my GSoC contributions. It was an exciting journey, and I loved working on this project.&lt;/p&gt;
&lt;p&gt;Before I begin any further just for ritual, let me share a one liner on &lt;strong&gt;what The Linked Commons is&lt;/strong&gt;, although I highly recommend reading the other posts in this series, who knows you might join our team.😉&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The CC catalog data visualization or linked commons is a web application which finds and explores relationships between Creative Commons licensed content on the web.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;My primary contributions to the project during the GSoC timeline were threefold.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Firstly&lt;/strong&gt;, revamp the design and migrate the project to react.js for a fast and scalable rendering performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Secondly&lt;/strong&gt;, add graph filtering methods and scale the data to enable users to visualize massive data more efficiently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;At last&lt;/strong&gt;, make the developer onboarding easy by dockerizing the project and bring more application portability.&lt;/p&gt;
&lt;h2 id="gsoc-work-product"&gt;GSoC Work Product&lt;/h2&gt;&lt;p&gt;The live version of the linked commons can be found &lt;a href="http://dataviz.creativecommons.engineering/"&gt;here&lt;/a&gt;. You can interact with it and &lt;strong&gt;"explore the creative commons in graphs"&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If you wish to access the raw or filtered data, then here is a brief documentation of our new API.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/api/graph-data&lt;/span&gt;
&lt;span class="nt"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;GET&lt;/span&gt;
&lt;span class="nt"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Returns a randomized graph having around 500 nodes and links.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/api/graph-data/?name={node_name}&lt;/span&gt;
&lt;span class="nt"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;GET&lt;/span&gt;
&lt;span class="nt"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Returns the filtered graph with a set of nodes which are either immediate neighbours to {node_name} in the original graph or the transpose graph.&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/api/suggestions/?q={query}&lt;/span&gt;
&lt;span class="nt"&gt;Method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;GET&lt;/span&gt;
&lt;span class="nt"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Returns a set of nodes which contains the {query} pattern in their nodeid&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id="demo"&gt;Demo&lt;/h3&gt;&lt;div style="text-align: center; width: 90%; margin-left: 5%;"&gt;
    &lt;figure&gt;
        &lt;img src="graph-filtering.gif" alt="demo" style="border: 1px solid black"&gt;
        &lt;figcaption&gt;Linked Commons: Filtering the Graph 🔥&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;&lt;h3 id="my-code-contributions"&gt;My Code Contributions&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/cc-archive/cccatalog-dataviz/"&gt;https://github.com/cc-archive/cccatalog-dataviz/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Commits:&lt;/strong&gt; &lt;a href="https://github.com/cc-archive/cccatalog-dataviz/commits/master"&gt;https://github.com/cc-archive/cccatalog-dataviz/commits/master&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Contributors:&lt;/strong&gt; &lt;a href="https://github.com/cc-archive/cccatalog-dataviz/graphs/contributors"&gt;https://github.com/cc-archive/cccatalog-dataviz/graphs/contributors&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/pull/28"&gt;&lt;strong&gt;Migrate frontend to React #28&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Migrated the frontend to a web application using React.js for smooth rendering performance.&lt;/li&gt;
&lt;li&gt;Add client-side graph filtering method to enable users to interact with the loaded graph.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/pull/29"&gt;&lt;strong&gt;Add server-side filtering #29&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We realized that the client-side graph filtering method is not very scalable. This PR adds the basic structure for the backend server and adds server-side graph filtering logic.&lt;/li&gt;
&lt;li&gt;Added a parser to convert the input JSON file from &lt;code&gt;{nodes:[], links:[]}&lt;/code&gt; schema to the distance list format.&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="text-align: center; width: 90%; margin-left: 5%;"&gt;
    &lt;figure&gt;
        &lt;img src="api-call.png" alt="API call" style="border: 1px solid black"&gt;
        &lt;figcaption&gt;API call to filter graph data&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;&lt;p&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/pull/33"&gt;&lt;strong&gt;Design upgrade #33&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It revamped the design of the frontend.&lt;/li&gt;
&lt;li&gt;Added both primary light theme and secondary dark theme&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="text-align: center; font-style: normal; width: 80%; margin-left: 10%;"&gt;
    &lt;figure&gt;
        &lt;img src="design-dark.png" alt="Dark Theme" style="border: 1px solid black"&gt;
        &lt;figcaption&gt;&lt;em&gt;Linked Commons: Dark Theme&lt;/em&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;&lt;p&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/pull/35"&gt;&lt;strong&gt;Add node suggestions feature #35&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Added query autocomplete feature, to enable users to explore all the nodes in the database.&lt;/li&gt;
&lt;li&gt;This functionality aims to minimize the number of misspelt filtering tries from the client.&lt;/li&gt;
&lt;li&gt;Refer to &lt;a href="/blog/entries/linked-commons-autocomplete-feature/"&gt;this blog&lt;/a&gt; for the motivation and detailed report on why we added autocomplete aka node suggestions feature.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/pull/38"&gt;&lt;strong&gt;Fix filtering module #38&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Optimizes the build-dB-script to run efficiently on the larger and newer dataset of the cc-catalog.&lt;/li&gt;
&lt;li&gt;Added the basic form of the randomized graph filtering method.&lt;/li&gt;
&lt;li&gt;Refer to &lt;a href="/blog/entries/linked-commons-data-update/"&gt;this blog&lt;/a&gt; for a piece of detailed information on data update.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/pull/39"&gt;&lt;strong&gt;Database upgrade and core enhancements #39&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It upgrades the primary database from shelve to MongoDB for higher performance.&lt;/li&gt;
&lt;li&gt;Dockerizes the frontend and backend for both dev and prod environments for higher application portability.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://github.com/cc-archive/cccatalog-dataviz/pull/40"&gt;&lt;strong&gt;Frontend enhancements #40&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fixes common UI bugs, updates the frontend design and enhances the mobile and smaller devices experience with the linked commons&lt;/li&gt;
&lt;li&gt;Modularizes and updates the code documentation&lt;/li&gt;
&lt;/ul&gt;
&lt;div style="text-align: center; width: 80%; margin-left: 10%;"&gt;
    &lt;figure&gt;
        &lt;img src="design-light.png" alt="Theme Light" style="border: 1px solid black"&gt;
        &lt;figcaption&gt;Linked Commons: Light Theme&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;&lt;div style="text-align: center; width: 90%; margin-left: 5%;"&gt;
    &lt;figure&gt;
        &lt;img src="lighthouse-audit.png" alt="Lighthouse Audit" style="border: 1px solid black"&gt;
        &lt;figcaption&gt;Lighthouse Stats of the latest version of the Linked Commons&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;&lt;h3 id="whats-next"&gt;What’s Next?&lt;/h3&gt;&lt;p&gt;Throughout this internship period, we, the Linked Commons team, aimed to make this version the best among all. But there is still scope for improvement.&lt;/p&gt;
&lt;p&gt;Just to give you some insights; currently, the complete graph contains 235k nodes and 4.14million links. During the preprocessing, we dropped a lot of the nodes. Additionally, we removed more than 3 million nodes which didn't have cc_licenses information. So, in general, the current version shows only those nodes which are soundly linked with other domains and their licenses information is available. To give a complete picture of the massive "cc-catalog", the linked commons need to "gird up his loins".&lt;/p&gt;
&lt;p&gt;After seeing the tremendous potential it has, I will undoubtedly continue working on it and help the linked commons in this quest. ⚡&lt;/p&gt;
&lt;h3 id="ending-note"&gt;Ending Note&lt;/h3&gt;&lt;p&gt;In the end, I would like to thank my mentors Maria and Brent, for their unconditional guidance throughout this internship period. The insights I got from them will truly help me in the days to come.&lt;/p&gt;
&lt;p&gt;Special thanks to Francisco, Anna and Kriti for the awesome brainstorming ideas in the UX meet which helped us build an increment superior version of the Linked Commons.&lt;/p&gt;
&lt;p&gt;It is not the end, rather a new beginning. Cheers! 🚀🚀🚀&lt;/p&gt;
</content></entry><entry><title>CC Search Extension: Wrapping up GSoC 2020</title><link href="http://opensource.creativecommons.org/blog/entries/cc-search-extension-wrapping-up-gsoc-2020/" rel="alternate"></link><updated>2020-08-27T00:00:00Z</updated><author><name>['makkoncept']</name></author><id>urn:uuid:a836222c-bec0-3f28-a8cd-9699e8ba778f</id><content type="html">&lt;p&gt;In this post, I'll give an overview of the improvements and features that were added to the CC Search browser extension. I am delighted to state that the goals that were set for Google Summer of Code 2020 have been successfully completed.&lt;/p&gt;
&lt;h2 id="widen-the-integration-with-cc-catalog-api"&gt;Widen the integration with CC Catalog API&lt;/h2&gt;&lt;p&gt;Both &lt;a href="/blog/entries/cc-search-extension-wrapping-up-gsoc-2020/search.creativecommons.org"&gt;CC Search&lt;/a&gt; and CC Search Extension are powered by &lt;a href="https://api.creativecommons.engineering/v1/"&gt;CC Catalog REST API&lt;/a&gt;. The API allows programmatic access to search for CC-licensed and public domain digital media. Better integration with the API was one of the major targets during this internship because it significantly improves and adds new searching workflows to the extension.&lt;/p&gt;
&lt;p&gt;This can be sub-divided into &lt;em&gt;New Filters&lt;/em&gt;, &lt;em&gt;Browse By Sources&lt;/em&gt;, &lt;em&gt;search by tags&lt;/em&gt;, and &lt;em&gt;related images&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id="new-filters"&gt;New Filters&lt;/h3&gt;&lt;p&gt;The &lt;code&gt;/image&lt;/code&gt; endpoint of the API is used for searching. We can also provide several query parameters that can filter the result. Previously, the extension only supported filtering the content using &lt;code&gt;license&lt;/code&gt;, &lt;code&gt;sources&lt;/code&gt;, and &lt;code&gt;use case&lt;/code&gt;. Now, besides these filters, the extension also supports filtering by &lt;code&gt;image type&lt;/code&gt;, &lt;code&gt;file type&lt;/code&gt;, &lt;code&gt;aspect ratio&lt;/code&gt;, and &lt;code&gt;image size&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="old-extension-filters.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Filters in the old version&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;figure&gt;
    &lt;img src="new-extension-filters.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Filters in the new version&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;&lt;em&gt;Rationale&lt;/em&gt;: This will allow users to be more precise in their queries when searching.&lt;/p&gt;
&lt;h3 id="browsing-by-source"&gt;Browsing by source&lt;/h3&gt;&lt;p&gt;The extension now has a dynamically updated "sources" section. Clicking a source link triggers a request to the &lt;code&gt;/image&lt;/code&gt; endpoint to get the images associated with it.&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="source-section-light.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Source section&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;figure&gt;
    &lt;img src="source-section-dark.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Source section in dark mode&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;&lt;em&gt;Rationale&lt;/em&gt;: This opens an avenue for exploration of all the different sources which are available in the catalog. This is advantageous for the users who are not familiar with the type of content a particular source provides. They might run into a source that has a huge catalog of high-quality images that they are looking for.&lt;/p&gt;
&lt;h3 id="search-by-tags"&gt;Search by tags&lt;/h3&gt;&lt;p&gt;Most of the images have some tags associated with them, which are also sent along with the image data by the API. This, and the flexibility of the &lt;code&gt;/image&lt;/code&gt; endpoint, paved the way for the addition of searching for images using image-tags.&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="search-by-image-tag.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Search by image tag&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Rational&lt;/em&gt; - Image tags will allow users to incrementally make their queries better and more specific.&lt;/p&gt;
&lt;h3 id="related-images"&gt;Related images&lt;/h3&gt;&lt;p&gt;In the image detail section of any particular image, you can now see several recommendations. This has been made possible by adding support for the &lt;a href="https://api.creativecommons.engineering/v1/#tag/recommendations"&gt;&lt;code&gt;/recommendations/images/{identifier}&lt;/code&gt;&lt;/a&gt; endpoint of the API.&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="related-images.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Image recommendations&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Rationale&lt;/em&gt; -  This will help users find a variety of images that fit their requirements and also explore the images that would not usually show up on the initial pages of the search result.&lt;/p&gt;
&lt;h2 id="improvements-to-bookmarks-section"&gt;Improvements to bookmarks section&lt;/h2&gt;&lt;p&gt;The bookmark section has great prominence in CC Search Extension because the export/import workflow is tied to it and unlike the search result data, the bookmarks data is preserved across user sessions (closing the extension does not wipe out the bookmarks). It has undergone some crucial improvements like caching, voluntary loading and increase in the number bookmarks that it can hold (the limit now is 300 which earlier was ~50).&lt;/p&gt;
&lt;p&gt;The bookmarks section is significantly faster now as caching has eliminated the need to make many simultaneous network requests to the API when bookmarks are loaded. Voluntary loading also helps reduce perceived lag by reducing the number of bookmarks that load at once.&lt;/p&gt;
&lt;p&gt;Though the improvement in performance is better recognized when you are using the extension, I tried to demonstrate that by comparing the rendering of the bookmarked images.&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="bookmarks-in-old-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Bookmark section in the old version&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="bookmarks-in-new-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Bookmarks section in the new version&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h2 id="a-better-use-of-sync-storage"&gt;A better use of sync storage&lt;/h2&gt;&lt;p&gt;The bookmarks and the user settings are synced between user systems. There are very tight write limits and bytes quotas associated with this storage(&lt;a href="https://developer.chrome.com/apps/storage#properties"&gt;documentation link&lt;/a&gt;). Due to this, the way the extension used this storage, and the assumptions it made about its schema was improved multiple times. Since the extension was already in production, and had around 5,000 weekly users, the code for migrating the user's sync storage was pushed along with these updates. Also, the support was added for legacy bookmark files that some users might still be using.&lt;/p&gt;
&lt;h2 id="integration-with-vocabulary"&gt;Integration with Vocabulary&lt;/h2&gt;&lt;p&gt;The extension now supports the latest version of &lt;a href="https://github.com/creativecommons/vocabulary"&gt;CC vocabulary&lt;/a&gt;. The challenging part of this was to rethink, mold, and update each and every workflow of the extension according to the new design.&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="image-detail-old-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Old version — Image detail&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="image-detail-new-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;New version — Image detail&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="deletion-in-old-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Old version — Deleting bookmarks&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="deletion-in-new-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;New version — Deleting bookmarks&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="dark-mode-old-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;Old version — Dark mode&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure &gt;
    &lt;img src="dark-mode-new-version.gif" style="width: 70%"&gt;
    &lt;figcaption&gt;
        &lt;em&gt;New version — Dark mode&lt;/em&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/p&gt;
&lt;h2 id="release-on-microsoft-edge"&gt;Release on Microsoft Edge&lt;/h2&gt;&lt;p&gt;I am also testing the extension on Microsoft Edge. We also have it &lt;a href="https://microsoftedge.microsoft.com/addons/detail/cc-search/djolilnbndifmlfmcdnifdfjfbglipgc"&gt;listed&lt;/a&gt; on the Edge store. You can soon expect the latest version of CC Search Extension available for install there.&lt;/p&gt;
&lt;h2 id="code"&gt;Code&lt;/h2&gt;&lt;p&gt;The project repository is hosted on &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension"&gt;Github&lt;/a&gt;. During this period, I have made &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/compare/v1.3.0...master"&gt;more than 320&lt;/a&gt; commits.&lt;/p&gt;
&lt;p&gt;The Major pull requests: &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/249"&gt;#249&lt;/a&gt;, &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/255"&gt;#255&lt;/a&gt;, &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/268"&gt;#268&lt;/a&gt;, &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/270"&gt;#270&lt;/a&gt;, &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/271"&gt;#271&lt;/a&gt;, &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/272"&gt;#272&lt;/a&gt;, &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/275"&gt;#275&lt;/a&gt;, &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/pull/276"&gt;#276&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Also, during this period, 5 updates of the extension were pushed to the extension stores. You can check out the &lt;a href="https://github.com/creativecommons/ccsearch-browser-extension/releases"&gt;releases page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="acknowledgements"&gt;Acknowledgements&lt;/h2&gt;&lt;p&gt;I would like to thank &lt;a href="https://creativecommons.org/author/aldencreativecommons-org/"&gt;Alden&lt;/a&gt; and &lt;a href="https://creativecommons.org/author/kriticreativecommons-org/"&gt;Kriti&lt;/a&gt; for their valuable guidance during this journey. Special thanks to &lt;a href="https://github.com/panchovm"&gt;Fransisco&lt;/a&gt;, for designing the mockups of the extension, and to the wonderful contributors of CC Vocabulary.&lt;/p&gt;
</content></entry><entry><title>Automate GitHub for more than CI/CD</title><link href="http://opensource.creativecommons.org/blog/entries/automate-github-for-more-than-CI%20CD/" rel="alternate"></link><updated>2020-08-26T00:00:00Z</updated><author><name>['zackkrida']</name></author><id>urn:uuid:1f2b4dad-de07-33ab-b1c7-394778548e55</id><content type="html">&lt;blockquote&gt;&lt;p&gt;&lt;em&gt;Get started using GitHub bots and actions for community management and repository health.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In late 2018, in the midst of being acquired by Microsoft, GitHub &lt;a href="https://github.blog/2018-10-16-future-of-software/"&gt;launched Github Actions&lt;/a&gt; into public beta, allowing users to run code on the popular development platform for the first time. With a straightforward &lt;code&gt;YAML&lt;/code&gt; configuration syntax and the power of Microsoft's Azure cloud, GitHub Actions quickly rose to compete with existing Continuous Integration (CI) and Continuous Deployment (CD) platforms like &lt;strong&gt;Circle CI&lt;/strong&gt; and &lt;strong&gt;Travis CI&lt;/strong&gt;. GitHub Actions made it easier than ever for developers to test and deploy software in the cloud, but from the beginning GitHub had bigger plans for the service.&lt;/p&gt;
&lt;p&gt;In a &lt;a href="https://techcrunch.com/2018/10/16/github-launches-actions-its-workflow-automation-tool/"&gt;2018 TechCrunch interview&lt;/a&gt;, GitHub's then head of platform acknowledged the usefulness of actions for more than CI/CD. "I see CI/CD as one narrow use case of actions. It’s so, so much more,” Lambert stressed. “And I think it’s going to revolutionize DevOps because people are now going to build best in breed deployment workflows for specific applications and frameworks, and those become the de facto standard shared on GitHub. […] It’s going to do everything we did for open source again for the DevOps space and for all those different parts of that workflow ecosystem."&lt;/p&gt;
&lt;p&gt;At Creative Commons, we use Github Actions and Bots on many of &lt;a href="https://github.com/creativecommons?type=source"&gt;our open-source projects&lt;/a&gt; for more than CI/CD—to manage our &lt;a href="/community/community-team/"&gt;community team&lt;/a&gt;; to automate repository health; and to automate tedious but frequent tasks. The following examples are just a small snapshot of our existing and in-progress automations.&lt;/p&gt;
&lt;h2 id="example-automations"&gt;Example automations&lt;/h2&gt;&lt;p&gt;&lt;!-- no toc --&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/blog/entries/automate-github-for-more-than-CI CD/#automatic-release-note-generation"&gt;Release note generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/entries/automate-github-for-more-than-CI CD/#repository-normalization"&gt;Repository normalization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/blog/entries/automate-github-for-more-than-CI CD/#automatic-dependency-updates"&gt;Dependency updates&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="release-note-generation"&gt;Release note generation&lt;/h3&gt;&lt;p&gt;Our frontend Vue.js application for CC Search gets released weekly, and is subject to constant pull requests from myself, one-time volunteers making their first open source contribution, and long-term, dedicated community members who frequently contribute. It's important for us to highlight &lt;em&gt;all&lt;/em&gt; of these contributions in our release notes, regardless of size or scope. Additionally, we find it useful to group changes into categories, so our users have a clear sense of what kinds of updates we've made.&lt;/p&gt;
&lt;div style="text-align: center;"&gt;
  &lt;figure class="margin-bottom-large"&gt;
    &lt;img src="release-notes-screenshot.png" alt="GitHub screenshot of release notes for CC Search" /&gt;
    &lt;figcaption&gt;
      &lt;em&gt;
        An example of CC Search release notes generated by the &lt;a href="https://github.com/marketplace/actions/release-drafter"&gt;Release Drafter&lt;/a&gt; GitHub Action.
      &lt;/em&gt;
    &lt;/figcaption&gt;
  &lt;/figure&gt;
&lt;/div&gt;&lt;p&gt;The quality of these release notes made them quite tedious to generate manually. With the &lt;a href="https://github.com/marketplace/actions/release-drafter"&gt;release drafter action&lt;/a&gt;, we're able to automatically update a draft release note on every pull request to CC Search. The action lets us configure the line added for each pull request with some basic templating which includes variables for the pr number, title, and author (among others):&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;change-template&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;-&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;$TITLE:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;#$NUMBER&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;by&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;@$AUTHOR&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;This means each pull request gets a line like this in our release notes:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Enable web monetization on single result pages: &lt;strong&gt;#1191&lt;/strong&gt; by &lt;strong&gt;@zackkrida&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Perfect! We can also map GitHub labels on our pull requests to the sections of our generated release notes, like so:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;New&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Features&amp;#39;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;feature&amp;#39;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;Bug&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Fixes&amp;#39;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;bug&amp;#39;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;critical&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The resulting release notes require no manual editing at release time, and has saved us hours over time and allows our developers to focus on DevOps work instead of copywriting on release days. We also never miss a contribution or expression of gratitude to one of our contributors. You can read the &lt;a href="https://github.com/cc-archive/cccatalog-frontend/releases/latest"&gt;latest CC Search release notes&lt;/a&gt; or &lt;a href="https://github.com/cc-archive/cccatalog-frontend/blob/develop/.github/release-drafter.yml"&gt;see our full release-drafter.yml file here&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="repository-normalization"&gt;Repository Normalization&lt;/h3&gt;&lt;p&gt;Within a private repository of internal helper scripts, the CC technical team has a number of Github Actions which trigger Python scripts to keep configuration standardized across our repositories. We casually call this process "repository normalization". One such script ensures that we use a standard set of GitHub labels across all of our projects. This consistency helps us do things like direct users to &lt;a href="https://github.com/search?q=org%3Acreativecommons+label%3A%22help+wanted%22+state%3Aopen&amp;amp;type=Issues"&gt;open issues in need of assistance&lt;/a&gt; across the organization, or issues &lt;a href="https://github.com/search?q=org%3Acreativecommons+label%3A%22good+first+issue%22+state%3Aopen&amp;amp;type=Issues"&gt;good for first-time open source contributors&lt;/a&gt;. With GitHub Actions, its easy to set up scheduled tasks with only a few lines of human-readable configuration. Here's the gist of running a Python script daily, for example:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Example scheduled python action&lt;/span&gt;
&lt;span class="nt"&gt;on&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&amp;#39;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;push&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;branches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;master&lt;/span&gt;
&lt;span class="nt"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;build&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;runs-on&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;ubuntu-latest&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;uses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;actions/checkout@v2&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Set up Python 3.7&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;uses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;actions/setup-python@v1&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;with&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;python-version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;3.7&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Install dependencies&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;|&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="no"&gt;python -m pip install --upgrade pip&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="no"&gt;python -m pip install pipenv&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="no"&gt;pipenv install&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Export token to env and run our script&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;|&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="no"&gt;pipenv run python our-script.py&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;ADMIN_GITHUB_TOKEN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;${{ secrets.ADMIN_GITHUB_TOKEN }}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Internally and publicly, we use &lt;a href="https://github.com/orgs/creativecommons/projects"&gt;GitHub Projects&lt;/a&gt; to manage our bi-weekly sprints and backlogs. The &lt;a href="https://github.com/subhamX/github-project-bot"&gt;GitHub Project Bot&lt;/a&gt; action was built by &lt;a href="https://github.com/subhamX"&gt;one of our community contributors&lt;/a&gt; and allows us to add pull requests to our project columns. Here's an example step in such a job:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;Handle cccatalog-frontend Repo&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;uses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;subhamX/github-project-bot@v1.0.0&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;with&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;ACCESS_TOKEN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;${{ secrets.ADMIN_GITHUB_TOKEN }}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;COLUMN_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;In&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Progress&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;(Community)&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;PROJECT_URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;https://github.com/orgs/creativecommons/projects/7&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;REPO_URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;https://github.com/cc-archive/cccatalog-frontend&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We have additional scripts that sync our community team members across our open source website and GitHub, and several others that do even more of this cross-platform synchronization work. All of these scripts relive significant burden off of our engineering manager and open source community coordinator.&lt;/p&gt;
&lt;h3 id="dependency-updates"&gt;Dependency Updates&lt;/h3&gt;&lt;p&gt;Modern JavaScript projects are built atop piles of 3rd party dependencies. This frees developers to focus on product code instead of writing the same utility code over and over again, but exposes projects to issues of security and dependency management. To help alleviate these issues, GitHub &lt;a href="https://github.blog/2019-05-23-introducing-new-ways-to-keep-your-code-secure/#automated-security-fixes-with-dependabot"&gt;acquired a startup called Dependabot&lt;/a&gt; which initially focused on automatic security updates for repositories. Dependabot creates pull requests that update third-party code  with known security vulnerabilities to the latest safe and stable versions.&lt;/p&gt;
&lt;p&gt;This summer (June 2020), GitHub &lt;a href="https://github.blog/2020-06-01-keep-all-your-packages-up-to-date-with-dependabot/"&gt;expanded dependabot's scope&lt;/a&gt; to keep &lt;em&gt;all&lt;/em&gt; third-party code up to date, regardless of security. By adding a &lt;code&gt;dependabot-config.yml&lt;/code&gt; file to any repo, developers no longer need to keep track of dependency updates on their own.&lt;/p&gt;
&lt;div style="text-align: center;"&gt;
  &lt;figure class="margin-bottom-large"&gt;
    &lt;img src="dependabot-example.png" alt="GitHub screenshot of a Dependabot PR message" /&gt;
    &lt;figcaption&gt;
      &lt;em&gt;
        Dependabot writes pull requests to bump JavaScript dependencies and will automatically resolve merge conflicts and keep the PR up to date.
      &lt;/em&gt;
    &lt;/figcaption&gt;
  &lt;/figure&gt;
&lt;/div&gt;&lt;p&gt;If your project has strong test coverage and a solid quality control process for release management, Dependabot pull requests can be made even more powerful with the &lt;a href="https://github.com/ridedott/merge-me-action"&gt;Merge Me Action.&lt;/a&gt; Merge Me can be added to the end of any series of Github Actions to automatically merge pull requests that pass all CI tests which were authored by a particular user (the action assumes &lt;code&gt;dependabot&lt;/code&gt; by default). This means your repository can have highly-configurable, fully-automated dependency updates in just a few lines of &lt;code&gt;YAML&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="here-s-a-few-more"&gt;Here's a few more&lt;/h2&gt;&lt;p&gt;Here's some smaller and simpler automations that can make a huge difference in your workflows.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/probot/stale"&gt;Automatically close old PRs after a period of inactivity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/2020-08-24-automate-releases-and-more-with-the-new-sentry-release-github-action/"&gt;Automate security releases on Sentry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/probot/reminders"&gt;Add reminders to issues and pull requests&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These examples are a small sample of the non-CI/CD capabilities of GitHub Actions. You can peek in the &lt;code&gt;.github/&lt;/code&gt; directory of any of our open source repositories to see the actions we're using, and feel free to make an issue on any project if you have an idea for an automation of your own. As we increase the number and quality of integrations in our open source repositories, we may update this article or create follow-up posts with more examples.&lt;/p&gt;
&lt;p&gt;If you're interested in learning more about GitHub Actions, GitHub has a wonderful &lt;a href="https://github.com/marketplace?type=actions"&gt;marketplace&lt;/a&gt; of avaliable actions you can explore, and the &lt;a href="https://docs.github.com/actions"&gt;documentation for actions&lt;/a&gt; is avaliable in several languages.&lt;/p&gt;
</content></entry><entry><title>Overview of the GSoC 2020 Project</title><link href="http://opensource.creativecommons.org/blog/entries/overview-of-the-gsoc-2020-project/" rel="alternate"></link><updated>2020-08-26T00:00:00Z</updated><author><name>['charini']</name></author><id>urn:uuid:4cdfc111-c714-33e1-b521-390d487d46da</id><content type="html">&lt;p&gt;This is my final blog post under the &lt;a href="/blog/entries/overview-of-the-gsoc-2020-project/#series"&gt;GSoC 2020: CC catalog&lt;/a&gt; series, where I will highlight and
summarize my contributions to Creative Commons (CC) as part of my GSoC project. The CC Catalog project collects and
stores CC licensed images scattered across the internet, such that they can be made accessible to the general public via
the &lt;a href="https://ccsearch.creativecommons.org/"&gt;CC Search&lt;/a&gt; and &lt;a href="https://api.creativecommons.engineering/v1/"&gt;CC Catalog API&lt;/a&gt; tools. I got the opportunity to work on different aspects of the
CC Catalog repository which ultimately enhances the user experience of the CC Search and CC Catalog API tools. My
primary contributions in the duration of GSoC, and the related pull requests (PR) are as follows.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sub-provider retrieval&lt;/strong&gt;: The first task I completed as part of my GSoC project was the retrieval of sub-providers
(also known as &lt;em&gt;source&lt;/em&gt;) such that images could be categorised under these sources, ensuring an enhanced search
experience for the users. I completed the implementation of sub-provider retrieval for three providers; Flickr,
Europeana, and Smithsonian. If you are interested in learning how the retrieval logic works, please check my
&lt;a href="/blog/entries/flickr-sub-provider-retrieval/"&gt;initial blog post&lt;/a&gt; of this series. The PRs related to this task are as follows.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PR #&lt;a href="https://github.com/cc-archive/cccatalog/pull/420"&gt;420&lt;/a&gt;: Retrieve sub-providers within Flickr&lt;/li&gt;
&lt;li&gt;PR #&lt;a href="https://github.com/cc-archive/cccatalog/pull/442"&gt;442&lt;/a&gt;: Retrieve sub-providers within Europeana&lt;/li&gt;
&lt;li&gt;PR #&lt;a href="https://github.com/cc-archive/cccatalog/pull/455"&gt;455&lt;/a&gt;: Retrieve sub-providers within Smithsonian&lt;/li&gt;
&lt;li&gt;PR #&lt;a href="https://github.com/cc-archive/cccatalog/pull/461"&gt;461&lt;/a&gt;: Add new source as a sub-provider of Flickr&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Alert updates to Smithsonian unit codes&lt;/strong&gt;: For the Smithsonian provider, we rely on the field known as &lt;em&gt;unit code&lt;/em&gt;
to determine the sub-provider (for Smithsonian it is often a museum) each image belongs to. However, it is possible for
the &lt;em&gt;unit code&lt;/em&gt; values to change over time at the upstream, and if CC is unaware of these changes, it could hinder the
successful categorisation of Smithsonian images under unique sub-provider values. I have therefore introduced a
mechanism of alerting the CC code maintainers of potential changes to &lt;em&gt;unit code&lt;/em&gt; values at the upstream. More
information is provided in my &lt;a href="/blog/entries/smithsonian-unit-code-update/"&gt;second blog post&lt;/a&gt; of this series. The PR related to this task
is #&lt;a href="https://github.com/cc-archive/cccatalog/pull/465"&gt;465&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Improvements to the Smithsonian provider API script&lt;/strong&gt;: Smithsonian is an important provider which aggregates images
from 19 museums. However, due to the fact that the different museums have different data models and the resultant
incompatibility of the JSON responses returned from requests to the Smithsonian API, it is difficult to know which
fields to rely on to obtain the information necessary for CC. This results in CC missing out on certain important
information. As part of my GSoC project, I improved the completeness of &lt;em&gt;creator&lt;/em&gt; and &lt;em&gt;description&lt;/em&gt; information, by
identifying previously unknown fields from which these details could be retrieved. Even though my improvements did not
result in the identification of a comprehensive list of fields, the completeness of data was considerably improved for
some Smithsonian museums compared to how it was before. For more context about this issue please refer to the ticket
#&lt;a href="https://github.com/cc-archive/cccatalog/issues/397"&gt;397&lt;/a&gt;. Apart from improving information of Smithsonian data, I was also able to identify issues with certain
Smithsonian API responses which did not contain mandatory information for some of the museums. We have informed the
Smithsonian technical team of these issues and they are highlighted in ticket #&lt;a href="https://github.com/cc-archive/cccatalog/issues/397"&gt;397&lt;/a&gt; as well. The PRs related
to this task are as follows.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PR #&lt;a href="https://github.com/cc-archive/cccatalog/pull/474"&gt;474&lt;/a&gt;: Improve the creator and description information of the Smithsonian source &lt;em&gt;National Museum of
Natural History&lt;/em&gt; (NMNH). This is the largest museum (source) under the Smithsonian provider.&lt;/li&gt;
&lt;li&gt;PR #&lt;a href="https://github.com/cc-archive/cccatalog/pull/476"&gt;476&lt;/a&gt;: Improve the &lt;em&gt;creator&lt;/em&gt; and &lt;em&gt;description&lt;/em&gt; information of other sources coming under the Smithsonian
provider.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Expiration of outdated images&lt;/strong&gt;: The final task I completed as part of my GSoC project was implementing a strategy
for expiring outdated images in the CC database. CC has a mechanism for keeping the images they have retrieved from
providers up-to-date, based on how old an image is. This is called the &lt;a href="/blog/entries/date-partitioned-data-reingestion/"&gt;re-ingestion strategy&lt;/a&gt;,
where newer images are updated more frequently compared to older images. However, this re-ingestion strategy does not
detect images which have been deleted at the upstream. Thus, it is possible that some of the images stored in the CC
database are obsolete, which could result in broken links being presented via the &lt;a href="https://ccsearch.creativecommons.org/"&gt;CC Search&lt;/a&gt; tool. As a
solution, I have implemented a mechanism of identifying whether images in the CC database are obsolete by looking at the
&lt;em&gt;updated_on&lt;/em&gt; column value of the CC image table. Depending on the re-ingestion strategy per provider, we can know what
the oldest &lt;em&gt;updated_on&lt;/em&gt; value, an image can assume. If the &lt;em&gt;updated_on&lt;/em&gt; value is older than the oldest valid value, we
flag the corresponding image record  as obsolete.  The PR related to this task is #&lt;a href="https://github.com/cc-archive/cccatalog/pull/483"&gt;483&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I will continue to take the responsibility for maintaining my code in the CC Catalog repository, and I hope to continue
contributing to the CC codebase. It has been a wonderful GSoC journey for me and special thanks goes to my supervisor
Brent for his guidance.&lt;/p&gt;
</content></entry><entry><title>Linked Commons: Data Update</title><link href="http://opensource.creativecommons.org/blog/entries/linked-commons-data-update/" rel="alternate"></link><updated>2020-08-25T00:00:00Z</updated><author><name>['subhamX']</name></author><id>urn:uuid:bf2032b7-d1c2-320b-9cf1-92ad64320a02</id><content type="html">&lt;p&gt;In this blog, I will be explaining the task we were working on for the last 3-4 weeks. It will take you on a journey of optimizations from million graph traversals in building the database to just a few traversals in the end. Also, we will be covering the new architecture for the upcoming version of the Linked Commons and the reason behind the change.&lt;/p&gt;
&lt;h2 id="where-does-it-fit"&gt;Where does it fit?&lt;/h2&gt;&lt;p&gt;So far the Linked Commons was using a tiny subset of the data available in the CC Catalog. One of the primary targets of our team was to update the data. If you observe closely all tasks so far starting from adding "Graph Filtering Methods" to "Autocomplete Feature". These were actually bringing us closer towards this task. i.e. the much-awaited &lt;strong&gt;"Scale the Data of Linked Commons"&lt;/strong&gt;. We aim to add around &lt;strong&gt;235k nodes and 4.14 million links&lt;/strong&gt; into the Linked Commons project from around &lt;strong&gt;400 nodes and 500 links&lt;/strong&gt; in the current version. This drastic addition of new data is one of its kind, which makes this task very challenging and exciting.&lt;/p&gt;
&lt;h2 id="pilot"&gt;Pilot&lt;/h2&gt;&lt;p&gt;The raw CC Catalog data cannot be used directly in the Linked Commons. Our first task involves processing it, which includes removing isolated nodes, etc. You can read more about it in the data processing series &lt;a href="/blog/entries/cc-datacatalog-data-processing/"&gt;blog&lt;/a&gt; written by my mentor Maria. After this, we need to build a database which stores the &lt;strong&gt;"distance list"&lt;/strong&gt; of all the nodes.&lt;/p&gt;
&lt;h3 id="what-is-distance-list"&gt;What is "distance list"?&lt;/h3&gt;&lt;div style="text-align: center; width: 90%; margin-left: 5%;"&gt;
    &lt;figure&gt;
        &lt;img src="distance-list.png" alt="Distance List" style="border: 1px solid black"&gt;
        &lt;figcaption&gt;Distance list representation* of the node 'icij' part of a hypothetical graph&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Distance List&lt;/strong&gt; is a method of graph representation. It is similar to &lt;a href="https://en.wikipedia.org/wiki/Adjacency_list"&gt;Adjacency List&lt;/a&gt; representation of graphs but instead of storing data of just immediate neighbouring nodes, "distance list" groups all vertices based on their distance from the root node and stores this grouped data for every vertex in the graph. In short, "distance list" is a more general form of the Adjacency List representation.&lt;/p&gt;
&lt;p&gt;To build this "distance list", we created a script for this, let’s name it &lt;strong&gt;build-dB-script.py&lt;/strong&gt;, which uses the &lt;a href="https://en.wikipedia.org/wiki/Breadth-first_search"&gt;Breadth-First Search(BFS)&lt;/a&gt; algorithm on every node to traverse the graph and gradually build this distance list. The filtering nodes feature of our web page connects to the server, which uses the aforementioned database and serves a smaller chunk of data.&lt;/p&gt;
&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;&lt;p&gt;Now that we know where the &lt;em&gt;build-dB-script&lt;/em&gt; is used, let’s discuss the problems with it. The new graph data we are going to use is enormous and is in millions. A full traversal of a graph with million nodes, million times is very slow. Just to give some helpful numbers, the script was taking around 10 minutes to process a hundred nodes. Assuming the growth is linear(in the best case), it will take more than &lt;strong&gt;15 days&lt;/strong&gt; to complete the computations. &lt;strong&gt;It is scary, and thus, optimizations in the &lt;em&gt;build-dB-script&lt;/em&gt; are the need of the hour!!&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="optimizations"&gt;Optimizations&lt;/h2&gt;&lt;p&gt;In this section, we will talk of the different versions of the build database script, starting from the brute force BFS method.&lt;/p&gt;
&lt;p&gt;The brute force BFS was the most simple and technically correct solution, but as the name suggests it was slow. In the next iteration, I stored the details of last n nodes, 10 to be precise and performed the same old BFS. It was faster but it had a logic error. Say, there is a link from a node to an already visited/traversed node. The script was not putting all the nodes which could have been explored from this path. After a few more leaps from Depth-first Search, to Breadth-first search, and other methods, eventually with the help of my mentors, we built a new approach - &lt;strong&gt;"Sequential dB Build"&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;To keep this blog short, I won’t be going too much into implementation details, but here are some of the critical points.&lt;/p&gt;
&lt;h3 id="key-points-of-the-sequential-db-build"&gt;Key points of the Sequential dB Build:&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;It was the fastest of all the predecessors and reduced the script timing significantly.&lt;/li&gt;
&lt;li&gt;In this approach, we aimed to build the all distance list of [1, 2, 3,... ., k-1] before building kth distance list.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unfortunately, still, it was not enough for our current requirements. Just to give you some insights, the distance two list computation was taking around &lt;strong&gt;4 hours&lt;/strong&gt;, and &lt;strong&gt;distance three list&lt;/strong&gt; computation was taking &lt;strong&gt;20+ hours&lt;/strong&gt;. It shows that all these optimizations were not enough and were incapable of handling this big dataset.&lt;/p&gt;
&lt;h2 id="new-architecture"&gt;New Architecture&lt;/h2&gt;&lt;p&gt;As the optimizations in "build-dB-scripts" weren’t enough, we started looking to simplify the current architecture. In the end, we want to have a viable product which is scalable to this massive data. Although we are still not dropping the multi-distance filtering, we will continue our research on it and hopefully will have it in &lt;strong&gt;Linked Commons 3.0&lt;/strong&gt;. 😎&lt;/p&gt;
&lt;p&gt;For any node, it is more likely that any person would wish to know the immediate neighbours who are linking to some arbitrary node. Nodes at a distance greater than one exhibits very less information on the reach and connectivity of the root node. It was because of this we decided to change our current logic of having the distance list up to 10; instead, we reduced it to 1 and also stored the immediate incomming nodes list (Nodes which are at distance 1 in the &lt;a href="https://en.wikipedia.org/wiki/Transpose_graph"&gt;transpose graph&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;This small change in the design simplified a lot of things, and now the new graph build was taking around 2 minutes. By the time I am writing this blog we have upgraded our database from &lt;strong&gt;shelve to MongoDB&lt;/strong&gt; where the build time is further reduced. 🔥🔥&lt;/p&gt;
&lt;div style="text-align: center; width: 90%; margin-left: 5%;"&gt;
    &lt;figure&gt;
        &lt;img src="graph.png" alt="Light Theme" style="border: 1px solid black"&gt;
        &lt;figcaption&gt;Graph showing neighbouring nodes. Incoming link are coloured with Turquoise and outgoing are coloured with Red.&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;&lt;p&gt;This task was really challenging and I learnt a lot. It was really mesmerizing to see the &lt;strong&gt;Linked Commons grow and evolve&lt;/strong&gt;. I hope you enjoyed reading this blog. You can follow the project development &lt;a href="https://github.com/cc-archive/cccatalog-dataviz/"&gt;here&lt;/a&gt;, and access the stable version of linked commons &lt;a href="http://dataviz.creativecommons.engineering/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Feel free to report bugs and suggest features. It will help us improve this project. If you wish to join the our team, consider joining our &lt;a href="https://creativecommons.slack.com/channels/cc-dev-cc-catalog-viz"&gt;slack&lt;/a&gt; channel. Read more about our community teams &lt;a href="/community/"&gt;here&lt;/a&gt;. See you in my next blog! 🚀&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;*&lt;em&gt;Linked Commons uses a more complex schema. The picture is just for illustration.&lt;/em&gt;&lt;/p&gt;
</content></entry><entry><title>CC Catalog: wrapping up GSoC20</title><link href="http://opensource.creativecommons.org/blog/entries/cc-catalog-wrapping-gsoc20/" rel="alternate"></link><updated>2020-08-25T00:00:00Z</updated><author><name>['srinidhi']</name></author><id>urn:uuid:ba947438-0d00-32ec-8ba7-acf6f5f15eb5</id><content type="html">&lt;p&gt;With the summer of code coming to an end, this blog post summarises the work done during the last three months. The project I have been working on is to add more provider API scripts to the CC Catalog. The CC Catalog project is responsible for collecting CC licensed images hosted across the web.&lt;/p&gt;
&lt;p&gt;The internship journey has been great , and I was glad to get the opportunity to understand more about the working of the data pipeline. My work during the internship mainly involved researching new API providers and checking if they meet the necessary conditions, then we decided on a strategy to crawl the API. The strategy varies according to different APIs:  some can be partitioned based on date, others have to be paginated . Script is written for the API according to the strategy.
During the later phase of the internship, I had worked on the reingestion strategy for europeana and a script to merge Common Crawl tags and metadata to the corresponding image in the image table.&lt;/p&gt;
&lt;p&gt;Provider API implemented :&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Science Museum :  Science Museum collection has around 60,000 images and was initially crawled through Common Crawl and shifted to API based crawl.&lt;ul&gt;
&lt;li&gt;Issue: &lt;a href="https://github.com/cc-archive/cccatalog/issues/302"&gt;Science Museum ticket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Related PRs: &lt;a href="https://github.com/cc-archive/cccatalog/pull/400"&gt;Science Museum script&lt;/a&gt;, &lt;a href="https://github.com/cc-archive/cccatalog/pull/411"&gt;Science Museum workflow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Statens Museum : Statens Museum for Kunst is Denmark’s leading museum for artwork . This is a new integration and 39115 images have been collected.&lt;ul&gt;
&lt;li&gt;Issue: &lt;a href="https://github.com/cc-archive/cccatalog/issues/393"&gt;Statens Museum ticket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Related PRs: &lt;a href="https://github.com/cc-archive/cccatalog/pull/428"&gt;Statens Museum implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Museums Victoria : It was initially ingested from Common Crawl later shifted to API based crawl. It has around 140,000 images.&lt;ul&gt;
&lt;li&gt;Issue: &lt;a href="https://github.com/cc-archive/cccatalog/issues/291"&gt;Museums Victoria ticket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Related PRs: &lt;a href="https://github.com/cc-archive/cccatalog/pull/447"&gt;Museums Victoria implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;NYPL : New York Public Library is a new integration , as of now it has around 1296 images.&lt;ul&gt;
&lt;li&gt;Issue: &lt;a href="https://github.com/cc-archive/cccatalog/issues/147"&gt;NYPL ticket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Related PRs: &lt;a href="https://github.com/cc-archive/cccatalog/pull/462"&gt;NYPL implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Brooklyn Museum : This was an existing integration , changes were made to follow the new &lt;code&gt;ImageStore&lt;/code&gt; and &lt;code&gt;DelayedRequestor&lt;/code&gt; class , it has 61503 images.&lt;ul&gt;
&lt;li&gt;Issue: &lt;a href="https://github.com/cc-archive/cccatalog/issues/348"&gt;Brooklyn Museum ticket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Related PRs: &lt;a href="https://github.com/cc-archive/cccatalog/pull/355"&gt;Brooklyn Museum implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Iconfinder is a provider of icons that could not be integrated as the current strategy of ingestion is very slow and we need a better strategy.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Issue : &lt;a href="https://github.com/cc-archive/cccatalog/issues/396"&gt;Iconfinder ticket&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="europeana-reingestion-strategy"&gt;Europeana reingestion strategy&lt;/h2&gt;&lt;p&gt;Data collected from europeana was collected on a daily basis and there was a need to refresh it. The idea is that new data should be refreshed more frequently and as the data gets old, refreshing should become less frequent. While developing the strategy the API key limit and maximum collection expected is to be kept in mind. Considering these factors, a workflow was set up such that each day it crawls 59 days of data.
The 59 days were split up into layers. The DAG crawls daily up to 1 week old data then it crawls monthly for data more than 1 week old and less than a year old data, anything older than a year is crawled every 3 months.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Issue: &lt;a href="https://github.com/cc-archive/cccatalog/issues/412"&gt;Europeana reingestion ticket&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Related PR: &lt;a href="https://github.com/cc-archive/cccatalog/pull/473"&gt;Europeana reingestion strategy&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;More details regarding the math of reingestion: &lt;a href="/blog/entries/date-partitioned-data-reingestion/"&gt;Data reingestion&lt;/a&gt;&lt;/p&gt;
&lt;div style="text-align:center;"&gt;
    &lt;img src="dag_image_1.png" width="1000px"/&gt;
    &lt;img src="dag_image_2.png" width="1000px"/&gt;
    &lt;img src="dag_image_3.png" width="1000px"/&gt;
    &lt;p&gt;Europeana reingestion workflow&lt;/p&gt;
&lt;/div&gt;&lt;h2 id="merging-common-crawl-tags"&gt;Merging Common Crawl tags&lt;/h2&gt;&lt;p&gt;When a provider is shifted from Common Crawl to API based crawl, the new data from API doesn’t have tags and metadata that were generated using clarifai and hence there is need to associate the new data with the tags corresponding to that image from the Common Crawl data. A direct url match is not possible as the Common Crawl urls and API image url are different, so we try to match it on the number or identifier that is associated with the url.&lt;/p&gt;
&lt;p&gt;Currently the merging logic is applied to Science Museum, Museums Victoria and Met Museum .&lt;/p&gt;
&lt;p&gt;In Science Museum, API url in image table is like &lt;a href="https://coimages.sciencemuseumgroup.org.uk/images/240/862/large_BAB_S_1_02_0017.jpg"&gt;https://coimages.sciencemuseumgroup.org.uk/images/240/862/large_BAB_S_1_02_0017.jpg&lt;/a&gt; and CC url is like &lt;a href="https://s3-eu-west-1.amazonaws.com/smgco-images/images/369/541/medium_SMG00096855.jpg"&gt;https://s3-eu-west-1.amazonaws.com/smgco-images/images/369/541/medium_SMG00096855.jpg&lt;/a&gt; . So the idea is to reduce the url to the last identifier like number , so after the modification of the url by modify_urls function it looks like &lt;code&gt;gpj.1700_20_1_S_BAB_&lt;/code&gt; (API url) and &lt;code&gt;gpj.55869000GMS_&lt;/code&gt; (CC url) .
Similar logic has been applied to met museum and museum victoria.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Issue: &lt;a href="https://github.com/cc-archive/cccatalog/issues/468"&gt;https://github.com/cc-archive/cccatalog/issues/468&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Related PR: &lt;a href="https://github.com/cc-archive/cccatalog/pull/478"&gt;https://github.com/cc-archive/cccatalog/pull/478&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="acknowledgement"&gt;Acknowledgement&lt;/h2&gt;&lt;p&gt;I would like to thank my mentors Brent and Anna for their guidance throughout the internship.&lt;/p&gt;
</content></entry><entry><title>X5GON Using CC Catalog API for Image Results</title><link href="http://opensource.creativecommons.org/blog/entries/2020-08-x5gon-cc-catalog-api/" rel="alternate"></link><updated>2020-08-24T00:00:00Z</updated><author><name>['annatuma']</name></author><id>urn:uuid:ffcc37e0-31ad-3231-b583-73749555ba0b</id><content type="html">&lt;p&gt;A few months ago, the Open Education team at Creative Commons made an introduction between the folks working on X5GON and CC Search.&lt;/p&gt;
&lt;p&gt;Throughout a few conversations, we quickly discovered that there are many parallels to how we're approaching our work, and some important differences that would allow each of us to benefit from cooperation.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.x5gon.org/"&gt;X5GON&lt;/a&gt; is building an AI-driven platform, focused on delivery of open education resources (OER). At its core, it is building a catalog of OER, upon which other &lt;a href="https://www.x5gon.org/platforms/services/"&gt;services&lt;/a&gt; are based, such as analytics for personalized recommendations, and a discovery engine. By aggregating relevant content, curating it with the use of artificial intelligence and machine learning, and personalizing the experience to each learner, they're making OER more accessible and relevant.&lt;/p&gt;
&lt;p&gt;CC Search is not yet ready to ingest content types beyond images, but when we are able to do so, we plan to integrate via API with X5GON in order to serve OER that is made available in formats we will support in the future, starting with audio.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://discovery.x5gon.org/"&gt;X5GON Discovery search engine&lt;/a&gt; allows users to find OER in video, audio, and text formats - and now, with the integration of results powered by the CC Catalog API, which also powers CC Search, users can also find openly licensed images for relevant educational queries. This is a great resource for educators and learners from all over the world.&lt;/p&gt;
&lt;p&gt;Try it for yourself, or look at these results for making &lt;a href="https://discovery.x5gon.org/search?q=geometry&amp;amp;type=Image"&gt;geometry&lt;/a&gt; visual and fun!&lt;/p&gt;
</content></entry><entry><title>How to politely crawl and analyze 500 million images</title><link href="http://opensource.creativecommons.org/blog/entries/crawling-500-million/" rel="alternate"></link><updated>2020-08-17T00:00:00Z</updated><author><name>['aldenpage']</name></author><id>urn:uuid:c4bde8a8-0a5d-324a-b450-571f43e3af02</id><content type="html">&lt;h4 id="background"&gt;Background&lt;/h4&gt;&lt;p&gt;The goal of &lt;a href="https://search.creativecommons.org"&gt;CC Search&lt;/a&gt; is to index all of the Creative Commons works on the internet, starting with images. We have indexed over 500 million images, which we believe is roughly 36% of all CC licensed content on the internet by &lt;a href="https://creativecommons.org/2018/05/08/state-of-the-commons-2017/"&gt;our last count&lt;/a&gt;. To further enhance the usefulness of our search tool, we recently started crawling and analyzing images for improved search results. This article will discuss the process of taking a paper design for a large scale crawler, implementing it, and putting it in production, with a few idealized code snippets and diagrams along the way. The full source code can be viewed on &lt;a href="https://github.com/creativecommons/image-crawler"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Originally, when we discovered an image and inserted it into CC Search, we didn't even bother downloading it; we stuck the URL in our database and embedded the image in our search results. This approach has a lot of problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We don't know the dimensions or compression quality of images, which is useful both for relevance purposes (de-ranking low  quality images) and for filtering. For example, some users are only interested in high resolution images and would like to exclude content below a certain size.&lt;/li&gt;
&lt;li&gt;We can't run any type of computer vision analysis on any of the images, which could be useful for enriching search metadata through object recognition.&lt;/li&gt;
&lt;li&gt;Embedding third party content is fraught with problems. What if the other party's server goes down, the images disappear due to link rot, or their TLS certificates expire? Each of these situations results in broken images appearing in the search results or browser alerts about degraded security.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We solved (3) by setting up a &lt;a href="https://github.com/willnorris/imageproxy"&gt;caching thumbnail proxy&lt;/a&gt; between images in the search results and their 3rd party origin, as well as some last-minute liveness checks to make sure that the image hasn't 404'd.&lt;/p&gt;
&lt;p&gt;(1) and (2), however, are not possible to solve without actually downloading the image and performing some analysis on the contents of the file. For us to reproduce the features that users take for granted in image search, we're going to need a fairly powerful crawling system.&lt;/p&gt;
&lt;p&gt;On the scale of several thousand images, it would be easy to cobble together a few scripts to spit out this information, but with half a billion images, there are a lot of hurdles to overcome.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We want to crawl &lt;a href="https://en.wikipedia.org/wiki/Web_crawler#Politeness_policy"&gt;politely&lt;/a&gt;; however, the concentration and quantity of images means that we have to hit some sources with a high crawl rate in order to have any hope of finishing the crawl in a reasonable period of time. Our data sources range from non-profit museums with a single staff IT person to tech companies with their own data centers and thousands of employees; the crawl rate has to be tailored to download quickly from the big players but not overwhelm small sources. At the same time, we need to be sure that we are not overestimating any source's capacity and watch for signs that our crawler is straining the server.&lt;/li&gt;
&lt;li&gt;We need to keep the time to process each image as low as possible to make it feasible to finish the crawling and analysis task in a reasonable period of time. This means that the crawling and analysis tasks need to be distributed to multiple machines in parallel.&lt;/li&gt;
&lt;li&gt;A lot of metadata will be produced by this crawler. The step of integrating it with our internal systems needs to not block resizing tasks. That suggests that a message bus will be necessary to buffer messages before they are written into our data layer, where writes can be expensive.&lt;/li&gt;
&lt;li&gt;We want to have a basic idea of how the crawl is progressing in the form of summaries of error counts, status codes, and crawl rates for each source.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In summary, the challenge here isn't so much making a really fast crawler as much as it is tailoring the crawl speed to each source. At a minimum, we'll need to deal with concurrency and parallelism, provisioning and managing the life cycle of crawler infrastructure, pipelines for capturing output data, a way to monitor the progress of the crawl, a suite of tests to make sure the system behaves as expected, and a reliable way to enforce a "politeness" policy. That's not a trivial project, particularly for our tiny three person tech team (of which only one person is available to do all of the crawling work). Can't we just use an off-the-shelf open source crawler?&lt;/p&gt;
&lt;h4 id="what-about-existing-open-source-crawlers"&gt;What about existing open source crawlers?&lt;/h4&gt;&lt;p&gt;Any decent software engineer will consider existing options before diving into a project and reinventing the wheel. My assessment was that although there are a lot of open source crawling frameworks available, few of them focus on images, some are not actively maintained, and all would require extensive customization to meet the requirements of our crawl strategy. Further, many solutions are more complex than than our use case demands and would significantly expand our use of cloud infrastructure, resulting in higher expenses and more operational headaches. I experimented with Apache Nutch, Scrapy Cluster, and Frontera; none of the existing options looked quite right for our use case.&lt;/p&gt;
&lt;p&gt;As a reminder, we want to eventually crawl every single Creative Commons work on the internet. Effective crawling is central to the capabilities that our search engine is able to provide. In addition to being central to achieving high quality image search, crawling could also be useful for discovering new Creative Commons content of any type on any website. In my view, that's a strong argument for spending some time designing a custom crawling solution where we have complete end-to-end control of the process, as long as the feature set is limited in scope. In the next section, we'll assess the effort required to build a crawler from the ground up.&lt;/p&gt;
&lt;h4 id="designing-the-crawler"&gt;Designing the crawler&lt;/h4&gt;&lt;p&gt;We know we're not going to be able to crawl 500 million images with one virtual machine and a single IP address, so it is obvious from the start that we are going to need a way to distribute the crawling and analysis tasks over multiple machines. A basic queue-worker architecture will do the job here; when we want to crawl an image, we can dispatch the URL to an inbound images queue, and a worker eventually pops that task out and processes it. Kafka will handle all of the hard work of partitioning and distributing the tasks between workers.&lt;/p&gt;
&lt;p&gt;The worker processes do the actual analysis of the images, which essentially entails downloading the image, extracting interesting properties, and sticking the resulting metadata back into a Kafka topic for later downstream processing. The worker will also have to include some instrumentation for conforming to rate limits and error reporting.&lt;/p&gt;
&lt;p&gt;We also know that we will need to share some information about crawl progress between worker processes, such as whether we've exceeded our prescribed rate limit for a website, the number of times we've seen a status code in the last minute, how many images we've processed so far, and so on. Since we're only interested in sharing application state and aggregate statistics, a lightweight key/value store like Redis seems like a good fit.&lt;/p&gt;
&lt;p&gt;Finally, we need a supervising process that centrally controls the crawl. This key governing process will be responsible for making sure our crawler workers are behaving properly by moderating crawl rates for each source, taking action in the face of errors, and reporting statistics to the operators of the crawler. We'll call this process the crawl monitor.&lt;/p&gt;
&lt;p&gt;Here's a rough sketch of how things will work:&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/crawling-500-million/image_crawler_simplified.png" alt="Diagram"&gt;&lt;/p&gt;
&lt;p&gt;At a high level, the problem of building a fast crawler seems solvable for our team, even on the scale of several hundred million images. If we can sustain a crawl and analysis rate of 200 images per second, we could crawl all 500 million images in about a month.&lt;/p&gt;
&lt;p&gt;In the next section, we'll examine some of the key components that make up the crawler.&lt;/p&gt;
&lt;h4 id="detailed-breakdown"&gt;Detailed breakdown&lt;/h4&gt;&lt;h5 id="concurrency-with-asyncio"&gt;Concurrency with &lt;code&gt;asyncio&lt;/code&gt;&lt;/h5&gt;&lt;p&gt;Crawling is a massively IO bound task. The workers need to maintain lots of simultaneous open connections with internal systems like Kafka and Redis as well as 3rd party websites holding the target images. Once we have the image in memory, performing our actual analysis task is easy and cheap. For these reasons, an asynchronous approach seems more attractive than using multiple threads of execution. Even if our image processing task grows in complexity and becomes CPU bound, we can get the best of both worlds by offloading heavyweight tasks to a process pool. See "&lt;a href="https://docs.python.org/3/library/asyncio-dev.html#running-blocking-code"&gt;Running Blocking Code&lt;/a&gt;" in the &lt;code&gt;asyncio&lt;/code&gt; docs for more details.&lt;/p&gt;
&lt;p&gt;Another reason that an asynchronous approach may be desirable is that we have several interlocking components which need to react to events in real-time: our crawl monitoring process needs to simultaneously control the rate limiting process and also interrupt crawling if errors are detected, while our worker processes need to consume crawl events, process images, upload thumbnails, and produce events documenting the metadata of each image. Coordinating all of these components through inter-process communication could be difficult, but breaking up tasks into small pieces and yielding to the event loop is comparatively easy.&lt;/p&gt;
&lt;h5 id="the-resize-task"&gt;The resize task&lt;/h5&gt;&lt;p&gt;This is the most vital part of our crawling system: the part that actually does the work of fetching and processing an image. As established previously, we need to execute this task concurrently, so everything needs to be defined with &lt;code&gt;async&lt;/code&gt;/&lt;code&gt;await&lt;/code&gt; syntax to allow the event loop to multitask. The actual task itself is otherwise straightforward.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Download the remote image and load it into memory.&lt;/li&gt;
&lt;li&gt;Extract the resolution and compression quality.&lt;/li&gt;
&lt;li&gt;Thumbnail the image for later computer vision analysis and upload it to S3.&lt;/li&gt;
&lt;li&gt;Write the information we've discovered to a Kafka topic.&lt;/li&gt;
&lt;li&gt;Report success/errors to Redis in aggregate.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;See &lt;a href="https://github.com/creativecommons/image-crawler/blob/master/worker/image.py"&gt;image.py&lt;/a&gt; for the nitty-gritty details.&lt;/p&gt;
&lt;h4 id="rate-limiting-with-token-buckets-and-error-circuit-breakers"&gt;Rate limiting with token buckets and error circuit breakers&lt;/h4&gt;&lt;h5 id="how-do-we-determine-the-rate-limit"&gt;How do we determine the rate limit?&lt;/h5&gt;&lt;p&gt;Often times, when designing highly concurrent software, the goal is to maximize the throughput and push servers to their absolute limit. The opposite is true with a web crawler, particularly when you are operating a non-profit organization completely reliant on the goodwill of others to exist. We want to be as certain as reasonably possible that we aren't going to knock a resource off of the internet with an accidental &lt;a href="https://en.wikipedia.org/wiki/Denial-of-service_attack"&gt;DDoS&lt;/a&gt;. At the same time, we need to crawl as quickly as possible against sources with adequate resources to withstand a heavy crawl, or else we'll never finish. How can we match our crawl rate to a site's capabilities?&lt;/p&gt;
&lt;p&gt;Originally, my plan was to determine this through an adaptive rate limiting strategy, where we would start with a low rate limit and use a hill climbing algorithm to determine the optimal rate. We could track metrics like &lt;a href="https://en.wikipedia.org/wiki/Time_to_first_byte"&gt;time to first byte&lt;/a&gt; (TTFB) and bandwidth speed to determine the exact moment that we have started to strain upstream servers. However, there are a lot of drawbacks here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It may not be correct to assume that performance will steadily degrade instead of failing all at once.&lt;/li&gt;
&lt;li&gt;We can't detect whether we are the cause of a performance issue or if the host is simply experiencing server trouble due to configuration errors or high traffic. We could get stuck at a suboptimal rate limit due to normal fluctuations in traffic.&lt;/li&gt;
&lt;li&gt;Recording TTFB in Python is difficult because it requires low level access to connection data. We might have to write an extension to &lt;code&gt;aiohttp&lt;/code&gt; to get it.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Eventually I decided that this is too much hassle. Can we get the job done with a simpler strategy?&lt;/p&gt;
&lt;p&gt;It turns out that the size of a website is typically correlated with infrastructure capabilities. The reasoning behind this is that if you are capable of hosting 450MM images, you are probably able to handle at least a couple hundred requests per second for serving traffic. In our case, we already know how many images a source has, so it's easy for us to peg our rate limit between a low minimum for small websites and a reasonable maximum for large websites, and then interpolate the rate limit for everything in between.&lt;/p&gt;
&lt;p&gt;Of course, it's important to note that this is only a rough heuristic that we use to make a reasonable guess about what a website can handle. We have to allow the possibility that we set our rate limit too aggressively in spite of our precautions.&lt;/p&gt;
&lt;h5 id="backing-off-with-circuit-breakers"&gt;Backing off with circuit breakers&lt;/h5&gt;&lt;p&gt;If our heuristic fails to correctly approximate the bandwidth capabilities of a site, we are going to start encountering problems. For one, we might exceed the server-side rate limit, which means we will see &lt;code&gt;429 Rate Limit Exceeded&lt;/code&gt; and &lt;code&gt;403 Forbidden&lt;/code&gt; errors instead of the images we're trying to crawl. Worse yet, the upstream source might continue to happily serve requests while we suck up all of their traffic capacity, resulting in other users being unable to view the images. Clearly, in either scenario, we need to either reduce our crawl rate or even give up crawling the source entirely if it appears that we are impacting their uptime.&lt;/p&gt;
&lt;p&gt;To handle these situations, we have two tools in our toolbox: a sliding window recording the status code of every request made we've made to each domain in the last 60 seconds, and a list of the last 50 statuses for each website. If the number of errors in our one minute window exceed 10%, something is wrong; we should wait a minute before trying again. If we have encountered many errors in a row, however, that suggests that we're having trouble with a particular site, so we ought to give up crawling the source and raise an alert.&lt;/p&gt;
&lt;p&gt;Workers can keep track of this information in sorted sets in Redis. For the sliding error window, we'll sort each request by its timestamp, which will make it easy and cheap for us to expire status codes beyond the sliding window interval. Maintaining a list of the last N response codes is even easier; we just stick the status code in a list associated with the source.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;StatsManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;known_sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_record_window_samples&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Insert a status into all sliding windows. &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;monotonic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# Time-based sliding windows&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;stat_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;WINDOW_PAIRS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stat_key&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zadd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;monotonic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Delete events from outside the window&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zremrangebyscore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;-inf&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# &amp;quot;Last n requests&amp;quot; window&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rpush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;LAST_50_REQUESTS&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ltrim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;LAST_50_REQUESTS&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;Collecting status codes in aggregate&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Meanwhile, the crawl monitor process can keep tabs on the contents of each error threshold.&lt;/p&gt;
&lt;p&gt;When more than 10% of the requests made to a source in the last minute are errors, we'll set a halt condition in Redis and stop replenishing rate limit tokens (more on that below).&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;monotonic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;one_minute_window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;zrangebyscore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;one_minute_window_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;-inf&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;one_minute_window&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;EXPECTED_STATUSES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;successful&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;tolerance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ERROR_TOLERANCE_PERCENT&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;successful&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;successful&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tolerance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sadd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TEMP_HALTED_SET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;Detecting elevated crawl errors for a source&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;For detecting "serious" errors, where we've seen 50 failed requests in a row, we'll set a permanent halt condition. Someone will have to manually troubleshoot the situation and switch the crawler back on for that source.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;last_50_statuses_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;statuslast50req:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;last_50_statuses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_50_statuses_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_50_statuses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;_every_request_failed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_50_statuses&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sadd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HALTED_SET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;Detecting persistent crawl errors&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In practice, keeping a sliding window for tracking error thresholds and setting reasonable minimum and maximum crawl rates has worked well enough that the circuit breaker has never been tripped.&lt;/p&gt;
&lt;h5 id="enforcing-rate-limits-with-token-buckets"&gt;Enforcing rate limits with token buckets&lt;/h5&gt;&lt;p&gt;It's one thing to set a policy for crawling; it's another thing entirely to actually enforce it. How can we coordinate our multiple crawling processes to prevent them from overstepping our rate limit?&lt;/p&gt;
&lt;p&gt;The answer is to implement a distributed token bucket system. The idea behind this is that each crawler has to obtain a token from Redis before making a request. Every second, the crawl monitor sets a variable containing the number of requests that can be made against a source. Each crawler process decrements the counter before making a request. If the decremented result is above zero, the worker is cleared to crawl. Otherwise, the rate limit has been reached and we should wait until a token has been obtained.&lt;/p&gt;
&lt;p&gt;The beauty of token buckets is their simplicity, performance, and resilience against failure. If our crawler monitor process dies, crawling halts completely; making a request is not possible without first acquiring a token. This is a much better alternative to the guard rails completely disappearing with the crawl monitor and allowing unbounded crawling. Further, since decrementing a counter and retrieving the result is an atomic operation in Redis, there's no risk of race conditions and therefore no need for locking. This is a huge boon for performance, as the overhead of coordinating and blocking on every single request would rapidly bog down our crawling system.&lt;/p&gt;
&lt;p&gt;To ensure that all crawling is performed at the correct speed, I wrapped &lt;code&gt;aiohttp.ClientSession&lt;/code&gt; with a rate limited version of the class.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;RateLimitedClientSession&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;aioclient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aioclient&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_get_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;token_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;CURRTOKEN_PREFIX&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;
        &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token_key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;token_acquired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Out of tokens&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;token_acquired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;token_acquired&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;token_acquired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;token_acquired&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;token_acquired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_get_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Meanwhile, the crawl monitor process is filling up each bucket every second.&lt;/p&gt;
&lt;h5 id="scheduling-tasks-somewhat-intelligently"&gt;Scheduling tasks (somewhat) intelligently&lt;/h5&gt;&lt;p&gt;The final gotcha in the design of our crawler is that we want to crawl every single website at the same time at its prescribed rate limit. That sounds almost tautological, like something that we should be able to take for granted after implementing all of this logic for preventing our crawler from working too quickly, but it turns out our crawler's processing capacity itself is a limited and contentious resource. We can only schedule so many tasks simultaneously on each worker, and we need to ensure that tasks from a single website aren't starving other sources of crawl capacity.&lt;/p&gt;
&lt;p&gt;For instance, imagine that each worker is able to handle 5000 simultaneous crawling tasks, and every one of those tasks is tied to a tiny website with a very low rate limit. That means that our entire worker, which is capable of handling hundreds of crawl and analysis jobs per second, is stuck making one request per second until some faster tasks appear in the queue.&lt;/p&gt;
&lt;p&gt;In other words, we need to make sure that each worker process isn't jamming itself up with a single source. We have a &lt;a href="https://en.wikipedia.org/wiki/Scheduling_(computing%29"&gt;scheduling problem&lt;/a&gt;. We've naively implemented first-come-first-serve and need to switch to a different scheduling strategy.&lt;/p&gt;
&lt;p&gt;There are innumerable ways to address scheduling problems. Since there are only a few dozen sources in our system, we can get away with using a stupid scheduling algorithm: give each source equal capacity in every worker. In other words, if there are 5000 tasks to distribute and 30 sources, we can allocate 166 simultaneous tasks to each source per worker. That's plenty for our purposes. There are obvious drawbacks of this approach in that eventually there will be so many sources that we start starving high rate limit sources of work. We'll cross that bridge when we come to it; it's better to use the simplest possible approach we can get away with instead of spending all of our time on solving hypothetical future problems.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;_schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_schedule&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;raw_sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;smembers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;inbound_sources&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;utf-8&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw_sources&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;num_sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# A source never gets more than 1/4th of the worker&amp;#39;s capacity. This&lt;/span&gt;
        &lt;span class="c1"&gt;# helps prevent starvation of lower rate limit requests and ensures&lt;/span&gt;
        &lt;span class="c1"&gt;# that the first few sources to be discovered don&amp;#39;t get all of the&lt;/span&gt;
        &lt;span class="c1"&gt;# initial task slots.&lt;/span&gt;
        &lt;span class="n"&gt;max_share&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MAX_TASKS&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
        &lt;span class="n"&gt;share&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MAX_TASKS&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;num_sources&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;max_share&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;to_schedule&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;num_unfinished&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_get_unfinished_tasks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_schedule&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;num_to_schedule&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;share&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;num_unfinished&lt;/span&gt;
            &lt;span class="n"&gt;consumer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_get_consumer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;source_msgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_consume_n&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_to_schedule&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;to_schedule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_msgs&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;to_schedule&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;Scheduling tasks for every source&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The one implementation detail to deal with here is that our workers can't draw from a single inbound images queue anymore; we need to partition each source into its own queue so we can pull tasks from each source when we need it. This partitioning process can be handled transparently by the crawl monitor.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/entries/crawling-500-million/image_crawler.png" alt="A more complete diagram"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;A more complete diagram showing the system with a queue for each source&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h5 id="designing-for-testability"&gt;Designing for testability&lt;/h5&gt;&lt;p&gt;It's quite difficult to test IO-heavy systems because of their need to interact with lots of external dependencies. Often times it is necessary to write complex integration tests or run manual tests to be certain that key functionality works as expected. This is no good because integration tests are much more expensive to maintain and take far longer to execute. We certainly wouldn't go to production without running a smoke test to verify correctness in real-world conditions, but it's still critical to have unit tests in place for catching bugs quickly during the development process.&lt;/p&gt;
&lt;p&gt;The solution to this problem is to use dependency injection, which is a fancy way of saying that we never do IO directly from within our application. Instead, we delegate IO to external objects that can be passed in at run-time. This makes it easy to pass in fake objects that approximate real world behavior without real world consequences.&lt;/p&gt;
&lt;p&gt;For example, the crawl monitor usually has to talk to our CC Search API (for assessing source size), Redis, and Kafka to do its job of regulating the crawl; instead of setting up a brittle and complicated integration test with all of those dependencies, we just instantiate some mock objects and pass them in. Now we can easily test individual components such as the error circuit breaker.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;Testing our crawl monitor's circuit breaking functionality with mock dependencies&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@pytest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;source_fixture&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Mocks the /v1/sources endpoint response. &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;source_name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;example&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;image_count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;display_name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Example&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;source_url&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;example.com&amp;quot;&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;source_name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;another&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;image_count&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;display_name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Another&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s2"&gt;&amp;quot;source_url&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;whatever&amp;quot;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;create_mock_monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FakeAioResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FakeAioSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FakeRedis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;regulator_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rate_limit_regulator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regulator_task&lt;/span&gt;


&lt;span class="nd"&gt;@pytest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mark&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;test_error_circuit_breaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_fixture&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_fixture&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;create_mock_monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;statuslast50req:example&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;500&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;statuslast50req:another&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;200&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;run_monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;monitor_task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;example&amp;#39;&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;halted&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;another&amp;#39;&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;halted&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The main drawback of dependency injection is that initializing your objects will take some more ceremony. See the &lt;a href="https://github.com/creativecommons/image-crawler/blob/00b59aba9a15faccf203a53d73a98e8c06cb69e8/worker/scheduler.py#L162"&gt;initialization of the crawl scheduler&lt;/a&gt; for an example of wiring up an object with a lot of dependencies. You might also find that constructors and other functions with a lot of dependencies will have a lot of arguments if care isn't taken to bundle external dependencies together. In my opinion, the price of a few extra lines of initialization code is well worth the benefits gained from testability and modularity.&lt;/p&gt;
&lt;h4 id="smoke-testing"&gt;Smoke testing&lt;/h4&gt;&lt;p&gt;Even with our unit test coverage, we still need to do some basic small-scale manual tests to make sure our assumptions hold up in the real world. We'll need to write &lt;a href="https://www.terraform.io/"&gt;Terraform&lt;/a&gt; modules that provision a working version of the real system. Sadly, our Terraform infrastructure repository is private for now, but here's a taste of what the infra code looks like.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kr"&gt;module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;&amp;quot;image-crawler&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;../../modules/services/image-crawler&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;prod&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;docker_tag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;0.25.0&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;aws_access_key_id&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.aws_access_key_id}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;aws_secret_access_key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.aws_secret_access_key}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;zookeeper_endpoint&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${module.kafka.zookeeper_brokers}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;kafka_brokers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${module.kafka.kafka_brokers}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;worker_instance_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;m5.large&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;worker_count&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;Initialization of crawler Terraform module in our production environment&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kr"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;&amp;quot;aws_instance&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;&amp;quot;crawler-workers&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;ami&lt;/span&gt;&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.ami}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;instance_type&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.worker_instance_type}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;user_data&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${data.template_file.worker_init.rendered}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;subnet_id&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${element(data.aws_subnet_ids.subnets.ids, 0)}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;vpc_security_group_ids&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${aws_security_group.image-crawler-sg.id}&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.worker_count}&amp;quot;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;tags&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;image-crawler-worker-${var.environment}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.environment}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:environment&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.environment == &amp;quot;dev&amp;quot; ? &amp;quot;staging&amp;quot; : &amp;quot;production&amp;quot;}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:product&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cccatalog-api&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:purpose&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Image crawler worker&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:team&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc-search&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;&amp;quot;aws_instance&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;&amp;quot;crawler-monitor&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;ami&lt;/span&gt;&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.ami}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;instance_type&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;c5.large&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;user_data&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${data.template_file.monitor_init.rendered}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;subnet_id&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${element(data.aws_subnet_ids.subnets.ids, 0)}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="na"&gt;vpc_security_group_ids&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${aws_security_group.image-crawler-sg.id}&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;tags&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;image-crawler-monitor-${var.environment}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.environment}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:environment&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${var.environment == &amp;quot;dev&amp;quot; ? &amp;quot;staging&amp;quot; : &amp;quot;production&amp;quot;}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:product&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cccatalog-api&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:purpose&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Image crawler monitor&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc:team&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cc-search&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;center&gt;An excerpt of the crawler module definition&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;One &lt;code&gt;terraform plan&lt;/code&gt; and &lt;code&gt;terraform apply&lt;/code&gt; cycle later, we're ready to feed a few million test URLs to the inbound image queue and see what happens. By my recollection, this uncovered many glaring issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Basic network security configuration problems preventing communication between key components&lt;/li&gt;
&lt;li&gt;The need for our scheduling algorithm to be overhauled (already discussed)&lt;/li&gt;
&lt;li&gt;Workers exceeding Redis maximum connection limit&lt;/li&gt;
&lt;li&gt;Workers crashing due to hitting open file limit due to huge number of concurrent connections&lt;/li&gt;
&lt;li&gt;Probably a half dozen other problems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After fixing all of those issues and performing a larger smoke test, we're ready to start crawling on a large scale.&lt;/p&gt;
&lt;h5 id="monitoring-the-crawl"&gt;Monitoring the crawl&lt;/h5&gt;&lt;p&gt;Unfortunately, we can't just kick back and relax while the crawler does its thing for a few weeks. We need some transparency about what the crawler is doing so we can be alerted when something breaks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How fast are we crawling each website? What's our target rate limit?&lt;/li&gt;
&lt;li&gt;How many errors have occurred? How many images have we successfully processed?&lt;/li&gt;
&lt;li&gt;Are we crawling right now, or are we finished?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It would be nice to build a reporting dashboard for this, but in the interest of time, we'll dump a giant JSON blob to &lt;code&gt;STDOUT&lt;/code&gt; every 5 seconds and call it a day. When we want to check on crawl progress, we &lt;code&gt;ssh&lt;/code&gt; into the crawl monitoring virtual machine and &lt;code&gt;tail&lt;/code&gt; the logs (we could also use our Graylog instance if we're feeling lazy). Fortunately, JSON is both trivially human and machine readable, so we can build a more sophisticated monitoring system later by parsing the logs.&lt;/p&gt;
&lt;p&gt;Here's an example log line from one of our smoke tests, indicating that we've crawled 13,224 images successfully and nothing else is happening.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;event&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;monitoring_update&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;time&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;2020-04-17T20:22:56.837232&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;general&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;global_max_rps&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;193.418869804698&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;error_rps&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;processing_rate&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;success_rps&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;circuit_breaker_tripped&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;num_resized&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;resize_errors&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;split_rate&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;specific&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;flickr&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;successful&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13188&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;last_50_statuses&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;200&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;rate_limit&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;178.375147633876&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;animaldiversity&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;last_50_statuses&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;200&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;successful&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;rate_limit&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.206215440554406&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;phylopic&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;rate_limit&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;successful&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;last_50_statuses&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;200&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that we can see what the crawler is up to, we can schedule the larger crawl and start collecting production quality data.&lt;/p&gt;
&lt;h4 id="takeaways"&gt;Takeaways&lt;/h4&gt;&lt;p&gt;The result here is that we have a lightweight, modular, highly concurrent, and polite distributed image crawler with only a handful of lines of code.&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;alden:~/code/image_crawler$&lt;span class="w"&gt; &lt;/span&gt;cloc&lt;span class="w"&gt; &lt;/span&gt;.
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="m"&gt;48&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;text&lt;span class="w"&gt; &lt;/span&gt;files.
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="m"&gt;43&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;unique&lt;span class="w"&gt; &lt;/span&gt;files.
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="m"&gt;25&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;files&lt;span class="w"&gt; &lt;/span&gt;ignored.

github.com/AlDanial/cloc&lt;span class="w"&gt; &lt;/span&gt;v&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;.81&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;T&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.02&lt;span class="w"&gt; &lt;/span&gt;s&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1667&lt;/span&gt;.4&lt;span class="w"&gt; &lt;/span&gt;files/s,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;130887&lt;/span&gt;.8&lt;span class="w"&gt; &lt;/span&gt;lines/s&lt;span class="o"&gt;)&lt;/span&gt;
------------------------------------------------------------------------------
Language&lt;span class="w"&gt;                     &lt;/span&gt;files&lt;span class="w"&gt;          &lt;/span&gt;blank&lt;span class="w"&gt;        &lt;/span&gt;comment&lt;span class="w"&gt;           &lt;/span&gt;code
------------------------------------------------------------------------------
Python&lt;span class="w"&gt;                          &lt;/span&gt;&lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="m"&gt;244&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="m"&gt;242&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="m"&gt;1324&lt;/span&gt;
Markdown&lt;span class="w"&gt;                         &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="m"&gt;79&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="m"&gt;219&lt;/span&gt;
YAML&lt;span class="w"&gt;                             &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="m"&gt;61&lt;/span&gt;
XML&lt;span class="w"&gt;                              &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;
Bourne&lt;span class="w"&gt; &lt;/span&gt;Shell&lt;span class="w"&gt;                     &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt;
------------------------------------------------------------------------------
SUM:&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="m"&gt;28&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="m"&gt;325&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="m"&gt;247&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="m"&gt;1626&lt;/span&gt;
------------------------------------------------------------------------------

alden:~/code/image_crawler$&lt;span class="w"&gt; &lt;/span&gt;tree&lt;span class="w"&gt; &lt;/span&gt;.
.
├──&lt;span class="w"&gt; &lt;/span&gt;architecture.png
├──&lt;span class="w"&gt; &lt;/span&gt;CODE_OF_CONDUCT.md
├──&lt;span class="w"&gt; &lt;/span&gt;CONTRIBUTING.md
├──&lt;span class="w"&gt; &lt;/span&gt;crawl_monitor
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;__init__.py
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;monitor.py
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;rate_limit.py
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;README.md
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;settings.py
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;source_splitter.py
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;structured_logging.py
│&lt;span class="w"&gt;   &lt;/span&gt;└──&lt;span class="w"&gt; &lt;/span&gt;tsv_producer.py
├──&lt;span class="w"&gt; &lt;/span&gt;docker-compose.yml
├──&lt;span class="w"&gt; &lt;/span&gt;Dockerfile-monitor
├──&lt;span class="w"&gt; &lt;/span&gt;Dockerfile-worker
├──&lt;span class="w"&gt; &lt;/span&gt;__init__.py
├──&lt;span class="w"&gt; &lt;/span&gt;LICENSE
├──&lt;span class="w"&gt; &lt;/span&gt;Pipfile
├──&lt;span class="w"&gt; &lt;/span&gt;Pipfile.lock
├──&lt;span class="w"&gt; &lt;/span&gt;publish_release.sh
├──&lt;span class="w"&gt; &lt;/span&gt;README.md
├──&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;corrupt.jpg
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;__init__.py
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;mocks.py
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;test_image.jpg
│&lt;span class="w"&gt;   &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;test_monitor.py
│&lt;span class="w"&gt;   &lt;/span&gt;└──&lt;span class="w"&gt; &lt;/span&gt;test_worker.py
└──&lt;span class="w"&gt; &lt;/span&gt;worker
&lt;span class="w"&gt;    &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;image.py
&lt;span class="w"&gt;    &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;__init__.py
&lt;span class="w"&gt;    &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;message.py
&lt;span class="w"&gt;    &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;rate_limit.py
&lt;span class="w"&gt;    &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;scheduler.py
&lt;span class="w"&gt;    &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;settings.py
&lt;span class="w"&gt;    &lt;/span&gt;├──&lt;span class="w"&gt; &lt;/span&gt;stats_reporting.py
&lt;span class="w"&gt;    &lt;/span&gt;└──&lt;span class="w"&gt; &lt;/span&gt;util.py

&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;directories,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;34&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;files
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We now have a lot of useful information about images that we were lacking before. The next step is to take this metadata and integrate it into our search engine, as well as perform deeper analysis of images using computer vision.&lt;/p&gt;
</content></entry><entry><title>Say Hello To Our Community Team</title><link href="http://opensource.creativecommons.org/blog/entries/say_hello_to_ct/" rel="alternate"></link><updated>2020-08-14T00:00:00Z</updated><author><name>['dhruvkb']</name></author><id>urn:uuid:ddeb63da-7771-357d-8299-ad51defeaf4a</id><content type="html">&lt;p&gt;Creative Commons is committed to open-source software. We have over two dozen
projects, spanning three times as many repositories on GitHub, each with its
small, but extremely enthusiastic, subcommunity. With only a few full-time
employees working on these projects, it is vital that we enable members from the
community to take increased responsibility in developing and maintaining them,
and growing the community of which they are a part.&lt;/p&gt;
&lt;p&gt;With that goal in mind, we've launched our Community Team initiative.&lt;/p&gt;
&lt;h3 id="what-is-the-community-team"&gt;What is the Community Team?&lt;/h3&gt;&lt;p&gt;Communities that grow organically around open source projects tend to be a bit
disorganised and the frequency of contributions and degree of involvement tends
to vary from member to member. Our goal is to identify contributors who are
actively involved within their communities and give them increased permissions
over the codebase and access to more information channels and tools in an effort
to empower them to participate more fully in the project.&lt;/p&gt;
&lt;p&gt;This is not restricted to code though. We're also looking for people who work
with the community on other aspects of the projects, such as design,
documentation, evangelism, and onboarding to name a few.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Community Team establishes a framework for formalising the level of
involvement, which is a spectrum, into discrete level or 'roles'.&lt;/li&gt;
&lt;li&gt;Each role is mapped to a set of responsibilities that a member holding the
role is encouraged to take up.&lt;/li&gt;
&lt;li&gt;Each role also entrusts the members holding it to certain privileges, accesses
and permissions, to help them execute these responsibilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Roles also progressively include members in our roadmaps and planning meetings
to ensure that the community is aligned with our long-term goals.&lt;/p&gt;
&lt;h3 id="what-s-in-it-for-me"&gt;What's in it for me?&lt;/h3&gt;&lt;p&gt;The Community Team is not just a one-sided deal. Your membership in the
Community Team is just as beneficial for the you as it is for us. While there is
a &lt;a href="/community/community-team/#benefits-of-joining-the-community-team"&gt;laundry list of benefits&lt;/a&gt; that you're entitled to, I'll just
mention some notable ones here.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You gain real-world practical experience of working on open-source projects.&lt;/li&gt;
&lt;li&gt;You gain both soft-skills and technical-skills by interacting with other
developers from both the community as well as CC staff.&lt;/li&gt;
&lt;li&gt;Since we've already seen the quality of your work and involvement with the
community, you get priority in internship applications*.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Oh and, lest I forget, you'll receive CC swag!&lt;/p&gt;
&lt;p&gt;&lt;blockquote class="twitter-tweet" data-align="center"&gt;
  &lt;p lang="en" dir="ltr"&gt;
    Thanks for the goodies!!
    &lt;a href="https://twitter.com/creativecommons?ref_src=twsrc%5Etfw"&gt;@creativecommons&lt;/a&gt;
    😀
    &lt;a href="https://twitter.com/hashtag/OpenSource?src=hash&amp;amp;ref_src=twsrc%5Etfw"&gt;#OpenSource&lt;/a&gt;
    &lt;a href="https://twitter.com/hashtag/creativecommons?src=hash&amp;amp;ref_src=twsrc%5Etfw"&gt;#creativecommons&lt;/a&gt;
    &lt;a href="https://twitter.com/hashtag/GSoC?src=hash&amp;amp;ref_src=twsrc%5Etfw"&gt;#GSoC&lt;/a&gt;
    &lt;a href="https://t.co/DFvpXCs8uu"&gt;pic.twitter.com/DFvpXCs8uu&lt;/a&gt;
  &lt;/p&gt;
  &amp;mdash;
  Mayank Nader (@MayankNader)
  &lt;a href="https://twitter.com/MayankNader/status/1137995920866390016?ref_src=twsrc%5Etfw"&gt;June 10, 2019&lt;/a&gt;
&lt;/blockquote&gt;&lt;/p&gt;
&lt;script async src="https://platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;&lt;h3 id="what-are-these-roles"&gt;What are these 'roles'?&lt;/h3&gt;&lt;p&gt;If you've reached this point, I assume you see the potential of the Community
Team. Let's see where you'd fit in them.&lt;/p&gt;
&lt;p&gt;We have two kinds of roles, code-oriented &lt;a href="/community/community-team/project-roles/"&gt;Project roles&lt;/a&gt;, that
give you responsibilities and permissions related to one CC project, and
non-code-oriented &lt;a href="/community/community-team/community-building-roles/"&gt;Community Building roles&lt;/a&gt;, that give
you responsibilities and permissions related to improving the community of all
CC projects as a whole.&lt;/p&gt;
&lt;p&gt;Each type has a few levels but that I'll just link them for you to read on your
own. While your eligibility for any role depends on how involved you have been
in the past, the role you choose reflects how involved you would like to be in
the future.&lt;/p&gt;
&lt;p&gt;Start by asking yourself a simple question, "Do I code?"&lt;/p&gt;
&lt;h4 id="sure-i-can-code..""&gt;"Sure, I can code..."&lt;/h4&gt;&lt;p&gt;&lt;em&gt;That's awesome!&lt;/em&gt; We have projects in a diverse array of languages, using myriad
tools and frameworks. Depending on the skills you have, or are planning to
acquire, you can pick a project and start contributing to it. Based on your
contributions and your familiarity with the codebase, you can then apply for the
role that matches your desired level of involvement.&lt;/p&gt;
&lt;p&gt;So if you want to be lightly involved with code-reviews and would like to know
about our plans in advance, you can start off as a Project Contributor. This is
a fantastic role to get started with and ensures that you get excellent
mentorship as you start your FOSS journey.&lt;/p&gt;
&lt;p&gt;As your familiarity with the codebase increases, you might want to triage
incoming issues or block certain PRs that you've reviewed. You could escalate
your role to Project Collaborator. Want to me more involved? You can apply to be
a Project Core Committer, or even a Project Maintainer.&lt;/p&gt;
&lt;h4 id="no-i-can-t-code..""&gt;"No, I can't code..."&lt;/h4&gt;&lt;p&gt;&lt;em&gt;That's cool too!&lt;/em&gt; We realise that open source communities are never just about
the code. If you're passionate about growing the CC community by enabling new
contributors to get started or by spreading the word, you can apply for one of
the Community Building roles. Like the Project roles, there are a couple of
levels to choose from.&lt;/p&gt;
&lt;p&gt;Community builders have a whole different set of responsibilities and privileges
specifically catered to the unique task of cultivating a healthy community
around our many open source projects.&lt;/p&gt;
&lt;p&gt;So if you want to be lightly involved with onboarding new contributors to the
repositories and the workflows, you could start off as a Community Contributor.
This is a fantastic role to help new contributors get a headstart in their
journey with FOSS.&lt;/p&gt;
&lt;p&gt;As your familiarity with the community increases, you might want to suggest
tweets for our Twitter account, or pariticipate in long-term community building
tasks from Asana. You could escalate your role to Community Collaborator. Want
to me more involved? You can even apply to be a Community Maintainer.&lt;/p&gt;
&lt;h3 id="what-s-next"&gt;What's next?&lt;/h3&gt;&lt;p&gt;The Community Team is a fairly novel idea for us and we're still tweaking things
along the way. For example, we recently merged of two Project roles, namely
Project Member and Project Collaborator, when we realised they weren't so
different. As we internalise these roles more and more, we'll find more scope
for improvement and we'll continue to refine these roles over time.&lt;/p&gt;
&lt;p&gt;We're excited about the Community Team. If you're interested in joining us on
this ride, it's really easy to &lt;a href="/community/community-team/"&gt;get started&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;*We do not guarantee that you will be accepted if you apply for an
internship!&lt;/small&gt;&lt;/p&gt;
</content></entry><entry><title>Accessibility Improvements: Final Changes and Modal Accessilibity</title><link href="http://opensource.creativecommons.org/blog/entries/cc-search-accessibility-week9-10/" rel="alternate"></link><updated>2020-08-12T00:00:00Z</updated><author><name>['AyanChoudhary']</name></author><id>urn:uuid:4b846809-9764-37d3-9551-f5c639064470</id><content type="html">&lt;p&gt;These are the last two weeks of my internship with CC. I am working on improving the accessibility of cc-search and internationalizing it as well.
This post contains details of my work done to make accessibility improvements to the search result page and the image detail page and also covers some advanced accessiblity improvement details.&lt;/p&gt;
&lt;p&gt;The topics included in this post cover:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Tooltip accessibility and keyboard interactions&lt;/li&gt;
&lt;li&gt;Improve modal accessibility and implement trap focus&lt;/li&gt;
&lt;li&gt;Fix &lt;code&gt;&amp;lt;label&amp;gt;&lt;/code&gt; for form elements&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first stage involved fixing the license explanation tooltips. These tooltips worked fine on click but did not respond to keypress events.
The solution to overcome this was to use an event listener on the element which would would execute the &lt;code&gt;showLicenseExplanation&lt;/code&gt; function onClick.
Luckily &lt;code&gt;VueJS&lt;/code&gt; provides this function inbuilt via the &lt;code&gt;v-on:keyup&lt;/code&gt; attribute. So after change the code looks as follows:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;img&lt;/span&gt;
    &lt;span class="na"&gt;:aria-label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;$t(&amp;#39;browse-page.aria.license-explanation&amp;#39;)&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;tabindex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;0&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;v-if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;filterType == &amp;#39;licenses&amp;#39;&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;@/assets/help_icon.svg&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;alt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;help&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;license-help is-pulled-right padding-top-smallest padding-right-smaller&amp;quot;&lt;/span&gt;
    &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="na"&gt;click&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;toggleLicenseExplanationVisibility(item.code)&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;v-on:keyup&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="na"&gt;enter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;toggleLicenseExplanationVisibility(item.code)&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Similar change was made to all the tooltips. The reason behind this error was that non-semantic element representation
(i.e. using &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; instead of a &lt;code&gt;&amp;lt;button&amp;gt;&lt;/code&gt;) does not register a keypress listener for these tags and hence they don't respond on keypress.&lt;/p&gt;
&lt;p&gt;The second change is related to modals. Modals have some stringent accessilibity parameters that have to be carefully handled.
The criteria are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;On opening the modal the remaining elements should get disabled.&lt;/li&gt;
&lt;li&gt;The modal should have trap-focus(the user should not exit the modal when using tab to navigate).&lt;/li&gt;
&lt;li&gt;The modal should close on pressing &lt;strong&gt;esc&lt;/strong&gt; or on clicking the overlay.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;To meet the criteria we developed a new &lt;a href="https://github.com/cc-archive/cccatalog-frontend/blob/develop/src/components/AppModal.vue"&gt;modal component&lt;/a&gt;.
This modal has an overlay and closes when we press the &lt;strong&gt;esc&lt;/strong&gt; key or click on the overlay. The modal also disables other elements when it is opened.&lt;/p&gt;
&lt;p&gt;The final task achieved in the modal was the implementation of trap focus. For this we used the &lt;a href="https://github.com/posva/focus-trap-vue"&gt;vue-trap-focus library&lt;/a&gt;
The library exposes a &lt;code&gt;&amp;lt;focus-trap&amp;gt;&lt;/code&gt; component which acts as wrapper to enable focus-trap. The implementation we used was:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;focus-trap&lt;/span&gt; &lt;span class="na"&gt;:active&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;modal relative&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;aria-modal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;true&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dialog&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;header&lt;/span&gt;
        &lt;span class="na"&gt;v-if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;title&amp;quot;&lt;/span&gt;
        &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;modal-header padding-top-bigger padding-left-bigger padding-right-normal padding-bottom-small&amp;quot;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;slot&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;header&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;h3&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;{{ title }}&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;h3&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;button&amp;quot;&lt;/span&gt;
            &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;close-button has-color-gray is-size-6 is-size-4-touch&amp;quot;&lt;/span&gt;
            &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="na"&gt;click&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;$emit(&amp;#39;close&amp;#39;)&amp;quot;&lt;/span&gt;
            &lt;span class="na"&gt;:aria-label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;$t(&amp;#39;browse-page.aria.close&amp;#39;)&amp;quot;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;i&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;icon cross&amp;quot;&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;header&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;slot&lt;/span&gt; &lt;span class="na"&gt;default&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;focus-trap&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Apart from these the modal also has the &lt;code&gt;aria-modal&lt;/code&gt; attribute and the &lt;code&gt;role="dialog"&lt;/code&gt; attribute.
These attributes direct our screen readers to recognise this component as a modal and declare it whenever the modal opens.&lt;/p&gt;
&lt;p&gt;The last improvement involves using appropriate label tags for the form elements. A lot of elements did not have proper labels or were nested in wrong way.
These elements were fixed and after the fixing the nestings the elements had proper labels which the screen readers were able to identify.
An example a proper input elements with correct label nesting is:&lt;/p&gt;
&lt;div class="hll"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;label&lt;/span&gt; &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;checkbox&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;:for&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;item.code&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;:disabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;block(item)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;input&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;checkbox&amp;quot;&lt;/span&gt;
        &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;filter-checkbox margin-right-small&amp;quot;&lt;/span&gt;
        &lt;span class="na"&gt;:id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;item.code&amp;quot;&lt;/span&gt;
        &lt;span class="na"&gt;:key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;index&amp;quot;&lt;/span&gt;
        &lt;span class="na"&gt;:checked&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;item.checked&amp;quot;&lt;/span&gt;
        &lt;span class="na"&gt;:disabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;block(item)&amp;quot;&lt;/span&gt;
        &lt;span class="err"&gt;@&lt;/span&gt;&lt;span class="na"&gt;change&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;onValueChange&amp;quot;&lt;/span&gt;
    &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;license-icons&lt;/span&gt; &lt;span class="na"&gt;v-if&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;filterType == &amp;#39;licenses&amp;#39;&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;:license&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;item.code&amp;quot;&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
    {{ $t(item.name) }}
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;label&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Notice how the input is a child of the &lt;code&gt;&amp;lt;label&amp;gt;&lt;/code&gt; tag which has the &lt;code&gt;for&lt;/code&gt; attribute to point which element it labels.&lt;/p&gt;
&lt;p&gt;Apart from these changes, the eslint configuration of the project were also changed to include a11y-linting for the elments.
We used the &lt;a href="https://github.com/maranran/eslint-plugin-vue-a11y"&gt;eslint-plugin-vue-a11y&lt;/a&gt; to enforce accessibility guidelines for our components via lint checks.
Furthermore all the aria-labels were internationalized to enforce the i18n standard in our repo that we had setup earlier this summer.&lt;/p&gt;
&lt;p&gt;After all these changes we had the following inprovements in the accessibility scores(computed from lighthouse):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Browse Page: 76 -&amp;gt; 98 | +22&lt;/li&gt;
&lt;li&gt;Collections Browse Page: 86 -&amp;gt; 96 | +10&lt;/li&gt;
&lt;li&gt;Photo Detail Page: 75 -&amp;gt; 95 | +20&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And we are officially done with our work for the summer internship. The next blog will be the culmination of this series.&lt;/p&gt;
&lt;p&gt;You can track the work done for these weeks through these PRs:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-frontend/pull/1072"&gt;Accessibility Improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-frontend/pull/1121"&gt;setup vue-a11y for eslint&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-frontend/pull/1123"&gt;Aria labels and internationalization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-frontend/pull/1120"&gt;internationalize aria-labels for about page and feedback page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/cc-archive/cccatalog-frontend/pull/1153"&gt;add trap focus to modals&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The progress of the project can be tracked on &lt;a href="https://github.com/cc-archive/cccatalog-frontend"&gt;cc-search&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;CC Search Accessiblity is my GSoC 2020 project under the guidance of &lt;a href="https://creativecommons.org/author/zackcreativecommons-org/"&gt;Zack Krida&lt;/a&gt; and &lt;a href="/blog/authors/akmadian/"&gt;Ari Madian&lt;/a&gt;, who is the primary mentor for this project, &lt;a href="https://creativecommons.org/author/annacreativecommons-org/"&gt;Anna Tumadóttir&lt;/a&gt; for helping all along and engineering director &lt;a href="https://creativecommons.org/author/kriticreativecommons-org/"&gt;Kriti
Godey&lt;/a&gt;, have been very supportive.&lt;/p&gt;
</content></entry><entry><title>CC Legal Database: Developing features</title><link href="http://opensource.creativecommons.org/blog/entries/legal-database-features/" rel="alternate"></link><updated>2020-08-07T00:00:00Z</updated><author><name>['krysal']</name></author><id>urn:uuid:22d82fa8-ac33-3574-89ba-98937b7365f7</id><content type="html">&lt;p&gt;In this post, I want to update the progress on Reimplementing the CC Legal Database site, my Outreachy project. There are several features added over the last month to date.&lt;/p&gt;
&lt;h3 id="submission-forms"&gt;Submission forms&lt;/h3&gt;&lt;p&gt;The first thing I wanted to implement was the respective forms so that anyone can submit a case or article to the database. These forms were slightly modified in the redesign (discussed in the previous articles), so now it has fewer mandatory fields to lower the bar and facilitate the contribution of users.&lt;/p&gt;
&lt;figure style="text-align: center;"&gt;
    &lt;img src="scholarship-form.png" alt="Form to submit an article related to CC licenses" style="border: 1px solid black; width: 60%;"&gt;
    &lt;figcaption&gt;Scholarship form to submit an article.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;For the Scholarship form, for example, it is only needed to share your name, email and a link to propose an article related to any of the CC licenses, although the more information you can provide us the better, in any case, each contribution is reviewed by the staff before publishing.&lt;/p&gt;
&lt;h3 id="search"&gt;Search&lt;/h3&gt;&lt;p&gt;The second important task was to allow searching in each of the listings. A basic function to start making use of the exposed information. In the &lt;a href="https://labs.creativecommons.org/caselaw/"&gt;current site&lt;/a&gt;, this function is delegated to an external service, a certain famous search engine. Filtering is now performed in the backend based on the keywords entered by the user, thus returning the reduced list. Later this will be combined with filtering by tags or topics that are associated with each entry (case or scholarship).&lt;/p&gt;
&lt;h3 id="automated-tests"&gt;Automated tests&lt;/h3&gt;&lt;p&gt;While developing the mentioned functionalities I was also in charge of adding automatic unit tests, to ensure that future changes to the code base do not damage already functional parts of the site. This, in addition to giving more confidence to future contributors, they provide value immediately, at the time of writing the tests you should think about possible edge cases, so they allowed me to notice a missing validation in a couple of routes and then correct it.&lt;/p&gt;
&lt;figure style="text-align: center;"&gt;
    &lt;img src="404-page.png" alt="404 page" style="border:1px solid black; width:70%;"&gt;
    &lt;figcaption&gt;Example of page obtained when requesting a case detail that is not published or doesn't exist.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;In this process of adding automated tests I wanted them to run on every pull request created, so I learned how to write a GitHub Action with a PostgreSQL service, the DBMS used in this case. Previously, I had already created a job for linting, so I needed to add another one to run in parallel to save time. This service provided by GitHub is pretty cool and useful, it opens up a world of possibilities, from running third party services like &lt;a href="https://github.com/GoogleChrome/lighthouse-ci"&gt;Lighthouse test&lt;/a&gt; to even &lt;a href="https://github.com/gr2m/twitter-together"&gt;send tweets&lt;/a&gt;! If you want to see the GitHub Action file configurated for this project, check it out: &lt;a href="https://github.com/creativecommons/legaldb/blob/31c3002a7860d78f3fdb464150c5c1b2f8bb86fc/.github/workflows/main.yml"&gt;&lt;code&gt;.github/workflows/main.yml&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="accessibility"&gt;Accessibility&lt;/h3&gt;&lt;p&gt;To check if the site had shortcomings I did the Lighthouse test on the homepage, discovering that there were indeed some issues to tackle. In principle the results were these:&lt;/p&gt;
&lt;figure style="text-align: center;"&gt;
    &lt;img src="lighthouse-before.png" alt="" style="border:1px solid black; width:70%;"&gt;
    &lt;figcaption&gt;Initial Lighthouse test measurements.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;The good thing about this test is that it throws up suggestions on how to fix the bugs found, so after adding certain missing attributes and labels, the following results were achieved.&lt;/p&gt;
&lt;figure style="text-align: center;"&gt;
    &lt;img src="lighthouse-after.png" alt="" style="border:1px solid black; width:70%;"&gt;
    &lt;figcaption&gt;Lighthouse test measurements after corrections.&lt;/figcaption&gt;
&lt;/figure&gt;&lt;p&gt;There is still room for improvement but at least we are within a quite acceptable green range.&lt;/p&gt;
&lt;h3 id="other-features-and-tweaks"&gt;Other features and tweaks&lt;/h3&gt;&lt;p&gt;Some other features were implemented but only relevant to our registered users, that is, the Legal Staff. They consist of Django admin customization, such as filtering records by status, and a particular thing requested, the answers of frequently asked questions need to be displayed formatted, so they are now saved as Markdown text and transformed to HTML with style on the public site, showing lists, bold text, links, etc. The admin can also see a preview while editing.&lt;/p&gt;
&lt;h3 id="conclusion"&gt;Conclusion&lt;/h3&gt;&lt;p&gt;After reviewing all done this last month I see significant progress has been made, I have learned many things along the way: more of what Django and its ecosystem offers, about accessibility, continuous integration in Heroku and GitHub, and more. One of the things that makes me most happy is being able to be contributing and being part of an Open Source organization, knowing how it moves and works inside, something I never imagine before.&lt;/p&gt;
&lt;p&gt;Time flies and there are less than two weeks left to finish, so if you want to follow the project here is the repository to suggest improvements or report bugs, or if you prefer something less technical you can join us on the &lt;a href="https://creativecommons.slack.com/channels/cc-dev-legal-database"&gt;slack channel&lt;/a&gt;.&lt;/p&gt;
</content></entry></feed>