-
-
Notifications
You must be signed in to change notification settings - Fork 135
[WIP] Fix for Extraneous Whitespace in HTML Pages #536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
For now, outside of just not calling |
|
@adqm Thank you for the work you put into this. The links to examples in the description is very helpful. However, the HTML output during publishing must be normalized/tidy and consistent. There must be a formatting step. If |
|
Thanks for taking a look! That makes sense. I'll look into a few alternatives and see if I can find a reasonable solution. |
|
@TimidRobot I've been trying a few things over the past few days. Nothing is perfect, but I wanted to give a little bit of an update. Apologies for the long message 😅. I've pushed what I think is the least-objectionable method I've tried so far, but it still definitely has some downsides:
That said, I'm quite happy with the output (both the rendered output and the source code look pretty good to me). Since everything is ultimately served statically, my opinion is that the improvement in the output is worth the slowdown and the extra complexity at build time. (Maybe also worth mentioning that I didn't yet try to update the README's instructions for manual setup, which will need adjustment if this approach is ultimately used). MethodBased on prior experience, my first thought was to use Prettier to do the HTML formatting; it seems to be the popular tool for this kind of thing nowadays, and it doesn't have the same issues with adding extra whitespace that BeautifulSoup's My initial attempt of calling out to a new subprocess for every page, though, naturally slowed things down a lot. I knew ahead of time that this would be the case, but I didn't expect it to be as slow as it ended up being (40 minutes for I explored Biome as an alternative as well, but while it did seem to be substantially faster, its HTML support is still a work in progress and did not work great on our input (I was getting parse errors on well-formed files). Maybe something to consider for the future, but I don't think it'll work right now. So what I've just pushed still uses I didn't find any pure-Python HTML formatters that seemed to be on Malformed HTML
For right now, the approach in the code is that malformed HTML simply isn't formatted; it falls back to the unformatted version while printing out an error message. The remaining instances of malformed HTML are all in translation files. I'm not sure how to fix these, though (nor do I speak most of these languages, so in a few cases it's not obvious to me whether the apparent obvious fix will affect the meaning). But regardless of what comes of this PR, those might be worth fixing. Should I open a separate issue for the issues I've found there? If so, should it be in this repo, the data repo, or someplace else? |
|
@TimidRobot, sorry to pester, but just wondering if you might be able to take a look and share your thoughts on the current approach here? |
|
@adqm Thank you for the reminder (not a pester at all). This is a really interesting approach! I would like to do some testing with it. I suggest we proceed as follows:
|
|
Thanks! I can certainly split things up. I should be able to get to that later today, maybe tomorrow. |
These will go into a separate PR.
|
Sorry for the delay here, @TimidRobot! I went ahead and made the changes you requested (new PR is #538), and I also opened a separate issue creativecommons/cc-legal-tools-data#260 for some related issues in some of the translation files. Feedback is welcome; happy to adjust the approach here if need be. |
Fixes
Description
This PR fixes the whitespace issues described in #485, where translated strings and inline HTML elements were often surrounded by extra whitespace.
The most noticeable effects are the removal of whitespace around some punctuation, and the removal of whitespace from the ends of link text.
Technical details
The main change is avoiding the use of BeautifulSoup's
prettifyfunction, which can add whitespace that affects the rendered HTML. To my eyes, the HTML is plenty 'pretty' without calling that function 🙂Additional changes to specific templates catch several other related issues (manually adjusting whitespace, replacing
{% trans ... %}with{% blocktrans trimmed %}...{% endblocktrans %}).Screenshots
Some sample screenshots:
licenses/by-nc/4.0/legalcode.en: sample from old render, sample from updated render, full-page difflicenses/by-nc/4.0/legalcode.de: sample from old render, sample from updated render, full-page difflicenses/by-nc/4.0/legalcode.ja: sample from old render, sample from updated render, full-page diffChecklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."
Developer Certificate of Origin