Skip to content

Full syntax support in vimtex#1834

Merged
lervag merged 99 commits intomasterfrom
feat/syntax
Nov 24, 2020
Merged

Full syntax support in vimtex#1834
lervag merged 99 commits intomasterfrom
feat/syntax

Conversation

@lervag
Copy link
Copy Markdown
Owner

@lervag lervag commented Oct 18, 2020

This PR implements full syntax support as part of vimtex. It is essentially what is referred to as Phase 2 in #1799. It contains breaking changes with the current master and implies a major change to vimtex. When this is merged, it should bump the version of vimtex to v2.0.

To test the WIP branch, you can do one of the following:

  1. Manuall checkout git branch:

    # Use your actual path to vimtex repo here!
    cd ~/.vim/bundle/vimtex
    git pull --all
    git checkout feat/syntax
  2. Checkout branch with plugin manager. For example, with vim-plug:

    Plug 'lervag/vimtex', {'branch': 'feat/syntax'}

    Then :PlugUpdate vimtex to update and check out the branch.

Finished tasks:

  • Update docs so they are consistent.
  • Add better package specific options (e.g. allow to disable syntax packages or to auto enable syntax packages) - this will deprecate the g:vimtex_syntax_autoload_packages.
  • Fix \addplot [x] {y} commands for TikZ pictures
  • Add support for aligned and similar envs
  • Add support for minipage env
  • Improve p/minted.vim
  • Add generic texCmd with texOpt and texArg
  • Avoid implicit clustering e.g. for texClusterCmd (it fails when people define groups before the syntax script is loaded)
  • Improve p/listings.vim
  • Add matches for spacing commands like \,
  • Fix issue with math delims
  • Allow short version of conceal option
  • Decide (and possibly act) on texMatcher discussion: We need texMatcher!
  • Add texCmdFootnote with argument group texArgFootnote
  • Decide (and possibly act) on group name convention
  • Move current vimtex syntax scripts from being after/syntax/... to being loaded properly on startup.
  • Ensure syntax script is loaded after filetype scripts (cf. Problem with g:vimtex_syntax_alpha #1809).
  • Update configuration -> vimtex configuration schemes. E.g. let g:vimtex_syntax_config = {'conceal': ...}.
  • Make the syntax groups more consistent:
    • Change texStatement to texCmd (shorter and more inline with TeX syntax specifications).
    • The section family of commands (i.e. \chapter, \section, etc.) should get a special group: \texCmdSection, and the title argument should be texSectionTitle. Possibly also match the shorttitle optional argument?
    • Paragraph commands should be \texCmdParagraph and the argument should be texParagraphTitle.
    • Commands such as \appendix, \(front|main|back)matter should be matched as texCmdPart.
    • Environments should be consistent: In general, the \begin and \end should be texCmdEnv, the following {...} should be texEnvName. This should be the same in all cases. This will fix current inconsistency between e.g. \begin{equation} and \begin{lemma}.
    • Math environments should be similar, except the name should be matched with texEnvMathName.
    • $, $$, \[, and \( type of delimiters should be texMathMatcher.
    • Delimiter symbols in math regions should be named e.g. texMathDelimiter or similar. This includes (, [, \{, \|, \left(, and so on.
    • Other "symbols" in math regions can be called texMathSymbol, similar to the current state. Ensure this is consistent.
    • Commands like \usepackage and \RequirePackage should be matched as a normal texCmd (no reason to differentiate?). The optional argument should be a general texOptions, and the real argument should be texFilename.
    • Commands like \bibliography, \bibliographystyle, \addbibresource should be matched with texCmd but take a "general" file argument like texFilename.
      • Commands that take a texInputFile should be changed to be consistent with the parent task.
    • \verb commands should be texCmd with a following texRegionVerb
  • Change default highlight groups (use TeX primitive highlight groups to allow easier user customization):
    • texUrl from p/hyperref.vim
    • texTikzEqual from p/tikz.vim
    • texTikzSemicolon from p/tikz.vim
    • texTabularCol from p/tabularx.vim
    • texTabularAtSep from p/tabularx.vim
    • texTabularVertline from p/tabularx.vim
    • texTabularPostPre from p/tabularx.vim
    • texMathDelimSingle from p/tabularx.vim
    • texBeamerOpt from p/beamer.vim
    • texBeamerDelimiter from p/beamer.vim
  • Fix some errors/missing pieces
    • nested commands seem to work now, but not for \author{A. Author\thanks{An Institute}} -- the last brace is labeled texError
    • starred sectioning commands aren't working yet -- \section*{Acknowledgments} has the Acknowledgments labeled as texMatcher
    • texRegionRef is not yet linked to a highlight group? (this is from \cref, where the argument is labeled differently from \ref)
    • should \title and \author be labeled special (maybe as texCmdParts, too)?
    • does it make sense to label \item special?
  • Add LaTeX3 syntax support (see Integrating LaTeX3 syntax  #1798).
  • More clean up and improvements to code.
  • Optimize things and try to improve efficiency as much as possible.

@clason
Copy link
Copy Markdown
Contributor

clason commented Oct 18, 2020

As requested, here are all the syntax groups in autoload/syntax I found that are linked directly to generic highlight groups instead of tex-specific ones (i.e., of the form texXXX):

p/hyperref.vim
29:  highlight link texUrl          Function

p/tikz.vim
38:  highlight def link texTikzEqual Operator
39:  highlight def link texTikzSemicolon Delimiter

p/tabularx.vim
66:  highlight def link texTabularCol        Directory
67:  highlight def link texTabularAtSep      Type
68:  highlight def link texTabularVertline   Type
69:  highlight def link texTabularPostPre    Type
70:  highlight def link texMathDelimSingle   Delimiter

p/beamer.vim
24:  highlight link texBeamerOpt Identifier
25:  highlight link texBeamerDelimiter Delimiter

core.vim
448:  highlight def link texError                 Error

(The last is obviously OK; just maybe in the wrong section of the file.)

@clason
Copy link
Copy Markdown
Contributor

clason commented Oct 18, 2020

And here are some gaps and inconsistencies I noticed going through some TeX files and using the following code (which everyone else probably knows already) to check which highlight groups are actually used:

function! SynGroup()
    let l:s = synID(line('.'), col('.'), 1)
    echo synIDattr(l:s, 'name') . ' -> ' . synIDattr(synIDtrans(l:s), 'name')
endfun
  • The begin in \begin{equation} is a texStatement, while the begin in \begin{lemma} is a texBeginEnd (both are linked to Statement, so one wouldn't see the difference at first, but it shows up of course when you want to relink them). Is this intentional or necessary due to the way these are set up?

  • $ and \[ are a Delimiter rather than a texMathMatcher (as is equation in \begin{equation*}), but that's probably not possible to match otherwise? Actually, it'd be nice to have a separate texMathEnvName highlight group for this (so that it can be highlighted separately from the math contents).

  • Is there a way to highlight "text in math" (like \text{foo}) independently (as normal text)? That is already done (as texMathText), it just failed in this specific case where I have \foo{bar \text{baz}}.

  • ( and [ are texMathZoneCS and X in inline and display math, respectively, while \{ and \| are texSpecialChar, even though they're all (math) delimiters (unlike {, which is a Delimiter). On the other hand, \left( is a texMathSymbol.

  • There is currently no specific syntax group for the arguments of a section command (i.e., the foo in \section{foo}). (In the new syntax, these commands have not yet been added.)

  • Should \appendix be a texSection as well? (And similarly \frontmatter, \mainmatter, \backmatter?)

  • No syntax groups for \bibliographystyle or \addbibresource. The syntax group for foo in \bibliography{foo} is texRefZone, but it'd be better as texInputFile.

  • In \usepackage[foo]{bar}, bar is declared as texBegEndName, foo as texDocTypeArgs. (For the equivalent \RequirePackage in style/class files, no specific syntax groups seem to be set up, but that's arguably a different filetype.)

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Oct 25, 2020

As requested, here are all the syntax groups in autoload/syntax I found that are linked directly to generic highlight groups instead of tex-specific ones (i.e., of the form texXXX): ...

Ok, and your point is that we should make this consistent so that every "auxiliary" TeX group by default links to one of the "primitive" tex groups, right?

I very much agree, but it does raise the question of what the primitive tex groups should be and to which primitive Vim highlight group they should link. Do you have a good suggestion for this?

And here are some gaps and inconsistencies I noticed going through some TeX files ...

Thanks! I've added a TODO list for consistency based on your remarks and my own thoughts. What do you think of it?

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Oct 25, 2020

I've updated the original post TODO, and I'd appreciate feedback on the suggested changes.

@clason
Copy link
Copy Markdown
Contributor

clason commented Oct 25, 2020

Ok, and your point is that we should make this consistent so that every "auxiliary" TeX group by default links to one of the "primitive" tex groups, right?

Yes, exactly. The point is to have a (somewhat) stable and contained external "color scheme API" that abstracts from specific syntax groups.

I very much agree, but it does raise the question of what the primitive tex groups should be and to which primitive Vim highlight group they should link. Do you have a good suggestion for this?

That is a very good question, and at the heart of what #1800 was about (beyond improving consistency in the syntax/highlight groups). Since it's always possible in your own config to re-link specific highlight groups to other default highlight groups or colors, it would probably be enough to have a handful of such primitive TeX groups -- your edited todo sounds very reasonable for this. As for the primitive Vim groups, I wouldn't worry too much about this -- any choice will be wrong for as many color schemes as for which it works well, and ideally color schemes would ship with their own links.

On this point, it should be noted that renaming the primitive TeX groups will be "breaking" for color schemes with TeX support, since they are based on Dr Chip's syntax file. Personally, I'd say the improvements are definitely worth it, but something to keep in mind and document maybe.

As a single data point, here are the highlight groups I use (overriding https://github.com/cocopon/iceberg.vim/):

" section and environments
hi! link texSection Title
hi! link texBeginEndName Title

" braces should be colored
hi! link Delimiter Constant
hi! link texBeginEndModifier Special

" math in blue
hi! link texMath Identifier
hi! link texSubscript Identifier
hi! link texSuperscript Identifier
hi! link texSpecialChar Identifier
hi! link texMathSymbol Identifier

" labels and reference
hi! link texRefZone Constant

Thanks! I've added a TODO list for consistency based on your remarks and my own thoughts. What do you think of it?

Perfect, although I wouldn't differentiate between, say subsubsection and paragraph and just lump all sectioning commands into one highlight group -- \mainmatter included. It's also a matter for debate whether "math environments" include the outermost, i.e., equation or align. Maybe the most consistent way would be to use texMathMatcher for those as well?

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Oct 25, 2020

That is a very good question, and at the heart of what #1800 was about (beyond improving consistency in the syntax/highlight groups). Since it's always possible in your own config to re-link specific highlight groups to other default highlight groups or colors, it would probably be enough to have a handful of such primitive TeX groups -- your edited todo sounds very reasonable for this. As for the primitive Vim groups, I wouldn't worry too much about this -- any choice will be wrong for as many color schemes as for which it works well, and ideally color schemes would ship with their own links.

Sounds good. I'll just make a first version for you (and everyone) to comment on.

On this point, it should be noted that renaming the primitive TeX groups will be "breaking" for color schemes with TeX support, since they are based on Dr Chip's syntax file. Personally, I'd say the improvements are definitely worth it, but something to keep in mind and document maybe.

Good point. But I guess one has to break some eggs to make an omelet. I think I'll take my chances :)

Perfect, although I wouldn't differentiate between, say subsubsection and paragraph and just lump all sectioning commands into one highlight group -- \mainmatter included.

Do you mean to use e.g. texCmdParts for all of these?

It's also a matter for debate whether "math environments" include the outermost, i.e., equation or align. Maybe the most consistent way would be to use texMathMatcher for those as well?

My idea was this: Math environments need to define a math syntax region. But the \begin and \end should still look like "regular" TeX, because it is. However, I think a lot of people are used to the names looking different than other environments, so texEnvMathName makes sense, and it can have the same default highlight group as the texMathMatcher to make it consistent with e.g. $ and \[. And this is what I proposed in my action list.

By the way: does it make sense to differentiate between different math environments? I.e., are these differences used for anything? Do we need texMathZone[XYZ] and texMathZoneArray?

@clason
Copy link
Copy Markdown
Contributor

clason commented Oct 25, 2020

Good point. But I guess one has to break some eggs to make an omelet. I think I'll take my chances :)

:)

Perfect, although I wouldn't differentiate between, say subsubsection and paragraph and just lump all sectioning commands into one highlight group -- \mainmatter included.

Do you mean to use e.g. texCmdParts for all of these?

Exactly, or texCmdSectioning (which may be too long?) I think your point about sticking as much as possible to standard TeX terminology is a good one.

(It may still make sense to define them separately as you indicated, but then group them in a single primitive TeX highlight group for convenience.)

My idea was this: Math environments need to define a math syntax region. But the \begin and \end should still look like "regular" TeX, because it is. However, I think a lot of people are used to the names looking different than other environments, so texEnvMathName makes sense, and it can have the same default highlight group as the texMathMatcher to make it consistent with e.g. $ and \[. And this is what I proposed in my action list.

Then that is perfect, exactly what I had in mind.

By the way: does it make sense to differentiate between different math environments? I.e., are these differences used for anything? Do we need texMathZone[XYZ] and texMathZoneArray?

To be honest, I don't know what they could be useful for. Presumably, you could match improper nesting (e.g., aligned outside equation) as an error, but I'm not sure that'd be worth the effort. And I don't see myself differentiating them by color.

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Oct 25, 2020

Exactly, or texCmdSectioning (which may be too long?) I think your point about sticking as much as possible to standard TeX terminology is a good one.

I've currently implemented the "minimal" version (with texParts).

(It may still make sense to define them separately as you indicated, but then group them in a single primitive TeX highlight group for convenience.)

Yes, I think it could be of interest to have more fine grained control.

My idea was this: Math environments need to define a math syntax region. But the \begin and \end should still look like "regular" TeX, because it is. However, I think a lot of people are used to the names looking different than other environments, so texEnvMathName makes sense, and it can have the same default highlight group as the texMathMatcher to make it consistent with e.g. $ and \[. And this is what I proposed in my action list.

Then that is perfect, exactly what I had in mind.

This is now implemented and should work well now, I think.

To be honest, I don't know what they could be useful for. Presumably, you could match improper nesting (e.g., aligned outside equation) as an error, but I'm not sure that'd be worth the effort. And I don't see myself differentiating them by color.

Agreed; this also allows more simplifications, which I like. I've made some progress now and things are slowly moving towards something that is IMHO better than before. Still a lot more work to do, but I believe I should finish before Christmas.

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Oct 26, 2020

I would not mind if someone started to test run this branch. It is not ready yet, but I think a lot of stuff is getting quite good. Good enough for the brave to look into it, at least :)

@clason
Copy link
Copy Markdown
Contributor

clason commented Oct 28, 2020

I'd be happy to test -- do you have a list of the "basic TeX" highlight groups (so far)?

One thing I noticed out of the box is that there is a problem with nested commands outside of math mode:

\foo{ \bar{baz} }

has problems with the argument of foo -- \bar is not recognized as a Cmd, and the closing brace for bar is matched to the opening of foo.

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Oct 28, 2020

I'd be happy to test -- do you have a list of the "basic TeX" highlight groups (so far)?

I'm not ready to post my initial "hypothesis" yet, but you will probably get the gist of you look at the syntax groups. A very large set of groups are related to texCmd.

One thing I noticed out of the box is that there is a problem with nested commands outside of math mode:

\foo{ \bar{baz} }

has problems with the argument of foo -- \bar is not recognized as a Cmd, and the closing brace for bar is matched to the opening of foo.

Sorry, I made a silly mistake yesterday! It is fixed now.

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Oct 29, 2020

I think this is getting pretty good now. The current list of main "primitive" groups are represented by these highlight commands:

highlight def link texArg              Include
highlight def link texArgAuthor        Identifier
highlight def link texArgEnvMathName   Delimiter
highlight def link texArgEnvName       PreCondit
highlight def link texArgRef           Special
highlight def link texArgTitle         Underlined
highlight def link texCmd              Statement
highlight def link texCmdSpaceCodeChar Special
highlight def link texCmdTodo          Todo
highlight def link texComment          Comment
highlight def link texCommentTodo      Todo
highlight def link texDelim            Delimiter
highlight def link texDelimMath        Statement
highlight def link texError            Error
highlight def link texLength           Number
highlight def link texMath             Special
highlight def link texMathOper         Operator
highlight def link texOpt              Identifier
highlight def link texOptSep           NormalNC
highlight def link texParm             Special
highlight def link texRegion           PreCondit
highlight def link texSpecialChar      SpecialChar
highlight def link texSymbol           SpecialChar
highlight def link texSymbolString     String
highlight def link texTitle            String
highlight def link texType             Type

As you can see, it is now possible to reduce this to:

texArg
texCmd
texComment
texDelim
texError
texLength
texMath
texMathOper
texOpt
texParm
texRegion
texSpecialChar
texSymbol
texSymbolString
texTitle
texType

Still not perfect, and I assume that there will be a lot of comments and suggestions for changing things, but at least I think this is moving very much in the right direction.

The main thing left, I think, is to go through the large lists of symbols and delimiters that are defined for conceal replacement in math mode. I'll try to make this more consistently separated between symbol and delimiter (texSymbolMath and texDelimMath). I'll also try to simplify the code even further.

I think it should be possible to use this branch as a "daily driver" for those who want to test and give feedback.

@clason
Copy link
Copy Markdown
Contributor

clason commented Oct 31, 2020

I haven't yet switched to it as a daily driver (will do that soon), but here's some things I noticed playing around today:

  • the labeling as texError may be too aggressive -- for example, the _ in \newcommand{\umin}{u_{\min}} in the preamble (which was texMathOper before)
  • is there a difference between texArgFile and texArgFiles (and similar for Opt)?
  • nested commands seem to work now, but not for \author{A. Author\thanks{An Institute}} -- the last brace is labeled texError
  • starred sectioning commands aren't working yet -- \section*{Acknowledgments} has the Acknowledgments labeled as texMatcher
  • texRegionRef is not yet linked to a highlight group? (this is from \cref, where the argument is labeled differently from \ref)
  • should \title and \author be labeled special (maybe as texCmdParts, too)?
  • does it make sense to label \item special?

As a general comment: it's a matter of taste and probably too much effort to change now, but I think I'd prefer a "hierarchical" naming scheme tex->Parts->Cmd / Name rather than texCmdParts (and texArgPartTitle, which doesn't match that scheme). What is the guiding principle behind your naming scheme? Knowing that might help finding and remembering the right highlight groups.

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 1, 2020

the labeling as texError may be too aggressive -- for example, the _ in \newcommand{\umin}{u_{\min}} in the preamble (which was texMathOper before)

Yes, perhaps. But in this case you might want to use \newcommand{\umin}{\ensuremath{u_\min}}?

is there a difference between texArgFile and texArgFiles (and similar for Opt)?

Yes; the latter accepts multiple arguments separated by comma.

  • nested commands seem to work now, but not for \author{A. Author\thanks{An Institute}} -- the last brace is labeled texError
  • starred sectioning commands aren't working yet -- \section*{Acknowledgments} has the Acknowledgments labeled as texMatcher
  • texRegionRef is not yet linked to a highlight group? (this is from \cref, where the argument is labeled differently from \ref)

Thanks, I'll look into these and fix them!

should \title and \author be labeled special (maybe as texCmdParts, too)?

You mean like texCmdTitle and texCmdAuthor I guess that's unproblematic and would allow people to specifically highlight these differently, if wanted.

does it make sense to label \item special?

I guess that may make sense. It is easy to add support for it. Do you have a concrete suggestion?

As a general comment: it's a matter of taste and probably too much effort to change now, but I think I'd prefer a "hierarchical" naming scheme tex->Parts->Cmd / Name rather than texCmdParts (and texArgPartTitle, which doesn't match that scheme).

I'm not quite sure what you mean here. In general, it is not so hard to change things now (global replace statements).

What is the guiding principle behind your naming scheme? Knowing that might help finding and remembering the right highlight groups.

I will write about this in more detail later. The gist is that most things are centered around the primitive being a texCmd. Almost everything starts there. Interesting commands have arguments texArg* or option groups texOpt*. Some things are expected in various places, e.g. texParm (i.e. #1 as a parameter in a \newcommand), and so on. Not sure if this is already enough for understanding where I'm headed; I'll write about it more detailed and clear at some later time when I have time.

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 1, 2020

the labeling as texError may be too aggressive -- for example, the _ in \newcommand{\umin}{u_{\min}} in the preamble (which was texMathOper before)

Yes, perhaps. But in this case you might want to use \newcommand{\umin}{\ensuremath{u_\min}}?

I like living dangerously :] (In seriousness, you are probably right, but I'm just pointing out a regression/visible change from the old highlighting. If it's a necessary change due to the better implementation, that's fine.)

is there a difference between texArgFile and texArgFiles (and similar for Opt)?

Yes; the latter accepts multiple arguments separated by comma.

Right, makes sense. And I assume that texArgFiles is linked to texArgFile (or vice versa) so I don't have to remember linking both variants all the time? (Please? ;))

Thanks, I'll look into these and fix them!

Thanks! (How did you fix my checkbox list, by the way? I tried and couldn't get it to work...)

should \title and \author be labeled special (maybe as texCmdParts, too)?

You mean like texCmdTitle and texCmdAuthor I guess that's unproblematic and would allow people to specifically highlight these differently, if wanted.

Yes, that would probably be useful.

does it make sense to label \item special?

I guess that may make sense. It is easy to add support for it. Do you have a concrete suggestion?

Not really, no... I guess the same group as \begin would make most sense?

I will write about this in more detail later. The gist is that most things are centered around the primitive being a texCmd. Almost everything starts there. Interesting commands have arguments texArg* or option groups texOpt*. Some things are expected in various places, e.g. texParm (i.e. #1 as a parameter in a \newcommand), and so on. Not sure if this is already enough for understanding where I'm headed; I'll write about it more detailed and clear at some later time when I have time.

Well, I think it'd be easier (for me) if things started from texParts (and then specializing to texPartsCmd and texPartsName); this makes for a more balanced tree -- otherwise you basically have huge top-level lists texCmd* and texArg* (and texOpt*).

(I'm thinking from setting up the color scheme here, where I'd be going down the list "commands, environments (command, name, option), references,..." rather than "all commands, and then all arguments, and then all options, ...". Not a big deal either way, just curious.)

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 1, 2020

Yes, perhaps. But in this case you might want to use \newcommand{\umin}{\ensuremath{u_\min}}?

I like living dangerously :] (In seriousness, you are probably right, but I'm just pointing out a regression/visible change from the old highlighting. If it's a necessary change due to the better implementation, that's fine.)

Not necessary; I think in the old highlighting the error was not highlighted at all. It's all a question of what we want.

is there a difference between texArgFile and texArgFiles (and similar for Opt)?

Yes; the latter accepts multiple arguments separated by comma.

Right, makes sense. And I assume that texArgFiles is linked to texArgFile (or vice versa) so I don't have to remember linking both variants all the time? (Please? ;))

Yes, that makes sense. I'm updating that now.

Thanks! (How did you fix my checkbox list, by the way? I tried and couldn't get it to work...)

For some reason, the - list marker did not work as reliable as the * list marker. Not sure why. I remove the checkboxes, btw, because they show up in the total "task list" for the PR. I instead moved the ones I would act on to the original post.

does it make sense to label \item special?

I guess that may make sense. It is easy to add support for it. Do you have a concrete suggestion?

Not really, no... I guess the same group as \begin would make most sense?

No, because that is texCmdEnv. I would propose texCmdItem, and perhaps default highlight target be texCmd (i.e. no difference). Open to suggestions, though.

Well, I think it'd be easier (for me) if things started from texParts (and then specializing to texPartsCmd and texPartsName); this makes for a more balanced tree -- otherwise you basically have huge top-level lists texCmd* and texArg* (and texOpt*).

I don't agree, but I might not understand properly what you mean. Yes, there will be quite large trees, but I don't find the texParts to be a proper general "thing" (i.e. not all documents have part/section commands). If you can give some specific examples of what you mean, preferably by comparing to the current highlight groups, it might be easier to understand why you think the current "scheme" is not optimal.

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 1, 2020

Yes, perhaps. But in this case you might want to use \newcommand{\umin}{\ensuremath{u_\min}}?
Not necessary; I think in the old highlighting the error was not highlighted at all. It's all a question of what we want.

Indeed; especially whether you want the syntax group highlighting to serve as a sort of linter. I personally don't, because I don't think you can make it robust enough, but your view may be different (and that is fine).

For some reason, the - list marker did not work as reliable as the * list marker. Not sure why. I remove the checkboxes, btw, because they show up in the total "task list" for the PR. I instead moved the ones I would act on to the original post.

Curious; I used - because * worked even less well... Anyway, moving it is fine, and I'll refrain from adding more ;)

No, because that is texCmdEnv. I would propose texCmdItem, and perhaps default highlight target be texCmd (i.e. no difference). Open to suggestions, though.

That makes sense as well, and is certainly least surprising. My thought was coming from the similarity to \left...\middle...\right and the similar movement behavior for begin...item...end, so these seemed to be birds of a feather. (And, of course, it'd get it's own group, which my idea was to link to texCmdEnv by default.)

I don't agree, but I might not understand properly what you mean. Yes, there will be quite large trees, but I don't find the texParts to be a proper general "thing" (i.e. not all documents have part/section commands). If you can give some specific examples of what you mean, preferably by comparing to the current highlight groups, it might be easier to understand why you think the current "scheme" is not optimal.

Not sure I can explain it any better, I just find texPartsCmd easier to parse than texCmdParts.

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 1, 2020

Indeed; especially whether you want the syntax group highlighting to serve as a sort of linter. I personally don't, because I don't think you can make it robust enough, but your view may be different (and that is fine).

I think we mostly agree on this. I'd like to think about it some more, though.

Curious; I used - because * worked even less well... Anyway, moving it is fine, and I'll refrain from adding more ;)

No problem, I'm happy to get some feedback and further suggestions! :)

No, because that is texCmdEnv. I would propose texCmdItem, and perhaps default highlight target be texCmd (i.e. no difference). Open to suggestions, though.

That makes sense as well, and is certainly least surprising. My thought was coming from the similarity to \left...\middle...\right and the similar movement behavior for begin...item...end, so these seemed to be birds of a feather. (And, of course, it'd get it's own group, which my idea was to link to texCmdEnv by default.)

I've pushed this now where \item gets the same default highlighting as the texEnvName. It might not be what you were suggesting. I think it could be interesting to have it highlighted differently. What do you think of this default behaviour?

Not sure I can explain it any better, I just find texPartsCmd easier to parse than texCmdParts.

Ah, I think I get the "gist" now. You mean the group names should be e.g.:

  • texCmd for generic commands
  • texArg for generic arguments
  • texParm for generic parameters
  • texEnvCmd for e.g. \begin and \env
  • texEnvArgName for the {name} in \begin{name}
  • texAuthorArg for the {names} in \author{names}
  • texNewenvParm for parameter in \newenvironment arguments

And so on. This is currently named like this:

  • texCmd for generic commands
  • texArg for generic arguments
  • texParm for generic parameters
  • texCmdEnv for e.g. \begin and \env
  • texArgEnvName for the {name} in \begin{name}
  • texArgAuthor for the {names} in \author{names}
  • texParmNewenv for parameter in \newenvironment arguments

If I understand correctly, your point is to collect the various parts of a family of syntax items with the same stem. It's like BFS vs DFS, right?

It is not hard to make this change. The question is what would be better. I've been thinking in the opposite manner where I've "stemmed" the type of item first and the "family" later. It felt right when changing things and has been easy to work with, but it might not be the optimal "UI" for users.

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 2, 2020

I've pushed this now where \item gets the same default highlighting as the texEnvName. It might not be what you were suggesting. I think it could be interesting to have it highlighted differently. What do you think of this default behaviour?

Hmm, I think I prefer texCmdEnv, since \item is a structural command on the same level (and with the same form) as \begin; it looks a bit strange to have it colored differently (and, currently, the same as a \label argument):

Screenshot 2020-11-02 at 10 15 04

(I don't think the separate highlight group will be needed in general, since most people will leave texCmdEnv linked to texCmd; it's only if they link it to something else that they probably want to change \item to match.)

If I understand correctly, your point is to collect the various parts of a family of syntax items with the same stem. It's like BFS vs DFS, right?

Yes, exactly!

It is not hard to make this change. The question is what would be better. I've been thinking in the opposite manner where I've "stemmed" the type of item first and the "family" later. It felt right when changing things and has been easy to work with, but it might not be the optimal "UI" for users.

You are probably right that this comes down to what's more natural to "developers" vs. "users". Either is a valid choice; I just wanted to bring it up as an alternative. If it's down to us two only, your vote carries more weight, of course ;)

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 2, 2020

Here's another possible syntax group missing from the builtin syntax file: \footnote(text) (both command and arg)

Not sure what the best default highlight group for this is, though -- it feels most related to sectioning, but not very strongly? Kinda on the same level as \author/\title, maybe? (But you often see a footnote in an author text, so that'd be weird...)

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 3, 2020

"Minor" things

Hmm, I think I prefer texCmdEnv, since \item is a structural command on the same level (and with the same form) as \begin; it looks a bit strange to have it colored differently (and, currently, the same as a \label argument):

Sounds reasonable; changing this now.

Here's another possible syntax group missing from the builtin syntax file: \footnote(text) (both command and arg)

Not sure what the best default highlight group for this is, though -- it feels most related to sectioning, but not very strongly? Kinda on the same level as \author/\title, maybe? (But you often see a footnote in an author text, so that'd be weird...)

Do you think \footnote{...} should be highlighted differently, though? I think that's the main/first question. If so, then I would propose to call it texCmdFootnote. The argument accepts "any" TeX code, I guess, but we could e.g. highlight it similar to a comment, for instance? I.e., let the texArgFootnote be linked to texComment, but also let it contain stuff like texCmd so that it can contain other TeX commands.

Structure of syntax group names

You are probably right that this comes down to what's more natural to "developers" vs. "users". Either is a valid choice; I just wanted to bring it up as an alternative. If it's down to us two only, your vote carries more weight, of course ;)

Hah, yes, I guess that's the benefit of being the developer. Still, although I think I am often able to find good patterns, I don't believe I also make the right calls. So, I'm more than willing to discuss this.

So. We have sort of "two" proposals right now:

  1. My current implementation: tex{type}{family}{more}. Examples of this
    scheme (sorted):

    • texArgAuthor
    • texArgEnvName
    • texCmdAuthor
    • texCmdEnv
    • texCmdItem
    • texRegionMath
    • texRegionVerb
  2. @clason's "opposite" idea: tex{family}{more}{type} (or tex{family}{type}{more}). For instance:

    • texAuthorArg
    • texAuthorCmd
    • texEnvArgName
    • texEnvCmd
    • texItemCmd
    • texMathRegion
    • texVerbRegion

I think both schemes should generally work well, and I have no very strong opinions. But if I hear a strong argument for why proposal 2 would be better than 1, then I will accept it and update the PR accordingly.

I'll take the liberty to tag some more people here: @Rmano, @clason, @Konfekt

RFC: Remove the general purpose texMatcher

The texMatcher groups is used to highlighted matching pairs of braces, { ... }. It also allows us to match errors like mismatched }s. However, it does mean we need to created nested syntax regions. This seems to me to be an unnecessary complication. I also don't think syntax highlighting should be considered a type of linting, except perhaps for some minor cases. I would therefore like to remove this group and do either of the following:

  1. Simply match any top level { and } as a texDelim.

  2. Let the "general" top level texCmd allow arbitrary optional ([ ... ]) and normal ({ ... }) arguments.

I think 1 is simplest and would still "look" right, so that's what I'm currently leaning towards.

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 4, 2020

Do you think \footnote{...} should be highlighted differently, though? I think that's the main/first question. If so, then I would propose to call it texCmdFootnote. The argument accepts "any" TeX code, I guess, but we could e.g. highlight it similar to a comment, for instance? I.e., let the texArgFootnote be linked to texComment, but also let it contain stuff like texCmd so that it can contain other TeX commands.

I think the syntax/highlight groups are perfect as you propose (modulo the discussion on naming ;)). Regarding default linking, Comment was indeed the motivating example I saw on Gitter that prompted my comment (hehe!), but I feel that would be too intrusive as a default. I'd link it directly to Normal (I didn't see a texText group in the new code) and leave it to colorschemes or users to add special highlights -- I'll definitely try texStyleItal for it.

(Basically, my view is that it's good to add as many specific groups as feasible, but not necessarily color them all as differently as possible; people can make their own rainbows ;))

I'll take the liberty to tag some more people here: @Rmano, @clason, @Konfekt

Since I already put in my vote, I'll just wait what others say. Either choice is fine with me.

RFC: Remove the general purpose texMatcher

That is a very good question indeed. I guess strictly speaking, this would be a regression, so there should be a clear benefit in another area to make up for it -- ideally for users as well, such as performance. (I assume it'll make your life much easier, which is already a benefit.)

One point that is often prominently mentioned in favor of tree-sitter is that it "localizes" parsing failures, so that, e.g., a missing brace somewhere doesn't break highlighting in the rest of the file. If this simplification had a similar effect, that'd be a big plus in my book.

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 5, 2020

First, let me say that this is shaping up to look very nice! A few things I'm still missing are

  • A high-level texMathRegion highlight group that I can use to set all math (inline, display, starred display, nested, subscripts, operators, whatever -- except possibly commands) to a specific color or builtin highlight group. (it's texMath)
  • The arguments of commands in math mode (like \norm{...}) are matched as texMatcherMath -- that doesn't seem intended?
  • You've ticked off (math) delimiter matching, but I don't see that -- these are either generic texRegionMathEnv ((, [) or texSpecialChar (\{). On the other hand, \left(, \left[, and \left\{ are texDelimMath; \left) and \left\} are texDelimMathMod, while \left] is texDelimMathSet.
  • There seem to be some "specific" symbols that I'd like to be linked to a more generic highlight group -- beside texSpecialChar, there is at least -- that is just texSymbolDash. Are there others?
  • Should \\ be matched specifically? It seems that if & is, a line break should be as well.
  • Spacing commands like \, don't seem to be matched yet?
  • It's a good question what best to link \textsc{...} to -- texStyleItal feels a bit weird, but is texStyleBold any better?

EDIT Hmmm, looking at the actual code, this should work. For some reason my code snippet is not picking up that top level group... If I link texMath to my favorite highlight group, things look as expected.

  • Another thing: I'm having trouble getting tikzpicture to match -- except texTikzSemicolon, I don't get any TikZ specific highlight groups?

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 6, 2020

RFC: Remove the general purpose texMatcher

That is a very good question indeed. I guess strictly speaking, this would be a regression, so there should be a clear benefit in another area to make up for it -- ideally for users as well, such as performance. (I assume it'll make your life much easier, which is already a benefit.)

One point that is often prominently mentioned in favor of tree-sitter is that it "localizes" parsing failures, so that, e.g., a missing brace somewhere doesn't break highlighting in the rest of the file. If this simplification had a similar effect, that'd be a big plus in my book.

I think my idea was good, but it seems to not work as intended. The problem is that, if I remove the texMatcher group, the other texArg... syntax items will end on the first }. That is, we need the texMatcher to handle the nested sets of braces. If there was a way to make the texArg... groups match regions that handled the nested sets of braces without actually defining "general" sub regions, then that would allow exactly the simplification you mention - this was what I was hoping for.

At the moment, I think the conclusion to this is that we really need the texMatcher. Because of this, we also need the texMatcherMath, and we need similar "specializations" that are relevant inside different regions.

Better naming would be good in this case. I'm changing to texGroup now, but feel free to propose a better name! To me, texMatcher does not feel right. Perhaps texArgGeneric is better?

First, let me say that this is shaping up to look very nice! A few things I'm still missing are

Thanks!

  • A high-level texMathRegion highlight group that I can use to set all math (inline, display, starred display, nested, subscripts, operators, whatever -- except possibly commands) to a specific color or builtin highlight group. (it's texMath)

Yes; I've just changed this to texRegionMath because it feels unnecessary to keep the "extra" texMath.

  • The arguments of commands in math mode (like \norm{...}) are matched as texMatcherMath -- that doesn't seem intended?

It is intended; it is necessary because, unless we specify otherwise, we need to assume that the content of arguments of commands in math mode is still math mode. If we instead used the texMatcher (now texGroup), then the content would no longer be highlighted as texRegionMath.

  • You've ticked off (math) delimiter matching, but I don't see that -- these are either generic texRegionMathEnv ((, [) or texSpecialChar (\{). On the other hand, \left(, \left[, and \left\{ are texDelimMath; \left) and \left\} are texDelimMathMod, while \left] is texDelimMathSet.

Can you give some more specific examples? In my (simple) tests, it looked like I was close to consistent here.

  • There seem to be some "specific" symbols that I'd like to be linked to a more generic highlight group -- beside texSpecialChar, there is at least -- that is just texSymbolDash. Are there others?

Good question; I'll mirror it back to you. Feel free to suggest explicit symbols and target groups.

  • Should \\ be matched specifically? It seems that if & is, a line break should be as well.

You mean something like texSymbolNewline? That's OK by me.

  • Spacing commands like \, don't seem to be matched yet?

You're right. I could add that, but it would help if you/someone could provide me the full list of commands (both short and long names).

  • It's a good question what best to link \textsc{...} to -- texStyleItal feels a bit weird, but is texStyleBold any better?

Agreed. Another possibility is to just keep it unstyled?

  • Another thing: I'm having trouble getting tikzpicture to match -- except texTikzSemicolon, I don't get any TikZ specific highlight groups?

Can you be specific? For me in my examples, it seems to work OK.

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 6, 2020

I think my idea was good, but it seems to not work as intended. The problem is that, if I remove the texMatcher group, the other texArg... syntax items will end on the first }. That is, we need the texMatcher to handle the nested sets of braces. If there was a way to make the texArg... groups match regions that handled the nested sets of braces without actually defining "general" sub regions, then that would allow exactly the simplification you mention - this was what I was hoping for.

At the moment, I think the conclusion to this is that we really need the texMatcher. Because of this, we also need the texMatcherMath, and we need similar "specializations" that are relevant inside different regions.

Better naming would be good in this case. I'm changing to texGroup now, but feel free to propose a better name! To me, texMatcher does not feel right. Perhaps texArgGeneric is better?

I agree Matcher is a bit unclear. I don't have a strong opinion (or good idea), but texGroup and texGroupMath would indeed be better. I feel texArgGeneric implies that it is the argument of something -- i.e., contained in {...}. If that is always the case, that's fine, of course.

Yes; I've just changed this to texRegionMath because it feels unnecessary to keep the "extra" texMath.

But not yet pushed?

  • You've ticked off (math) delimiter matching, but I don't see that -- these are either generic texRegionMathEnv ((, [) or texSpecialChar (\{). On the other hand, \left(, \left[, and \left\{ are texDelimMath; \left) and \left\} are texDelimMathMod, while \left] is texDelimMathSet.

Can you give some more specific examples? In my (simple) tests, it looked like I was close to consistent here.

I thought these were specific? (It's not enough to look at the color, because it might happen that the specific color scheme you or I are using happens to assign the same color to different highlight groups.)

So is it intended that ( and \{ and \left] are different syntax groups?

  • There seem to be some "specific" symbols that I'd like to be linked to a more generic highlight group -- beside texSpecialChar, there is at least -- that is just texSymbolDash. Are there others?

Good question; I'll mirror it back to you. Feel free to suggest explicit symbols and target groups.

texSymbol would be a natural "high-level" target group. As for symbols, I've only noticed --/---, &, and \\ so far. To be honest, there are so many symbols (\&, \%, \S, ...) that cataloguing them would be difficult...

(And going back to the initial three, --/--- is quite different from & and \\. So maybe one group for "literal symbols" -- which I'd probably color the same as texCmd -- and one group for "syntax symbols" & and \\ -- which I'd color differently -- would be good?)

  • Spacing commands like \, don't seem to be matched yet?

You're right. I could add that, but it would help if you/someone could provide me the full list of commands (both short and long names).

What do you mean by "long name"? The commands themselves are

  • \; - a thick space
  • \: - a medium space
  • \, - a thin space
  • \! - a negative thin space
  • as well as \quad and \qquad
  • (and \bigskip, \medskip, \smallskip, if you want to add vertical spacing)
  • It's a good question what best to link \textsc{...} to -- texStyleItal feels a bit weird, but is texStyleBold any better?

Agreed. Another possibility is to just keep it unstyled?

Also good! (In fact, probably simplest.)

  • Another thing: I'm having trouble getting tikzpicture to match -- except texTikzSemicolon, I don't get any TikZ specific highlight groups?

Can you be specific? For me in my examples, it seems to work OK.

Hmm, with the latest pushed commits, the results are different for me as well.

But something like this:

\begin{document}
\usepackage{tikz}
\usepackage{pgfplots}
\begin{figure}
    \centering
    \begin{tikzpicture}
        \begin{axis}[%
            width=0.75\linewidth,
            height=6cm,
            xmin=-0.25,
            xmax=2.5,
            ymin=-0.25,
            ymax=2.5,
            ytick=\empty,
            extra y tick style={grid=major},
            extra y ticks={0,1,2},
            extra y tick labels={$u_1$,$u_2$,$u_3$},
            axis y line=left,
            axis x line=bottom,
            ]
            \pgfmathsetmacro{\ua}{0}
            \pgfmathsetmacro{\ub}{1}
            \pgfmathsetmacro{\uc}{2}
            \pgfmathsetmacro{\at}{0.2}
            \addplot[%
            domain=-0.25:2.5,samples=200,
            color=DarkBlue,solid,line width=1.5pt,
            ]
            {
                (x<=\ua)*\ua + %
                and(x>(1+\at/2)*\ua+\at/2*\ub,x<(1+\at/2)*\ub+\at/2*\ua)*(x-\at/2*(\ua+\ub)) + %
                and(x>=(1+\at/2)*\ub+\at/2*\ua,x<=(1+\at/2)*\ub+\at/2*\uc)*\ub + %
                and(x>(1+\at/2)*\ub+\at/2*\uc,x<(1+\at/2)*\uc+\at/2*\ub)*(x-\at/2*(\ub+\uc)) + %
                (x>=(1+\at/2)*\uc+\at/2*\ub)*\uc
            };
        \end{axis}
    \end{tikzpicture}
\end{figure}
\end{document}

(Interestingly, in the full file, I get texOptTikzpic for the options, but generic texArgEnvname for the actual picture.)

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 6, 2020

But not yet pushed?

Sorry, pushed now.

So is it intended that ( and \{ and \left] are different syntax groups?

The \left (and \right, \bigl, etc) is matched as texMathDelimMod, while the (, \{, etc, should be texMathDelim or texMathDelimSet (depends on the use of conceal or not). So, the "modifier" is given a different highlighting.

I'll look into this again and make some more specific tests for it.

texSymbol would be a natural "high-level" target group.

Agreed!

As for symbols, I've only noticed --/---, &, and \\ so far. To be honest, there are so many symbols (\&, \%, \S, ...) that cataloguing them would be difficult...

Right now, I think \S and such are matched as texSpecialCharacter. Perhaps we should just match them to texSymbol?

(And going back to the initial three, --/--- is quite different from & and \\. So maybe one group for "literal symbols" -- which I'd probably color the same as texCmd -- and one group for "syntax symbols" & and \\ -- which I'd color differently -- would be good?)

Yes, that could work. But I'm not fully sure I understand the distinction between "literal" and "syntax" here.

  • Spacing commands like \, don't seem to be matched yet?

You're right. I could add that, but it would help if you/someone could provide me the full list of commands (both short and long names).

What do you mean by "long name"?

I think all or most of these have long name equivalents, such as \medmuskip, \thinspace, and similar. But I never remember all these names... :p

I'll also look into the tikz thing later.


I just pushed a major update where I've changed the naming scheme to something much more similar to what you've asked for. I'm not sure if it is fully what you asked for, so feel free to comment on it. The main idea is this: Almost everything starts from a command, e.g. \author -> texCmdAuthor, but from there, I use the "object" as the grouping factor, so texAuthorArg. So, almost everything is now grouped by the actual "object"/"item", except I still prefix texCmd for the actual command.

I also use texCluster as a general prefix, and I'll try to generalize the clusters a little bit more to make it easier to maintain everything.

@Rmano
Copy link
Copy Markdown

Rmano commented Nov 24, 2020

Thanks, @Rmano, please share it by email. I'll look into it!

You've got an email :-)

@lervag lervag merged commit d623e56 into master Nov 24, 2020
@lervag lervag deleted the feat/syntax branch November 24, 2020 08:48
@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 24, 2020

See: https://github.com/lervag/vimtex/releases/tag/v2.0

(Feel free to let me know if you think I should make some minor updates to the release text!)

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 24, 2020

Great news, thank you so much for doing this! TeX files have never looked so good :) (Seriously.)

If you think it's helpful, you could put my colorscheme snippet in the Wiki and refer to that from the release notes? (The colors and links are personal preference, of course, but it's always easier to tinker with something existing than to start from scratch.)

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 24, 2020

Great news, thank you so much for doing this! TeX files have never looked so good :) (Seriously.)

I agree! There're still some things that need improving (e.g. TikZ support), but I think it will be much easier to do it now.

If you think it's helpful, you could put my colorscheme snippet in the Wiki and refer to that from the release notes? (The colors and links are personal preference, of course, but it's always easier to tinker with something existing than to start from scratch.)

Yes, that's not a bad idea. I've not really spent much time on the wiki myself, as I prefer to keep the README and docs updated as the main references. But I agree that for things like this, the wiki makes sense.

Would you mind writing a first version of this that I could review and then link to in the release notes?

@clason
Copy link
Copy Markdown
Contributor

clason commented Nov 24, 2020

Done! Feel free to edit any way you deem fit (including outright deletion)!

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Nov 24, 2020

Thanks! I think that looks good. I'll add a link both from the docs and from the release page.

@g6ai
Copy link
Copy Markdown

g6ai commented Dec 21, 2020

Thanks @lervag for the amazing work! I see g:tex_comment_nospell is disabled, so how to mimic this function, to ignore comment in spell checking?

I have tried textidote and vlty, they can skip comment spell checking, but I only managed to show the spell check result in the quickfix window rather than vim's native in-line display. If I can use them in a manner of output-to-html, it would be a lot easier to use :)

@lervag
Copy link
Copy Markdown
Owner Author

lervag commented Dec 22, 2020

Happy to hear you like it, @g6ai! You ask a good question. We did discuss spell checking comments at some time, but I don't remember where (and when). The conclusion was to enable spell checking and remove the option. This might have been a bad conclusion, but it is easy to change. Can you open a new issue in which we can have a new discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants