Skip to content

[WIP] add folding for bib files#2066

Closed
yongrenjie wants to merge 11 commits intolervag:masterfrom
yongrenjie:2065-bib-foldtext
Closed

[WIP] add folding for bib files#2066
yongrenjie wants to merge 11 commits intolervag:masterfrom
yongrenjie:2065-bib-foldtext

Conversation

@yongrenjie
Copy link
Copy Markdown
Contributor

@yongrenjie yongrenjie commented Jun 9, 2021

Since opening the issue (#2065) I've tweaked the output a bit, here's a sample (first one unfolded for comparison):

Screenshot 2021-06-09 at 9 05 06 PM

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jun 10, 2021

Great, thanks! I'll review as soon as I get the chance. One thing I notice immediately: Should we shorten the first part of the foldtext if it becomes too long, e.g. in the lines SchulzeSunninghausen2014* in your screenshot?

@yongrenjie
Copy link
Copy Markdown
Contributor Author

That's something I was not entirely sure about, too. On one hand it would looks nicer if it's trimmed. On the other hand I think the cite key is the most important bit of info. In this case, if we trim the length to 32 or fewer chars, then the user can't differentiate the keys ...2014JACS and ...2017JMR, and I think that's why I implemented it the way I did. But I don't think I'm hugely opinionated either way.

I experimented a bit with calculating the length of the longest key when opening the buffer, and setting it to a b:vimtex_fold_bib_maxwidth variable. Basically, it uses the same code as in the foldtext function to parse the bib file for cite keys. For an 837-line bib file this takes about 22 ms (on my computer, at least), and it scales linearly with the length of the file.

@yongrenjie
Copy link
Copy Markdown
Contributor Author

I pushed a second commit which does this, it's just a draft to see what you think about the behaviour, which I can clean up later!

Here's a short video I recorded. The buffer variable is recalculated when opening and also when saving, hence the 'jumps' when I save:

Untitled.mp4

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jun 11, 2021

That's something I was not entirely sure about, too. On one hand it would looks nicer if it's trimmed. On the other hand I think the cite key is the most important bit of info.

Yes, agreed. Good point.

I experimented a bit with calculating the length of the longest key when opening the buffer, and setting it to a b:vimtex_fold_bib_maxwidth variable. Basically, it uses the same code as in the foldtext function to parse the bib file for cite keys.

Yes, that sounds good. I think having an option as well, e.g. g:vimtex_fold_bib_max_key_width, may be useful. To allow to restrict it, in case someone tends to have a few very long keys.

For an 837-line bib file this takes about 22 ms (on my computer, at least), and it scales linearly with the length of the file.

So, in the insane case you have ~10000 lines, it would still only be about ~250 ms, which should be acceptable for most people. Let's say they have a slow computer, then it could be as bad as 1 second. Still acceptable if this is only when the file is initially opened.

Copy link
Copy Markdown
Owner

@lervag lervag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a review of the code as well, and I am very glad to see that this is high quality content! Well done!

One "major" discussion point: should we somehow merge some of the bulk code here with the code under autoload/vimtex/parser/bib.vim? That could help make it useful for other related features in the future. Note, I won't block merging because of this, I can also simply consider to do that myself at some time. But hearing other peoples opinions are always interesting.

@yongrenjie
Copy link
Copy Markdown
Contributor Author

I must admit I've not looked at the bib parsing code much, and I'm going to be a little busy these few days, but will come back to it (and make the changes above) when I can!

As for moving the code inside autoload/vimtex/parser/bib.vim... I'm not sure about the extent to which it can be fully "integrated".

If we modify function! s:parse_with_vim(), then but the function signature would have to change in a couple of ways:

  1. to either perform full parsing (the existing function), or cheap parsing (which does only entry type and key). Chances are we have to keep it separate for performance reasons - we don't want to do full parsing for the folds.
  2. to accept either filename or a list of lines

That would introduce some kind of asymmetry between the bibtex/bibparse and vim parsers, which seems to be a bit unsatisfying to me.

The alternative would be to keep the parsing function separate, lets say write a s:cheap_parse_with_vim(), and to make use of the underlying parser 'primitives' e.g. s:get_value(). I don't think I can look at this properly in the next few days. But my impression was that they are all interlinked (since they pass dictionaries to each other), and trying to use just one of them and not the rest might be a bit tricky.

All in all I'm more inclined to keep it separate, mainly because I don't see the best way to merge the code, but I would defer to you on that 🙂

One benefit of utilising the existing parser (if it's feasible, and not as hard as I've claimed) might be to deal with edge cases. I tried to deal with a reasonable number of cases, but I know the foldtext function will break with pathological input like title = "something" # {another thing}, (the regex will only capture something).

On the topic of edge cases: is it worth adding some tests for the above as well?

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jun 13, 2021

I must admit I've not looked at the bib parsing code much, and I'm going to be a little busy these few days, but will come back to it (and make the changes above) when I can!

No problem, and no rush!

As for moving the code inside autoload/vimtex/parser/bib.vim... I'm not sure about the extent to which it can be fully "integrated".
‥‥

Let's first assume we don't need to care about the performance. In this case, I think the only real change would be to allow a list of lines as an alternative input. It would bring an asymmetry, unless we also update the other alternatives to work with this kind of input (e.g. we could save the lines to a file first).

The alternative would be to keep the parsing function separate, lets say write a s:cheap_parse_with_vim(), and to make use of the underlying parser 'primitives' e.g. s:get_value(). I don't think I can look at this properly in the next few days. But my impression was that they are all interlinked (since they pass dictionaries to each other), and trying to use just one of them and not the rest might be a bit tricky.

If performance is important (and it may very well be!), then I think the idea would be to implement the parser as a full API function with signature vimtex#parser#bib#cheap_parse(lines). I think this might be a good idea as a fallback solution, because it would make it more likely that the code could be reused and improved in future work.

I do agree that adding such an extra function does seem slightly "off", but I think this is one of those cases where some comments may help our future selves (and others) understand the reasons we decided this (or that) way.

On the topic of edge cases: is it worth adding some tests for the above as well?

Yes, definitely! I think it could be quite simple, actually:

  1. Create a new test folder test-folding-bib.

  2. Either rely on the existing common/huge.bib or similar, or create a new test.bib inside the test folder.

  3. Create a test.vim file that opens a bib file. Ensure all folds are closed. Then use some assert_equals based on results of foldtextresult(lnum).

I think the test-utils may be a good example of how I write the Makefiles and test scripts, if you want to give it a shot.

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jun 13, 2021

By the way; I realize there's a choice here that is undecided and sort of hard to decide. My suggestion would be to go for the cheap parser, but to move the main parsing code to e.g. vimtex#parser#bib#cheap_parse(lines). I think it would be good, and not too hard, to allow it to work well both on any number of lines that contain one or more entries. It should return a list of parsed results. This would allow to use the same function both in the foldtext function and in the get_max function.

@yongrenjie
Copy link
Copy Markdown
Contributor Author

Ok, finally free from real life work for a bit 😅

I agree with your assessment re. implementing vimtex#parser#bib#cheap_parse; I think for me performance would be the main reason. I'll work on that main code first, then can write some tests in the coming days, since I probably need a little bit of time to figure out how the tests are laid out.

@yongrenjie
Copy link
Copy Markdown
Contributor Author

I think the actual code is done. Here is how it looks with the default g:vimtex_fold_bib_max_key_width=0

Screenshot 2021-06-21 at 8 10 02 PM

and with g:vimtex_fold_bib_max_key_width=20

Screenshot 2021-06-21 at 8 10 10 PM

I added some docs as well, and will look into tests. But one more thing regarding edge cases: I'm pretty certain that the most brittle part of the current setup is actually foldmethod=syntax. For example, the following snippet

@string{ foo = "Mrs. Foo" }

@article{Kupce2017ACIE,
    doi = {10.1002/anie.201705506},
    author = {Kup{\v{c}}e, {\=E}riks and Claridge, Tim D.\ W.},
    journal = {Angew.\ Chem.\ Int.\ Ed.},
    title = {{NOAH}: {NMR} Supersequences for Small Molecule Analysis and Structure Elucidation},
    year = {2017},
    volume = {56},
    issue = {39},
    pages = {11779--11783},
}

produces a single fold for all 12 lines, instead of (probably) the more natural result of @string line being left on its own and the @article being folded. So it seems like it might be sensible to write a foldexpr as well, and I'd wager that some of the existing code could be used for that as well... I'll see what I can come up with.

@yongrenjie yongrenjie changed the title add bib foldtext function [WIP] add folding for bib files Jun 21, 2021
@lervag
Copy link
Copy Markdown
Owner

lervag commented Jun 21, 2021

I think the actual code is done. Here is how it looks ...

Great, this looks good! One possibly good idea could be to parse the titlestring and remove the brackets, as that would allow to show more of the title text. But that's clearly not very important, of course.

I added some docs as well, and will look into tests.

Great!

But one more thing regarding edge cases: I'm pretty certain that the most brittle part of the current setup is actually foldmethod=syntax. For example, ...

Ah, yes. I've generally found fdm=syntax to be quite brittle, and I agree it could be simple to write a foldexpr for bib files. So that might be a good improvement here! (However, for a robust fold expression, you would probably need to match/count the braces. People who are "messy" and careless may put their braces at unexpected places. :p --- I do this in the TeX fold code, and I can probably help if you want.)

Copy link
Copy Markdown
Owner

@lervag lervag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you still want to make some more changes. Still, here are some minor comments to the current version. In general, I'm very happy to see that you are suggesting very high quality work (in my opinion!). Thanks!

@yongrenjie
Copy link
Copy Markdown
Contributor Author

@lervag Sorry about all the force pushes/rebasing! I think I'm done for now, though.

I wrote a foldexpr in vimtex#fold#bib#level(lnum) which seems to be fairly robust towards badly positioned curly braces (the test contains a pretty good demonstration IMO). It's slow, no surprises I guess!

File Number of lines foldtext (ms) foldexpr (ms)
hsqc1.bib 845 10 86
hsqc2.bib 1690 20 161
hsqc4.bib 3380 38 300
hsqc8.bib 6760 70 592
hsqc16.bib 13520 122 1165

Here foldtext refers to the fime needed to evaluate vimtex#fold#bib#get_max_key_width(), and foldexpr refers to the time needed to evaluate vimtex#fold#bib#level(lnum) on every line of the file. (hsqc1.bib is this file here; the rest are just hsqc1.bib concatenated with itself as many times as necessary.)

With the current code in the PR, hsqc8.bib takes 855 ms to startup (this includes various other plugins which I didn't bother disabling), whereas if I revert to foldmethod=syntax (but keeping the foldtext functions) this drops to 446 ms. So I wonder whether one option to 'fall back' to foldmethod=syntax, but keep the foldtext functionality, might be useful. Or maybe I'm just worrying too much about people who keep crazy 10k-line bib files. :-)

There are also some tests now, and I changed whatever was necessary in the docs (which turned out to be very little).

Let me know what you think / if you want me to change anything, whenever you are free, of course!

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jun 29, 2021

Thanks! And sorry for the delay, I'll try and get time to review and possibly merge tomorrow or the day after!

Copy link
Copy Markdown
Owner

@lervag lervag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lervag Sorry about all the force pushes/rebasing! I think I'm done for now, though.

No need to apologize; I really do prefer the force pushing (although I don't enforce it). :)

I wrote a foldexpr in vimtex#fold#bib#level(lnum) which seems to be fairly robust towards badly positioned curly braces (the test contains a pretty good demonstration IMO). It's slow, no surprises I guess!

Yes, it's a known "issue" with foldexpr and the reason why plugins like Fastfold exists. Thanks for looking into the timings. I could make the same tests on my machine, but your bib file seems to be unavailable (perhaps private?).

With the current code in the PR, hsqc8.bib takes 855 ms to startup (this includes various other plugins which I didn't bother disabling), whereas if I revert to foldmethod=syntax (but keeping the foldtext functions) this drops to 446 ms. So I wonder whether one option to 'fall back' to foldmethod=syntax, but keep the foldtext functionality, might be useful. Or maybe I'm just worrying too much about people who keep crazy 10k-line bib files. :-)

I won't mind bothering users a little bit with pushing this and then waiting for the reactions. I'll have a deeper look into the foldexpr function after merging to see if I can spot some optimizations (I did not see anything obvious, except perhaps a minor improvement with the foldlevel() thing.

There are also some tests now, and I changed whatever was necessary in the docs (which turned out to be very little).

Great!

Let me know what you think / if you want me to change anything, whenever you are free, of course!

I had a few minor comments/suggestions. After that, I think I'll do a merge. Again, I appreciate the high quality of your contribution!

@yongrenjie
Copy link
Copy Markdown
Contributor Author

I could make the same tests on my machine, but your bib file seems to be unavailable (perhaps private?).

Whoops! Yes it was private. Fixed now. https://github.com/yongrenjie/hsqc-paper/blob/master/main_final/hsqc.bib

I'll refactor the rest tomorrow probably. Had another look at the foldexpr and managed to squeeze out a 5–10% speedup (so now hsqc8.bib is around 540–550 ms, from the previous 592 ms), but I don't think I've any more ideas.

[I'm sure there is some way to improve it, though; the alternative is claiming that I got it right on the first go. :)]

yongrenjie added 11 commits July 2, 2021 16:06
... The actual parsing is now done in vimtex#parser#bib#cheap_parse(),
and the folding functions merely make use of this.
Deals more gracefully with @comment, @Preamble, etc. and other possible
non-standard @___{} constructs that may not be fully parsed by
vimtex#parser#bib#parse_cheap().
This makes foldtext look sensible in the very specific case where one
opens a blank bib file, adds an entry to it, and closes the fold before
saving.
@yongrenjie
Copy link
Copy Markdown
Contributor Author

@lervag OK, I'm finally done! I couldn't get around recursively calling vimtex#fold#bib#level(), unfortunately: when I tried to do so I introduced bugs which my tests didn't catch... But the last commit cuts around 20% of the time calculating foldexprs, which I'm pretty happy with.

Basically the last two commits contain the only 'new' changes since your last round of comments. I also added more tests, which got squashed into the earlier tests commit.

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jul 4, 2021

Thanks! It looks good!

I've looked at the foldlevel code: You need the recursion because you want to handle the case of single line entries. I'm curious if it would not be a better idea to handle single line entries by not giving them the result >1 in the first place. I.e., perhaps we could count the braces when we find @... and return 0 if the element is self contained on a single line? If this was done, then I believe you could avoid the recursion. Not sure if it is worth the effort, though; I'll still merge this now, but feel free to investigate if you think my idea may have merit.

@lervag lervag closed this Jul 4, 2021
@lervag
Copy link
Copy Markdown
Owner

lervag commented Jul 4, 2021

I've merged now. I also made some minor changes on top. Nothing important. To test my idea, I believe the only necessary change is to add to the if l:firstline == a:lnum conditional - perform the count here as well (save count results) and return 0 if the braces are balanced, e.g.:

  " Check if we're at fold start
  let l:text = join(getline(l:firstline, a:lnum))
  let l:braces_balanced = s:count(l:text, '{') == s:count(l:text, '}')
  if l:firstline == a:lnum
    return l:braces_balanced ? 0 : '>1'
  endif

  " Beginning of entry wasn't found
  if l:firstline == 0 | return 0 | endif

  " Check if braces are closed by the current line
  return l:braces_balanced ? '<1' : 1

Ref:

" Check if we're at fold start
if l:firstline == a:lnum | return '>1' | endif
" Beginning of entry wasn't found
if l:firstline == 0 | return 0 | endif
" Check if braces are closed by the current line
let l:text = join(getline(l:firstline, a:lnum))
return s:count(l:text, '{') == s:count(l:text, '}') ? '<1' : 1

@yongrenjie
Copy link
Copy Markdown
Contributor Author

Thanks! I have a vague recollection that I tried this, and it failed the tests. IIRC the problematic bib entry is

@article
{
Kupce2021NRMP3
,
doi = {10.1038/s43586-021-00024-3},
author = {Kup{\v{c}}e, {\=E}riks and Frydman, Lucio and Webb, Andrew G. and Yong, Jonathan R.\ J. and Claridge, Tim D.\ W.},
title = {Parallel nuclear magnetic resonance spectroscopy},
journaltitle = {Nat.\ Rev.\ Methods Primers},
year = {2021},
volume = {1},
number = {1},
pages = {No. 27}}

Admittedly, it is probably a pretty rare edge case, so if there is a performance boost it might be worthwhile on balance to change it.

@yongrenjie yongrenjie deleted the 2065-bib-foldtext branch July 4, 2021 22:40
yongrenjie added a commit to yongrenjie/abbotsbury.vim that referenced this pull request Jul 4, 2021
(It's now in vimtex itself(!!) see lervag/vimtex#2066.)
lervag added a commit that referenced this pull request Jul 5, 2021
@lervag
Copy link
Copy Markdown
Owner

lervag commented Jul 5, 2021

I believe I've managed to simplify and improve a little bit. I've pushed an update now; could you test the timings again on your end and see if it has improved?

Could you share the process you use for doing the timings?

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jul 5, 2021

I've looked at the numbers on my side, and I don't really see any significant speedup on my latest changes. But at least the code is somewhat shorter.

For more detailed profiling, one can use this:

" test.vim
set nocompatible
let &rtp = '../../..,' . &rtp
filetype plugin on

let g:vimtex_fold_bib_enabled = 1

function! TestProfile() abort
  call vimtex#profile#start()
  normal! zM
  call vimtex#profile#stop()
endfunction

autocmd BufEnter * call TestProfile()

The vimtex#profile#stop is useful because it interpolates the script local names to make them readable. To use it, run nvim -u test.vim hsqc16.bib; then afterwards inspect prof.log.

It seems the most expensive lines are these:

13520              0.113667   let l:firstline = search('\v^\s*\@', 'bcnW')
13520              0.031338   call setpos('.', l:curpos)

                              " Check if we're at fold start
13520              0.072083   let l:text = join(getline(l:firstline, a:lnum))
13520   0.206131   0.085421   let l:count_open = s:count(l:text, '{')
13520   0.198364   0.086337   let l:braces_balanced = l:count_open == s:count(l:text, '}')

(From the current version of the code, so not quite what you proposed, but it is generally the same code.)

@yongrenjie
Copy link
Copy Markdown
Contributor Author

yongrenjie commented Jul 5, 2021

Oh, my 'profiling' is very crude, it's just a foldlevel() calculation for the entire file:

https://github.com/yongrenjie/dotfiles/blob/bdc71c43374edc8f171a58e6e0354f811ccf51d2/.vim/ftplugin/bib.vim#L4-L11

function! TimeBibFold() abort
    let start_time = reltime()
    call vimtex#fold#bib#init()
    echomsg 'initialised in time' . reltimestr(reltime(start_time))
    let start_time = reltime()
    call map(range(1, line('$')), 'foldlevel(v:val)')
    echomsg 'foldexprs calculated in' . reltimestr(reltime(start_time))
endfunction

By appropriately inserting return 0 I could also crudely profile how long each block of code took. I think I reached a similar conclusion as above, that the pattern searching is the most time-consuming part, but that's probably not a huge surprise. What I didn't expect was that the whole getpos()->search()->setpos() scheme to find l:firstline was actually a lot faster than using a while loop to look for it.

The timings are pretty much the same as before, which is a good thing, because I think you made it easier to read. When I wrote this I think I tried to shortcircuit as much as possible, but chances are in 'typical' cases there's not all that much difference.

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jul 5, 2021

Oh, my 'profiling' is very crude, it's just a foldlevel() calculation for the entire file:

Thanks! Simple solutions are good, especially when they're sufficient :)

And I agree, the idea of using search() was neat! I was not so surprised that it was faster, but I am not sure if I would have realized it myself.

Let's consider this matter resolved for now, then. Again, thanks for your contribution!

@paniash
Copy link
Copy Markdown
Contributor

paniash commented Jul 6, 2021

Hi! I believe the recent commit (related to this PR), now folds bib files by default. Is there any way to avoid this? (I personally prefer no folds in bib files).

@clason
Copy link
Copy Markdown
Contributor

clason commented Jul 6, 2021

Oh, yeah, folding should remain opt-in ;)

@yongrenjie
Copy link
Copy Markdown
Contributor Author

Actually we discussed the default at #2065; but personally I'm not hugely fussed whichever way you guys want to do it.

@paniash let g:vimtex_fold_bib_enabled = 0 will turn it off.

@clason
Copy link
Copy Markdown
Contributor

clason commented Jul 6, 2021

I assumed that this was conditional on folding enabled -- if I had realized that this was a new, independent, folding option to be turned on by default, I would have spoken up, loudly ;)

@paniash
Copy link
Copy Markdown
Contributor

paniash commented Jul 6, 2021

I assumed that this was conditional on folding enabled -- if I had realized that this was a new, independent, folding option to be turned on by default, I would have spoken up, loudly ;)

I agree with @clason. Such options should be opt-in.

@yongrenjie
Copy link
Copy Markdown
Contributor Author

yongrenjie commented Jul 6, 2021

OK, you can use the option first, and I'll put something together, is that OK?

@clason
Copy link
Copy Markdown
Contributor

clason commented Jul 6, 2021

To be clear, I am completely fine with it being enabled by default (or even automatically; I don't care :P) if general folding is enabled. But if folding is disabled, it must be disabled too.

@lervag
Copy link
Copy Markdown
Owner

lervag commented Jul 6, 2021

Thanks, @paniash; I agree that it was wrong to make the new bib folds enabled by default. It is changed now through the PR by @yongrenjie in #2089.

Thanks for pitching in, @clason. I'm curious, what do you mean with "general folding is enabled"? Do you mean e.g. if g:vimtex_fold_enabled is 1? Or do you mean something more general?

@clason
Copy link
Copy Markdown
Contributor

clason commented Jul 6, 2021

Yes, g:vimtex_fold_enabled is exactly what I meant -- I don't think there's a general "folding" option in vim? (Again, I wouldn't know :P)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants