Engraving Challenges: Regressions and Managing LilyPond versions

This post is part of a series analyzing LilyPond’s performance during the preparation of a new edition of Oskar Fried’s songs.

When we began working on the Fried project, LilyPond’s current development version was 2.15.37. Since that time a new stable version – 2.16 – was released, followed by a steady flow of 2.17.x development releases. How did the changes in the program affect our workflow?

Trying to keep up

Initially we tried to keep up with the updates and always use the latest development version. This was partly due to the fact that some new releases contained fixes for bugs that affected us – for example, we reported a problem with the formatting of hairpins (issue 2532), which was fixed in 2.15.39 release. However, this proved to be too much work: for each new version of LilyPond, Janek had to compile all beautified scores and make sure there are no regressions (i.e. check if the new version didn’t introduce any unwanted changes). Doing this meant visually comparing dozens of pages, system-after-system. And it wasn’t enough to check if the new version was the same as the old – most of the time there were subtle, harmless changes in spacing. Note that changes not necessarily are problematic – in general you can say that new LilyPond versions make the layout of your existing scores even better – but still you have to carefully check whether individual changes in the new version aren’t for the worse.
Our worst enemy here was the butterfly effect that Janek already mentioned in his post about slurs: some tiny difference in the positioning of a dynamic may influence LilyPond’s decisions regarding line breaks, and when line breaking changes everything becomes different. It’s almost impossible to swiftly compare two versions of a piece that have different line-breaking, because you cannot use “Alt-Tab comparison”.

Turning to a stable approach

At one point we therefore decided not to change the LilyPond version unless really necessary. Nevertheless, the situation wasn’t so simple: for quite a long time Mike’s skyline patch, which we used, was in active development and produced uneven results – some scores looked better with it, some worse. Because of that we used different LilyPond versions for different songs, which was very inconvenient. So we made one last change when Mike’s skyline patch was finished and merged into the official LilyPond codebase – since it was stable and consistent enough to be used in all songs, we upgraded all source files to 2.17.3, which once more involved some extra work to fix regressions.

We kept to this for a whole year – with one exception: For the beautification of Heiterkeit, güldene, komm! op. 7, No. 1 Janek wanted to use a special function he wrote and didn’t want to miss in this extremely complex song. Since the function required LilyPond 2.17.25 he decided to upgrade just this song to the latest development version available at that moment (2.17.27). Everything moved smoothly, but we wouldn’t want to go back to the earlier situation described at the beginning of this post. Particularly not if you consider that due to some internal changes it wasn’t possible to compile this song with LilyPond 2.17.29 (of course that’s not anything to say against LilyPond devs – it’s just the normal way of software development that in development versions anything can change at any time to be in the best possible shape for the next stable release).

As mentioned at the outset of this article checking for regressions was a significant time-sink. Unfortunately this problem appeared not only with changed LilyPond version – it appeared all the time: since any modification (to the music or e.g. to a function in the library) could change the layout of a score, we had to check for regressions after every big change to be absolutely sure that nothing broke. There was really not much that we could do about it, apart from being cautious. Fortunately using version control meant that if anything became broken, it wasn’t difficult to get back to the last known good state, even if the offending change wasn’t the last one. This allowed us to decrease the frequency of checking the scores – in the end it seemed enough to check every score once, after finishing beautification.

Returning to custom and volatile builds

The situation got a last turn right before the end. When seeing all those scores in their “beautified glory” Urs expressed a few more wishes because some design elements didn’t fit perfectly in the “style” of the scores. A characteristic of the style sheet that we had developed is that most of the lines are somewhat thicker than default – which makes it very enjoyable to play from the scores. Unfortunately there are elements that can’t easily be made thicker and consequently were now somewhat inconsistent – a fact that only became that obvious in the context of the beautified scores. In particular Urs wished to have portato dashes and chord brackets heavier but didn’t find any settings how to achieve that.

Janek quickly found out that the thickness of the chord brackets (actually they’re bracketed arpeggios) is for some unknown reason hardcoded in LilyPond’s C++ source files and that it would be a breeze for him to change that. As a LilyPond developer he could easily modify the source code and compile a custom build with which he then could compile affected scores (i.e. those containing chord brackets). Luckily he didn’t simply change the hardcoded thickness but added a settable property for it, so we can hope to see this improvement in default LilyPond soon. Unfortunately, since Urs didn’t have access to custom LilyPond builds, he couldn’t compile the scores containing chord brackets properly 🙁 .

Now, portato dashes are part of LilyPond’s notation font Emmentaler and therefore naturally don’t have any properties that could be overridden in LilyPond input files. But as the source code of the fonts is also part of LilyPond’s source repository, and the fonts are directly compiled from them when building LilyPond, Janek could simply modify Emmentaler’s source files to make the portato somewhat thicker and compile another custom build containing this improvement too. Again, without access to custom LilyPond builds Urs was dependent on letting Janek produce new PDFs.

Conclusion

Our final judgement on the situation? Well, it’s really ambivalent.

  • It is really nice that LilyPond generally improves, and that these improvements
    automatically affect existing scores. It is also nice that you can live
    “at the cutting edge of technology” by using the latest development versions –
    something you don’t have with big commercial products.
    But it’s way less nice to experience the trouble that is tied to changing LilyPond versions.

  • It is also exciting that LilyPond is accessible to actual improvements from within
    an edition project – by either extending it with Scheme code in the input files
    or by actually making modifications to LilyPond’s source code. This is really a unique
    feature of Open Source Software.

  • Of course, not everyone has an actual LilyPond developer in the team, so some of these
    opportunities may not be available to you. But if you are able to compile LilyPond
    from source code (and Janek’s work on a special building script should make doing this
    easier), you may ask for assistance on the LilyPond mailing list and try using some
    custom improvements.

  • We are now in a situation very similar to that of the beginning: the scores in our
    edition have to be compiled with specific, custom versions of LilyPond, ones
    that aren’t easily available but have to be built from LilyPond’s
    Git repository.

And our conclusions, tips and suggestions?

  • Make a version freeze at the latest possible moment and keep any work following that moment as concise and short as possible.
  • Try to determine all possible engraving issues before making the version freeze and before starting beautification.
  • If custom modifications to LilyPond have to be done, do them once and at this point.

We already mentioned version control in this article, but in his next post Urs will write much more about this and how it saved our day more than once 😉 .

6 thoughts on “Engraving Challenges: Regressions and Managing LilyPond versions

  1. vvillenave

    Keeping up certainly is a challenge (although someone like Nicolas Sceaux has managed to do so for an impressively long time). The way I do it is that I always try and prepare far the *next* stable release: my codebase was labeled \version “2.10.0” years ago when only 2.9.3 was available, then it got bumped to “2.12” and “2.14” in advance, and nowadays it has already been tagged “2.18” for more than a year. This way, It has brought some occasional challenges, but keeping track of LilyPond’s changelog and convert-ryles is easy enough (although not everything gets documented there).
    There is, however, a catch: unlike you guys, my codebase is mostly devoid of fine-grained tweaks and overrides. There’s a fair amount of Scheme code (much like Nicolas’ or Reinhold’s way) and a few layout/paper stylesheets, but when it comes to actual music notes my .ly files are quite simple and rely almost exclusively on LilyPond’s own formatting choices. Not surprisingly, keeping content and formatting strictly separate goes a _long_ way in making your codebase easily updateable and manageable; what you may lose in terms of output quality (and since this is Lily we’re talking about, it’s already much, much better than most automatically-engraved sheet music — not to mention that it gets even better over time), you definitely gain in sustainability.

    Reply
  2. vvillenave

    Well, that’s the thing: I’ve been working for six or seven years without having to use convert-ly _once_. As I said, I go to great lengths to avoid having any non-trivial code in my actual score files: ideally, all Scheme functions, \layout and \paper indications, markup formatting, and even the occasional \overrides, are “factorized” in separate files as a universal, score-agnostic framework.

    You can probably guess how many modifications are needed in a six-year old file that only contains notes: virtually none. No matter what users may complain about every now and then, LilyPond’s syntax _is_ quite stable and reliable when it comes to basic things. (And even when it changes, my git log is a more accurate way of reminding me how old is my codebase than a \version string I’d probably forget to update anyway.)

    Off the top of my head, the three most invasive changes I can think of in the past two years were:
    – new Scheme tokens parsing by David: in some of my framework I had to work out where # or $ were needed. (But that didn’t require me to open my score files.)
    – finally deprecating the \relative {} syntax without a reference pitch. That *was* slightly annoying because I used to _never_ input a reference pitch; then again the fix was easy and a fairly simple regex did the trick throughout my codebase.
    – new \tuplet syntax (although the old syntax is still supported for now). Well, I actually never use either \times nor \tuplet, I have my own shorcuts for that and they’re defined in one single file. Even if the command had been renamed to \supercalifragilisticexpialidociousTuplet, I’d have barely noticed it!

    That being said, one could deem my process as somewhat reckless, since it requires me to keep in touch with new syntax developments (at least, say, once a year). And if I were to get run over by a bus, nobody else could probably manage my codebase the way I do. Don’t try this at home!

    Reply
    1. Janek Warchoł Post author

      Ah, i see. Separating tweaks from the musical content definitely *is* a good thing to do, but i expect that it would be unwieldy in our case – we have too many adjustments.

      What special function are you using for tuplets?

      Reply
      1. Urs Liska

        Actually I had thought about this several times. For example this would have made it possible to catch *any* tweaks in the draft mode. But I agree this may be somewhat absurd when you start having to define included commands for single-use tweaks.

        Reply
    2. Janek Warchoł Post author

      By the way, Valentin, did you get my recent emails about LilyPond Report? I would like to help with putting Report archives someplace available to the audience.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *