“Musicians prefer LilyPond scores, study finds“

With these words merited community member Francisco Vila announced the completion of his PhD thesis, for which we would like to congratulate him heartily. It seems to be an interesting topic with – from our perspective – pleasant results.

The – Spanish – thesis is available from http://paconet.org/tesis/tesis.pdf, and it contains an English abstract which we cite for your convenience:

Based on the observations that use of sheet music is paramount in the field of music education, and that free software has characteristics that make it very interesting for use in all kinds of learning activities, our first study describes from multiple points of view a free software project dedicated to music typography. Over a span of 18 years, data about development of the project are analyzed from the whole registered history of its source code. Several visualization and modeling techniques suggest that such a system offers a fairly high level of robustness against deep changes in the composition of the developers’ team.

In our second study, a set of scores made with this program were subject to a custom-made, comparative test through data collected from 106 participating musicians, and at the same time their consumption patterns of musical scores are analyzed. The survey questionnaire and the score models used are validated by a team of experts. A test essay is made and timewise stability is checked by means of the test/retest method. Results show that musicians seem to prefer, in a statistically significant way, music scores made with a program which happens to be freely available for download. We conclude that such a solution makes a valid option for education and professional uses.

Both studies are leaded by a complete bibliographic compendium and a theoretical and historical framework about free software and music typesetting. Further and deeper studies are claimed to be done, as only the surface of the tool has started to show its full potential.

This looks really nice, but I had one open question left: “Musicians prefer LilyPond over what?” Francisco’s answer was pretty strict: “To the same music typeset by other means”.

Obviously they went to great lengths realizing a double-blind study. They prepared ten different (types of) scores, taking an existing (usually professional) engraving and a new LilyPond score, ensuring they were identical in as many respects as possible: paper color, page layout, staff size, background noise etc. This similarity was scrupulously evaluated by experienced musicians before the actual tests were made and the probands were presented pairs of randomly arranged scores. Music types were: a cello part, a piano sonata, a lied, an SATB choral piece, a drumset exercise, a leadsheet, a guitar piece, a string quartet, a chamber music for piano trio, and a full orchestra page, and for two ‘control’ cases the same score was presented twice.

Not all comparisons clearly showed LilyPond as the winner, and in one case – the guitar piece – it actually was the loser. Nevertheless, the average result was clearly favorable to LilyPond scores in a blind experiment where musicians did not know which one was LilyPond-made. I think this gives a clear body of evidence to a ‘truth’ I have felt for a long time (and discussed just recently): LilyPond’s pursuit of traditional aesthetics and craftsmanship simply works.

If you like you may read through the detailed description Francisco gave me or have a look at the photo album with nice impressions.

16 thoughts on ““Musicians prefer LilyPond scores, study finds“

  1. David Kastrup

    LilyPond’s pursuit of traditional aesthetics and craftsmanship simply works

    I wish it were that simple. If you take a look at the kind of problems on the LilyPond bug/user lists, you’ll find that a fair share falls in the category of making LilyPond do something that’s not yet implemented (and unlikely to be available out of the box in any other typesetting system either). Here LilyPond compares somewhat well due to having large parts of its machinery exposed and workable using just Scheme and, due to being Free Software, making it possible to figure out how to use Scheme for making C++ layers do stuff they were not planned or documented for, and finally, if all else fails, how to change/extend its internals so that future versions fare better.

    I’d like the architecture to be much, much more and more easily extensible in the long run: it makes a lot of sense. But even the comparatively hobbled state of 10 years ago was ahead of most of its proprietary competition in potential and much of the execution: their approaches at extensibility are more sandboxed and limited to what the programmers were thinking of in advance, if at all.

    The second category of problems on the lists mostly falls in the category of usability problems. LilyPond makes progress in that regard, but a lot of more complex problems of wrapping musical matter into LilyPond’s terms may end up somewhat ham-fisted.

    At any rate, Paco’s comparison focused on finished scores. And while the amount of man power that has in the recent years been working on improving the core tenets of typesetting has been rather limited and progress is slow, that it still outscored the competition means that while our progress may be slow, its direction seems to be appreciated.

    I am glad for all the people who work on making LilyPond better, and I try making their life easier. It’s great that people prefer the results achieved with LilyPond. But in addition to “It’s great playing music printed with LilyPond”, I’d like to get more “It’s great entering music with LilyPond” and “It’s great implementing new features in LilyPond”. When LilyPond is not just the preferred choice of players but also of editors and programmers, its future will look bright.

    1. Urs Liska

      What you say is very true. In the light of that I’d modify my statement: “LilyPond’s pursuit of traditional aesthetics and craftsmanship is the right path. and it produces results that are more pleasing and better readable in general.”

    2. Paco

      True. I explicitly focused on finished results, usability would have been a subject for another thesis. I praise LilyPond’s text-based documents for their extreme flexibility and power, but I think it is very difficult to compare its usability with other software because visual, interactive, mouse/menu-driven approach is not comparable to our text-centric system. It is like comparing Word to LatTeX. Unless you make extensive use of named styles, I find Word a nightmare to use in comparison, especially for large documents, and by ‘large’ I mean ‘more than one single page long’.

      In my opinion and letting the thesis aside for a moment, LilyPond is much better and faster for very small scores, too, provided you know the simplest basics. It is not intuitive or interactive, but if you are a musician you have already struggled with similar/harder difficulties while learning music! Coding is like a game, and it is fun.

    3. Simon Albrecht

      I think the approach of LilyPond is great, and I’m very much convinced of it. Only some parts are in a somewhat premature state. So I feel like it’s in a way still a ‘young’ project with a great future, and its heyday is yet to come.

  2. Alex

    > This looks really nice, but I had one open question left: “Musicians prefer LilyPond over what?” Francisco’s answer was pretty strict: “To the same music typeset by other means”.

    This must be some sort of obscure joke. Why is it so difficult to just say what software (name/version) was used in tests?

    1. Urs Liska

      a) because they didn’t use software in their test but rather preexisting sheet music and b) because I rather meant “music typeset by arbitrary other means”.

      1. Abraham Lee

        They may not have engraved the comparison scores themselves, but many of them were originally created using some kind of music notation software (see my other comment below).

        1. Paco

          This was not a test trying to directly compare a software to another. The criteria to select music material to compare was that they should be in one of two main groups: 1) non-professional, non for profit, everyday scores used in education as teaching material, and amateur orchestra/choir parts or scores. 2) Professionally engraved, sold, commercial, book scanned scores (mostly) used in education. Which software they used to prepare those scores, it’s up to them.

          One was unknown, maybe an app for iPad you use with your fingertip. That was actually used in a orchestra rehearsal. One was Sibelius (version unknown), found in notes from an university. Three were Finale, mixed versions, two from PDF metadata and one book-scanned. One was metal plate engraved. One was probably made with SCORE or Amadeus; and so on.

          All in all, I am willing to clarify every further aspect you want to.

  3. Abraham Lee

    Well done, Paco! I’d love to see the score comparisons in a more hi-res format. You can kind-of see some in the photo album, but no details. Obviously, doing a similar survey/poll here might be a little biased since we’d know right off which score was done in LilyPond, but it still might be interesting to see what others say.

    1. Abraham Lee

      Response to my own comment. The exact images are found in the appendices of the thesis (in case anyone else wants to see them). What I see (in response to Alex’s and Urs’ comments) is that some of the scores are definitely computer-engraved (No. 2, 3, 4, 6, 7 and I suspect that No. 1 and 5 are as well) while some others (No. 8 and 9) are hand-engraved scores. No. 5 and 10 appear to be the control (i.e., identical) scores. Take that for what it’s worth, but I agree that LilyPond didn’t do as well on No. 7.

      BTW, which LilyPond version was used for the score comparisons? I couldn’t tell in the thesis (as I don’t speak Spanish) or in the longer explanation message.

      1. Dave

        In one of the appendices, p 321, they include some Lilypond code with the version number at 2.19.9. Not sure if that applies to all of them but it seems like a good guess.

  4. Paco

    Yes, I used 2.19.9 for all. That is a known issue of the study because ‘competitors’ did not have the chance of using the latest version of their software. To balance, I used always plain feta font and also default layout when possible, with absolute minimum tweaking/overriding. All reasonable, manual interventions were comprehensively detailed. And both models were photocopied again every time. A complex experiment like this is not perfect, none is, but is a bona fide attempt and a starting point for further development.

    Thanks all for your comments, have a happy new year!


Leave a Reply

Your email address will not be published. Required fields are marked *