Documentation for L2SDK

Documentation for L2SDK, the software development kit for version 2 of the Liszt music-OCR engine.

Contents


Overview

L2SDK is essentially a command line program which inputs a scanned image file and outputs a music notation file. The available output formats are NIFF, MusicXML (1.0 and 1.1), SharpEye's own .mro format, and MIDI. The SDK is available for Windows, Mac OS 9 and Mac OS X. The SDK consists of the recognition engine, associated data files, and a very simple "front end" which illustrates how the engine can be called from another program.

A warning

You should be aware that taking the uncorrected output from the recognition engine, and importing it straight into a typical music notation editor can cause problems. Here are some common issues.

Rhythmic inconsistencies. Due to recognition errors, the notes and rests in each measure do not always fit the time signature. When you call the engine, you can specify how rhythmic inconsistencies are handled (eg you can enforce consistency at the expense of omitting notes that don't fit). A music notation editor which can cope with rhythmic inconsistencies has an advantage here.

Clefs A missed or misread clef can cause trouble, because music notation editors usually preserve pitch of subsequent notes when editing clefs, whereas to correct a recognition error, you want to preserve the graphical position of subsequent notes.

Key Signatures Similar to clefs. To correct a recognition error you want to change the key signature without changing the accidentals on subsequent notes.

Irregular Systems If there are different numbers of staves in different systems, or if different instruments occupy different staves, this causes a problem. The engine can add empty staves to make all systems have the same number (exactly what that means depends on the output format) but it may require user help to "join up" the systems so that staves in different systems match.

How to run Liszt

Starting Liszt

Under Win32, Liszt can be run from the command line like:

liszt.exe infilepath outfilepath configfilepath

(Each component should be enclosed in double quotes if there are any spaces.) From another application, CreateProcess() can be used.

Under Mac OS, it can be launched from another application using LaunchApplication with an Apple Event containing the arguments as FSSpecs.

Input Image

infilepath is a standard Windows Version 4 bitmap file.

It must be one bit per pixel and not compressed. As far as I can see there is no such thing as a compressed 1bpp BMP format - anyway, the biBitCount field must be 1, and biCompression field must be BI_RGB = 0.

The music should be black on a white background. The palette can be 0 = white, 1 = black, or vice versa. SharpEye writes a white-is-zero BMP file for the engine, but either should work.

This format is used under Mac OS as well. It is a simple format which is easy to write on any platform.

Output

outfilepath is the output file, ie file path or FSSpec for NIFF, MusicXML, MIDI or .mro file.

Config

configfilepath is a file supplying all the options for running Liszt. It is based on the format of a Windows .ini file. Under Win32 it can be created using calls to WritePrivateProfileString(). The Config file is described in more detail below.

While Liszt is running

On Windows Liszt is a Win32 program, that is, it has a WinMain() and creates but never shows a single window. Throughout the recognition a "multitask()" function is called frequently to process Windows messages. The calling application can abort the recognition by sending a WM_DESTROY, and can receive progress messages (percentage complete and errors) using a message ID obtained via RegisterWindowMessage().

wParam == 0: lParam is percentage complete, in range 0 through 100.

wParam == 1: lParam == 0. Successful completion.

wParam == 2: lParam is error number.

On Mac OS Liszt is a Faceless Background Application. Throughout the recognition a "multitask()" function is called frequently to process Apple Events. The calling application can abort the recognition by sending a kAEQuitApplication, and can receive progress messages, completion message, and errors messages via Apple Events.

Error numbers

The error numbers are integers in the range 1000 through 1999. It is also possible that some Mac OS errors may be passed on. The meaning of these numbers is described below.

Description of files and directories in the SDK

WinRunLisztSource contains source code for a simple Win32 application RunLiszt which can run the Liszt recognition engine. This a supplied as a Visual C++ project RunLiszt.dsw. The output directory is set to WinRunLisztOutput.

WinRunLisztOutput contains the Windows version of the Liszt recognition engine in L2SDK, an executable file Liszt.exe and auxiliary data files in directories Langs, Learned, User, DataTabs.

WinRunLisztOutput also contains a small image "Air.BMP", and once Runliszt has been built it will contain Runliszt.exe as well. Runliszt allows the user to choose a BMP file (eg "Air.BMP") will produce a configuration file for Liszt (cfg.ini) when the convert button is pressed. Liszt will then read the BMP file and cfg.iniand produce an output file "out.nif".

MacRunLisztSource contains source code for a simple Mac OS Application "C Runner" which runs the Liszt recognition engine. It is supplied as a CodeWarrior project "C Runner:C Runner.mcp" with a library subproject "LisztSupportLib:LisztSupportLib.mcp". The output directory is set to MacRunLisztOutput. Classic and Carbon version of "C Runner" are produced here.

MacRunLisztOutput contains Classic and Carbon version of the Liszt recognition engine in L2SDK, called ClassicLiszt, CarbonLiszt plus auxiliary data files in directories Langs, Learned, User, DataTabs.

MacRunLisztOutput also contains a small image "Air.BMP" and a config file "cfg.ini". Once C Runner has been built it will also contain Carbon and Classic versions of C Runner. When Liszt is run from C Runner, Liszt will read "Air.BMP" and "cfg.ini" and produce an output file out.nif. C Runner runs the Carbon version under Mac OS X, and the Classic version under OS 9 or earlier.

NB There is a bug in some versions of CarbonLib, which causes Faceless Background Applications like Liszt to hang if they are run twice. It apparently applies only to OS 9, not OS 8, or OS X. It was assigned Bug ID # 2831409 by Apple. It appears to be fixed in CarbonLib_1.5f6_SDK.img, available from the "Download Software" section of the ADC section of Apple's developer web site.

Config File details

Config File example

[General]
origimage=C:\tmp\scan.tif
errorlog=
reporterrors=0
senderrors=1
outputformat=MIDI

[Recognition]
oldesymbols=0
ellipseheads=0
gracenotes=0
cuenotes=0
thinbeams=0
lyriclanguage=English
readlyrics=1
lookforlyrics=1
lookforchords=0
lyricsassignedhigh=0
readstext=1
charsused20=-###--####-#####################
charsused40=-###########################-#-#
charsused60=-##########################-----
charsused80=-------##-------##--##--------##
charsusedA0=---------#----------------------
charsusedC0=--------------------------------
charsusedE0=-------------------------#------

[RhythmAnalysis]
rhythm_lookfortriplets=1
rhythm_partialvoices=1
rhythm_optimistic=1
rhythm_allowoverspill=1
rhythm_maxvoices=4

[OutputOptions]
mro_rawoutput=0
niff_guitarchordsastext=0
mxml_systembreaks=1
mxml_pagebreaks=1
midi_tempo=180
midi_velocity=100
midi_repeats=0
midi_lyrics=1
midi_upstemislowchannel=0

[Display]
windowhandle=3964

Example with comments

[General]
origimage=C:\tmp\scan.tif
senderrors=1
errorlog=
reporterrors=0
outputformat=NIFF

origimage is optional. You can use this to pass the original file path of the image to Liszt, which will be put into the .mro file (or NIFF file from 2.60) which Liszt generates. This is not necessarily the same file path as the one passed to Liszt on the command line. This is intended for programs which want to call Liszt, and have a link back to original scan in the output file.

If senderrors is 1, Liszt will send Windows messages or Apple Events to parent. If 0 it won't. This is the recommended method of handling Liszt errors. The other two (errorlog and reporterrors) may be useful when debugging.

errorlog is a file path where Liszt will append an error message if there is an error during processing an image. If no path supplied, nothing will be written. SE uses this in batch mode. Might have unpredictable results on Mac OS due to difficulty in representing file paths unambiguously.

reporterrors is for Win32 only. If 1, Liszt will display errors in a message box. If 0 it won't.

outputformat can be "NIFF" or "MRO" or "MXML" or "MXML11" or "MIDI". MXML produces MusicXML 1.0. MXML11 produces MusicXML 1.1.

oldesymbols=0
ellipseheads=0
gracenotes=0
cuenotes=0
thinbeams=0

Control recognition of music. If oldesymbols is 1, old style symbols such as straight flags, 'spiral' bass clefs will be recognised. If ellipseheads is 1, halfnotes with symmetrical elliptical heads are recognised. If gracenotes is 1, small note heads are recognised. If cuenotes is 0 (the default) small note heads are interpreted as grace notes. If cuenotes is 1, small note heads will be interpreted as normal note heads. (The recognition engine does not yet deal with cue notes properly, but recognising them as normal notes may be useful.) If thinbeams is 1, beams which are thin lines are recognised.

lyriclanguage=English
readlyrics=1
lookforlyricswhere=0
lookforchordswhere=0
lyricsassignedhigh=0
readstext=1

charsused20=-###--####-#####################
charsused40=-###########################-#-#
charsused60=-##########################-----
charsused80=-------##-------##--##--------##
charsusedA0=---------#----------------------
charsusedC0=--------------------------------
charsusedE0=-------------------------#------

Can set lyriclanguage to Danish, Dutch, English, French, German, Italian, Latin, Spanish. It is used for reading lyrics, not other text. It determines which trigram frequencies to use to help disambiguate syllables. Eg ijk is common in Dutch, not in English.

If readlyrics is 1, lyrics are read, if it is 0 they are not.

The lookforlyricswhere and lookforchordswhere values determine where lyrics and (textual) chords are looked for, in relation to a stave. The value 0 is the default, and means the recognition engine decides itself (which generally means looking for lyrics below staves and chords above or below). A value of 1 means the recognition engine looks above the stave, and a value of -1 means look below. For example, set lookforlyricswhere=1 to read lyrics above staves, or set lookforchordswhere=1 to ensure chords are never found below a stave.

If lyricsassignedhigh is 1, lyric syllables are preferentially assigned to up stem notes when both up and down stem notes line up with a syallable. Otherwise down stem notes are preferred. (May be useful in MusicXML export, possibly in NIFF or MIDI.)

If readstext is 1, other text is read, if it is 0 it is not.

The charsused values are a map of the characters recognised in both lyrics and other text. They are ISO Latin1 codes except 0x80 to 0x9f. Leave the ones in this range as above.

rhythm_lookfortriplets=1
rhythm_partialvoices=1
rhythm_optimistic=1
rhythm_allowoverspill=1
rhythm_maxvoices=4

These control the rhythm analysis (assignment of voices and start times to notes).

If rhythm_lookfortriplets is 1, Liszt will use rhythm analysis (not graphical recognition) to find notes which are probably triplets and assign start times and durations accordingly.

If rhythm_partialvoices is 0, notes in incomplete voices will only be assigned times when they line up vertically with a note in a complete voice.

If rhythm_optimistic is 1, Liszt will assign voices and times to notes even if this means 'guessing' quite a lot.

If rhythm_allowoverspill is 1, Liszt will assign voices and times to notes even if they overspill the measure.

Sensible settings for rhythm_partialvoices, rhythm_optimistic, rhythm_allowoverspill are (0,0,0), (1,0,0), (1,1,0), (1,1,1), in order from most strict to most relaxed.

You can use rhythm_maxvoices to limit the maximum number of voices per stave.

mro_rawoutput=0
niff_guitarchordsastext=0
mxml_systembreaks=1
mxml_pagebreaks=1
midi_tempo=180
midi_velocity=100
midi_repeats=0
midi_lyrics=1
midi_upstemislowchannel=0

If mro_rawoutput is 1, none of the simplifications and interpretations done by SharpEye's front end are performed. This means the mro file may contain systems with different numbers of staves; chords and beamed groups that spread across two or more staves; time signatures that are different (or missing) for the same measure in different staves; different systems may have different pairs of staves which are joined (by braces); and unmarked triplets are not detected even if rhythm_lookfortriplets is 1.

By default, guitar chords are exported as NIFF ChordSymbols. If niff_guitarchordsastext is 1, they will be treated as plain text.

mxml_systembreaks and mxml_pagebreaks determine whether systems and page breaks are put into the MusicXML file.

In MIDI export, midi_tempo is an integer giving the number of quarter notes per minute. midi_velocity is an integer giving the velocity (volume) of all notes. If midi_repeats is 1, repeat signs are honoured when generating MIDI. If midi_lyrics is 1, lyrics are included. midi_upstemislowchannel currently does nothing. (It would require an option to split staves into two MIDI channels based on stem direction before it became useful.)

[Display]
windowhandle=792

windowhandle is for Win32 only. If not 0, Liszt will send progress messages to this window. The message ID is registered using RegisterWindowMessage("70BC82E0-9033-11d2-9C32-EB852EFC6F01-Liszt2ProgressMessage");

The messages sent have wParam=0 and lParam=percentage complete, with a final one with wParam=1, lParam=0 for succesful completion, or wParam=2, lParam=errornumber. If windowhandle=0, Liszt will display its own progess window (May be useful for debugging versions on Win32).

Error number details

Serious errors that might occur at any stage

1101 Internal inconsistency detected

1102 OS routine failed with no error code

1103 Not enough memory (heap)

1104 Not enough memory (shifting heap)

1105 Not enough memory (malloc)

Miscellaneous

1201 Wrong number of arguments passed to engine

Initialisation errors: failure to load data file

1301 Failed to initialise file paths module

1302 No memory for 'magang' table

1303 No memory for 'vdist' table

1304 Can't read pattern recognition data file

1305 Can't read data file

1311 Can't read text-OCR data file

1312 Failed to initialise for text-OCR

1321 Can't read music-OCR data file

1322 Failed to initialise for music-OCR

1331 Failed to read language data

1332 Failed to initialise language data

1334 No memory for dictionary

1335 Failed to initialise user dictionary

1341 Failed to initialise OMR engine

1342 not used (RISC OS)

1343 not used (RISC OS)

1344 not used (RISC OS)

Failure to read image file

1401 Failed to open image file for reading

1402 Failed to read image file

1403 No memory for image file

1404 Image is too big

The group below is for front end, in case engine reads a wider variety of BMPs later. Not used Mar 2003

1410 Failed to load image file

1411 No memory for image file

1412 Not a 1 bit per pixel image

1413 Failed to open image file for reading

1414 Not a version 4 image file

1415 Not a RGB image

1416 More than one plane in image

Failure to read config file

1501 Can't open file or low level read error

1502 Bad syntax or bad value

Failure to make sense of image

1601 Empty image

1602 Image too noisy or complex

1603 Image too noisy or complex

1604 Failed to find any staves

1605 Staves too big (>~ 50 pixels between lines)

Failure to make output

Failure to 'rationalise' music score

1701 Failed to make score rectangular

1702 Failed to deal with multistave objects

1703 failed to unify time signatures

Failure to output NIFF.

1711 No memory to make NIFF

1712 Failed to open NIFF file

1713 Failed to write to NIFF file

Failure to output SharpEye format (.mro)

1721 Failed to open .mro file

1722 Failed to write to .mro file

Failure to output MusicXML.

1731 Failed to open MusicXML file

1732 Failed to write to MusicXML file

Failure to output MIDI.

1741 Failed to open MIDI file

1742 Failed to write to MIDI file

1743 No memory to make MIDI file

1744 Failed to write byte to MIDI file

Unknown error

1999 Unknown error