Documentation for L2SDK, the software development kit for version 2 of the Liszt music-OCR engine.
L2SDK is essentially a command line program which inputs a scanned image file and outputs a music notation file. The available output formats are NIFF, MusicXML (1.0 and 1.1), SharpEye's own .mro format, and MIDI. The SDK is available for Windows, Mac OS 9 and Mac OS X. The SDK consists of the recognition engine, associated data files, and a very simple "front end" which illustrates how the engine can be called from another program.
You should be aware that taking the uncorrected output from the recognition engine, and importing it straight into a typical music notation editor can cause problems. Here are some common issues.
Rhythmic inconsistencies. Due to recognition errors, the notes and rests in each measure do not always fit the time signature. When you call the engine, you can specify how rhythmic inconsistencies are handled (eg you can enforce consistency at the expense of omitting notes that don't fit). A music notation editor which can cope with rhythmic inconsistencies has an advantage here.
Clefs A missed or misread clef can cause trouble, because music notation editors usually preserve pitch of subsequent notes when editing clefs, whereas to correct a recognition error, you want to preserve the graphical position of subsequent notes.
Key Signatures Similar to clefs. To correct a recognition error you want to change the key signature without changing the accidentals on subsequent notes.
Irregular Systems If there are different numbers of staves in different systems, or if different instruments occupy different staves, this causes a problem. The engine can add empty staves to make all systems have the same number (exactly what that means depends on the output format) but it may require user help to "join up" the systems so that staves in different systems match.
Under Win32, Liszt can be run from the command line like:
liszt.exe infilepath outfilepath configfilepath
(Each component should be enclosed in double quotes if there are any
spaces.) From another application, CreateProcess()
can be used.
Under Mac OS, it can be launched from another application using
LaunchApplication
with an Apple Event containing
the arguments as FSSpecs
.
infilepath
is a standard Windows Version 4 bitmap file.
It must be one bit per pixel and not compressed. As far as I can
see there is no such thing as a compressed 1bpp BMP format - anyway,
the biBitCount
field must be 1, and
biCompression
field must be BI_RGB = 0
.
The music should be black on a white background. The palette can be 0 = white, 1 = black, or vice versa. SharpEye writes a white-is-zero BMP file for the engine, but either should work.
This format is used under Mac OS as well. It is a simple format which is easy to write on any platform.
outfilepath
is the output file, ie
file path or FSSpec
for NIFF, MusicXML, MIDI or .mro file.
configfilepath
is a file supplying all the options for running Liszt.
It is based on the format of a Windows .ini file. Under Win32 it
can be created using calls to WritePrivateProfileString()
. The Config
file is described in more detail below.
On Windows Liszt is a Win32 program, that is, it has a WinMain()
and
creates but never shows a single window. Throughout the recognition a
"multitask()" function is called frequently to process Windows messages.
The calling application can abort the recognition by sending a WM_DESTROY
,
and can receive progress messages (percentage complete and errors) using a
message ID obtained via RegisterWindowMessage()
.
wParam == 0
: lParam
is percentage complete, in range 0 through 100.
wParam == 1
: lParam == 0
. Successful completion.
wParam == 2
: lParam
is error number.
On Mac OS Liszt is a Faceless Background Application. Throughout the
recognition a "multitask()" function is called frequently to process Apple
Events. The calling application can abort the recognition by sending a
kAEQuitApplication
, and can receive progress messages,
completion message, and errors messages via Apple Events.
The error numbers are integers in the range 1000 through 1999. It is also possible that some Mac OS errors may be passed on. The meaning of these numbers is described below.
WinRunLisztSource contains source code for a simple Win32 application RunLiszt which can run the Liszt recognition engine. This a supplied as a Visual C++ project RunLiszt.dsw. The output directory is set to WinRunLisztOutput.
WinRunLisztOutput contains the Windows version of the Liszt recognition engine in L2SDK, an executable file Liszt.exe and auxiliary data files in directories Langs, Learned, User, DataTabs.
WinRunLisztOutput also contains a small image "Air.BMP", and once Runliszt has been built it will contain Runliszt.exe as well. Runliszt allows the user to choose a BMP file (eg "Air.BMP") will produce a configuration file for Liszt (cfg.ini) when the convert button is pressed. Liszt will then read the BMP file and cfg.iniand produce an output file "out.nif".
MacRunLisztSource contains source code for a simple Mac OS Application "C Runner" which runs the Liszt recognition engine. It is supplied as a CodeWarrior project "C Runner:C Runner.mcp" with a library subproject "LisztSupportLib:LisztSupportLib.mcp". The output directory is set to MacRunLisztOutput. Classic and Carbon version of "C Runner" are produced here.
MacRunLisztOutput contains Classic and Carbon version of the Liszt recognition engine in L2SDK, called ClassicLiszt, CarbonLiszt plus auxiliary data files in directories Langs, Learned, User, DataTabs.
MacRunLisztOutput also contains a small image "Air.BMP" and a config file "cfg.ini". Once C Runner has been built it will also contain Carbon and Classic versions of C Runner. When Liszt is run from C Runner, Liszt will read "Air.BMP" and "cfg.ini" and produce an output file out.nif. C Runner runs the Carbon version under Mac OS X, and the Classic version under OS 9 or earlier.
NB There is a bug in some versions of CarbonLib, which causes Faceless Background Applications like Liszt to hang if they are run twice. It apparently applies only to OS 9, not OS 8, or OS X. It was assigned Bug ID # 2831409 by Apple. It appears to be fixed in CarbonLib_1.5f6_SDK.img, available from the "Download Software" section of the ADC section of Apple's developer web site.
[General] origimage=C:\tmp\scan.tif errorlog= reporterrors=0 senderrors=1 outputformat=MIDI [Recognition] oldesymbols=0 ellipseheads=0 gracenotes=0 cuenotes=0 thinbeams=0 lyriclanguage=English readlyrics=1 lookforlyrics=1 lookforchords=0 lyricsassignedhigh=0 readstext=1 charsused20=-###--####-##################### charsused40=-###########################-#-# charsused60=-##########################----- charsused80=-------##-------##--##--------## charsusedA0=---------#---------------------- charsusedC0=-------------------------------- charsusedE0=-------------------------#------ [RhythmAnalysis] rhythm_lookfortriplets=1 rhythm_partialvoices=1 rhythm_optimistic=1 rhythm_allowoverspill=1 rhythm_maxvoices=4 [OutputOptions] mro_rawoutput=0 niff_guitarchordsastext=0 mxml_systembreaks=1 mxml_pagebreaks=1 midi_tempo=180 midi_velocity=100 midi_repeats=0 midi_lyrics=1 midi_upstemislowchannel=0 [Display] windowhandle=3964
[General] origimage=C:\tmp\scan.tif senderrors=1 errorlog= reporterrors=0 outputformat=NIFF
origimage
is optional. You can use this to pass the
original file path of the image to Liszt, which will be put into
the .mro file (or NIFF file from 2.60) which Liszt generates. This
is not necessarily the same file path as the one passed to Liszt on
the command line. This is intended for programs which want to call
Liszt, and have a link back to original scan in the output file.
If senderrors
is 1, Liszt will send Windows messages
or Apple Events to parent. If 0 it won't. This is the
recommended method of handling Liszt errors. The other two
(errorlog
and reporterrors
) may be
useful when debugging.
errorlog
is a file path where Liszt will append an error message
if there is an error during processing an image. If no path supplied,
nothing will be written. SE uses this in batch mode. Might have
unpredictable results on Mac OS due to difficulty in representing
file paths unambiguously.
reporterrors
is for Win32 only. If 1, Liszt will display errors
in a message box. If 0 it won't.
outputformat
can be "NIFF" or "MRO" or "MXML" or "MXML11" or "MIDI". MXML
produces MusicXML 1.0. MXML11 produces MusicXML 1.1.
oldesymbols=0 ellipseheads=0 gracenotes=0 cuenotes=0 thinbeams=0
Control recognition of music. If oldesymbols
is 1,
old style symbols such as straight flags, 'spiral' bass clefs will
be recognised. If ellipseheads
is 1, halfnotes with
symmetrical elliptical heads are recognised. If gracenotes
is 1, small note heads are recognised. If cuenotes
is 0 (the default) small
note heads are interpreted as grace notes. If cuenotes
is 1, small note heads
will be interpreted as normal note heads. (The recognition engine does not yet deal
with cue notes properly, but recognising them as normal notes may be useful.)
If thinbeams
is 1, beams which are thin lines are recognised.
lyriclanguage=English readlyrics=1 lookforlyricswhere=0 lookforchordswhere=0 lyricsassignedhigh=0 readstext=1 charsused20=-###--####-##################### charsused40=-###########################-#-# charsused60=-##########################----- charsused80=-------##-------##--##--------## charsusedA0=---------#---------------------- charsusedC0=-------------------------------- charsusedE0=-------------------------#------
Can set lyriclanguage
to Danish, Dutch, English, French, German,
Italian, Latin, Spanish. It is used for reading lyrics, not other
text. It determines which trigram frequencies to use to help
disambiguate syllables. Eg ijk is common in Dutch, not in English.
If readlyrics
is 1, lyrics are read, if it is 0 they are not.
The lookforlyricswhere
and lookforchordswhere
values
determine where lyrics and (textual) chords are looked for, in relation to a stave. The value 0
is the default, and means the recognition engine decides itself (which generally means
looking for lyrics below staves and chords above or below). A value of 1 means the
recognition engine looks above the stave, and a value of -1 means look below. For example,
set lookforlyricswhere=1
to read lyrics above staves, or set lookforchordswhere=1
to ensure chords are never found below a stave.
If lyricsassignedhigh
is 1, lyric syllables are preferentially
assigned to up stem notes when both up and down stem notes line up with a
syallable. Otherwise down stem notes are preferred. (May be useful in MusicXML
export, possibly in NIFF or MIDI.)
If readstext
is 1, other text is read, if it is 0 it is not.
The charsused
values are a map of the characters recognised
in both lyrics and other text. They are ISO Latin1 codes except
0x80 to 0x9f. Leave the ones in this range as above.
rhythm_lookfortriplets=1 rhythm_partialvoices=1 rhythm_optimistic=1 rhythm_allowoverspill=1 rhythm_maxvoices=4
These control the rhythm analysis (assignment of voices and start times to notes).
If rhythm_lookfortriplets
is 1, Liszt will use rhythm analysis (not
graphical recognition) to find notes which are probably triplets and
assign start times and durations accordingly.
If rhythm_partialvoices
is 0, notes in incomplete voices
will only be assigned times when they line up vertically with a note
in a complete voice.
If rhythm_optimistic
is 1, Liszt will assign voices and times to
notes even if this means 'guessing' quite a lot.
If rhythm_allowoverspill
is 1, Liszt will assign voices and times to
notes even if they overspill the measure.
Sensible settings for rhythm_partialvoices, rhythm_optimistic,
rhythm_allowoverspill
are (0,0,0), (1,0,0), (1,1,0), (1,1,1), in order
from most strict to most relaxed.
You can use rhythm_maxvoices
to limit the maximum number of
voices per stave.
mro_rawoutput=0 niff_guitarchordsastext=0 mxml_systembreaks=1 mxml_pagebreaks=1 midi_tempo=180 midi_velocity=100 midi_repeats=0 midi_lyrics=1 midi_upstemislowchannel=0
If mro_rawoutput is 1, none of the simplifications and interpretations done by SharpEye's front end are performed. This means the mro file may contain systems with different numbers of staves; chords and beamed groups that spread across two or more staves; time signatures that are different (or missing) for the same measure in different staves; different systems may have different pairs of staves which are joined (by braces); and unmarked triplets are not detected even if rhythm_lookfortriplets is 1.
By default, guitar chords are exported as NIFF ChordSymbols. If
niff_guitarchordsastext
is 1, they will be treated as plain text.
mxml_systembreaks
and mxml_pagebreaks
determine whether systems and page breaks are put into the MusicXML file.
In MIDI export, midi_tempo
is an integer giving the number of
quarter notes per minute. midi_velocity
is an integer
giving the velocity (volume) of all notes. If midi_repeats
is 1, repeat signs are honoured when generating MIDI. If
midi_lyrics
is 1, lyrics are included. midi_upstemislowchannel
currently does nothing. (It would require an option to split staves into two MIDI
channels based on stem direction before it became useful.)
[Display] windowhandle=792
windowhandle
is for Win32 only. If not 0, Liszt will send progress messages
to this window. The message ID is registered using
RegisterWindowMessage("70BC82E0-9033-11d2-9C32-EB852EFC6F01-Liszt2ProgressMessage");
The messages sent have wParam=0
and
lParam=percentage
complete,
with a final one with wParam=1, lParam=0
for succesful completion,
or wParam=2, lParam=errornumber
.
If windowhandle=0
, Liszt will display its own progess window
(May be useful for debugging versions on Win32).
1101 Internal inconsistency detected
1102 OS routine failed with no error code
1103 Not enough memory (heap)
1104 Not enough memory (shifting heap)
1105 Not enough memory (malloc)
1201 Wrong number of arguments passed to engine
1301 Failed to initialise file paths module
1302 No memory for 'magang' table
1303 No memory for 'vdist' table
1304 Can't read pattern recognition data file
1305 Can't read data file
1311 Can't read text-OCR data file
1312 Failed to initialise for text-OCR
1321 Can't read music-OCR data file
1322 Failed to initialise for music-OCR
1331 Failed to read language data
1332 Failed to initialise language data
1334 No memory for dictionary
1335 Failed to initialise user dictionary
1341 Failed to initialise OMR engine
1342 not used (RISC OS)
1343 not used (RISC OS)
1344 not used (RISC OS)
1401 Failed to open image file for reading
1402 Failed to read image file
1403 No memory for image file
1404 Image is too big
1410 Failed to load image file
1411 No memory for image file
1412 Not a 1 bit per pixel image
1413 Failed to open image file for reading
1414 Not a version 4 image file
1415 Not a RGB image
1416 More than one plane in image
1501 Can't open file or low level read error
1502 Bad syntax or bad value
1601 Empty image
1602 Image too noisy or complex
1603 Image too noisy or complex
1604 Failed to find any staves
1605 Staves too big (>~ 50 pixels between lines)
1701 Failed to make score rectangular
1702 Failed to deal with multistave objects
1703 failed to unify time signatures
1711 No memory to make NIFF
1712 Failed to open NIFF file
1713 Failed to write to NIFF file
1721 Failed to open .mro file
1722 Failed to write to .mro file
1731 Failed to open MusicXML file
1732 Failed to write to MusicXML file
1741 Failed to open MIDI file
1742 Failed to write to MIDI file
1743 No memory to make MIDI file
1744 Failed to write byte to MIDI file
1999 Unknown error