How did I implement this 250+ page specification in such a short amount of time? Simple - I took the specification and made it executable.
The specification is full of tables describing the on-disk format that look like this:
At first, I began by manually coding up my library from these tables. Unfortunately, each table corresponds to no less than three bits of Haskell:
- A data type definition for the record (called
FOCALGRADIENT
in this case) - A function for reading that record from the file (of type
SwfGet FOCALGRADIENT
:SwfGet
is a state monad encapsulating aByteString
) - A function for writing that thing back (of type
FOCALGRADIENT -> SwfPut ()
, whereSwfPut
is another monad)
It quickly became obvious that writing this code (and keeping the three versions in sync!) was totally untenable.
My solution was to turn my parser into a Literate Haskell file which can contain sections like the following:
p18: RGB color record
\begin{record}
RGB
Field Type Comment
Red UI8 Red color value
Green UI8 Green color value
Blue UI8 Blue color value
\end{record}
I then have a preprocessor which replaces these wholesale with code implementing the data type, reader and writer. For our running example, we get this:
data RGB = RGB{rGB_red :: UI8, rGB_green :: UI8, rGB_blue :: UI8}
deriving (Eq, Show, Typeable, Data)
getRGB
= do rGB_red <- getUI8
rGB_green <- getUI8
rGB_blue <- getUI8
return (RGB{..})
putRGB RGB{..}
= do putUI8 rGB_red
putUI8 rGB_green
putUI8 rGB_blue
return ()
Choice and Repetition
Frequently, some part of the record is repeated a number of times that depends on an earlier part of the record. In other cases, the exact shape a field takes can depend on an earlier one. For example:
p96: ActionConstantPool
\begin{record}
ActionConstantPool
Field Type Comment
ActionConstantPool ACTIONRECORDHEADER ActionCode = 0x88
Count UI16 Number of constants to follow
ConstantPool STRING[Count] String constants
\end{record}
This record contains a field-controlled repetition count. My generator is smart enough to:
- Generate a Haskell expression from the repetition count within the square brackets. In this case it is just
Count
, but in general this can contain arithmetic and references to several fields. - Omit the
Count
field from the generated data type, and infer it from the length of theConstantPool
when we come to put this record back
There is similar support for choice. For example, look at the
CodeTable
field in DefineFontInfo
: p177: DefineFontInfo
\begin{record}
DefineFontInfo
Field Type Comment
Header RECORDHEADER Tag type = 13
FontID UI16 Font ID this information is for.
FontNameLen UI8 Length of font name.
FontName UI8[FontNameLen] Name of the font (see following).
FontFlagsReserved UB[2] Reserved bit fields.
FontFlagsSmallText UB[1] SWF 7 file format or later: Font is small. Character glyphs are aligned on pixel boundaries for dynamic and input text.
FontFlagsShiftJIS UB[1] ShiftJIS character codes.
FontFlagsANSI UB[1] ANSI character codes.
FontFlagsItalic UB[1] Font is italic.
FontFlagsBold UB[1] Font is bold.
FontFlagsWideCodes UB[1] If 1, CodeTable is UI16 array; otherwise, CodeTable is UI8 array.
CodeTable If FontFlagsWideCodes, UI16[] Otherwise, UI8[] Glyph to code table, sorted in ascending order.
\end{record}
The field changes representation based on
FontFlagWideCodes
- this is represented in the corresponding Haskell record as an Either
type, and the FontFlagsWideCodes
flag is not put in the record at all, as it can be unambiguously inferred from whether we are Left
or Right
.Bit Fields
The fields we saw in
DefineFontInfo
with the name UB
were actually bit fields. UB[n]
is actually an unsigned number with n
bits in it. Here is another example: \begin{record}
BLURFILTER
Field Type Comment
BlurX FIXED Horizontal blur amount
BlurY FIXED Vertical blur amount
Passes UB[5] Number of blur passes
Reserved UB[3] Must be 0
\end{record}
Bit fields pose a problem - how should they be represented in Haskell? If I know the number of bits statically, there might be a suitable type I can choose. For example, my generator will use
Bool
for UB[1]
fields. Unfortunately, in general I'll have to choose a Haskell type which is large enough to hold the maximum possible number of bits I might encounter.What I have chosen to do is represent any
UB[n]
field (where n
is not 1) with a Word32
(32 bits seems to be the upper limit on any variable-length bitfields I've encountered in practice, though this isn't part of the specification). What this means, however, is that my generator must produce runtime assertions in the writing code to ensure that we won't lose any information by truncating the Word32
to the number of bits we actually have available.Abstraction
Sometimes, the structure of a record varies depending on the context in which it is used. For example, the
GLYPHENTRY
record can only be read and written if we know the number of bits used for its two fields. This is encoded as follows: p192: Glyph entry
\begin{record}
GLYPHENTRY(GlyphBits, AdvanceBits)
Field Type Comment
GlyphIndex UB[GlyphBits] Glyph index into current font.
GlyphAdvance SB[AdvanceBits] x advance value for glyph.
\end{record}
The "parameter list" after
GLYPHENTRY
in the header must be filled out with concrete arguments by any user of GLYPHENTRY
, such as in the GlyphEntries
field of the following: \begin{record}
TEXTRECORD(TextVer, GlyphBits, AdvanceBits)
Field Type Comment
TextRecordType UB[1] Always 1.
StyleFlagsReserved UB[3] Always 0.
StyleFlagsHasFont UB[1] 1 if text font specified.
StyleFlagsHasColor UB[1] 1 if text color specified.
StyleFlagsHasYOffset UB[1] 1 if y offset specified.
StyleFlagsHasXOffset UB[1] 1 if x offset specified.
FontID If StyleFlagsHasFont, UI16 Font ID for following text.
TextColor If StyleFlagsHasColor, If TextVer = 2, RGBA Otherwise RGB Font color for following text.
XOffset If StyleFlagsHasXOffset, SI16 x offset for following text.
YOffset If StyleFlagsHasYOffset, SI16 y offset for following text.
TextHeight If StyleFlagsHasFont, UI16 Font height for following text.
GlyphCount UI8 Number of glyphs in record.
GlyphEntries GLYPHENTRY(GlyphBits, AdvanceBits)[GlyphCount] Glyph entry (see following).
Padding PADDING8 Padding to byte boundary
\end{record}
Records abstracted over parameters generate Haskell reading and writing code which is lambda-abstracted over those same paramaters, just as you might expect. Likewise, occurrences of argument lists correspond to applications of arguments to a nested reading or writing function.
Conclusion
Having a declarative description of the contents of SWF files helped enormously. It had the following benefits:
- Very rapid development. I essentially just pasted in the contents of the specification for each record. This also led to a very low rate of bugs in the code (though I found some specification bugs, I can hardly be blamed for those :-)
- Easy post-hoc changes when adding new features.
- I only added assertions to check bit fields lengths quite late on in development, and it was a painless generator change. If I had handwritten everything I would have had to do a lot of tedious and error-prone programming to add a small feature like this.
- I could make use of the declarative information to generate reasonable error messages for showing to the user when they try to write back an invalid SWF. Again, this is something I decided to do after transcribing the specification.
- Very easy to read and spot bugs. Legibility was a goal when designing the DSL.
- It's relatively easy to change to using custom Haskell code for dealing with those records that can't be expressed in the DSL - you can copy and paste the unsatisfactory generated Haskell and then mold it to fit your needs.
Using custom code (as per the last bullet point) is sometimes indispensable for making the Haskell records look nicer. For example, you might need to write it so you can stop generating this sort of record:
data RECT = RECT { rECT_nbits :: UI8, rECT_xmin :: SB, rECT_xmax :: SB, rECT_ymin :: SB, rECT_ymax :: SB }
And, by observing that you can compute
nbits
by the maximum of the base-2 logarithms of the 4 SB
fields, generate this one instead: data RECT = RECT { rECT_xmin :: SB, rECT_xmax :: SB, rECT_ymin :: SB, rECT_ymax :: SB }
The fact that you have to use copy and paste to get this done is something which I'm not totally happy with. I've been looking at approaches to turn my DSL into
much more of a combinator library (i.e. taking the focus away from using an external code generator).
A combinator based approach would make it easy to add extra primitive combinators that know about special cases like one - but more on that another time!
Interested souls can get the code for my SWF library on GitHub. It can successfully roundtrip all of the example files that I could gather from the Gnash and Flash Gordon projects.
This is where macros [1] come in handy. It has been said that macros are not necessary because Haskell has lazy evaluation, but macros also allow code to be algorithmically created. (See Chapter 24 in Practical Common Lisp by Peter Seibel: http://www.gigamonkeys.com/book/practical-parsing-binary-files.html. It also allows declarative specification of binary formats leading to code to read/write the formats. Only he was able to do it *within* the language in one pass through the compiler rather than having to manually create a specialized "macro" system by hand.
ReplyDelete[1] When I say macros, I do not mean brain-dead cpp or m4 monstrosities. Think Lisp macros.
You've got 21 upvotes on HN at the moment btw. Good job.
ReplyDelete@Anonymous 1: this doesn't really require macros. In Python the "construct" library does this very well without any macros nor code generation.
ReplyDeleteI've heard that converting the specification to HTML and using an HTML parser has led to some success in generating parsers/generators/fuzzers. Good luck!
ReplyDeleteWhy not use quasiquoting instead of the preprocessor?
ReplyDeleteWow... I'm not sure I belong here, but I've been following a lot of the free classes available online and I love it. Such a wonder that so much knowledge is available to anyone willing to learn.
ReplyDeleteThank you for the clear explanation. I'm only a beginner in flash programming, but I was able to follow and implement it in my homework. Thanks a bunch! :)
ReplyDeleteThis is absolutely brilliant.
ReplyDelete@anonymous: i would challenge you to "construct" (pun intended) a parser for SWF using "construct" - it's one format it cannot deal with out of the box and requires a lot of tweaking even in the core code base!
ReplyDeletecoin haber - koin haber - kripto para haberleri - coin haber - instagram video indir - instagram takipçi satın al - instagram takipçi satın al - tiktok takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - instagram beğeni satın al - instagram beğeni satın al - google haritalara yer ekleme - btcturk güvenilir mi - binance hesap açma - kuşadası kiralık villa - tiktok izlenme satın al - instagram takipçi satın al - sms onay - paribu sahibi - binance sahibi - btcturk sahibi - paribu ne zaman kuruldu - binance ne zaman kuruldu - btcturk ne zaman kuruldu - youtube izlenme satın al - torrent oyun - google haritalara yer ekleme - altyapısız internet - bedava internet - no deposit bonus forex - erkek spor ayakkabı - tiktok jeton hilesi - tiktok beğeni satın al - microsoft word indir - misli indir - instagram takipçi satın al
ReplyDeleteen son çıkan perde modelleri
ReplyDeletesms onay
Mobil Ödeme Bozdurma
Nft Nasıl Alinir
ankara evden eve nakliyat
Trafik Sigortası
dedektör
websitesi kurma
Aşk romanları
Smm Panel
ReplyDeleteSMM PANEL
iş ilanları
İNSTAGRAM TAKİPÇİ SATIN AL
HIRDAVAT
beyazesyateknikservisi.com.tr
servis
Tiktok Jeton Hile
Thanks for information
ReplyDeleteCongratulations on your article, it was very helpful and successful. c621fa6ad71dd7afc455c451d0d9c7a6
ReplyDeletenumara onay
website kurma
website kurma
Thank you for your explanation, very good content. 4f6e6c3dfa4731e5bc3289f8efc419d2
ReplyDeletealtın dedektörü
instagram takipçi satın al
ReplyDeletecasino siteleri
H2KG6İ
Success Write content success. Thanks.
ReplyDeletebetpark
kıbrıs bahis siteleri
canlı poker siteleri
deneme bonusu
kralbet
canlı slot siteleri
betturkey
elf bar
ReplyDeletebinance hesap açma
sms onay
DRYQ
bilecik
ReplyDeletegebze
ısparta
şırnak
alsancak
1NB
Kanarya Adaları yurtdışı kargo
ReplyDeleteKanada yurtdışı kargo
Kamerun yurtdışı kargo
Kamboçya yurtdışı kargo
Jersey yurtdışı kargo
0P68U
شركة عزل اسطح zO5OWczeQE
ReplyDelete