Friday, June 17, 2016

Dependently typed red-black tree in Haskell, part 4

A rather large jump from page 3 to 4. The tl;dr, page 4 implements insert. The slightly longer version: there are several bugs in page 3, and insert requires implementing many cases.

Unfortunately, I did not keep notes through the process, and really, no blog post could capture the frustration and awe of implementing insert. Having the compiler say "No, don't do that; you're violating the black count," is very cool.

First, a few big bugs in page 3. Can you spot them?
data RedZip n a where
        RedZip ::
          a -> Black n a -> Black n a -> BlackCrumb (S n) a -> RedZip n a
        TempZip ::
          a -> Black n a -> Black n a -> RedCrumb n a -> RedZip n a

data BlackZip n a where
        LeafZip :: RBCrumb Z a -> BlackZip Z a
        Black2Zip ::
          a -> Black n a -> Black n a -> RBCrumb (S n) a -> BlackZip (S n) a
        Black3Zip ::
          a -> Red n a -> Black n a -> RBCrumb (S n) a -> BlackZip (S n) a

There are three:
  1. TempZip was missing. TempZip captures the idea of consecutive red nodes, which may occur during insertion. Note that since ultimately, the zip must be unwound into nodes, this has no effect on the integrity of the red-black constraint.
  2. Black4Zip is unnecessary. Turns out I should have read all of the Sedgewick paper [1] before I started, but Black4 can be converted to Red.
  3. LeafZip was wrong. This one is quite subtle. In the constructor, we need (RBCrumb Z a), rather than (RBCrumb n a) -- the difference is we know a something extra about the leaf zipper. In particular, we know exactly the black depth of the parent, and the type system needs that information to handle all the cases.
Additionally, there were several problems with the "crumbs," most important being I was missing the right/left information. For details, see Page4/Types.hs

Its hard to say much about the insert implementation. Overall, it's a hundred lines of very dense ADT destructruing and reconstruction. Through the process, I was able to significantly simplify the code by eliminating several "crumb" and "zipper" constructors. Mostly, it was fun and frustrating having the type system yell at me for trying to violate red-black and black-count constraints.

A closing remark: QuickCheck is amazing. The basic property we are looking for from a red-black tree is that a traversal yields the elements in sorted order. This is trivial to assert with QuickCheck (see Lib.hs). Importantly, this allows us to verify the final constraint, that we have a binary search tree. The QuickCheck one line-er caught several bugs in the afore mentioned dense ADT code.

[1] Sedgewick, R.: Left-leaning red-black trees, http://www.cs.princeton.edu/rs/talks/LLRB/LLRB.pdf