Tag Archives: haskell

Writing things right

OO has contributed many big and important innovations to programming. Among these, the foremost is that you write functions after rather than before their argument.

No, really.

It’s not just OO languages of course. Concatenative languages do the same thing. There’s a long history of mathematicians doing it as well (though we don’t like to talk about them. The cool mathematicians all write their functions on the left).

It’s funny how attached people get to this fact though.

Consider the following piece of Scala code:

object StringUtils{
  /** 
   * Trims whitespace from the end of s.
   */
  def rtrim(s : String) = ...
}

We can invoke this as StringUtils.rtrim(myString). Or if we import StringUtils, just rtrim(myString);

People get very upset if you ask them to do so though, and they go to all sorts of lengths to avoid it.
Consider the following three examples from different languages:

Scala:

object StringUtils{
   implicit def string2RTrim(s : String) = new { def rtrim = ...; }   
}

Ruby:

class String
  def rtrim
  ...
  end
end

C#:

class StringUtils{
   public static String rtrim(this String s) {
     ...
   }
}

What do these achieve over the previous version? Simple: You can write myString.rtrim instead of rtrim(myString). That’s it. (Actually the Ruby and Scala versions both *can* allow you to do different things than that. It’s just that here and in 90% of the use cases they aren’t used for anything else. The C# version literally doesn’t do anything else).

The thing is, while I’m making fun of this to a certain degree, it’s actually a perfectly reasonable thing to want to do. Designing things in noun-verb order is a good principle of UI design, and it works for programming as well. Things chain better – when you want to add new functions to a pipeline you add them at the point your cursor is naturally at and it matches well with thinking of it as a pipeline of “take this thing, do this to it, do that to it, do this other thing to it, get this value out”. Also you write far fewer brackets. :-) (compare Haskell’s foo . bar . baz $ thing idiom for a similar bracket avoidance tool).

Of these, I’d say that the Ruby solution is the most obvious (it just uses the fact that classes are open to add a new method to String), but it comes with the possibility of amusingly non-obvious runtime errors when someone else defines a conflicting method. The C# solution seems the best to me – it’s relatively little overhead over writing the utility method as you would otherwise and comes with the option to invoke it either as myString.rtrim or StringUtils.rtrim(myString), so when namespacing conflicts inevitably occur you have an easy fallback. But of course it uses a language feature specifically added to do this, while the other two are functions of more general language features. The Scala solution is, to my mind, decidedly the worst of the three.It’s syntactically noisy and comes with a significant additional runtime overhead.

But honestly I’m not particularly happy with any of these solutions. The Scala and Ruby solutions come with disproportionate costs to the benefit they give and the C# solution requires an additional language feature. Moreoever, each of these solutions requires effort at each definition site in order to make something available that you always want at the use site. Wouldn’t it be better if for every utility function you automatically had the option to write it on the right?

Let’s take a digression. What language is the following (rather pointless) code written in?

[1, 2, 3].sort.length

Ruby, right?

Actually, no. It’s Haskell.

Wait, what?

Well, it’s Haskell if you do something slightly evil and redefine the (.) operator (which normally means composition):

Prelude Data.List> let (.) x f = f x
Prelude Data.List> [1, 2, 3].sort.length
3

I saw this trick a while ago (the author was amusingly apologetic for it). It’s evil Haskell code because of the way it redefines an operator that normally means something else (this is totally typesafe of course – existing code will continue to use the old operator definition). But it’s a perfectly valid operator definition, and a rather nice one.

It works well with additional arguments to functions too:

Prelude Data.List> [1, 2, 3].sortBy(compare).length
3

The reason this works is that sortBy takes the list argument curried as its last argument, so sortBy(compare) gives something of type [Int] -> [Int] which we can then apply as above (Haskell’s precedence rules make this work).

So this is a nice trick, but how is it useful to you? Well, it’s probably not. I can’t think of any low noise way of making it work in any of the other languages mentioned so far (the best I can come up with is an evil evil hack in Ruby that would make god go on a kitten killing spree and a mildly nasty hack with operators and implicit conversions in Scala that’s much too noisy to really use), and using it in Haskell will make other Haskell programmers very unhappy with you. But it’s an interesting trick, and I’ll be sure to bear it in mind if I ever get around to creating DRMacIverLang.

This entry was posted in programming and tagged , , , , on by .

QDBM Bindings

“So”, I thought to myself, “I feel like learning how to use the Haskell Foreign Function Interface. I think I’ll write a binding to a C library”

As one does.

Seeing as both my Haskell and my C are less than stellar this proved to be an interesting challenge, but I rose to it and put together a basic set of bindings for QDBM‘s Depot library. It’s only lightly tested and very incomplete, but seems to work and if you want something along these lines it will probably do the job. You can get it from my misc HG repository. The resulting code is extremely imperative to use, but otherwise not too bad.

This post is a literate Haskell file with a demo of how to use it.

We’re going to write a bulk file importer. It will get its data from stdin and will read a bunch of records formated as “key = value”, ignoring blank lines and lines starting with #. It will import them into a qdbm Depot database file, printing out a message for every value it overwrites.

> module Main where
> import qualified Data.ByteString as ByteString
> import Data.ByteString (ByteString)

The library is based around strict ByteStrings, both because of the way it interoperates with C and for performance reasons.

> import qualified Data.ByteString.Char8 as Char8
> import Data.ByteString.Char8 (pack, unpack)

In order to convert between ByteStrings and normal Haskell Strings

> import System.Environment
> import System.IO
> import Text.Printf

Just some random useful imports

> import QDBM.Depot

And of course we need to import the QDBM module itself.

> main = do args <- getArgs
>           case args of 
>             [dbname, importfile] -> importInto (pack dbname) importfile
>             x -> putStrLn "Expected usage: bulkimport dbname importfile"

Boring, and woefully inadequate, argument processing.

>   where importInto dbname importfile = 
>           do db <- openDepot dbname Write

Open a handle on the database in write mode.

>              eachLine importfile $ processLine db

I’m pretty sure the Haskell Cabal will want my blood for writing a foreach in Haskell. But anyway, this should be fairly self explanatory.

>              closeDepot db

Finally, close the database file

>         parseRecord line = let (start, end) = Char8.break (=='=') line in (start, ByteString.drop 1 end)

Split the line around the first = in order to get the key and value. We should probably do better error handling here. Oh well.

>         processLine db line = if (not (ByteString.null line) && Char8.head line /= '#')
>                                then do currentValue <- get db key 
>                                        case currentValue of 
>                                           Nothing -> put db key value 
>                                           Just currentValue | currentValue == value -> return ()
>                                           Just currentValue -> printf "Replacing value for key %s. Value %s -> %s\n" (show key) (show currentValue) (show value)
>                                else return ()
>           where (key, value) = parseRecord line                  

For each line, we first check if it’s empty or starts with #. If it is, we ignore it. Else we look up the relevant key in the database. If it’s there already and with a different value with print a warning message. Else we put the new value into the database.

> eachLine :: FilePath -> (ByteString -> IO ()) -> IO ()
> eachLine file f = do handle <- openFile file ReadMode
>                      catch (eachLineInHandle handle) (const $ hClose handle)
>   where eachLineInHandle handle = ByteString.hGetLine handle >>= f >> eachLineInHandle handle
> 

Simple utility function for operating over the lines of a file.

And that’s about it. It’s fairly grotty code, but hopefully conveys the idea of how to use the basics (not that that would really have been hard to figure out). There are a few other functions in the module, but they should be fairly self explanatory.

This entry was posted in programming and tagged on by .

Monadic card shuffling

Because it’s what all the cool kids do, this is a post in literate Haskell. Assuming wordpress doesn’t screw things up too horribly, you should just be able to cut and paste it into your text editor and compile it.

How do you shuffle a pack of cards?

Easy. Throw it up in the air and then pick them up again. Done.

Ok, you don’t do that in practice, because it makes a mess. But in principle it would give you a fair shuffling of the cards. Conceptually it’s equivalent to doing “pick a card, any card” until you run out of cards, and using the resulting order you picked them in. But while tidy, that’s far too boring.

Nevertheless, it’s a pretty good way of shuffling. It’s more or less equivalent to one of the standard ways of shuffling a list/array/favourite sequential data structure, the Fisher-Yates shuffle. This has a very easy to follow imperative implementation, but the purely functional ones… not so much. Oleg has an implementation, although he doesn’t call it by this name. However, I found this implementation a little scary and (more importantly) not that easy to use.

Here’s one which is structured according to a custom monad (sorry) which emulates the “pick a card, any card” structure of shuffling the list. It seems likely that the monad has other uses, but I can’t think of any at the moment. Mostly I’m just posting this as a cute way to solve the problem.

>{-# LANGUAGE GeneralizedNewtypeDeriving#-} 

We’ll need this to derive the monad instance for our sample.

> module Sample (
>   Sample,

We’ll define a type Sample a b. This should be interpreted as an action which can add items to and random draw items from a bag of elements of type a and results in a b.

>   takeSample, 

Given a Sample we run it by providing it with a source of randomness.

We define a sample with the following primitives:

>   draw,

We can draw an item from it at random. This returns Nothing if the bag is empty, else Just someItem

>   place,

We can put an item into the bag.

>   placeAll,
>   drawAll,

And we provide some useful functions for bulk add and remove. placeAll puts a list of items into the bag. drawAll draws all the remaining items from the bag in a random order.

> shuffle

And using the combination of placeAll and drawAll we’ll define a shuffle function.

> ) where
>
> import Control.Monad.State
> import System.Random
> import qualified Data.Sequence as Seq
> import Data.Sequence (Seq, (<|), (|>), (><))
> newtype Sample a b = Sample (State (StdGen, Seq a) b) deriving Monad

A Sample consists of two things. A random generator with which to make choices and a collection of elements (we assume it’s a StdGen rather than an arbitrary generator, mainly because I’m being lazy) and a bag of elements to draw from. We allow repetitions, and in order to allow us to draw from any point we model it as a Data.Sequence rather than a list (which has O(log(k)) indexing).

We want to chain actions with respect to this sampling together, so we model it as a state monad.

> takeSample :: StdGen -> Sample a b -> b
> takeSample g (Sample st) = evalState st (g, Seq.empty)

Given a Sample, we set it running with a source of randomness and an empty bag.

> draw :: Sample a (Maybe a)
> draw = Sample $ do (gen, sample) <- get
>                    if (Seq.null sample) 
>                      then return Nothing
>                      else do let (i, gen') = randomR (0, Seq.length sample - 1) gen
>                              let (x, sample') = remove i sample
>                              put (gen', sample')
>                              return $ Just x

Draw takes an element from the sequence, returns the result of that and chains through the new generator and the remaining elements.

>  where
>    remove :: Int -> Seq a -> (a, Seq a)
>    remove 0 xs = (x, u) where (x Seq.:< u) = Seq.viewl xs
>    remove i xs | i == Seq.length xs = (x, u) where (u Seq.:> x) = Seq.viewr xs
>    remove i xs = (x, u >< v)
>      where (u', v) = Seq.splitAt i xs
>            (u Seq.:> x)  = Seq.viewr u' 

This is just a helpful method for removing an element from inside a sequence.

> place :: a -> Sample a ()
> place x = Sample $ do (gen, sample) <- get
>                       put (gen, x <| sample)

To place an element we just append it to the beginning of the sequence.

> placeAll :: [a] -> Sample a ()
> placeAll xs = Sample $ do (gen, sample) <- get
>                           put (gen, Seq.fromList xs >< sample)

Similarly for placing multiple elements, although we use sequence concatenation rather than appending them one by one.

> drawAll :: Sample a [a]
> drawAll = do el <- draw
>              case el of 
>                   Nothing -> return []
>                   Just(x) -> do xs <- drawAll
>                                 return $ x : xs

drawAll simply draws from the bag until it finds nothing left. Pretty self explanatory.

> shuffle :: StdGen -> [a] -> [a]
> shuffle gen xs = takeSample gen $ placeAll xs >> drawAll 

And finally, we can implement shuffle. And it’s a one liner. In order to shuffle a bunch of elements we simply put them all in the bag, then take them all out again in a random order. Ta da!

This wasn’t really very hard to do directly, but I found that creating the right abstraction to build it out of helped clarify the logic a lot.

This entry was posted in programming and tagged on by .