Fake: Generating Realistic Test Data in Haskell

On a number of occasions over the years I've found myself wanting to generate realistic looking values for Haskell data structures.  Perhaps I'm writing a UI and want to fill it in with example data during development so I can see how the UI behaves with large lists.  In this situation you don't want to generate a bunch of completely random unicode characters.  You want things that look plausible so you can see how it will likely look to the user with realistic word wrapping, etc.  Later, when you build the backend you actually want to populate the database with this data.  Passing around DB dumps to other members of the team so they can test is a pain, so you want this stuff to be auto-generated.  This saves time for your QA people because if you didn't have it, they'd have to manually create it.  Even later you get to performance testing and you find yourself wanting to generate several orders of magnitude more data so you can load test the database, but you stil…

Efficiently Improving Test Coverage with Algebraic Data Types

Think of a time you've written tests for (de)serialization code of some kind, say for a data structure called Foo.  If you were using the lowest level of sophistication you probably defined a few values by hand, serialized them, deserialized that, and verified that you ended up with the same value you started with.  In Haskell nomenclature we'd say that you manually verified that parse . render == id.  If you were a little more sophisticated, you might have used the QuickCheck library (or any of the numerous similar packages it inspired in other languages) to verify the parse . render == id property for a bunch of randomly generated values.  The first level of sophistication is often referred to as unit testing.  The second frequently goes by the term property testing or sometimes fuzz testing.

Both unit testing and property testing have some drawbacks.  With unit testing you have to write fairly tedious boilerplate of listing by hand all the values you want to test with.  Wit…

Armor Your Data Structures Against Backwards-Incompatible Serializations

As almost everyone with significant experience managing production software systems should know, backwards compatibility is incredibly important for any data that is persisted by an application. If you make a change to a data structure that is not backwards compatible with the existing serialized formats, your app will break as soon as it encounters the existing format. Even if you have 100% test coverage, your tests still might not catch this problem. It’s not a problem with your app at any single point in time, but a problem with how your app evolves over time.

One might think that wire formats which are only used for communication between components and not persisted in any way would not be susceptible to this problem. But these too can cause issues if a message is generated and a new version of the app is deployed before the the message is consumed. The longer the message remains in a queue, redis cache, etc the higher the chances of this occurring.

More subtly, if you deploy a ba…

Talk: Real World Reflex

I recently gave a talk at BayHac about some of the things I've learned in building production Reflex applications. If you're interested, you can find it here: videoslidesgithub

On Haskell Documentation

The following started out as a response to a Hacker News comment, but got long enough to merit a standalone blog post.

I think the root of the Haskell documentation debate lies in a pretty fundamental difference in how you go about finding, reading, and understanding documentation in Haskell compared to mainstream languages.  Just last week I ran into a situation that really highlighted this difference.

I was working on creating a Haskell wrapper around the ACE editor.  I initially wrote the wrapper some time ago and got it integrated into a small app.  Last week I needed ACE integration in another app I'm working on and came back to the code.  But I ran into a problem...ACE automatically makes AJAX requests for JS files needed for pluggable syntax highlighters and themes.  But it was making the AJAX requests in the wrong place and I needed to tell it to request them from somewhere else.  Depending on how interested you are in this, you might try looking through the ACE documentat…

How to Get a Haskell Job

Over and over again I have seen people ask how to get a full time job programming in Haskell. So I thought I would write a blog post with tips that have worked for me as well as others I know who write Haskell professionally. For the impatient, here's the tl;dr in order from easiest to hardest: IRCLocal meetupsRegional gatherings/hackathonsOpen source contributionsWork where Haskell people work First, you need to at least start learning Haskell on your own time. You had already started learning how to program before you got your first programming job. The same is true of Haskell programming. You have to show some initiative. I understand that for people with families this can be hard. But you at least need to start. After that, far and away the most important thing is to interact with other Haskell developers so you can learn from them. That point is so important it bears repeating: interacting with experienced Haskell programmers is by far the most important thing to do.

Measuring Software Fragility

While writing this comment on reddit I came up with an interesting question that I think might be a useful way of thinking about programming languages. What percentage of single non-whitespace characters in your source code could be changed to a different character such that the change would pass your CI build system but would result in a runtime bug? Let's call this the software fragility number because I think that metric gives a potentially useful measure of how bug prone your software is. At the end of the day software is a mountain of bytes and you're trying to get them into a particular configuration. Whether you're writing a new app from scratch, fixing bugs, or adding new features, the number of bytes of source code you have (similar to LOC, SLOC, or maybe the compressed number of bytes) is rough indication of the complexity of your project. If we model programmer actions as random byte mutations over all of a project's source and we're trying to predic…