2023-03-09

Haskell Basics - Lists & Functions 03

In the last post we explored the definition of functions by defining a function that returns Yep (our own boolean definition, Booly) if we pass it an empty list (using our own list definition: Lst). Here's the code with which we were working, and that we'll expand upon in this post. A friend of mine let me know about the Haskell Playground; I've added this as a snippet on the playground, I'll try to add all my samples over there and see how it works out.

picking up from last time

{-# LANGUAGE NoImplicitPrelude #-}
module List230217 where

-- | The Prelude defines the most common data types and functions for
-- | us, but the LANGUAGE construct at the top of the page excludes the
-- | normally automatically imported Prelude. Here we add back in what
-- | we might need in this module.
import Prelude (Show)

-- | our own version of a list datatype
data Lst a 
  = NIL
  | Cons a (Lst a)
  deriving (Show)

-- | our own version of a boolean datatype
data Booly
  = Yep
  | Nope
  deriving (Show)

-- | Returns true if an instance of @Lst@ is empty, false otherwise
isEmpty :: Lst a -> Booly
isEmpty NIL = Yep
isEmpty _   = Nope

List Length

Next function we'll tackle is finding the length of a list. Most (dare I say all) languages have a function to find the length of a list (or array, or vector, etc.), Haskell included. We, however, have been intentionally excluded Prelude, Haskell's standard library, and we've been re-inventing the wheel by defining our own list and boolean types for the purpose of understanding how these fundamental types work.

Recursive Functions and Inductive Data Types

You'll recall that our definition for lists is an inductive data type with two constructors, the NIL constructor that represents an empty list, and the Cons a (Lst a) constructor, which represents one element of the list and the rest of the list. When a sum type has one constructor that refers to it's own type (Cons has a reference to Lst), you should recognize it as an inductive type definition.

Inductive types and recursive functions are two sides of the same coin. The natural way to enumerate an inductive data type such as this is with a recursive function. Just like an inductive definition starts with a base case, which for our list definition is NIL, the same is true of the recursive function we use to manipulate it. This is a function for calculating the length of a list.

Length of a List

{-| An example list of integers 1 - 4 -}
ex01 :: Lst Int
ex01 = Cons 1 (Cons 2 (Cons 3 (Cons 4 NIL)))

{-| Compute the length of a list as an integer -}
lstLen :: Lst a -> Int
lstLen NIL = 0
lstLen (Cons x xs) = 1 + lstLen xs

(snippet in the playground)

You remember that the first line of the function is the type declaration lstLen :: Lst a -> Int. This tells us that that function has one parameter of type Lst a, where the a is a type variable meaning this will work with lists of any type.

The base case for the recursive function lstLen handles the case when the list (which is the only parameter) is NIL. When the list is NIL, the length is zero, because it's the empty list. The recursive case matches a list that is (Cons x xs), this will match any non-empty list. You'll notice our matching case binds variables x and xs. Get used to this syntax, it's super common, and Haskell has robust pattern matching and binding.

The x in the Cons case is the first item in the list, and the xs is the rest of the list. Note that the rest of the list is another list. We can say verbally that the Cons is an item in the list and another list. The "another list" may be another Cons , which would be the head of that list and another list. And so on, until we get to a Nil, which is the empty list.

To calculate the length of the list, when we get to a Cons, we'll say that the length of the list is one, plus whatever the length of the rest of the list is. In effect, our lstLen function replaces all the Cons that are chained together with a 1 + and it replaces the NIL with zero:

From List to Expression

1 2	(Cons 1 (Cons 2 (Cons 3 (Cons 4 NIL)))) -- ex01 (1 + (1 + (1 + (1 + 0 )))) -- lstLen ex01

The relationship of the inductive type to the recursive function is obvious! You'll see this pattern over and over again, as inductive types are fundamental to many functional data structures, and many functional languages (Haskell included) don't even have constructs like for loops. In Haskell, when you have to loop, you have to use recursion. If you're familiar with functions like map and fold you might say "hey wait - I don't have to use recursion, I can use functions like map and fold " ... well that's true but map and fold are recursive functions, so you may not have to write recursive functions, but you will definitely use recursive functions ;)

Double Every List Item

Now rather than counting every list item, we're going to modify each item in a list of integers by doubling it. But Haskell is immutable, so we can't change existing values. Whenever you want to "change" the value of a variable you have to instead think of creating a new variable with the new value.

Double Every Item

{-| Double every item in the list -}
doubler :: Lst Int -> Lst Int
doubler NIL = NIL
doubler (Cons x xs) = Cons (x * 2) (doubler xs)

(snippet in playground)

The doubler function follows the same pattern as lstLen. We handle the base case of the inductive type Lst by pattern matching NIL. Doubling every value in the empty list results in the empty list, no surprise there. The inductive case matches against Cons x xs, with the head of the list bound to x and the rest of the list is xs. The result of this match is a new Cons, with the head of the list being doubled, and calling doubler recursively on xs to get the new rest of the list.

Much like lstLen, the doubler function replaces each Cons as it walks its way down the list.

From List to Modified List

(Cons 1       (Cons 2       (Cons 3       (Cons 4       NIL))))  -- ex01
(1 +          (1 +          (1 +          (1 +          0  ))))  -- lstLen ex01

(Cons (1 * 2) (Cons (2 * 2) (Cons (3 * 2) (Cons (4 * 2) NIL))))  -- doubler ex01

Generalizing with Map and Fold

You may recognize the doubler as doing a map operation, and our lstLen is doing a fold. Let's start with a map operation.

Map

When we want to apply a function, for example one that double's an integer, to every item in a list (or other collection as we'll see), we can write the function directly as we did with doubler, or we can use a higher order function called map. A higher-order function is one that either takes a function as an argument, returns a function, or both. It's a function that operates on functions.

doubler is a function that doubles every element of a list. If we just look at the doubling part of doubler, we have

double an integer

1
2
3

{-| Doubles the input parameter -}
dub :: Int -> Int
dub n = n * 2

We can rewrite doubler using this function:

doubler using dub

{-| Double every item in the list using dub -}
doubler :: Lst Int -> Lst Int
doubler NIL = NIL
doubler (Cons x xs) = Cons (dub x) (doubler xs)

Now we can see how we might be able to generalize this function, if we pass the dub function in as a parameter. We'll change the name of the function as now it could do other things to every value in the list. Note how we specify the type of the modifying function; it's the same type signature of our dub function.

Generalized list modifier

{-| Apply a function to every element in the list to generate
    a new element -}
lstModder :: (Int -> Int) -> Lst Int -> Lst Int
lstModder _ NIL = NIL
lstModder modFn (Cons x xs) = Cons (modFn x) (lstModder modFn xs)

{-| Output list and doubled list -}
main :: IO ()
main = do
  putStrLn $ "ex1         : " ++ (show ex01) 
  putStrLn $ "doubled ex1 : " ++ (show (lstModder dub ex01))

Snippet

The first parameter is a function from Int to Int. We pass our integer doubling function dub to the lstModder function to do the work of modifying each element. Note how in the base case, when the list is NIL (empty), we use an underscore in the place of the first parameter. This tells Haskell to ignore the first parameter. In an empty list, we won't be using the modifier function. We could still name it, but Haskell would then warn us we have an unused variable.

The parentheses in the type signature are necessary because the arrow -> which defines a function, binds to the right. That means without parentheses the order of precedence would look like the following, and the function would not type check. It would be expecting three parameters first and second parameters to be a Ints and the third an Lst Int.

wrong order of precedence

{-| does not type check -}
lstModder :: (Int -> (Int -> (Lst Int -> Lst Int)))
lstModder _ NIL = NIL
lstModder modFn (Cons x xs) = Cons (modFn x) (lstModder modFn xs)

Now we have a general purpose modifier for lists of integers, but we could generalize further. What if we didn't specify Int as the input and output of our modifier function. And even better, what if the input and output of the modifier function didn't have to both be the same type?

general purpose lst modder

{-| apply a modifier function to a list of any type -}
lstModder :: (a -> b) -> Lst a -> Lst b
lstModder _ NIL = NIL
lstModder modFn (Cons x xs) = Cons (modFn x) (lstModder modFn xs)

We've already seen a type variable used in the definition of Lst to enable the Lst inductive type to represent a list of any type. The same approach is used in the lstModder function, the lower case a and b can both represent any type, without restriction. The implication is that the modifier function can not only modify one of the elements of the list, it can return a completely different type! The modifier turns a single a into a b, and the lstModder turns a list of a into a list of b. It could be that a and b are the same type as they are in the dub function, but they can also be different.

It happens now that our list modifier function is exactly a map function for our Lst data type.

list modder is map

{-| Apply a function to eeach element in a list 
    to produce a new list -}
lstMap :: (a -> b) -> Lst a -> Lst b
lstMap _ NIL         = NIL
lstMap f (Cons x xs) = Cons (f x) (lstMap f xs)

Snippet

We can use our new lstMap with a different modifier function to create a sort of a histogram:

from a list of ints to a histogram

{-| Makes a string of repeated strings -}
makeStringOf :: String -> Int -> String
makeStringOf _ 0 = ""
makeStringOf s n = s ++ (makeStringOf s (n - 1))

{-| Makes a string of asterisks -}
makeStringOfStars :: Int -> String
makeStringOfStars n = makeStringOf "*" n

{-| Turn a list of integers to a list of strings of asterisks where
    each string of asterisks represents the integer from the 
    original list -}
histogramOf :: Lst Int -> Lst String
histogramOf nums = lstMap makeStringOfStars nums

Snippet

Let's introduce a couple of more concepts that you'd find in real Haskell code before moving on. Here's another way to produce our histogram.

another way to histogram

1	putStrLn $ show $ lstMap (makeStringOf "*") ex01

Snippet

There are a few different things going on here. First, the putStrLn $ show $ (look at the snippet for the full context) - this is just a way to print our result to the output, we'll look at the details later. More interestingly, we've gotten rid of mkeStringOfStars. The type of makeStringOf is String -> Int -> String. In Haskell, if we pass this function that expects 2 parameters just one, what would it return?

curried function


makeStringOf :: String -> Int -> String
                 🡡         \__________/
                 |               |
                 |               |
makeStringOf    "*"              |
                                 |
                                 🡣
                          /-----------\
                          Int -> String

What we get back is another function, this time Int -> String. The asterisk string we pass to makeStringOf is captured in this new function that is returned. The new function will turn any Int into a string of that many asterisks. We then pass this in our lstMap function as the modifier. This behavior is known as partial application, we're only partially applying the parameters the makeStringOf function is expecting. This is perfectly legal and canonical Haskell.

Fold

While a map function is definitely a powerful tool, it is limited in that the structure being mapped over, a list in our case, is preserved in the output. Map doesn't allow us to change the output type. It will not only be a list, but be a list of the same size. What we need is a similar function that walks the data structure and allows more flexibility. Let's use this to refactor our lstLen function for finding the length of a list.

fold for our list type

{-| Fold for Lst -}
lstFold :: (b -> a -> b) -> b -> Lst a -> b
lstFold _ acc NIL = acc
lstFold f acc (Cons x xs) = lstFold f (f acc x) xs

...

{-| Get the length of a list by folding over the list with
    an accumulator function -}
lstLen :: Lst a -> Int
lstLen xs = lstFold (\acc _ -> 1 + acc) 0 xs

Snippet

The type of lstFold is interesting. The first parenthesized parameter has type (b -> a -> b). We saw earlier that the parentheses indicate that the first parameter is a function, in this case a function that takes two parameters a and b, and returns a value of the same type b as the first parameter. The first parameter, b, is usually referred to as the accumulator. This is the new value that we're building with our fold. Each application of lstFold will get the accumulator and the next value in the list. It should use these two parameters to calculate and return a new value for the accumulator.

The next parameter after the accumulator function is also of type b, and this should be the value of the accumulator before we start processing the list. When counting the elements in a list for example, we should start at zero as the accumulator. We need this because even at the first element, the accumulator function needs a value for the accumulator.

The last parameter to lstFold is the list over which we are folding.

The new lstLen function now uses lstFold and another new concept, an anonymous function, to calculate the length of the list. The anonymous function saves us from having to write a named function just to add one to a number. In Haskell an anonymous function begins with the backslash followed by the arguments of the function separated by spaces, and the body of the function follows the right arrow ->.

anonymous number

-- instead of
add1ToX :: Int -> Int -> Int
add1ToX acc _ = acc + 1

-- we can use an anonymous function
\acc _ -> acc + 1

Note that in the case of counting the elements in the list, we don't actually care what the element is, so in our accumulator function we ignore the second parameter that lstFold is looking for, the element in the list. But if we wanted to say sum up the values in an integer list, we could do that with this call to lstFold. The accumulator is carrying the running sum through the list.

sum the values in an integer list

1	putStrLn $ show $ lstFold (\acc x -> acc + x) 0 ex01

Snippet

η-conversion

There's another interesting optimization we can use, called η-conversion, (eta conversion). This one is optional, and some people really don't care to see it in code, but it is a fundamental concept with which it is good to be comfortable. As a matter of fact, with the Visual Studio Code plugin, you'll get linter hints where there's an opportunity to use η-conversion.

beta reduction for lstLen


{-| before η-conversion -}
lstLen1 :: Lst a -> Int
lstLen1 xs = lstFold (\acc _ -> 1 + acc) 0 xs

{-| with  η-conversion -}
lstLen :: Lst a -> Int
lstLen = lstFold (\acc _ -> 1 + acc) 0

Note that the type signature for lstLen hasn't changed, it's still looking for a Lst a and returning an Int. But in the definition, we've left off the parameter. We've also left the last parameter from body of the function, the call to lstFold. Look at this simple example:

η-conversion

-- these two definitions are the same (assume an abs function exists):

myAbs1 :: Int -> Int
myAbs1 x = abs x

myAbs2 :: Int -> Int
myAbs2 = abs

Partial Application of Infix Operators

Another common technique is using partial application of infix operators. Remembering that addition and multiplication are actually functions with an infix syntax, which means the arguments of the function are placed on either side of the function:

infix functions

add :: Int -> Int
add a b = a + b

-- these are equiv
-- note that you can't redefine ans over and over, this is just an example

ans :: Int
ans = 2 + 3

ans = (+) 2 3

ans = add 2 3

ans = 2 `add` 3

When we want to treat an infix operator, like addition, as a normal prefix function, we wrap it in parentheses. And when we want to treat a standard prefix function like add above as an infix function, we can wrap it in back ticks.

Knowing this, we can leverage partial application:

partial application of infix operators


{-| Add 2 to every integer in a list -}
ans1 :: Lst Int
ans1 = lstMap (2 +) ex01

{-| sum up every element in a list -}
ans2 :: Int
ans2 = lstFold (+) 0 ex01

Wrapping up

This is already a long post ... let's wrap it up by switching from the custom types we've been using for bools and lists to the standard ones from the Prelude, and using anonymous functions and η-conversion where we'd typically find them. Although there are standard functions for list length and sum, we'll still use our own.

On difference to notice when looking at this converted code - since Haskell uses lists so extensively, there's some syntactic sugar for added convenience. We can use square brackets in the type signature [Int] rather than List Int, we can use empty brackets [] rather than NIL, and we can use an infix operator : rather than Cons:

list syntactic sugar

----- Lst ------------------------------------ 
emptyLst :: Lst Int
emptyLst = NIL

ex01 :: Lst Int
ex01 = (Cons 1 (Cons 2 (Cons 3 (Cons 4 NIL))))

mapLst :: (a -> b) -> Lst a -> Lst b
mapLst _ NIL = NIL
mapLst f (Cons x xs) = Cons (f x) (map f xs)

----- List -----------------------------------
emptyList :: [Int]
emptyList = []

ex01 = (1 :: (2 :: (3 :: (4 :: []))))
-- or --
ex01 = [1,2,3,4]

mapList :: (a -> b) -> [a] -> [b]
mapList _ [] = []
mapList f (x:xs) = (f x) : (map f xs)

And finally, the final code listing, using the List type and other functions available in the Prelude.

Exercises with built in types and functions

import Prelude 

{- Prelude has 
     null :: [a] -> Bool    that retunrs True for empty lists,
     length :: [a] -> Int    to get the length of litss, and
     sum :: [Int] -> Int to get the sum of elements in the list
     
  so these implementations could be simpler still -}

{-| An example list of integers 1 - 4 -}
ex01 :: [Int]
ex01 = [1,2,3,4]

{-| Makes a string of repeated strings -}
makeStringOf :: String -> Int -> String
makeStringOf _ 0 = ""
makeStringOf s n = s ++ (makeStringOf s (n - 1))

main :: IO ()
main = do
  putStrLn $ "ex1         : " ++ (show ex01) 
  putStrLn $ "ex1 length  : " ++ (show (foldl (\acc _ -> acc + 1) 0 ex01))
  putStrLn $ "doubled ex1 : " ++ (show (map (2 *) ex01))
  putStrLn $ "sum ex1     : " ++ (show (foldl (+) 0 ex01))
  putStrLn $ "histogram   : " ++ (show (map (makeStringOf "*") ex01))
  
  -- bonus - mapping using foldr for a right fold...
  putStrLn $ "doubled ex1 : " ++
    (show (foldr (\x acc -> (2 * x):acc) [] ex01))

Snippet

Even with as much as we've looked at, we're still exploring the very tippy tip of the iceberg of the Haskell language. We've only looked at lists, but I've hinted that functions like map and fold can work over other data structures as well, like trees for example. Exploring how a function like map or fold might work over different data structures will require exploring type classes - a topic we'll have to tackle in another (possibly series of) posts!

But don't be discouraged - with the information we've covered we can create most data structures, and most any function to manipulate these data structures, only missing some conveniences and abstractions that would make our code more concise and reusable, but no more correct. You've got the basic building blocks right now!

2023-02-11

haskell

Haskell Basics - Lists & Functions 02

Previously:

In recent posts, we've discussed a list data type and the basics of function types. Here's the list that we came up with last time (I've updated the module name for this post, but otherwise it's the same as last time).

Lst Example

{-# LANGUAGE NoImplicitPrelude #-}
module List230210 where

import Prelude (Show)

data Lst a
  = NIL
  | Cons a (Lst a)
  deriving (Show)

Functions on our List Type

Now we'll combine topics of those two posts and write some of the common functions found that operate on lists, and in doing so, we'll come across several other important concepts in Haskell and in functional programming in general.

Is the List Empty?

Determining whether or not a list is empty is perhaps the simplest possible list function. When programming in Haskell one approach is to write the signature of the function. Believe it or not, in many simple cases, there's only one way to write a function for a given signature.

Signature of isEmpty

1	isEmpty :: Lst a -> Bool

We're stating that the variable isEmpty has the type: "a function from of Lst a to Bool". In Haskell, functions are pure. This means that functions behave more like mathematical functions than the functions we're used to in other programming languages. The function isEmpty can only use/refer to values passed to it as parameters and in its context, which means any variables or modules that are in scope. In this case, there's nothing in our module but the isEmpty function, so it has no other information other than the parameter of type Lst a. No other information is available to the function. Let's look at an implementation.

isEmpty implementation

{-# LANGUAGE NoImplicitPrelude #-}
module List230210 where

import Prelude (Show)

-- | A datatype for lists of values
data Lst a
  = NIL
  | Cons a (Lst a)
  deriving (Show)

-- | A custom implementation of a boolean like type
data Booly
  = Yep
  | Nope
  deriving (Show)

-- | Return @Yep@ if a @Lst@ is empty, and @Nope@ otherwise
isEmpty :: Lst a -> Booly
isEmpty NIL = Yep
isEmpty _   = Nope

There are several new elements here to look at. Normally everything in Prelude is imported automatically, but if you recall we've explicitly excluded Prelude with our LANGUAGE directive at the top of the module, so we can redefine types and functions related to lists in this exercise. So we need to implement our own boolean type. The "official" boolean data type in Haskell is Bool, and our Booly data type is defined the same way. We've named ours differently to make it obvious we've defined our own two valued type, but that's all a boolean type is, a data type with two values, and we intuitively assign semantics of truth-hood and false-hood to the two data constructors Yep or True, and Nope or False.

Pattern Matching

Finally we get to the definition of the isEmpty, and we see one of the key concepts in Haskell that looks very strange to those familiar with mainstream languages: pattern matching. There are two definitions of isEmpty. In the first, in the position of the parameter, we have the data constructor NIL from the definition of Lst, not a variable like we saw in the first function post, mathy. This causes Haskell to pattern match against the incoming value. If the incoming parameter has the value NIL, the expression evaluates to Yep.

In the second definition, there's an underscore in the parameter position. This indicates that Haskell should ignore the value in that position, any value will match the underscore. This means that if the first pattern doesn't match a NIL, the second pattern will match any value. For the isEmpty function, this is what we need. If we match NIL, then Yep the list is empty. If we match anything else, then Nope the list is not empty. That's the complete definition of isEmpty!

The Type (almost) Defines the Function

Earlier it was stated that often, especially for simple functions, there's only one way to write a function given its type. isEmpty is such a case. What else can we possible do with a function of type Lst a -> Booly ? There are only two possible outputs - Yep and Nope. Since the signature specifies a Lst of a values, but it doesn't say what a should be, it could be anything. This is a polymorphic function with no constraints on the type variable a. Since we don't know what a is, we can't know any function that works on a or any of the data constructors that might be part of the definition of a. So in effect, the a tells us nothing. All we know about the parameter comes from Lst, and it says the value might be NIL, or it might be Cons a (Lst a).

There are only four ways to define this function:

Four possible ways to define iEmpty

-- | the right way
isEmpty :: Lst a -> Booly
isEmpty NIL = Yep
isEmpty _   = Nope

-- | not the right way
isEmpty2 :: Lst a -> Booly
isEmpty NIL = Nope
isEmpty _   = Yep

-- | cmon now
isEmpty3 :: Lst a -> Booly
isEmpty3 _  = Yep 

-- | ok we've stopped trying
isEmpty4 :: Lst a -> Booly
isEmpty4 _  = Nope

The first has the semantics we're looking for - it's consistent with the human language we're using. The second is syntactically correct, but has the wrong semantics; it has what we sometimes call a logic error. The last two are constant functions. They ignore their argument and always return the same value, clearly not what we're looking for. So there's only really one way to write this function!

🤔 ... if you're thinking there's something to the fact that there are only 4 ways to write this function, congratulations, you're paying attention. This should remind you of the discussion of algebraic types from the post on lists, where we said that for product types, the number of possible values for a type is the product of the number of values in each of the parameterized types. Recall that the type Quiz Answer Answer has 16 different possible values, since Answer had four possible values.

Our Booly type has two possible values. But our Lst type is recursive, and you can see how it might have an infinite number of values. So 2 * Infinity == Infinity, not four. But, since we conceptually only care about the two different data constructors for Lst, NIL and Cons..., conceptually there are only two values to Lst, giving 2 * 2 == 4, but where's the product algebraic type?

Recall again, this time from the post on types, that the arrow -> we've been using for function definitions is a polymorphic type constructor over two types. This is easier to see when we alias the arrow and use it in a prefix way (rather than infix). That gives us this:

alias the arrow type

-- @type@ keyword creates an alias for a data type
type Func = (->)

-- | Return @Yep@ if a @Lst@ is empty, and @Nope@ otherwise, now
--   with a type signature using the @Func@ alias for @(->)@
isEmpty :: Func (Lst a) Booly 
isEmpty NIL = Yep
isEmpty _   = Nope

Ah - now we see that the arrow is a type, and the two type parameters that make it a product type are the input type and the return type.If we borrow the notation |T| from set theory to mean the cardinality (number of values) that inhabit a type, then we see that:

1	\|T a b\| == \|a\| * \|b\|

Where T is a polymorphic type over a and b which are type variables, and in the case of isEmpty we have

1	\|Func (Lst a) Booly\| == \|Lst a\| * \|Booly\| == 2 * 2 == 4

If we allow that we're only considering two semantically relevant values populating Lst.

So what?

This is not going to help you configure your Kubernetes cluster or get all the text boxes in your div to line up correctly. Understanding things at this level is the starting point, the tip of the iceberg, to understanding the incredible relationship between computer science, logic, and abstract mathematics. Haskell is a great tool for this exploration, an exploration on which I myself have taken only a few steps. My writing these posts helps me continue to look closely at these topics, continue to have new insights, and solidify concepts.

Have you ever heard someone say that functional programs are "easier to reason about"? Most of the people who I've heard say this love functional programming, and know that there's something to it that is intuitive, that stimulates their need for things to be ordered and logical. But then there are those who have really studied and gained an understanding of abstract computer science, type theory, programming language theory, set theory, category theory, etc. or some combination of those. They think of reasoning differently.

reasoning (noun): the process of thinking about things in a logical way; opinions and ideas that are based on logical thinking

Oxford English Dictionary

...based on logical thinking. That's the key. That's what's out there to be learned.

Happy Hacking!

2023-02-07

haskell

Haskell Basics - Lists 01

Lists are one of the fundamental data structures in Haskell and many other programming languages. LISP for example is a portmanteau of "LISt" and "Processing". The implementation of lists in Haskell is also a great way to look at a few different key features of the language that we'll discuss.

No Implicit Prelude

Let's do some "first principals" type experimentation. We'll create a module and tell Haskell not to import Prelude, which is the standard library. This will allow us to create our own list types and functions without conflicting with the ones from the standard library.

If you want to follow along, I suggest following these installation instructions. Then you can create a file with the *.hs extension.

1
2
3

{-# LANGAUGE NoImplicitPrelude #-}

module List230207 where

The first line is a compiler directive that excludes Prelude from being automatically imported. The other line defines a module. You can name your module any legal module name; alpha-numeric starting with a capital letter.

A List Type

We can define a list in Haskell like this (normally we'd use Prelude's list, this is illustrative only). In the following snippet, the data keyword is defining a new algebraic data type, Lst a, which has two data constructors, NIL and Cons.

The Lst Type

{-# LANGUAGE NoImplicitPrelude #-}

module List230207 where

import Prelude (Show)

data Lst a           -- names the new algebraic data type
  = NIL              -- First data constructor, represents empty list
  | Cons a (Lst a)   -- Second data constrctor, a value appened to an existing list
  deriving (Show)    -- Allows conversion of Lst values to strings for display

After excluding Prelude with the language extension, we add back in something called Show, which will make it easier to display values we're working with in the REPL; don't worry about the details for now.

Algebraic Data Types

Let's break down the definition of the Lst type by first examining the data keyword, which indicates the beginning of a definition of a new algebraic data type. There are two kinds of algebraic data types - sum types and product types. What we've described above is a sum type. It's called sum because the total number of possible values is the sum of the number of values that each data constructor can produce. Let's look at a simple example:

Example Algebraic data type

data Answer 
  = Yes 
  | No 
  | LeaningTowards Bool
  deriving (Show)

In this strange example, we have a data type Answer that has three data constructors. Two of them, Yes and No, take no parameters. The last one, LeaningTowards has a Bool parameter. This means the values, or terms, that inhabit the Answer data type are: Yes, No, LeaningTowards True, and LeaningTowards False

The total number of values inhabiting Answer is four, which is the sum of the number of terms that each data constructor can produce. Yes and No can only produce one value each since they have no parameters. Intuitively they behave like values and you'll soon see how in code they're used in places where values are used, but LeaningTowards has a parameter of type Bool, which defines two data constructors (True and False), bringing to total possible values of type Answer to four. Algebra! Sum!

The other type of algebraic data type is the product type. An example is:

1	data Quiz = Quiz Answer Answer

The data keyword is the same, indicating we're introducing a new (algebraic) type. The data type is Quiz, and it has one data constructor, also Quiz. Don't be confused by the fact that the type and the one data constructor are both defined as Quiz, they're used in different contexts, so once the idea of types vs type constructors clicks for you, this doesn't present a problem. These are 16 possible values for the type Quiz:

`Quiz Yes Yes`	`Quiz Yes No`	`Quiz Yes (LeaningTowards True)`	`Quiz Yes (LeaningTowards False)`
`Quiz No Yes`	`Quiz No No`	`Quiz No (LeaningTowards True)`	`Quiz No (LeaningTowards False)`
`Quiz (LeaningTowards True) Yes`	`Quiz (LeaningTowards True) No`	`Quiz (LeaningTowards True) (LeaningTowards True)`	`Quiz (LeaningTowards True) (LeaningTowards False)`
`Quiz (LeaningTowards False) Yes`	`Quiz (LeaningTowards False) No`	`Quiz (LeaningTowards False) (LeaningTowards True)`	`Quiz (LeaningTowards False) (LeaningTowards False)`

The one and only data constructor for Quiz has two parameters of type Answer, and we've seen that Answer has four possible values, therefore the product type Quiz has 4 * 4 or sixteen possible values. Algebra! Product!

Kinds and Polymorphic (aka Generic) Types

The Quiz and Answer are monomorphic types because they contain no type parameters. We say the kind of Quiz and Answer is Type. This is best illustrated with an example of a simple polymorphic data type:

the unbiquitous Maybe type

1
2
3

data Maybe a
  = Nothing
  | Just a

The Maybe type is omnipresent in functional programming languages, and some form of it is increasingly found in imperative languages as well. It's typically used to represent a value that may or may not exist. The type itself is parameterized - the a in the example is a type parameter. If the kind of Quiz is Type, what is the kind of Maybe?

it's Type -> Type.

Knowing what we know about Haskell functions, this seems like a function that takes a type and returns a type. That's pretty much what it is. We cannot specify terms to have a type Maybe, because Maybe isn't a Type! it's a Type -> Type, or a higher kinded type. To get a Type from Maybe, you must pass it a Type. For example:

student quiz results

s1 :: Maybe Quiz
s1 = Just (Quiz Yes No)

s2 :: Maybe Quiz
s2 = Nothing       -- didn't take the quiz

s3 :: Maybe Quiz
s3 = Just (Quiz No (LeaningTowards Yes))

The kind of Maybe Quiz is Type which means it can be used to specify the type of the terms s1, s2, and s3 above.

We see Maybe data constructors Nothing and Just being used here. Either a student took the quiz, represented by Just ... or they didn't, represented by Nothing.

Back to Our List

data Lst a
  = NIL
  | Cons a (Lst a)
deriving (Show)

Now that we understand algebraic data types and higher kinded types, we can see that Lst is kind Type -> Type, and NIL and Cons Lst are data constructors. So far so good. Lst adds one more concept - a recursively defined, or inductive construct. The first data constructor is NIL, which represents an empty list. The other data constructor, Cons a (Lst a) is recursive; it has two parameters, the first being a which represents the type of values the list can contain, and it the second parameter is a value of type Lst a, another list. Lets look at an example of using our Lst type.

Lst Example

module List230207 where

import Prelude (Show, Int)

data Lst a
  = NIL
  | Cons a (Lst a)
  deriving (Show)

nums :: Lst Int
nums = Cons 1 (Cons 2 (Cons 3 NIL))

In this example nums is a Lst Int , or a list of integers. I've included Int in the list of identifiers being imported from Prelude so we could specify the type parameter for Lst to be specifically Int. Noticed how we declare constants in a similar way that we declare functions, the type specification comes before the assignment (see this post on functions for more info). This is effectively how Haskell's standard library implements the list type, only since lists are so common and heavily used, there's some syntactic sugar in Haskell syntax that makes working with lists much more bearable and intuitive. We'll see these later, but our definition is definitely a valid way to represent lists.

So now that we've got a list defined ... we need to do something with it. Next time we'll work with our list type in creating some functions you typically use with lists - things like getting the length, appending one list to another, and mapping values in a list using a function to create a new list.

Happy Hacking!

2023-02-07

haskell

Install Haskell with GHCup

A quick and easy way to get Haskell set up and keep it up to date is to use GHCup. Follow the link for directions. GHCup will install a few different components. Haskell has a history of what is, by modern standards, poor tooling. It's not a new language, Haskell was developed in the early 90s, and for the first couple of decades of its existence was mostly an academic programming language, and it shows.

Things are far better now, but not quite as slick as Golang or Rust's ecosystems which have the benefit of being recently developed with contemporary thinking about tooling and package management. Here's some initial guidance.

GHC - Glasgow Haskell Compiler

The open source compiler for Haskell. Technically this is just one of the Haskell compilers available, as the intent of Haskell was to be a specification and not an implementation, thereby not locking anyone into a single compiler.

The reality is that most people are using GHC, and that most experimentation and expansion of the Haskell specification is happening in GHC. It's considered the most advanced Haskell compiler and to make your life simpler, just go with GHC and forget that there are other choices, until you find a compelling reason to do so.

Cabal

The standard package system for Haskell software, and for the longest time was the way to build Haskell programs, find and download packages, and build and publish packages. Many people still use Cabal, and it's not too bad. I personally don't I use the next tool in the list.

Stack

aka The Haskell Tool Stack, it's a program for working with Haskell projects, and is a bit more modern in its approach. This is the approach I use, and while still not perfect, it's at least somewhat reasonable, and once you establish a practice it's trivial to use Stack for package and product management. Stack actually leverage Cabal internally to do the actual work of package management, so the're reasonably compatible and interchangeable.

HLS

The Haskell Language Server, an implementation of the Language Server Protocol for Haskell. The Language Server Protocol, or LSP, defines a protocol use between an editor or IDE and a language server that provides language features like auto complete, go to definition, find all references, etc.. For a long time (and this is where the poor tooling rep was spot on), you would have a hard time working with Haskell outside of Emacs. With the HLS, we get a decent language aware IDE from VSCode, and many other editors that are now LSP aware.

Using the REPL

Once you've got the Haskell tooling installed and before you get involved with setting up a full project with either Stack or Cabal, to get your feet wet you can use the REPL combined with a single Haskell file like this:

1
2
3

$ # create your source file
$ touch scratch.hs  # or whatever you want to call it
$ stack repl scratch.hs

This starts the REPL. You'll see a prompt Prelude>, indicating you've got the REPL with Prelude (the standard library), and your file is loaded (even if it's empty). From here, you can use a few of these handy REPL commands:

Command	Purpose	Example
`:?`	Get help on all the commands
`:quit`	Quit the repl
`:type`	Output the type of the term or expression	`Prelude> :t [1,2,3]`
`:kind`	Get the kind of a `Type`	`Prelude> :k Maybe`
`:edit`	Opens the currently loaded file (scratch.hs in our example) in the default editor. Saving and quitting the editor will reload the file in the REPL
`:reload`	Reloads the currently loaded file. You can have an editor open on the file in the REPL, and iterate on the file, saving it each time. Then use the `:r` command in the REPL to load the latest save

Editing Your File

The REPL is handy of course for trying things out. But since it's only one line at a time (there's a kind of multi-line ability but it's not great), it's handy to have a file in your REPL session. You can either keep the file open in your favorite editor and iterate over it making changes, and periodically reload into the REPL with the :r command, or you can use the :e command to fire up your default editor (VIM in my case) make edits, and when you save/quit, the REPL will automatically reload your source file.

That's the basics... Happy Hacking!

2023-02-06

haskell

Haskell Basics - Functions 01

The syntax for function definition in Haskell is different from most other programming languages, especially mainstream programming languages. Let's take a look at a simple function.

1
2
3

-- | Perform math operation on two integers
mathy :: Int -> Int -> Int
mathy m n = (m * 10) + n

If you're new to Haskell this will look very strange. The first thing to notice is that the first line (after the comment) is the type declaration for the function. A variable mathy is going to be bound to the definition of this function. In Haskell once a variable is bound it cannot be changed. In Haskell values are immutable and once a variable is bound to a variable, including functions, they cannot be altered.

The double colon :: is how we define type signatures, so the first line says that mathy has the type Int -> Int -> Int, which we can think of as the type signature of a function. in this case there are 3 parts separated by right arrows ->. Using conventional terminology, you might say that each part is a parameter of the function except the last, which is the return value of the function. Therefore we can say that mathy takes two Int parameters and returns an Int value.

The next line is the definition. The variable bound to the function is separated by its parameters by whitespace. m and n are the parameters of the function. The "body" of the function is the expression after the equal sign, (m * 10) + n.

Calling Functions

Once the function is defined it can be called with parameters to return a value.

> mathy 21 5
215

> mathy 8 8
88

Arrow - The Type of Functions

Aside from the different syntax we've just seen, the first real big significant difference between Haskell and most other languages is that the arguments are simply listed after the function name; no parentheses or commas.

Partial Application

More significantly, we can call a function without passing all the arguments to the function. Let's take a look at the function, it's type, and the types of values that come from calling the function in different ways. Note the command :type at the REPL returns the type (it can be abbreviated :t).

> -- the type of the function bound to mathy
> :t mathy
mathy :: Int -> Int -> Int

> -- callng with one parameter
> :t mathy 8
mathy 8 :: Int -> Int

> -- calling the all parameters
> :t mathy 8 4
mathy 8 4 :: Int

You should be seeing the pattern here. Arrow -> is actually a polymorphic type constructor, in some other languages called a generic type. Specifically it's a generic type in two type variables, the input (on the left of the arrow) and the output (on the right of the arrow). In C#, the type of anonymous functions is Func<T,TReturn>, a polymorphic type in two type variables, the first T is the type of input, and the second TReturn is the type of output. Java has a similar generic type for anonymous functions.

Infix

Haskell has the notion of infix operators, and that's what -> is, an infix polymorphic type constructor. Functions and type constructors are typically post fix; the arguments go after the function name. Infix works like the plus sign, which represents the addition function. The plus sits between the two numbers being added. Haskell lets us define functions, and type constructors, as infix.

We could define our mathy function that's infix:

1
2
3

-- | An infix version of mathy
(++++) :: Int -> Int -> Int
(++++) m n = (m * 10) + n

Then it could be called infix style, like you would addition:

1 2	> 8 ++++ 5 85

You can also call non symbolic functions in an infix style by surrounding it with back tick characters.

1
2
3

-- infix calling of mathy
> 8 `mathy` 5
85

We can see that -> is a type constructor by assigning an alias that's not symbolic. We'll use the same name as the C# type for anonymous functions.

-- | create a type that's an alias for arrow
type Func = (->)

-- | using Func to define a function. It has the same type
--   as mathy, so we can assign it directly
mathy2 :: Func Int (Func Int Int)
mathy2 = mathy

It's now even more clear that mathy2, like mathy is actually a function that takes an integer and produces a function that takes an integer and produces an integer. We can do this:

> mathyOfEight = mathy 8
mathyOfEight :: Int -> Int

> mathyOfEight 5
85

> mathy 8 5 == mathyOfEight 5
True

We've seen that in Haskell a function is defined using a type signature, and a function body. We've seen the Haskell arrow -> type, which is the type of functions, and we've seen how we can partially apply functions to create new functions with fewer arguments. A future post will get into a bit more of the syntax of functions in Haskell.

2023-02-05

misc

A Fresh Start

I'm restarting this blog after several years of inattention. I've rebooted a few times over the years, but failed in the past to keep it up. Now I feel the need to capture my experiences, interests, and path of further education, for my own sake. I look to this as a purpose where before I can't say I had one. So with this in mind, off I go.

Parting with the Past

I have a few posts from the past, most of which are more than 12 years old. In previous reboots I incorporated those post in new versions of this blog. Some of them are still interesting, possibly even relevant. But I've changed significantly in the last dozen years. Connecting my thoughts of 12+ years ago with my current thinking seems feels forced. I'm no longer intrigued by the possibility or writing multi-threaded algorithms for the browser in Silverlight!

So with that I'm putting all those articles into an archive. This with be the new oldest post. The new, Fresh Start.

Areas of Interest

There are many. Too many. It's a wonderful problem to have. I sit in my den on a Sunday morning, put a playlist from the Total Baroque channel on the entertainment system, and have to choose what to spend the next few hours studying, practicing, writing. Today I'm getting this blog back on its feet; consider that area of interest #1.

I'm using Hexo to generate the blog because I wanted a simple static site generator that allows the authoring of posts using straight markdown. My posts won't be locked into a too-proprietary system. I use Obsidian heavily for notes, and I can set up the source for my blog to also be an Obsidian "vault", allowing me to work on blog posts with the same tools that I use all the time for other purposes. In the past I've used blog systems that were also tied to my current interest... but as interests change, so did my desire to keep up with whatever esoteric blogging approach I was using. Hexo is a node based system, simple to set up and use and, importantly, simple to come back to and be able to continue using without having to remember a series of bespoke incantations. Let's see how long this lasts.

Functional Programming

What a broad topic. I've got to take this back to about 2006. Around this time (I'm not going to look up specific dates) I was working as an independent contractor / consultant sub-contracting under Microsoft Consulting, building a system for the Pinellas county tax collector in Clearwater Florida. It was Microsoft stack, C# specifically, a distributed system, "thick" client (what we called native apps at the time) using DCOM, a now obsolete distributed networking stack.

C# gets Linq

Around this time Microsoft added Linq to C# and the .NET ecosystem. Linq is marketing speak for a combination of runtime, language, and library extensions that "adds native data querying capabilities to .NET languages". This amounted to adding anonymous functions to C#, extension methods, and an abstraction called IEnumerable that make it reasonably intuitive to write and compose functions we'd now recognize as building on maps and folds.

This was a bit shocking and revolutionary at a time when "functional" concepts had not yet started to become ubiquitous in the most popular OO ecosystems, Java and .NET. I remember specifically working with Jerry Maresca, one of the senior engineers from Microsoft (with whom I greatly enjoyed working), hungrily consuming these new capabilities and libraries, and experimenting with the new approaches to both low level algorithm implementations, but also what this meant for the structure and organization of entire programs. Object Oriented programming had been the norm for "line of business" programming since the late 80s / early 90s.

Linq and the associated technologies and libraries squeezed functional concepts (which interestingly enough is not a term that has a single, well accepted defintion), into what remains to this day a very object oriented ecosystem. As we began to see the elegance and utility of thinking of problems in terms of functions, functional composition, and data structures uncoupled from the functions by which they are manipulated (ie methods). This is directly at odds with object oriented thinking, and we did some strange experiments combining these paradigms using these early tools.

But the real benefit for me was to open my eyes to the possibility of significantly different ways to think about computing at all levels. In short, I thought I had fairly complete knowledge of the landscape of production software engineering and computer programming, and to realize how wrong I was about that was captivating and exciting!

A Pivotal Discussion over Lunch

A couple of years later, working again with Microsoft Consulting with Stephen Cohen (Chief Architect at Microsoft), this time for the Department of State in DC, I was at lunch with Stephen and Randy Miller. The topic of Lisp came up, at the time i was aware of lisp, but had failed to have any interest in a language that was outside of the ecosystem in which I was immersed. Randy made a compelling argument supporting his admiration and appreciation for lisps which piqued my curiosity.

Over the next weeks during frequent visits to the local Borders where several of us on that gig would hang out in evenings (we were "on the road" in DC from all over the country for this gig), I began seeking out and reading about Lisps and other languages. I wandered the field and eventually came across OCaml, and after playing with it a bit and gaining interest, fell upon a research project being worked on by Don Syme of Microsoft Research, a discovery that would radically alter the course of my journey.

F#

Don Syme was developing F#, a descendant of OCaml for the .NET framework at around the time I was having that lunch with Stephen and Randy. F# was still a couple of years away from production ready release, but you could get your hands on the compiler from the Microsoft Research site, and with some fiddling you could actually use it in an ecosystem (.NET) which was rich enough to make the endeavor worth while.

This was a major tipping point for me. I was now obsessed with this new way of thinking about writing software. I'd eventually come to understand that functional programming wasn't new at all, and importantly, function programming opens the door to a universe of cross disciplinary thought. I spent several years of using F# on my own time, and also building an appreciation for Clojure (a JVM lisp), but never having the opportunity to use either professionally. That would change when I was cold contacted by Aimee Gerzofsky a recruiter at Jet.com.

Jet.com was an e-commerce startup, pre-launch in 2015. The interesting part - they were an F# shop! This was 90% of the reason I would eventually join Jet and move to the NY area. I was working full time in a functional programming language. This was then and remains now far and away the best job I've ever had. I was being paid to spend thousands of hours over the next few years writing code that I would have been writing on my own time. F# remains a favorite of mine, though my current interests have moved on to find and probe the more exotic regions of the computer science landscape.

Haskell

A few years later I found myself with a deep interest in Haskell. I had taken a few shots at learning Haskell (which happens to many I've since learned), giving up and returning to the familiar, comfortable lands of F#, Clojure, and OCaml, before it finally started to click for me. This corresponded to a push toward management in my career.

This "push" was not driven by my interests, but rather that of the companies for whom I worked. This is common for software engineers as they hit the 10, 15, 20 year marks in their career. Companies will place a hand firmly in the small of your back and shove you into management, where your skills will immediately start to fade as thoughts of budgets, timelines, recruiting, career planning, and most disruptively, the authoring and delivery of endless power point presentations take the time you'd rather spend writing beautiful code. Sigh.

Most of my spare time was spent diligently building my intuition for the pure functional world of Haskell. This is an ongoing endeavor to this day. Haskell is my favorite general purpose programming language, and once again I find myself fixated on technologies that I will very likely never have the opportunity to work on professionally; this serves only to motivate me further.

Gödel, Escher, Bach

In the last several years, starting in approximately 2018, I crossed paths with several topics that were not obviously connected with my interest in functional programming. I received a copy of Gödel, Escher, Back (Hofstadter, 1979) (G.E.B.) from my family as a birthday or Christmas gift (I forget which), it had been on my wish list for some time. I knew nothing about it other than it appeared on the must read lists of several people I admired or followed.

Around the same time, I had come across Phil Wadler's Propositions as Types talk from Strange Loop 2015, which broke my brain. Wadler was talking about something that was obviously profound and fascinating, and I didn't understand 25% of it. But the fuse was once again lit for me. There was a connection between logic (which I knew nothing about), and computer science, and it was fascinating... more to come.

Also around the same time, my daughter was at Kenyon College studying philosophy and the classics. She had taken a course on Propositional Logic, and in chatting about it I came to realize that the topic had much overlap with the boolean logic that's a deep and integral aspect of computer programming. Add propositional logic to the list of topics in which I suddenly found myself deeply interested.

All Things Connected

Between the beginning of my studying G.E.B., discovering the idea of Proposition as Types, opening a line of studying of propositional logic, and all of the studying of functional programming I've done over several years, I came to understand the underpinnings of all these topics are related. Abstract Mathematics. Another shocking discovery for me - mathematics, a topic for which I would regularly declare an unabashed disdain for my entire life until this point, had stealthily worked it's way into my academic life to suddenly spring forth and become the center of my universe. I had to, and still have to suppress the feeling that had I come to this realization 30 years sooner, I would be in a very different place indeed. Such, as they say, is life.

The Academics

Which brings me, finally, to the remainder of the list of my interests. They include, but are not limited to:

Type Theory
- Dependent type theory
- Linear types
Category Theory
Set Theory
Formal Verification
- Calculus of Constructors
- Logic
- Coq, Agda, Idris, and most recently Lean4
- TLA+
Compiler Design
- The various flavors of Lambda Calculus & Combinators
Programming Language Theory

This challenge I now face regularly; spending enough time with each of these topics to retain what I've learned, while pushing forward the boundaries of my own knowledge one area at a time. Again, this is a really nice problem to have.

My journey continues.