Friday, August 01, 2008

BlogPostAnswer.hs

{-

Intro


-}
module BlogPostAnswer where

import System.Environment

{-

This is haskell source code, which relates
to a nice blog post of someone comparing
java with python.

Original post



I picked some of the python-java comparisons
from that post, which I would like to
extend in order to contrast java and python
with haskell.

You can actually use the complete post as a
haskell source code file. All prose is
embedded in "{-" and "-}", wich are the block
comment tokens in haskell syntax.

I am a beginner haskell-hacker, and I dont have
a clue about python, so enlighten me, if you spot
a mistake.

Like python, haskell allows you to use indenting
as part of the syntax, unlike python however, you
also have the choice of using "{", "}" and ";"
instead of, or combined with, indentation.

Also, like in python, and unlike java you don't need
to put type declarations into you source code.

The step from java to haskell is
not the step from a statically typed language
to a dynamically typed language. Neither is java
really statically typed, nor is haskell
dynamic.
The opposite is true, haskell has a far stricter static
typesystem than java has, but it uses a variation
of the Hindley-Milner typesystem wich includes
type inference - This effectivly frees one from
writing type specs in most cases.

Haskell has a rich typesystem with all kinds of
good stuff. For someone who knows
C++ template programming, learning haskell might be
easier than learning java 5 generics.

I will try to give code examples in haskell, that
roughly correspond to the java and python code examples
from the blog post.

Plase aquire ghc and load this blogpost as haskell file.
Every piece of code shown here, is complete and can be
executed in the interactice haskell shell called ghci.

Assuming ghci is started in the director where textfile
with this source is located, the following commands
should be typed at the ghci prompt in order to compile and
load the code:

> :l (name of file)

(this loads the code)

Now you can call every function in this source code from the prompt
by writing the name and the arguments of that function seperated by
spaces.
i.e.

> test_matches_2

to run the "main" function, you must compile the file outside the interactive shell and
start the programm from the commandline like this:

bash $ ghc --make (nameofsourcefile)
bash $ ./(nameofsourcefile) "hallo welt"

Again, let me remind the reader, that I am not
a haskell expert, and that there might be far
more elegant ways to do it.

ok, let's get started:

Observation 1: Haskell is a unique mixture of readability and brevity



Java:

public class Test
{
public static void main(String[] args)
{
for (String s : args[0].split("\\s+"))
System.out.println(s);

System.exit(0);
}
}

Haskell:

-}

main = getArgs >>= mapM putStrLn . words . head

{-

This examples pipes a list of command line parameters into the
functional combination of three functions that do the following in the
order they are listed:

  1. takes the first element and ignores the rest: "head"

  2. seperates this element into a list of words of wich the first element consists

  3. write each word seperated by a newline to the stdout


As you may have noticed:

  1. ">>=" pipes data

  2. "." combines two functions to a new function

  3. "mapM f" call the function "mapM" with the parameter "f"


Haskell function application syntax is very similar to the
typical unix shell function invocation syntax.
The name of the function and the arguments are put in a line
next to each other seperated by spaces instead of brackets.


Observation 2: No need for set/get methods in Haskell



One purpose of the 2nd code example from
the above mentioned blogpost is to show off
a nice python feature needed in a language
with mutable state and oo features like instance
variables.

Haskell has neither mutable variables nor objects with variables.
Haskell is a pure funcitonal programming language,
meaning that like erlang, haskell has no notion of
variables that may be changed after being bound.
The typical OO-Class system, having classes with
i.e. setters for properties is not possible in
haskell without rolling your own little object
system (wich isn't hard at all, but also not
necessary).

Haskell provides other means for grouping data.
One way is to use algebraic data types:

Observation 3: Haskell has useful constructs Java lacks


algebraic datatypes


Algeraic data types are those constructs that start with the reserved keyword "data".
They create symbols associated with a place to create new structural type info associated
with that symbol; with the hint that "|" means "or" and that all words starting with capitals
are symbols wich we associate witch structure and meaning, and that everything starting with a
lowercase character is a (type-) parameter, have a look at the definiton of the datatype definitons in the follwoing;


-}

data Point2D = Point2D Double Double deriving (Show)

{-

this defines a data type wich
contains to floats: a 2-D point...
before we go on, let's define another data type,
but this time one that is
parameterised with another type.
Comparable with C++ templates:

-}

data Triangle p = Triangle p p p deriving (Show)

{-

now we create a simple alias
for 2D triangles (typedef in C++):

-}

type Triangle2D = Triangle Point2D

{-

Confused now? good, we can move on...

Our Point2Ds can be created and read.
Their contents (the two floats) are
readonly. This of course doesn't bother
us at all, it turns out in the long run
to be a really nice feature
concerning readability of code.

Assume the following function that generates
a "circle" made of n points with a radius of r:

-}

generate_circle r n = [Point2D (calc sin i) (calc cos i) | i <- [0 .. (n - 1)]]
where calc op i = r * op ((i/n) * 2 * pi)

{-

which looks like this in Java:

class Point2d {
private final double x;
private final double y;

public Point2d(double x, double y) {
this.x = x;
this.y = y;
}
/* ... some getters... */
}

public final class GenerateCircle {

final Point2d[] points;
final double r;
final int n;

public GenerateCircle(double r, int n) {
points=new Point2d[n];
this.r = r;
this.n = n;
for ( int i = 0; i < n; ++i) {
points[i] = new Point2d(calc_sin(i), calc_cos(i));
}
}
private float calc_sin(int i) {
return r * Math.sin((double)i/n * 2.0 * Math.PI);
}
private float calc_cos(int i) {
return r * Math.cos((double)i/n * 2.0 * Math.PI);
}
/* ... some getters ... */
}


pattern matching and guards



Pattern matching is syntactic sugar that frees one
from writing complicated if and switch constructs.
Combined with algebraic data types, a feature that java lacks for sure -
it is, as we have seen above, a way to simulate generic methods.

Another useful construct is the function guard;
let's look at a practical example:
the programmer of the display function
may use a simple constraints(a guard) to protect
the precious display function:
(the guard is the boolean expression
between the "|" and the "=")
Now if in that example x or y is less than 0 a runtime exception
is thrown, with detailed information where and why the error was caused.(Try it already!)

-}

display_list primitives = [display p | p <- primitives]
where display (Point2D x y)
| x >= 0 && y >= 0 = show (Point2D x y)

{-

All we do now is add a super type for primitives
and pimp our "display_list function a little to make
use of pattern matching in conjunction with
algebraic data types(see below):

-}

data Primitive = Point Point2D | Tri Triangle2D

display_primitives :: [Primitive] -> [String]
display_primitives primitives = [display p | p <- primitives]
where
display (Point (Point2D x y))
| x >= 0 && y >= 0 = show (Point2D x y)
display (Tri (Triangle p1 p2 p3))
= show (Triangle p1 p2 p3)

{-

higher order functions, lambdas, currying and sections



When it comes to functions, lambdas and closures haskell really shines.
In haskell everything is done with functions. Let's have
a look at a small example.
Note that this example also makes heavy use of algebraic datatypes and pattern
matching.

-}

data MyTree t = Leaf t | Node [MyTree t]

{-


In Java this would look like this:

public interface MyTree<T> {
public class Leaf<S> implements MyTree<S> {
public S data;
}
public class Node<S> implements MyTree<S> {
public MyTree<S> childs;
}
}


This results in a tree with data in the leafs. Let's define a matches function,
that returns a list of "ts" that match a user defined predicate(or matcher function):

-}

matches matcher (Leaf t) = if matcher t then [t] else []
matches matcher (Node childs) = [e | child <- childs, e <- matches matcher child]

{-

well that's it! The first clause of the function "matches" is applied only to Leafs
and the second clause uses list comprehension to call "matches" recursivly
for every child node.

In java is considerably harder to write (and to read).

Now let's leave this planet, and let's head off into a galaxy far far away...
Lets write a matcher that checks that the first letter is a capital "A", but let's
do this in two steps, lets first create a "startsWith" function that you should be
familiar with from other languages, and after the a function using the first one,
that checks for the capital "A".

-}
startsWith str = (( == str) . (take (length str)))

{-

this can actually be translated to english:
"combine two functions: (this is the "." operator)


  1. A function that compares something for equality with str, this is "(== str)".
    I used a fancy haskell feature here... (==str) is actually function call
    to the function "==" which needs to arguments and returns True or False.
    We gave the function only one argument "str". Of course the "==" function
    cannot give us a Boolean value for that! But instead it gives us a new function,
    a closure, that takes only one argument and returns a Bool. Isn't that cool?!
    I mean once you get used to that, it's really readable and you will never want
    to go back!


  2. A function that returns a sublist of a list containing only the the first
    "(length str)" elements: "(take (length str))"
    BTW, a string in haskell is a list of characters, and "length" returns the number
    of elements in a list.


Observe that startsWith takes only one argument not two! You may wonder how startsWith
can work, because it needs the string to search for and a string to search in.

Recall the weired (== str) construct. What happend here happens again: we created a new
anonymous function by calling a function named "==" with not enough arguments, btw.
this is called currying(- invented by the logician "Haskell Brooks Curry")
We also called "take" wich actually needs two arguments with only one argument
(the string length). The result is therefore a function the takes a string and returns
the desired sub-string with the same length that "str" has.
Now the combine operator (".") combines the two functions that each take one argument
and creates a new one such that the new function takes the input string for (take ...)
and returns the result of "(==str)" called with the result of "(take ...)";

Let's forget about (==str) and (take...) for a while and let's look at the definition of
".":

(f . g) x = f(g(x))

this might look familiar...
Now it should become clearer what the result of calling the startsWith function with
ONE parameter is: a new function that takes a string and returns a Bool indicating
wether or not the string given to that anonymous function starts with the string!

Once a programmer has learned very few core concepts about the function handling,
haskell source code is very readable, although it is often less terse than java code.

Also note that our startsWith function is perfectly generic, it can be applied to
ANY type of lists not just strings wich happen to be a synonym for lists of "Char"!

Now I do not want to write down equivilant code for that in java, I think it is
possible. Also note that I do not want to judge the qualities of languages,
I just want to study them.

Ok, back to topic, let's create a test function that roughly corresponds to the one in
the original blogpost.

Remeber that we wanted to define a second function in terms of the first function,
that checks if a string starts with a capital "A"?
Here we use currying again, and simlpy and clearly write:

-}
startsWithA = startsWith "A"

keywordTree = Node [ Node [ Leaf "Hallo"], Leaf "lieber", Leaf "Allerwertester", Node [Node [], Node [Leaf "Wie"], Leaf "gehts"], Leaf "dir"]

test_matches_1 = matches startsWithA keywordTree

{-

oh, let's not forget to introduce a lambda!
the greek symbol for a small lambda actually looks alot like the "\" (backslash) charakter,
so in haskell a lambda is defined like this:

\x -> x + 2

in python:

lambda x: x + 2

another way in haskell:

(2 +)

A lambda expression with two parameters can be written in haskell like this:

\x y -> x + y

or simply:

(+)

another way to write (==str) using a lambda:

\x -> x == str


Ok, back to the tree example,
let's use a lambda expression as a parameter for "matches"
to filter all words shorter than 5 characters.
The then the complete expression looks like this:
  
-}

test_matches_2 = matches (\x -> length x <= 5) keywordTree

{-

of course we can assign the lamda expression to a name, and use that instead.

-}

shorter_than_5_chars = \x -> length x <=5
test_matches_3 = matches shorter_than_5_chars keywordTree

{-

This can of course be written with haskells usual function definition syntax:

-}

shorter_than_4_chars x = length x <= 4
test_matches_4 = matches shorter_than_4_chars keywordTree

{-

And even more compact using functional compisition and sections:

-}

shorter_than_6_chars = (<=6) . length
test_matches_5 = matches shorter_than_6_chars keywordTree

{-

shorter_that_6_chars is written in point-free style, that means
that in contrast to the previous definitons we didn't explicitly name
the argument wupped around in the function, instead we just combined
the the functions.
This is very smilar to the pipe syntax in the bash:

$ cat /proc/cpuinfo | grep bogo | awk '{print $3}' | sort -n

Here too no variable is mentioned, that actually contains the current line
that is passed around.
A functions stdin is its parameter and a functions stdout is its return value.
That's also why it is so important in haskell to have currying automatically
everywhere, because it allows me to use a function with more than one
parameters as a function with one parameter wich returns a new function,
wich takes one parameter and returns a new functions wich takes one parameter ... etc
until final we get the actual result of the computation.

Factories?


Writing factories in haskell might by now seem not very interesting; still
let's write some RPC class definitons in haskell:

-}

data RPC = XMLRPC String | JSONRPC String

data RPCResult = Void | Str String | Strs [String] | Numbers [Int] -- define some results here...
deriving (Show)

callRemote :: RPC -> String -> [String] -> RPCResult
callRemote (XMLRPC conn) function args =
Str ("nyi, xmlrpc to "++ conn ++" func: " ++ function ++ " args: " ++ (show args))
callRemote (JSONRPC conn) function args =
Str ("nyi, jsonrpc to "++ conn ++" func: " ++ function ++ " args: " ++ (show args))


{-

Here is a little testfunction so you know how to use this;
it will call the jsonrpc service "helloworld" at "http://..." with the arguments 1,2

-}

test_json_rpc = callRemote (JSONRPC "http://test.org/service") "helloworld" ["1", "2"]

--
now to a the factory:


rpc_factory :: String -> String -> RPC
rpc_factory impl arg = case impl of
"JSON" -> JSONRPC arg
"XML" -> XMLRPC arg

test_rpc_factory name = callRemote conn "trigger" []
where conn = rpc_factory name "http://alarm.org/sirene"
{-

Call the test_rpc_factory function with either "XML" or "JSON" from the
ghci shell.

A factory also makes sense when one wants to extend the programm at runtime.
This can be done in haskell. Code can be dynamically loaded and recompiled
in running haskell programms (see "hs-plugin"), with type checking and dependency
resolution.

It is possible to create a complete OO-Class infrastructure in haskell,
similar to what exists in python, java and CLOS.

Lesser need for DI and friends



A nice application of factories is dependency injection.
This extends the idea of factories, it allows programmers to wire all class
instance dependencies and configuration values in one function in a declarative way.
In powerfull functional languages this might be pointless in situation where this approach makes
perfect sense in programs written in python or java, as the means of functional abstraction
and combination, especially when used in combination with a powerfull type system, and pattern
matching, render dependency injection frameworks simply useless - if the whole point
of the dependency injection was merely the declarative wireing of class dependencies and
configuration and not the ability to change all that at runtime.
On the other hand, in systems like the erlang runtime complicated mechanisms exist to actually
exchange code and the the dependencies of the code at runtime,
while maintaining the serveral versions of the same code when it is still required,
with automatically restarting services in the correct order when necessary, etc.

Well I hope I have shown, that haskell can actually be fun to work with, and that
diffrent approaches in the design of programming languages are always fun to look at.

That's all folks


The original blog post author compared python with java to show the things he likes in python and
he misses in java.
This comparison has shown how similar phyton and java actually are when compared with haskell ;)
My relation to haskell and java is similar to the blogpost authors relation to python and java,
I think that the author should definitely have a look at haskell, for the
relation to haskell and python might just develop to be of a similar nature,
and the learning experience might also be pretty enlightening.

-}

No comments: