Under A Boddhi Tree: August 2011

Saturday, August 27, 2011

Python and Clojure comparisons

Since I've slowly been trying to move to a functional style of programming (despite most companies in the industry still wanting all kinds of esoteric knowledge about OOP, design patterns and other OO madness like UML), I thought it might be interesting to contrast some python code with clojure code. The python code I'll show here is definitely not the norm, but still valid.

Since I just covered clojure destructuring, it might be helpful to see the equivalent usage in python. Take for example a common use in python for passing in arguments to functions like these:

 1 def get_positional_args(*args):
 2   for arg in args:
 3     print arg
 4     
 5 def get_positional_arg1(farg, *args):
 6   print "first argument is", farg
 7   for i, arg in enumerate(args):
 8     print "argument ", i + 2, "=", arg
 9     
10 def get_keyword_args(**kwargs):
11   for name in kwargs:
12     print name, "=", kwargs[name]
13     
14 def get_pos_and_kw_args(farg, sarg, *args, **kwargs):
15   print "First argument is", farg
16   print "second argument is", sarg
17   for i, arg in enumerate(args):
18     print "Argument #", i+2, "=", arg
19   for name in kwargs:
20     print name, "=", kwargs[name]

And these are the equivalent functions in clojure:

 1 (defn get-positional-args  [ & more ]
 2   (println more))
 3 
 4 (defn get-positional-arg1   [ x & more ]
 5   (println x)
 6   (println more))
 7 
 8 (defn get_keyword_args [ m ]
 9   (doseq [ name m ]
10     (println "key =" (name 0) "value =" (name 1))))
11 
12 (defn get-pos-and-kw-args [ f s m & more ]
13   (println "first arg is " f)
14   (println "second arg is " s)
15   (doseq 
16     [ arg (map vector (iterate inc 3) more) ]
17     (println "Argument" (arg 0) " is " (arg 1)))
18   (doseq [ kv m ]
19     (println "key = " (kv 0) " value = " (kv 1))))

Now, I don't know about you, but I think that the argument that 'lisp' has too many parenthesis isn't really all that true for clojure. Sure, it has way more than python, and python is perhaps the cleanest looking language I've ever seen, but the other lisps/schemes I've seen aren't in the same league as clojure when it comes to reader friendliness.

So what are some other things that python and clojure have in common? Believe it or not, python has a lazy language feature as well. Clojure prefers using lazy sequences whenever possible (though it's not a lazy language by default like Haskell). Still, through the use of Clojure macros or functions like delay and force, clojure can (explicitly) be made very lazy. Python isn't nearly as lazy, but it does have one nice lazy feature....generators.

Generators and generator expressions are a way to explicitly run through a (possibly infinite) sequence. The key to generators is the 'yield' statement, which "freezes" the functions until the generator object's next() function is called. For example, we can create an infinite series of even numbers:

 1 def gen_even():
 2   i = 0
 3   while True:
 4     yield i
 5     i += 2
 6     
 7     
 8 def iterate_range(gen, i):
 9   res = None
10   for x in range(i):
11     res = gen.next()

However, the implementation above would not create a sequence as most people think (though technically, the generator itself has a next() method ). A generator is really just an object with a next() method, and when you call that next() method, it 'yields' a value. The next next() call generates another value. But a generator by itself does not generate a sequence. To generate a list, let's do this:

12 def create_map(gen, i)
13   return [ gen.next() for x in xrange(i) ]

What would happen if we call this (note, use xrange rather than range, since range has to generate a list consuming more memory)?

g = gen_even()
create_map(g, 10)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

This is very similar to lazy sequences in clojure. For example, the equivalent in clojure to the above would be this:

(def v (iterate #(+ 2 %) 0))

I could perform the following actions on the lazy sequence v

(take 10 v)
(nth 9 v)

The first line would yield the same as python's create_map() above. The second function would return the 10th item in the sequence. The advantage of both python's generator and clojure lazy sequence is that the entire sequence is NOT stored in memory. Only whatever is required for the calculation is needed.

So why would you use generators or lazy sequences? One possibility is to eliminate recursive calls. Recursion in both python and clojure consumes stack space, and thus might blow out when the recursion depth is very large. Another possibility is to model infinite sequences such as those found in math. For example, that the sum i -> inf where f(x) = 1/2^i is 1.

def half_gen():
  i = 2
  while True:
    yield 1.0/i
    i = i * 2
    
h = half_gen()
sum([ h.next() for x in xrange(100000) ])

Wednesday, August 24, 2011

Multiprocess TaskServer is working

I was just about to write a blog describing how I took one step forward and took one step back. I spent a good chunk of Saturday and today working on getting teleproc to be able to run multiple processes. The old TaskServer class is now the Task class, and the TaskServer class is now a container of Tasks (yes, I know, I am breaking the API, but since this isn't even an alpha, and thus the API is NOT stable, I don't have a problem with that).

Basically the TaskServer receives commands from a client, and directs those commands to a Task object. So the client has to now specify which Task object he is interested in. To avoid intermingling the stdout of all the processes, I am prepending all of them with a local ID. I will rewrite the GUI client to filter each line based on the ID and strip it out. It will then append the given line of text (from a given process) to a output area (probably a TabbedPane or something in swing or SWT).

The trick to reconnecting was that when the GUI client disconnected, the Channel object disconnected with it. When the GUI client reconnected, a new Channel object is created by Netty. I had to save this Channel object and set all of my Task object's readOut and readErr ThreadReader objects to use this new channel. So I had to add some logic to do this in my ThreadReader class.

I still haven't implemented the broadcast feature yet though. I want all clients who are connected to be able to get the same output. I also am stoked to convert this into a clojure project. Now that I am a little more familiar with how Netty works, I think I can use this as the basis for the networking engine for my game. I also believe that there are a couple of areas that clojure will shine in. For example, my parseCommand function is pretty ugly in java, and I think there's a more elegant solution I can use in clojure.

I also need to make the actual Eclipse Public License documents since that is the Open Source license I am going to use. But the directions are really spotty. I did find one webpage with a checklist of documents, so I will do that soon.

Monday, August 22, 2011

Something new to learn...emacs

I've mostly just been using jEdit as my REPL of choice for clojure, with Eclipse's counterclockwise plugin to actually edit clojure files. However, I don't like how ccw doesn't let me fire up a standalone REPL, and I didn't like how jedit's clojure repl didn't have a paredit feature. I also don't like the enclojure plugin for Netbeans which forces you to use Maven style projects for clojure (why do that when there's the excellent leiningen?). Since Clojurebox is no longer maintained, I figured I may as well FINALLY learn emacs. I used to joke that you can't really call yourself a linux hacker unless you know either vim or emacs really well.

There are several advantages to using emacs. Firstly, it seems like most clojurians tend to use it because of the lisp heritage of emacs. Using swank-clojure and clojure-mode, I can edit clojure files and have a repl too straight from leiningen. Also, emacs with swank-cdt appears to be one of the only ways of debugging clojure code (allowing you to set breakpoints and such).

On the downside, emacs is....painful. It is more than an editor, it is basically it's own little ecosystem. I tried learning vim before, but the whole 'command-mode' vs. 'edit-mode' screwed me up. I am hoping that emacs 'buffers' don't do the same to me. Then there's also the complexities of SLIME itself.

As it turns out, getting emacs up and running for Windows was a teensy bit trickier than I had hoped for. Although the description for getting emacs up and running at the official clojure dev site was helpful, it was also geared towards *nix users. For example, it tells you to create if necessary, an ~/.emacs.d folder. So I assumed that in Windows, the ~ (the linux HOME directory) would be the Windows equivalent of C:\Users\<userdir>. That's not the case. It's actually C:\Users\<userdir>\AppData\roaming.

But other than that little trick, things turned out fairly well. So I am now on my way to understanding the lovely world of working with and in emacs :)

UPDATE:
So far, emacs doesn't seem nearly as bad as it was when I tried to learn VIM. I do need to get Marmalade set up, though that requires an .emacs file which might interfere with the Starter Kit settings. Nevertheless, emacs is pretty cool. I like the regex search feature and being able to forward word by word instead of just by beginning or end of line (like with the home or end keys). The clojure mode syntax highlighting is nice, though I still need to figure out how to use swank-clojure inside of emacs.

Also, it appears that along with swank-cdt, there is also a project called ritz (forked from swank-clojure) which allows for setting breakpoints in clojure code.

Saturday, August 20, 2011

Clojure learning assignment: destructuring

I've decided to do a once weekly at least foray into topics in clojure to both help myself and others learn this fascinating language. I do not really have a lisp background (I do not count the few hundred lines of code I wrote in an AI class a long time ago), and thus clojure has been a bit of a mind-warping stretch for me. Here are some of the topics I will eventually cover:

1. Destructuring
2. Coverage of the "cheat sheet" functions with examples
3. Lazy sequences vs. recursion
4. Macros (once I've figured them out myself)
5. A multithreaded merge sort using refs
6. A multithreaded tree traversal
7. Examples of gen-class and proxy
8. Examples of using defrecords

And anything else I can think of. Mind you, I'm still learning this language in many ways as I go. What I hope that I can provide that the clojurian master's may not be able to, is the perspective of a newbie to the lisp and functional programming world. I personally learn through real-world examples, and while the books The Joy of Clojure, Programming Clojure, and Clojure in Action have all been very helpful, sometimes I wish there had been a little bit more attention paid to the small things. Just try and do a (doc ->>) and you will know what I mean. But anyhow....off to my first lesson, destructuring.

I often hear that Clojure (and lisps) have very little syntax, and they tout this as a defining feature of the language. While to a degree that's true compared to many other languages, there ARE more syntax rules than may appear at first glance. Take for example code like this

   1 (defn get-parts 
   2   [ [x y z & others ] ]
   3     (do 
   4       (println "First three are: " x y z)
   5       (println "Rest is: " others))
   6     others)

Normally, the first thing you'll see after the function name is the argument list (or possibly a docstring or metadata see here). But that's a kind of strange looking argument list. What am I supposed to pass in there? It kind of looks like I pass in an array of symbols...but what's that ampersand doing there?

We can run it like this:

user=> (get-parts [ 1 2 3 4 ] )
First three are: 1 2 3
Rest is: (4)
(4)

This is one of clojure's destructuring forms which is loosely akin to pattern matching found in other languages. The above code takes in a sequence of some form, splits out the first three values into x, y, and z respectively, and then stuffs the remainder of the sequence into others. It would be code equivalent to this:

 1 (defn get-parts-no-dest
 2   [ s ]
 3   (let [ x (nth s 1)
 4          y (nth s 2)
 5          z (nth s 3)
 6          others (drop 3 s) ]
 7     (do
 8       (println "First three are: " x y z)
 9       (println "Rest is: " others))
10     others))

As you can see, the destructuring above did cut down on some lines of code...if at the price of some readability in my opinion. Unfortunately, using destructuring seems to be the preferred idiomatic clojure style.

So the above example works well for a vector as well as a list or sequence. It will not however work on a map of any sort. If we try, we will get this:

user=> (get-parts { :1 1 :2 2 :3 3 :4 4} )
java.lang.UnsupportedOperationException: nth not supported on this type: PersistentArrayMap (NO_SOURCE_FILE:0)

So are there destructuring forms for maps? Of course. Here's an example where we take a map containing the keys fname, address and city, print them and return a vector of the values of those keys:

1 (defn get-parts-map
2   "Takes a map with keys fname, address and city and prints them"
3   [ {:keys  [fname address city]  } ]
4   (do
5     (println "Name: " fname)
6     (println "Address: " address)
7     (println "City: " city))
8   [ fname address city ])

If we called it with a map like { :fname "John Doe" :address "1234 Cherry Lane" :city "Timbuktu" }, we would see this:

user=> (def john_doe { :fname "John Doe" :address "1234 Cherry Lane" :city "Timbuktu" } )
#'user/john_doe
user=> (get-parts-map john_doe)
Name: John Doe
Address: 1234 Cherry Lane
City: Timbuktu
["John Doe" "1234 Cherry Lane" "Timbuktu"]

Notice how we used the :keys keyword and followed it with a vector of symbols, and not keywords. Keep that in mind when destructuring using maps. Also, you can use these destructuring features in let forms as well. For example, I could have written the code above like this:

 1 (defn get-parts-map-w-let
 2   "Takes a map with keys fname, address and city and prints them"
 3   [ m ]
 4   (let [ {:keys  [fname address city]  }  m ]
 5     (do
 6       (println "Name: " fname)
 7       (println "Address: " address)
 8       (println "City: " city))
 9     [ fname address city ]))
10

And the output would be exactly the same as above. As well as the :keys directive, you may use :syms, if the keys are symbols (instead of keywords) or :strs (if the keys are strings).

The other useful destructuring form is to associate a map with the elements of a sequence. For example, you could do something like this:

1 (let [ { dog 0 cat 1} [ "husky" "persian" "pug" "siamese" ] ]
2   (println "Dog is a " dog " and cat is a " cat))

This would print out "Dog is a husky and cat is a persian

Monday, August 15, 2011

Training others

I've come into the responsibility of training some new, and not so new people how to program. Right now, I am teaching them the basics of the language we are (mostly) using, but I am also trying to teach them some of the finer points of software engineering that I had to learn from experience. Some of the people I am training don't have Computer Science or Software Engineering degrees, but do have Electrical or Computer Engineering degrees. So I'm trying to impart just some general guidelines on writing decent code.

Working as a team-
Many of the other points I will cover below have this as a root element to consider. When I went to school, I had a grand total of two group projects, only one of which actually had any code to it. That's totally unrealistic in the real world. The fact is, your code and your work does not live in isolation. Your code should be readable by others, they should know where to obtain your code, you should not duplicate an entirely new library that someone else has built (though you can make enhancements or improvements to it), and you should document your code so that others know how to install and use what you created.

Revision Control-
I have found it amusing that at 2 different workplaces, the Electricial Engineers were somewhat in arms over having to learn a revision control system, and yet the CS people were more fascinated by it. Unfortunately, when I went to school, they weren't teaching anything about revision control systems, much less why you would need one. And sometimes you do have to explain to someone why you would need one. But without revision control, how do you experiment with your code? How do you tag your code so that you can replicate an issue a customer is seeing? How do you distribute your code so that others can see it and possibly make enhancements to it? Many engineers are frightened when they first attempt to use a revision control system, because they are afraid they will jack up someone's code base. Also, some revision control systems are easier to learn than others (I personally am finding git far harder to learn than mercurial). But these are small drawbacks compared to what a revision control system provides

Code Reviews-
Many engineers are scared of code reviews when they first start. Throughout school, you are ingrained not to share your code with others, and as a consequence you don't have to worry about what your code looks like. But once a new engineer accepts the fact that many eyes will be seeing his code, this alone changes how he writes (or at least will change once several comments come in). But code reviews are necessary because they are the next thing to find bugs after your unit tests. I also remind that when someone is a reviewer, that they should actually try to understand the code, rather than look for just superficial things like coding standards. This takes more time, but I believe that it improves your own coding as well as helping the one being reviewed.

Unit testing-
Usually in the madhouse rush to get something working, testing is thrown to the wayside. I am NOT a proponent of TDD, where you write your tests before you actually write your feature code, but one should eventually write tests for their code. When using a dynamic language, it is often necessary to check that the type of arguments passed in is correct. Make sure you write lots of negative tests too, because it's rather embarrassing to discover that invalid inputs makes your function return a supposedly valid result.

Reusability-
I usually give an anecdote for this. Imagine that you write a script that performs some functionality for a test you have. Later, you are tasked with a very similar problem, and so you write a 2nd, albeit slightly different script. And then you do so later for a 3rd and 4th script. But then, some new functionality in a library your scripts uses changes. Perhaps a new product is made which requires a different parameter to be passed in. Now, you have 4 scripts, and you have to go in and change all 4 of them. Always try to isolate code that could possibly vary and keep it in a library, class or module of some sort. I try to stress writing functions over writing scripts so that I have only one or two scripts, whose behavior changes depending on the arguments that get passed in.

DON'T copy and paste-
Also known as DRY (don't repeat yourself), copying and pasting code is BAD. Why is it bad? Because when you copy and paste code, you copy and paste bugs. And when you need to make an enhancement to your code, every place you copied and pasted now has to be fixed as well. As obvious as this one sounds, I am amazed in code reviews how many people simply copy and paste functions or worse, parts of functions into other functions.

Keep it simple stupid-
General Patton once said, "Don't give great orders. Give orders that can be understood". As mentioned earlier, code is read more than it is written. If your code tries to get too fancy, you might want to make it easier for others to understand. Of course this has its limits. If the most efficient code is complex, don't be afraid to do that, just comment the heck out of what your code is doing.

Avoid functions that return void-
This sort of goes along with unit testing, or perhaps testing in general. Functions that return void are usually either mutating state of some object (either of itself, if this is a method, or of some argument that is passed in), or they are impure, and only have validity for some side effect (for example, updating a database, or printing to a log file). The trouble is, how do you test this? If the function is a method of an object, and it mutates some field in the object, now you need a second function that has to be called to make sure the field in that object is correct. But what if this is a multi-threaded program? It is entirely possible that another function can change the state of the object before your test function gets a chance to run. Now you have to write some locks to make sure this is correct. All of this can be avoided if you simply return some values and then you can check those values (which although the data might be stale, was valid at the time of the original function call).

Document, document, document-
One of the reasons that python has such elegant syntax is that Guido Van Rossum had the insight that code is read far more often than it is written. An engineer should make it even easier for people to understand your code by making copious documentation. Now, one shouldn't comment the obvious, but if anything might be even remotely unclear, it's a good idea to comment for others (and yourself!!) on what your code is trying to accomplish. Also, learn the markup tool of choice for your language (doxygen, sphinx, doxia, javadoc, etc), as being able to publish a pdf or to have the documentation in html format is really really nice.

Use the debugger as a last resort-
This is where a lot of people may disagree with me, but to me, debuggers are the big guns of troubleshooting. Prefer loggers to debuggers when possible. For example, in C or C++, debugging macros or templates is very difficult. Loggers on the other hand can expand the macro for you, and you can also print out any genericized object. An exception to this is when you are learning someone else's code, and you want to figure out what is going on. Very complex code almost requires this.

Optimize AFTER your code works-
Unless you know a good algorithm right from the beginning, make something work first then make it faster. However, do keep in mind the following:

1. Nested for loops are almost always a bad sign (n^k runtime efficiency where k is # of nested loops)
2. Sorting a data structure is usually more efficient than trying to find something randomnly (nlogn)
3. When using recursion, watch out for potentially huge values being passed in (which will blow your call stack)
4. When using recursion, watch out for function calling itself more than once (ie, fibonacci...n ^ k big O of n).
5. Don't be afraid of recursion. Yes, it pushes a new function on the call stack and thus is slower, but often, recursive solutions are easier to understand than an equivalent for or while loop.
6. Be wary of cyclic data structures or potential ones (eg, a linked list where one node points to a previous node). Your code might work on a non-cyclic data structure, but a cyclic one might make you spin forever or blow your call stack away.

Saturday, August 6, 2011

Letting go of views

Well, the name of the blog IS Under a Boddhi Tree, so here goes my first non-software engineering post.

The recent political theatre and downgrading of America's credit rating has made me become upset at what I see as ignorance, greed, and political showmanship over the welfare of the nation and its people. I couldn't fathom the levels of ignorance that I also saw regarding facts on the situation. For example, a Yahoo Finance News web report consistently said that S&P downgraded America's credit due to not enough spending cuts. However, if you actually read the S&P report, you will see that it said that there should have been a balance of spending cuts and raising revenues.

Unfortunately, people see only what they want to see, and people hear only what they want to hear. If they are conservatives, they only want to hear statements supporting their beliefs (no new taxes, ending social welfare programs, government doesn't create jobs, illegal immigrants are taking our jobs, gay marriage is destroying our culture blah blah), and if you are a progressive or liberal, you only want to hear what you believe in (universal medical care, less need for a strong military, businesses need strong regulations, fair taxation, equal rights for all races, sex orientation, unions only protect workers, they never create unrealistic salaries, freedom of or from beliefs, etc etc). This habit for people to see only what they want is a huge reason for all suffering. In fact, I might say that the whole purpose of Buddhism is just to see things as they are, with no filters, and no expectations.

I have always considered myself pretty open-minded. When I was younger, I was a Republican, now I am a Democrat. Also when I was younger, I was an empirical material objectivist (if you couldn't sense it with your 5 senses, it didn't exist), but now I am an idealist. But I am beginning to see something now....these very labels are wrong. I really should have known this a long time ago due to the fact that I am of mixed ethnicity (Irish, German, and Polish from my dad, and Filipino, Malaysian, Chinese, Spanish, Persian from my mom). I do not consider myself white or asian...I am just a human being.

And that's the point. I am not a progressive. I am just someone who is trying to see the world as it truly is, not as I want or hope it should be. It is very hard. I feel like the world SHOULD be a certain way. But the fact is, I don't even truly know who I am. And if I don't know who I am, how am I supposed to be a judge for the way the world should be? Does that sound strange that I don't know who or what I am? People who have not practiced eastern religions are often confounded by that statement. They don't even really understand the question I think.

But asking, discovering, and knowing who and what you truly are is, I think, the true path to awakening. I am definitely not a label. I am not a progressive, or a democrat, or a computer scientist or engineer. These are facets perhaps, but they must be dropped. Letting go of these views of my 'self' helps me understand who I truly am. Otherwise, these views separate me from knowing reality as it is. I have tried to tell others before that is not your belief in Christianity, Judaism, Islam, Hinduism, Conservatism, Liberalism, Socialism, Capitalism, etc that is wrong. It is belief in and of itself that is wrong. If you can not know it and understand it for yourself, then it is an illusion, a desire that ignores reality for fantasy instead.

So who am I? I do not yet know, but I have traveled a little bit across the stream. I have understood that I am not my thoughts or feelings. I am still groping in the dark, but at least I know that I am in the dark. And I am beginning to see that getting mad at all this political theatre is pointless. Trying to get people to understand me is pointless. Trying to convince others about anything is also pointless. I know all about Maslowe's hierarchy, but chasing after all these "should be's" and "I want's" is in itself suffering.

All that matters is being aware, seeing how things are, and discovering what we truly are.