Under A Boddhi Tree

Saturday, August 20, 2011

Clojure learning assignment: destructuring

I've decided to do a once weekly at least foray into topics in clojure to both help myself and others learn this fascinating language. I do not really have a lisp background (I do not count the few hundred lines of code I wrote in an AI class a long time ago), and thus clojure has been a bit of a mind-warping stretch for me. Here are some of the topics I will eventually cover:

1. Destructuring
2. Coverage of the "cheat sheet" functions with examples
3. Lazy sequences vs. recursion
4. Macros (once I've figured them out myself)
5. A multithreaded merge sort using refs
6. A multithreaded tree traversal
7. Examples of gen-class and proxy
8. Examples of using defrecords

And anything else I can think of. Mind you, I'm still learning this language in many ways as I go. What I hope that I can provide that the clojurian master's may not be able to, is the perspective of a newbie to the lisp and functional programming world. I personally learn through real-world examples, and while the books The Joy of Clojure, Programming Clojure, and Clojure in Action have all been very helpful, sometimes I wish there had been a little bit more attention paid to the small things. Just try and do a (doc ->>) and you will know what I mean. But anyhow....off to my first lesson, destructuring.

I often hear that Clojure (and lisps) have very little syntax, and they tout this as a defining feature of the language. While to a degree that's true compared to many other languages, there ARE more syntax rules than may appear at first glance. Take for example code like this

   1 (defn get-parts 
   2   [ [x y z & others ] ]
   3     (do 
   4       (println "First three are: " x y z)
   5       (println "Rest is: " others))
   6     others)

Normally, the first thing you'll see after the function name is the argument list (or possibly a docstring or metadata see here). But that's a kind of strange looking argument list. What am I supposed to pass in there? It kind of looks like I pass in an array of symbols...but what's that ampersand doing there?

We can run it like this:

user=> (get-parts [ 1 2 3 4 ] )
First three are: 1 2 3
Rest is: (4)
(4)

This is one of clojure's destructuring forms which is loosely akin to pattern matching found in other languages. The above code takes in a sequence of some form, splits out the first three values into x, y, and z respectively, and then stuffs the remainder of the sequence into others. It would be code equivalent to this:

 1 (defn get-parts-no-dest
 2   [ s ]
 3   (let [ x (nth s 1)
 4          y (nth s 2)
 5          z (nth s 3)
 6          others (drop 3 s) ]
 7     (do
 8       (println "First three are: " x y z)
 9       (println "Rest is: " others))
10     others))

As you can see, the destructuring above did cut down on some lines of code...if at the price of some readability in my opinion. Unfortunately, using destructuring seems to be the preferred idiomatic clojure style.

So the above example works well for a vector as well as a list or sequence. It will not however work on a map of any sort. If we try, we will get this:

user=> (get-parts { :1 1 :2 2 :3 3 :4 4} )
java.lang.UnsupportedOperationException: nth not supported on this type: PersistentArrayMap (NO_SOURCE_FILE:0)

So are there destructuring forms for maps? Of course. Here's an example where we take a map containing the keys fname, address and city, print them and return a vector of the values of those keys:

1 (defn get-parts-map
2   "Takes a map with keys fname, address and city and prints them"
3   [ {:keys  [fname address city]  } ]
4   (do
5     (println "Name: " fname)
6     (println "Address: " address)
7     (println "City: " city))
8   [ fname address city ])

If we called it with a map like { :fname "John Doe" :address "1234 Cherry Lane" :city "Timbuktu" }, we would see this:

user=> (def john_doe { :fname "John Doe" :address "1234 Cherry Lane" :city "Timbuktu" } )
#'user/john_doe
user=> (get-parts-map john_doe)
Name: John Doe
Address: 1234 Cherry Lane
City: Timbuktu
["John Doe" "1234 Cherry Lane" "Timbuktu"]

Notice how we used the :keys keyword and followed it with a vector of symbols, and not keywords. Keep that in mind when destructuring using maps. Also, you can use these destructuring features in let forms as well. For example, I could have written the code above like this:

 1 (defn get-parts-map-w-let
 2   "Takes a map with keys fname, address and city and prints them"
 3   [ m ]
 4   (let [ {:keys  [fname address city]  }  m ]
 5     (do
 6       (println "Name: " fname)
 7       (println "Address: " address)
 8       (println "City: " city))
 9     [ fname address city ]))
10

And the output would be exactly the same as above. As well as the :keys directive, you may use :syms, if the keys are symbols (instead of keywords) or :strs (if the keys are strings).

The other useful destructuring form is to associate a map with the elements of a sequence. For example, you could do something like this:

1 (let [ { dog 0 cat 1} [ "husky" "persian" "pug" "siamese" ] ]
2   (println "Dog is a " dog " and cat is a " cat))

This would print out "Dog is a husky and cat is a persian

Monday, August 15, 2011

Training others

I've come into the responsibility of training some new, and not so new people how to program. Right now, I am teaching them the basics of the language we are (mostly) using, but I am also trying to teach them some of the finer points of software engineering that I had to learn from experience. Some of the people I am training don't have Computer Science or Software Engineering degrees, but do have Electrical or Computer Engineering degrees. So I'm trying to impart just some general guidelines on writing decent code.

Working as a team-
Many of the other points I will cover below have this as a root element to consider. When I went to school, I had a grand total of two group projects, only one of which actually had any code to it. That's totally unrealistic in the real world. The fact is, your code and your work does not live in isolation. Your code should be readable by others, they should know where to obtain your code, you should not duplicate an entirely new library that someone else has built (though you can make enhancements or improvements to it), and you should document your code so that others know how to install and use what you created.

Revision Control-
I have found it amusing that at 2 different workplaces, the Electricial Engineers were somewhat in arms over having to learn a revision control system, and yet the CS people were more fascinated by it. Unfortunately, when I went to school, they weren't teaching anything about revision control systems, much less why you would need one. And sometimes you do have to explain to someone why you would need one. But without revision control, how do you experiment with your code? How do you tag your code so that you can replicate an issue a customer is seeing? How do you distribute your code so that others can see it and possibly make enhancements to it? Many engineers are frightened when they first attempt to use a revision control system, because they are afraid they will jack up someone's code base. Also, some revision control systems are easier to learn than others (I personally am finding git far harder to learn than mercurial). But these are small drawbacks compared to what a revision control system provides

Code Reviews-
Many engineers are scared of code reviews when they first start. Throughout school, you are ingrained not to share your code with others, and as a consequence you don't have to worry about what your code looks like. But once a new engineer accepts the fact that many eyes will be seeing his code, this alone changes how he writes (or at least will change once several comments come in). But code reviews are necessary because they are the next thing to find bugs after your unit tests. I also remind that when someone is a reviewer, that they should actually try to understand the code, rather than look for just superficial things like coding standards. This takes more time, but I believe that it improves your own coding as well as helping the one being reviewed.

Unit testing-
Usually in the madhouse rush to get something working, testing is thrown to the wayside. I am NOT a proponent of TDD, where you write your tests before you actually write your feature code, but one should eventually write tests for their code. When using a dynamic language, it is often necessary to check that the type of arguments passed in is correct. Make sure you write lots of negative tests too, because it's rather embarrassing to discover that invalid inputs makes your function return a supposedly valid result.

Reusability-
I usually give an anecdote for this. Imagine that you write a script that performs some functionality for a test you have. Later, you are tasked with a very similar problem, and so you write a 2nd, albeit slightly different script. And then you do so later for a 3rd and 4th script. But then, some new functionality in a library your scripts uses changes. Perhaps a new product is made which requires a different parameter to be passed in. Now, you have 4 scripts, and you have to go in and change all 4 of them. Always try to isolate code that could possibly vary and keep it in a library, class or module of some sort. I try to stress writing functions over writing scripts so that I have only one or two scripts, whose behavior changes depending on the arguments that get passed in.

DON'T copy and paste-
Also known as DRY (don't repeat yourself), copying and pasting code is BAD. Why is it bad? Because when you copy and paste code, you copy and paste bugs. And when you need to make an enhancement to your code, every place you copied and pasted now has to be fixed as well. As obvious as this one sounds, I am amazed in code reviews how many people simply copy and paste functions or worse, parts of functions into other functions.

Keep it simple stupid-
General Patton once said, "Don't give great orders. Give orders that can be understood". As mentioned earlier, code is read more than it is written. If your code tries to get too fancy, you might want to make it easier for others to understand. Of course this has its limits. If the most efficient code is complex, don't be afraid to do that, just comment the heck out of what your code is doing.

Avoid functions that return void-
This sort of goes along with unit testing, or perhaps testing in general. Functions that return void are usually either mutating state of some object (either of itself, if this is a method, or of some argument that is passed in), or they are impure, and only have validity for some side effect (for example, updating a database, or printing to a log file). The trouble is, how do you test this? If the function is a method of an object, and it mutates some field in the object, now you need a second function that has to be called to make sure the field in that object is correct. But what if this is a multi-threaded program? It is entirely possible that another function can change the state of the object before your test function gets a chance to run. Now you have to write some locks to make sure this is correct. All of this can be avoided if you simply return some values and then you can check those values (which although the data might be stale, was valid at the time of the original function call).

Document, document, document-
One of the reasons that python has such elegant syntax is that Guido Van Rossum had the insight that code is read far more often than it is written. An engineer should make it even easier for people to understand your code by making copious documentation. Now, one shouldn't comment the obvious, but if anything might be even remotely unclear, it's a good idea to comment for others (and yourself!!) on what your code is trying to accomplish. Also, learn the markup tool of choice for your language (doxygen, sphinx, doxia, javadoc, etc), as being able to publish a pdf or to have the documentation in html format is really really nice.

Use the debugger as a last resort-
This is where a lot of people may disagree with me, but to me, debuggers are the big guns of troubleshooting. Prefer loggers to debuggers when possible. For example, in C or C++, debugging macros or templates is very difficult. Loggers on the other hand can expand the macro for you, and you can also print out any genericized object. An exception to this is when you are learning someone else's code, and you want to figure out what is going on. Very complex code almost requires this.

Optimize AFTER your code works-
Unless you know a good algorithm right from the beginning, make something work first then make it faster. However, do keep in mind the following:

1. Nested for loops are almost always a bad sign (n^k runtime efficiency where k is # of nested loops)
2. Sorting a data structure is usually more efficient than trying to find something randomnly (nlogn)
3. When using recursion, watch out for potentially huge values being passed in (which will blow your call stack)
4. When using recursion, watch out for function calling itself more than once (ie, fibonacci...n ^ k big O of n).
5. Don't be afraid of recursion. Yes, it pushes a new function on the call stack and thus is slower, but often, recursive solutions are easier to understand than an equivalent for or while loop.
6. Be wary of cyclic data structures or potential ones (eg, a linked list where one node points to a previous node). Your code might work on a non-cyclic data structure, but a cyclic one might make you spin forever or blow your call stack away.

Saturday, August 6, 2011

Letting go of views

Well, the name of the blog IS Under a Boddhi Tree, so here goes my first non-software engineering post.

The recent political theatre and downgrading of America's credit rating has made me become upset at what I see as ignorance, greed, and political showmanship over the welfare of the nation and its people. I couldn't fathom the levels of ignorance that I also saw regarding facts on the situation. For example, a Yahoo Finance News web report consistently said that S&P downgraded America's credit due to not enough spending cuts. However, if you actually read the S&P report, you will see that it said that there should have been a balance of spending cuts and raising revenues.

Unfortunately, people see only what they want to see, and people hear only what they want to hear. If they are conservatives, they only want to hear statements supporting their beliefs (no new taxes, ending social welfare programs, government doesn't create jobs, illegal immigrants are taking our jobs, gay marriage is destroying our culture blah blah), and if you are a progressive or liberal, you only want to hear what you believe in (universal medical care, less need for a strong military, businesses need strong regulations, fair taxation, equal rights for all races, sex orientation, unions only protect workers, they never create unrealistic salaries, freedom of or from beliefs, etc etc). This habit for people to see only what they want is a huge reason for all suffering. In fact, I might say that the whole purpose of Buddhism is just to see things as they are, with no filters, and no expectations.

I have always considered myself pretty open-minded. When I was younger, I was a Republican, now I am a Democrat. Also when I was younger, I was an empirical material objectivist (if you couldn't sense it with your 5 senses, it didn't exist), but now I am an idealist. But I am beginning to see something now....these very labels are wrong. I really should have known this a long time ago due to the fact that I am of mixed ethnicity (Irish, German, and Polish from my dad, and Filipino, Malaysian, Chinese, Spanish, Persian from my mom). I do not consider myself white or asian...I am just a human being.

And that's the point. I am not a progressive. I am just someone who is trying to see the world as it truly is, not as I want or hope it should be. It is very hard. I feel like the world SHOULD be a certain way. But the fact is, I don't even truly know who I am. And if I don't know who I am, how am I supposed to be a judge for the way the world should be? Does that sound strange that I don't know who or what I am? People who have not practiced eastern religions are often confounded by that statement. They don't even really understand the question I think.

But asking, discovering, and knowing who and what you truly are is, I think, the true path to awakening. I am definitely not a label. I am not a progressive, or a democrat, or a computer scientist or engineer. These are facets perhaps, but they must be dropped. Letting go of these views of my 'self' helps me understand who I truly am. Otherwise, these views separate me from knowing reality as it is. I have tried to tell others before that is not your belief in Christianity, Judaism, Islam, Hinduism, Conservatism, Liberalism, Socialism, Capitalism, etc that is wrong. It is belief in and of itself that is wrong. If you can not know it and understand it for yourself, then it is an illusion, a desire that ignores reality for fantasy instead.

So who am I? I do not yet know, but I have traveled a little bit across the stream. I have understood that I am not my thoughts or feelings. I am still groping in the dark, but at least I know that I am in the dark. And I am beginning to see that getting mad at all this political theatre is pointless. Trying to get people to understand me is pointless. Trying to convince others about anything is also pointless. I know all about Maslowe's hierarchy, but chasing after all these "should be's" and "I want's" is in itself suffering.

All that matters is being aware, seeing how things are, and discovering what we truly are.

Saturday, July 23, 2011

Got my remote process server working.

So I made a few changes, and now my client can pick up the stdout of a remote process, disconnect from it, and reattach if need be. The first problem I ran into was properly passing in the Channel object that got generated when a connection between the client and server was made. This also had the added effect that the process on the remote machine does not actually start until the connection was made.

The thing that really threw me though was why once I made the above change, the client wasn't receiving anything over the channel. At first I thought maybe I had to make a netty ChannelFuture object, due to the asynchronous nature of netty itself.

It is however working now, including the ability to send a few commands (that go over the channel from the client to the server) to the remote process's standard input. I can now edit the command string, show the command, kill the process, or restart it. I still need to work on changing the environment variables or working directory however. The other big limitation, and one that I don't think I can or will workaround is the ability to log in as a different user.

This is where SSH shines. With SSH you can log in as a specific user and get all the permissions that user has. My server will run with the same user ID and permissions as whomever launched the initial TaskServer process. This has its cons of course, but still, I think this is ok.

The next thing I'll do is add the ability to run multiple processes per TaskServer. The trick here will be encoding the output from the different processes in such a way that the client can piece together what output came from what process. The most obvious thought that comes to mind is to simply append some key to the line of output (since I am using netty's line delimiter encoder/decoder). This would require the ability to filter the incoming line to the client, and I think that could be a speed issue. A second option could be to not use line delimiter, and roll my own encoder/decoder, but that will be quite a bit more work.

Still, I think the little server is kinda cool. I still need to officially give it a license, and I've been leaning towards the Eclipse Public License. I'm also considering switching to using Git, and having it hosted on github or some other service. I'm starting to learn git for a couple of reasons:

1. It appears to be more popular than mercurial
2. Using jgit, it is easy to provide as a client
3. I want to use gerrit for code reviews

Many projects appear to be moving towards git. Eclipse now uses egit/jgit as the default Team provider now (instead of SVN). And although Sun seemed fond of mercurial, many java projects, are using git. Clojure and Android are two popular ones that are of interest to me using git now. Git does seem more complicated than mercurial, not that I'm an advanced user of mercurial by any means. But having this project using git will help me learn git as well.

Wednesday, June 29, 2011

Piping stdout and stdin to/from a socket

I'm kind of surprised no one has tried doing this before...or at least I haven't found anyone attempting to do this. Perhaps it's too closely related to SSH for people to bother. Basically, what I am trying to do is create a class (in Java) that launches a subprocess, and in which the stdout, stderr and stdin of the process are all linked to a network socket. The key advantage to this over SSH is that unlike SSH, if you close your terminal, you don't also kill your process(es) (though I'm not sure if there's already a way to do this in SSH). And of course, this will be applicable to windows platform.

I'm using the Netty project currently as my low level NIO network framework. It's not too bad, but I do seem to be having some trouble getting the asynchronous event driven nature of Netty working with the thread reading mechanic of the stdout and stdin reader of the subprocess. Basically, in order to read the stdout of the subprocess, I have a separate thread running which is constantly checking to see if there's anything in the process's InputStream (yeah, confusing...go look at the java.lang.Process class, but stdout is actually an InputStream).

So what I have to do is somehow, in the Runnable's run() method, pass Netty's Channel object to the Thread object and write the output to the channel. I can currently see when a client connects to the server, and the server can send a basic message, but I can't seem to pass the (shared) channel object (which I assign when the channel is connected) and have the stdout reader thread be able to use it.

Still, I feel like I'm on the right track. And since I think this could be a useful project, I'll open source this. Eventually, if I ever get it working, I'll port it to clojure. I'd eventually like to add a security layer (Netty has SSL support). But first things first.

Thursday, June 16, 2011

Functional python

Sadly, I don't have an opportunity to write clojure at work, but I am able to write in python, so I've been tackling some common problems in python in a more functional style. I've discovered that list comprehension, lambdas, map, and reduce are your friends. Also, writing in a functional style often means writing less iterative code, and more recursive code.

So first things first, how can list comprehensions help? Imagine if you a list of characters, and you want to combine all of them into a word. Of course the pythonic way to do this would be:

example = [ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
"".join(example)

This is of course perfectly valid. A functional approach to this would be this:

reduce(lambda x,y: x + y, example)

In this case, the lambda is adding (concatenating) two strings together. The reduce function takes the first two arguments, and then applies the result of that to the next argument.

Using the map() function is also handy, and can sometimes be easier to read then list comprehensions. For example, the two below are equivalent:

[ x**2 for x in (1,2,3,4) ]

map(lambda x: x**2, (1,2,3,4))

However, what if you wanted to multiply some_collection[x] + another_collection[x]? If you try to do this as a list comprehension, you won't get what you think:

[ x * y for x in (1,2,3,4) for y in (10,20,30,40) ] ## try it

Instead, you can use a map here:

map(lambda x,y: x*y, [1,2,3,4], [10,20,30,40])

While the examples above have been relatively trivial, here's a more complicated scenario. Imagine that you are given an amount of change, and you are to calculate all the possible combinations of quarters, dimes, nickels and pennies you can get. For example, if you are given 27 cents, you could have:

1 quarter, 2 pennies
2 dimes, 1 nickel, 2 pennies
1 dime, 2 nickels, 2 pennies
etc etc.

So how would you go about doing this? When I first thought about this problem, I tackled it in the usual manner by trying to come up with an iterative imperative solution. But I later decided (after the problem being fresh out of my mind) to come up with a recursive solution. However, it's not only recursive, it's a mutual recursive problem.

So think about the problem like this.
1. I have a total. Given a number of quarters Q, I know the remainder of change (total - (25 * Q) )
2. I have a remainder. Given a number of dimes D, I know the remainder of change ( remainder - (10 * D))
3. I have a remainder. Given a number of nickels N, I know the remainder of change ( remainder - (5 * N))
4. Any remainder left must be pennies

Do you see how all the problems are similar? there are some gotchas however. But below is the code representing my solution to this tricky problem. I used a list here as return values so I could add lists together when one function call popped off the stack.

import pprint
def remainderNickels(total, nickels):
if (nickels * 5) < total:
newn = nickels + 1
return remainderNickels(total, newn)
else:
return [ {'pennies' : total - (5 * (nickels - 1)), 'nickels' : nickels - 1 }]

def remainderDimes(total, dimes):
remainder = total - (dimes*10)
if (dimes * 10) < total:
newd = dimes + 1
if remainder >= 5:
return remainderDimes(total, newd) + [ { 'remainder' : remainderNickels(remainder, 1), 'dimes' : dimes }]
else:
return [ {'pennies' : total - (10 * dimes), 'dimes' : dimes }]
else:
return [[]]

def remainderQuarters(total, quarters):
remainder = total - (quarters*25)
if (quarters * 25) < total:
newq = quarters + 1
if remainder >= 10:
return remainderQuarters(total, newq) + [{ 'remainder' : remainderDimes(remainder, 1), 'quarters' : quarters} ]
elif remainder >= 5:
return remainderQuarters(total, newq) + [{ 'remainder' : remainderNickels(remainder, 1), 'quarters' : quarters} ]
else:
print "remainder = ", remainder
return [ {'pennies' : remainder, 'quarters' : quarters }]
else:
return []

q = remainderQuarters(88, 1)
pp =pprint.PrettyPrinter()
pp.pprint(q)

Monday, June 13, 2011

Things I need to get better at

Being a jack-of-all-trades suits me. I enjoy being able to dabble in many diverse areas of technology. Although my knowledge is shallow in many areas it is broad. I am a decent programmer in many languages, and I have in my short career worked with embedded device drivers and firmware, writing proprietary messaging protocols over TCP/IP sockets, stored and queried needed information on both MySQL and H2 databases, wrote XMLRPC servers, RMI servers, written JNI wrappers around C shared libraries, and many things inbetween. In other words, I have a pretty good view of the entire technology stack.

But there are still many areas I need improving on. Now that I am writing more java again, and I wish to be better at clojure, I need to get better at the java ecosystem. For example:

1. Get better at maven. I have built a few non-trivial maven projects, including one multi-module, but a lot of maven's finer points still elude me.
2. Javadoc commenting. I just let Eclipse auto-fill in the params and return values, but I really should know all the markup for it.
3. Annotations. I understand them in theory, but I've never written one (same goes for python decorators)
4. Unit testing. Yeah yeah, being in Test, I should know JUnit or TestNG like the back of my hand. While I realize their importance, sadly, time constraints often win. I have tried writing some TestNG unit tests, but they are not being called from maven, and I haven't had time to figure out why
5. OSGi. While I have written two eclipse plugins, I still don't truly understand a lot of OSGi. I understand what it's for (modularity to decrease coupling, and provide metadata to end jar hell), but it's such a huge beast that I need to know more

There's also a lot in general that I want to get better at or relearn:

1. C/C++. I haven't seriously written any C or C++ in a little over a year when I was doing some JNI wrapping. The new features in C++0x looks interesting, and eventually, I hope to get back to more JNI programming.
2. Advanced python. By this, I mean stuff like decorators, generators, continuations, and metaprogramming. I actually think I finally understand generators (functions that yield an iterator like object), and how they can be used for continuation style programming. I once showed a coworker how to implement "private" methods and fields in python through implementing some of the magic methods, but that's the most metaprogramming I've done.
3. Algorithm design. After implementing a homegrown software dependency installer program, I came up on my own a depth first search algorithm that could do post-order traversal. I didn't know it was called that until after I read the chapter on traversal algorithms in the book Python Algorithms: Mastering Basic Algorithms in the Python Language.
4. Concurrent programming: Most of my experience with multi-threading has simply been to spawn a new thread to prevent blocking during a long running task. Only twice have I had to share data across threads, and honestly, I'm not sure if I synchronized things right. One reason I wanted to learn clojure was for its approach to concurrent programming (even OpenGL is massively parallel in nature).

And while all of the above will help me in the "real" world, for my own personal desire, I still want to learn OpenGL, and get better at clojure. I also want to get better at JBoss's Netty NIO framework. Sadly, much of my spare time is spent working for work.