Friday, February 27, 2015

How to modify a clojure map from a given sequence

Okay, this turned out to be rather challenging to me.  As I've been telling some people the syntax of clojure is relatively simple, the hard part is learning how to deal with immutability.  For a little OpenStack project I was working on, I wanted to convert the service catalog that got returned as a JSON string into something a little more search-worthy.  But the problem I was having was that I wanted to modify a map based on items in a sequence.  Here's a contrived example.

Suppose I have this list:

(def people [{:first "John" :last "Doe" :age 42 :company "LinkedIn"}
                     {:first "Harry" :last "Smith" :age 31 :company "NASA"}])

and this already existing map:

(def companies {1 "Google", 2 "RedHat"})

So, how do I create a new map of companies by iterating through people, and adding the company the person belongs to?


Unlike a mutable language, where I could just directly change the map upon each iteration, that wont work in clojure.  Let me show you an equivalent example in python and show why something similar in clojure wont work.

people = [{"first" : "John", "last": "Doe", "age": 42, "company": "LinkedIn"},
                 {"first": "Harry",  "last": "Smith", "age": 31, "company": "NASA"}]
companies = {1: "Google", 2: "RedHat"}

for id, p in enumerate(people, 3):
    c_name = p.get("company")
    companies[id] = c_name     # mutates the dictionary


Do you see why a similar approach in clojure won't work?  So, my first temptation to do something similar in clojure was like this:

(for [{:keys [company]}  people
         i (range 3 5)]
    (assoc companies i company))

Yields this:
 ({1 "Google", 3 "LinkedIn", 2 "LinkedIn"} {1 "Google", 4 "LinkedIn", 2 "LinkedIn"} {1 "Google", 3 "NASA", 2 "LinkedIn"} {1 "Google", 4 "NASA", 2 "LinkedIn"})


Okay, so that's not what I want.  There are 2 problems.  The first is that I don't want the nested loop (that's why I get 4 entries).  The second and perhaps more serious is that when the for sequence calls assoc, it's only associating the key-value pair once on the map, and then "forgetting" the change on the next item in the sequence.  Remember, we're not mutating the map as we iterate through it.   So I thought, ok, let me use recur.

(defn addme[coll, m, id, keyname]
  (let [p (first coll)
          val (p keyname)]
    (if (nil? val)
      m
      (recur (rest coll) (assoc m id val) (inc id) (keyname)))))

(addme people companies 3 :company)

While this works, I hope you see that it's not the prettiest thing to look at.  It's also rather verbose.  So I scratched my head a little bit and realized I could use reduce.

Frankly, reduce had always been a little obscure to me.  I had seen it used for things like +, but + can already take multiple args.  So I was never quite clear where reduce would come in handy.  Then it dawned on me....reduce is really recursion with a function that takes two args and it "accumulates" results.

For the moment, forget about the mapping of index to company, and let's look at perhaps a simpler problem.  Here, we can map 1-27 to the letters a-z.  There's always clojure's zipmap function to do this.  If all you need to do is map one collection (as keys) to another collection (as values), it's pretty simple:

(let [alpha-int (range 1 27)
        alpha-char (for [i alpha-int] (char (+ i 96)))]
  (zipmap alpha-int alpha-char))

Another way is to use reduce.  Generally, reduce transforms a sequence into a scalar value.  But if you look at reduce, all it does is take a function that takes 2 arguments, and returns some value.  What if the thing that is returned is a sequence?  Remember, reduce initially pulls the first 2 items from your sequence, operates on those 2 values, and returns something. On the next iteration, that return value is then used as the first argument, and _one_ more item is pulled from the collection.  This continues until the sequence is empty

Clojure's reduce has a handy form of reduce that instead of pulling the first 2 items from the sequence on the first iteration (really it's recursion), you supply an optional first argument.  In that case, on the first iteration, only one item is pulled from the sequence

(defn map-seq [m val]
  (let [offset (+ val 96)  ; \a = 97  so if we have val starting at 1, it will map to \a
          c (char offset)]      ; convert 97 to \a  
      (assoc m val c))))   ; transform m by associating the new val to c

(reduce map-seq {} (range 1 27))  

Think about what's happening on the first 3 calls

  1. (map-seq {} 1) => {1 \a}  ; (assoc {} 1 \a) => {1 \a}
  2. (map-seq {1 \a} 2) => {1 \a 2 \b}  ; (assoc {1 \a} 2 b) => {1 \a 2 \b}
  3. (map-seq {1 \a 2 \b} 3) =>             ; (assoc {1 \a 2 \b} 3 c) =>; {1 \a 2 \b 3 \c}


So, getting back to our other problem, how would we use reduce?  Like the above demonstrated, we need to create a function that we pass to reduce, that takes 2 arguments.  The first argument is the map, and in this case, the second value is a 2 element vector.

; A simple function that takes a map, and a collection that is a key-value pair
(defn add-to-map [m coll]
  (let [[k v] coll]
    (assoc m k v)))

; Takes a sequence of maps, looking for a value in the map, and returns a
; mapped sequence
(defn make-indexed [keyname coll & start]
  (let [[s] (if start
                   start
                   [0])
         vals (for [m coll] (m keyname))]
    (map #(vector (+ s %) %2)  (range)  vals)))
  
(reduce add-to-map companies (make-indexed :company people 3))

Okay, some of you may be thinking that's more verbose than the recursive function addme.  However, there's an advantage to breaking this up into sub functions and using reduce.  Those 2 subfunctions, add-to-map and make-indexed can be used in other scenarios.  In fact, add-to-map can be used like zipmap

(let [pairs (map #(into [] [%1 %2]) (range 1 27) (for [c (range 1 27] (+ 96 c)))
  (reduce add-to-map {} pairs))