18 January 2018

Relational Helpers In Clojure

As Clojure programmers, working with collections of maps is our cherished pastime and daily sustenance. The clojure.set namespace encourages us to treat such collections as Coddian relations.

rhickey on core.set

What follows are a couple of additional functions, which can help us improve readability and clarity of intent during complex data transformations, by bringing strong keywords front and center, as well as maintaining provenance and association throughout the transform.

Implementations of these functions are available on GitHub. They don’t merit a separate library, simply copy and paste them or write your own version to better suit your needs.

A better zipmap

At the beginning of a transform, we usually have to gather information from various sources and associate them with each other.

(map vector ids names)

=> ([1 "Mabel"] [2 "Dipper"] ...)

The above is a common Clojure idiom for such a process, which is also known as zipmapping. Assuming sufficient previous encounters, the intent here is pretty clear, but we can imagine doing better.

(relate :user/id ids :user/name names)

=> #{{:user/id 1 :user/name "Mabel"}
     {:user/id 2 :user/name "Dipper"}

Here the intent is expressed through a dedicated verb, and thus crystal clear, even to the uninitiated. Moreover, by forcing ourselves to provide keys, we have removed any possible ambiguities from the resulting tuples and simultaneously increased visibility (and grepability!) of the paths which our data takes throughout our system.

Maintaining provenance and avoiding implicit ordering

Assuming successful obtainment of a collection of associative structures, we must now apply one or more functions to them.

(def users 
  #{{:user/id 1 :user/name "Mabel"}
    {:user/id 2 :user/name "Dipper"}

Many parts of Clojure are designed around accretion-only processes, i.e. processes which never destroy information. Yet one of our fundamental tools, the application of a function to the elements of a collection - map - will thoroughly do away with the relation between its inputs and its outputs.

(defn get-user-balance [user]
  (rand-int 1000))

(map get-user-balance users)	 

=> [313 508 ...]

We might even be dealing with functions, which are not aware of the tuple structure at all.

(map (comp str/upper-case :user/name) users)

=> ["MABEL" "DIPPER" ...]

But such derived data must often be tied back to the entity whence it was derived. A common way to do so falls back on zipping:

(for [[user balance] (map vector users user-balances)]
  ; do something with each users balance)

In addition to the drawbacks of zipmap discussed earlier, zipping derived data suffers from an avoidable implicit dependency on identical ordering of the original users collection and the derived user-balances sequence. This assumption can easily be violated by a well-placed filter, grouping, or partitioning operation.

Alternatively, we could resort to the slightly awkward:

(->> users
     (map #(assoc % :user/balance (get-user-balance %))))

=> #{{:user/id 1 :user/name "Mabel" :user/balance 313}
     {:user/id 2 :user/name "Dipper" :user/balance 508}

The relational approach allows us to non-destructively derive new information from tuples, using both tuple-aware and tuple-unaware functions, and without any ordering assumptions.

(->> users
     (derive :user/balance get-user-balance) ; tuple-aware
     (derive-k :user/name' str/upper-case [:user/name])) ; tuple-unaware

=> #{{:user/id 1 :user/name "Mabel" :user/balance 313 :user/name' "MABEL"}
     {:user/id 2 :user/name "Dipper" :user/balance 508 :user/name' "DIPPER"}

Especially derive-k allows us to decouple computations from the relational structure. (derive-k attr f [k]) corresponds to a SELECT f(k) AS attr.

Again we are encouraged to give strong, global names to derived attributes, making it easy to e.g. find all the places, where a user’s balance is processed.

Powerful, declarative filters

With clojure.spec Clojure recently gained a standardized, powerful way to describe the shape of data structures in arbitrary detail. While specs are designed for long term use in documentation, generative testing, and many other areas, we can also use short-lived specs to filter our relations in a more declarative way.

(where {:user/id #(= % 2)})
(where {:user/balance #(< % 100)})
(where {:user/name (s/and #(str/starts-with? % "Ma") 
                          #(str/ends-with? % "bel"))}


Finally, our new helpers should support composition without the creation of intermediate realizations of a relation.

(def xf-find-overdrawn
    (derive :user/balance get-user-balance)
    (where {:user/balance neg?})))

(into #{} xf-find-overdrawn users)