Showing posts with label xebia. Show all posts
Showing posts with label xebia. Show all posts

Monday, March 28, 2011

JIT Adjusted Map

One of the things I am currently working on is a Scala Kyoto Cabinet API. Kyoto Cabinet is a C++ library for accessing a very fast persistent key value store. It has a Java library, and as a consequence you can use it from Scala without a problem, but the standard Java API isn't really all that Scala-esque.

Kyoto Cabinet DB as a Map

A key value store is not all that different than a mutable Map in Scala. You pass in a key, and a value comes out. That means you can actually wrap a Kyoto Cabinet DB object by an implementation of Scala's mutable Map interface.

However, Kyoto Cabinet API only supports storing two types of values: Strings and byte arrays. In both cases, both the key and the value need to be of the same type. That means that - without doing any transformations - you can only wrap its DB object inside a Map[String,String] or a Map[Array[Byte], Array[Byte]]. That clearly leaves a lot left to be desired.

201103270903.jpg

Adapters


So, whatever I am going to do, it should at least (1) allow me to wrap a DB object in a Map and (2) allow me to transform keys and values to the appropriate type. I want to be able to access a Kyoto Cabinet DB as a Map[Int,Date], - if I feel like it.

Instead of addressing both concerns inside a single class, I eventually opted for factoring it out into separate classes. It seemed having a mutable Map abstraction that on the fly transforms its keys and/or values to an alternative type would be useful in other circumstances as well.

The result works like this:

import scala.collection.mutable.Map
import nl.flotsam.collectionables.mutable.AdaptableMap._
val original = Map("1" -> "a", "2" -> "b", "3" -> "e")
val adapted = map.mapKey(_.toInt)(_.toString)
adapted += (1 -> "foobar")

How is this different than just mapping it?


Note that this is definitely not the same as this:

val adapted = original.map{ case (x,y) => (i.toInt, y) }
adapted += (1 -> "foobar")

In the second case, the entire original map is replaced by a new map. After the transformation, all keys have been transformed to an Int. In the former case, operations on 'adapted' taking keys of type Int will transform the key on the fly to the same operation taking a key of type String on the underlying Map. In some cases, transforming the entire Map in a single go might be the better option. But if you have a Kyoto Cabinet database with millions of records, then this is the last thing you want to do.

Show me the code

This is is the latest version of the code. It still is in flux, but you get the picture:

class AdaptedMap[A, B, AA, BB](decorated: Map[AA, BB],
                               a2aa: (A) => AA,
                               b2bb: (B) => BB,
                               aa2a: (AA) => A,
                               bb2b: (BB) => B) extends Map[A, B] {

  def iterator = new AdaptedMapIterator[A, B, AA, BB](decorated.iterator, aa2a, bb2b)

  def get(key: A) = decorated.get(a2aa(key)) map (bb2b(_))

  def -=(key: A) = {
    decorated -= a2aa(key)
    this
  }

  def +=(kv: (A, B)) = {
    val (key, value) = kv
    val adapted = (a2aa(key), b2bb(value))
    decorated += adapted
    this
  }

  def mapKey[C](a2c: (A) => C)(implicit c2a: (C) => A) = {
    def c2aa(c: C) = a2aa(c2a(c))
    def aa2c(aa: AA) = a2c(aa2a(aa))
    new AdaptedMap[C, B, AA, BB](decorated, c2aa, b2bb, aa2c, bb2b)
  }

  def mapValue[C](b2c: (B) => C)(implicit c2b: (C) => B) = {
    def c2bb(c: C) = b2bb(c2b(c))
    def bb2c(bb: BB) = b2c(bb2b(bb))
    new AdaptedMap[A, C, AA, BB](decorated, a2aa, c2bb, aa2a, bb2c)
  }

}

/**
 * Providing the implicit transformation allowing you to transform an existing mutable Map into an AdaptedMap, allowing
 * you to invoke mapKey an mapValue on it.
 */
object AdaptedMap {

  implicit def map2adaptable[A, B](map: Map[A, B]) =
    new AdaptedMap[A, B, A, B](map, identity, identity, identity, identity)

}

Sunday, March 27, 2011

Groovy Int operations in Scala

Today, I briefly opened a book on Groovy, looked at the first line, and it said something like this:

10.times(print it)

... and I realized Scala doesn't have it. Now obviously, you can do this:

(1 to 10) foreach(println(_))

... but that seems slightly more complicated than what Groovy has to offer. No worries. Let's fix that:

class SmartInt(i: Int) {
def times(block: Int => Unit): Unit = (1 to i) foreach { j => block(j) }
def times(block: => Unit): Unit = (1 to i) foreach { i => block }
}
implicit def int2SmartInt(i: Int) = new SmartInt(i)

Now, I can call times on an Int, passing in either a parameterless block, or a function accepting an Int.

3 times println("foo")
3 times { println("foo") }
3 times { println(_) }
3 times { i => println(i) }

Note that SmartInt defines two operations called times(...); one with and one without a parameter. I figured that - in case of times(...) - it would be pretty normal to have it accept a function that ignores the value. If we would only have had the first operation, you would always have to capture the parameter, and then ignore it. With the second version of times(...), you can pass in an arbitrary expression ignoring the parameter carrying the current element.

Thursday, March 24, 2011

Scalatra, SBT and MongoDB

Last week I did a presentation on NoSQL at bol.com. In order to make it a little bit more compelling, I figured I would throw in a demo on how to use MongoDB for real - but I obviously didn't feel like doing it using Java.

So, behold, here is the entire catalog.

import javax.servlet.ServletConfig
import com.mongodb.casbah.Imports._
import scala.xml._
import org.scalatra._
import scala.util.control.Exception._

class WebApp extends ScalatraServlet {

  val missing = "http://cdn2.iconfinder.com/data/icons/august/PNG/Help.png"
  val mongo = MongoConnection()
  val coll = mongo("amazon")("products")

  get("/products") {
    val numberFormat = catching(classOf[NumberFormatException])
    val limit = numberFormat opt request.getParameter("limit").toInt
    val offset = numberFormat opt request.getParameter("offset").toInt
    <html>
    <head>
      <style type="text/css">
        body {{ font-family: Calibri; }}
      </style>
      <title>Products</title>
    </head>
    <body>
    <ul>
    {
      val items = coll.find().limit(limit getOrElse 10).skip(offset getOrElse 0)
      for (item <- items) yield {
        val set = item.as[DBObject]("ItemAttributes")
        val authors = set.getAs[BasicDBList]("Author") map(_.mkString(", ")) getOrElse("No authors")
        val title = set.as[String]("Title")
        val publisher = set.getAs[String]("Publisher") getOrElse("No publisher")
        val img: String = item.getAs[DBObject]("SmallImage") flatMap(_.getAs[String]("URL")) getOrElse(missing)
        <li>
          <img src={img}/>
          <b>{title}</b>
          <span> ({publisher})</span>
          <em> {authors}</em>
        </li>
      }
    }
    </ul>
    </body>
    </html>
  }

}

Okay, it's just a single page, but the first lesson learned is that the combination of Scalatra, SBT and MongoDB gives you a lot of bang for the buck.

Now, I could easily imagine that it is quite hard to digest everything in a single go, so I am going to explain a couple of things.

Lesson learned 2: Dealing with exceptions

One way of dealing with exceptions in Scala is to use a try-catch block. I am not even going to discuss that, because it's pretty much the same as in Java, apart from the fact that in Scala it's less code.

In my particular case however, I had to see if some parameters would be present in the request. I could have created a complicated conditional block containing a try-catch block to capture NumberFormatExceptions, but that would be a lot of code.

Instead I did this:


    val numberFormat = catching(classOf[NumberFormatException])
    val limit = numberFormat opt request.getParameter("limit").toInt
    val offset = numberFormat opt request.getParameter("offset").toInt

First I defined an object called number format by calling a factory method on the Exceptions object, passing in the type of exceptions I want to have handled. The object returned is giving me several options for handling blocks of code that will generate these exceptions. The method I am using here is opt. 

The 'opt' method takes a by-name parameter that will be evaluated by the operation itself. Once it is evaluated, it will wrap the result into a Some, and return that Option. That is, unless the NumberFormatException occurred. In that case it will return a None. And later on, I am calling getOrElse(...) on that option, to supply a default value in case it is a none

So in terms of Java, I am doing this:

int limit = 0;
try {
  limit = Integer.parseInt(request.getParameter("limit"));
} catch (NumberFormatException nfe) {
  limit = 10;
}

The whole construct in Scala is getting reduced to:

val limit = 
  (numberFormat opt request.getParameter("limit").toInt) getOrElse(10)

To me, that looks a lot more sensible. The entire policy for dealing with the exception has now been encoded in a library class.

Lesson learned 3: Accessing MongoDB from Scala is Easy


Accessing MongoDB from Scala seems pretty easy. The Casbah library makes it easy. One of the things that I found a little hard to grasp at first is what to expect from the object model returned from MongoDB. If you don't have a clue what the MongoDB Java drivers would have normally returned, then figuring out what to expect from Casbah can be a little challenging. I think I'm getting the hang of it now though.

These expressions might seem a little bewildering at first:

val publisher = set.getAs[String]("Publisher") getOrElse("No publisher")
val img = item.getAs[DBObject]("SmallImage") flatMap(_.getAs[String]("URL")) getOrElse(missing)

but actually Scala is helping a lot in these cases. In my database schema, a lot of fields are optional. In Java, you would have no other option than getting the value, storing it in a variable, checking if it is null, and then continuing based on the outcome. If your data is tugged away deeply into your document, then you would have pages of code in no time.

In Scala, with Scala's support for Options, it is actually quite easy. No need to capture results in variables before being able to move on. The Option allows you to keep on chaining operations to the result of previous operations. (By the way, the flatMap operation on the second line is going to make sure that instead of getting an Option[Option[String]], I end up getting an Option[String]. On that result,  I can invoke getOrElse and pass a default value.)

Lesson learned 4: Scalatra is simple

In all honesty, I have only scratched its surface, and it's questionable if you would ever create a huge web application this way, but it really is a 'hit-the-ground-running' experience.

Lesson learned 5: SBT makes it even sweeter


This is the way it works. You start sbt, and then type:

jetty-run
~prepare-webapp

From that point on, SBT will examine changes in your sources, and for every change immediately recompile your code and replace the existing web app. Way faster than you would imagine.


Wednesday, January 5, 2011

Clojure versus Scala (part 2)

In my previous post, I went over all of the basics introduced by the authors of "Clojure: functioneel programmeren". In the second part of their first article, they build a Last.fm client, based on the programming concepts introduced before. Let me do the same thing for Scala.

Build environment


Clojure has Leiningen, but I bet Maven is supported as well. Same goes for Scala: there are people using Rake or Gradle, and of course there's SBT (discussed before). However, for people coming from a Java world, Maven works just as well.

So, to start a Scala Maven project, just type this on the commandline:

mvn archetype:generate -DarchetypeCatalog=http://nexus.scala-tools.org/content/groups/public

... and choose the simple Scala project. Fill out the basic details, and you will have something working. (Now, this is a command that you're going to use more often. This might be a good time to turn it into a key macro.)

In order to make sure you can download the proper libraries, you obviously need to add the repo and a dependency:


        
            xebia-maven
            http://os.xebia.com/maven2
        
    
...
    
        
            net.roarsoftware
            last.fm-bindings
            1.0
        

Namespace


The next thing the authors do is talk about namespaces for a while. They mention that in Clojure, namespaces are first-class citizens. I guess the same applies to Scala as well. However, you cannot add new symbols to a package, as Clojure allows you to do.

Listing the top tracks


This is the Clojure version:

(defn top-tracks
  [user-name api-key]
  (User/getTopTracks user-name api-key))

This is the Scala version:

def topTracks(user: String, apiKey: String) =
  getTopTracks(user, apiKey).toSeq

Now, the above only works since I imported all of the User's functions somewhere else (so getTopTracks has been pulled into scope). And I can only invoke toSeq on the results of getTopTracks (normally a java.util.Collection) because of an import of some implicit conversions:

import net.roarsoftware.lastfm.User._
import net.roarsoftware.lastfm.Track
import scala.collection.JavaConversions._

Converting Track to a String


This is the Clojure version:

(defn track-to-str
  [track]
  (let [track-name (.getName track)
        artist-name (.getArtist track))
  str track-name " by " artist-name)))

This is the Scala version:

def trackToString(track: Track) =
    track.getName + " by " + track.getArtist

Numbering a list of items


This is the way the authors do it in Clojure:

(defn number-a-sequence
  [seeq]
  (map-indexed #(str (+ 1 %1) " " %2) seeq))

This is the Scala version. Basically, what it's doing is first create a sequence of tuples, existing of the element itself followed by its index. And then it goes on to map each individual item to a String.

def numberASequence(seq: Seq[Any]) =
  seq.zipWithIndex.map({
    case (elem, index) => (index + 1) + " " + elem
  })

Building HTML


Again, Clojure:

(defn to-html
  [str-seeq]
  (let [ header "<html><body>"
         footer "</body></html>"]
    (str header (reduce str (map #(str % "< br />") str-seeq)) footer )))

And this is Scala:

def toHtml(list: Traversable[Any]) =
  <html>
    <body>{list.map(item => <p>{item}</p>)}</body>
  </html>

In this case, it might be worth noting that the Scala version is actually building XML, whereas the Clojure version is generating a String. Building XML is a little safer: if the text included in your XML contains special characters, then Scala's XML support will guarantee that those special characters are getting escaped properly. (Who knows, perhaps there is an artist called "".)

Conclusion


This basically constitutes everything discussed in the Clojure article. They conclude that a Clojure program like this required less than 25 lines of code. I think it's fair to say that both Scala and Clojure are in good shape in that regard. I have counted the LoC of the Scala version, and it adds up to 22.

So, which one is the winner? I think it's inconclusive. I like the fact that Scala is statically typed, without a significant penalty. The number of lines of code is roughly the same. What do you think?

(Full source code is here.)

Tuesday, January 4, 2011

Clojure versus Scala (part 1)

This is just a brain dump, after having read an excellent article on Clojure by Maurits and Sander in the Dutch Java magazine. Admittedly, without having access to the original article, it probably isn't of any use, but I just wanted to jot it down here, for future reference.

Hello World


Let's start with their simple Hello world example. This is the Clojure version:

(def hello (fn [target] (println (str "Hello " target))))
(hello "Clojure")

or the short version:

(defn hello [target] (println (str "Hello " target)))
(hello "Clojure")

... and this is the Scala version - even shorter:

def hello(target: Any) = println("Hello " + target) 
hello("Scala")

I'm happy to say that the Scala version is shorter in number of characters, even though the type of parameter 'target' had to be specified explicitly.

Doc String


Next they explain the purpose of the doc string. Now this is something that I truly miss in Scala. It would be totally awesome to have the ability to pull up documentation on a function from the REPL, but it doesn't exist. Scala does have scaladoc, but that's all thrown away at compilation time. It should be possible to store some of this in AttributeInfo, but it doesn't.

This is clearly an area in which Clojure's ancestry of LISP and Scala's ancestry of Java shows.

First class functions


Before going any further, we first need to have square function, as defined in the article like this:

(defn square [x] (* x x))

Now, I would love to say that defining square in Scala is just as easy. If you would only define square for integers, it would be:

def square(x: Int) = x * x

... but we obviously want it to work for doubles as well, as I suspect the Clojure version does.

Now, this is the Scala version that supports any numeric type:

def square[T](x: T)(implicit numeric: Numeric[T]): T = 
  numeric.times(x, x)

in Scala 2.8.1 there is a shorthand notation for the same thing:

def square[T: Numeric](x: T) = 
  implicitly[Numeric[T]].times(n, n)

Not quite as intuitive as you would have expected, but it works quite well:

scala> square(4.0)
res1: Double = 16.0

scala> square(5)
res2: Int = 25

Once the square function has been defined, the article explains that functions like square could be passed to a function called twice, with twice being defined like this:

(defn twice [f a] (f (f a))) 
(twice square 2)

This is the Scala version:

def twice[T](a: T)(f: (T) => T) = f(f(a)) 
twice(4)(square)

It may look a little awkward at first, but this is the only way you get it to work without having to pass in additional type information. (Check this for more information.)

Here's an alternative that does not work:

scala> def twice[T](a: T, f: (T) => T) = f(f(a)) 
twice: [T](a: T,f: (T) => T)T

scala> twice(4, square)         
:8: error: could not find implicit value for evidence parameter of type Numeric[T]
       twice(4, square) 
                ^

If you insist on defining twice like this - perhaps because of its resemblance to the Clojure version, then the only option you have is call twice like this:

scala> twice(square[Int], 2)
res70: Int = 16

Data structures


Onward to lists. Clojure example to produce a list:

(list 1 2 3)

... and then a couple of alternatives for doing a Scala List:

List(1, 2, 3)
1 :: 2 :: 3 :: nil

Adding an item to a list in Clojure:

(conj (list 1 2 3) 4)

... and the same in Scala:

4 :: List(1, 2, 3)

In Scala, Lists are typed. You can only add anything to a List if it's a List[Any]. So, this is fine:

scala> 1 :: "a" :: 2 :: "b" :: Nil
res74: List[Any] = List(1, a, 2, b)

scala> List(1, "a", 2, "b")
res75: List[Any] = List(1, a, 2, b)

While Clojure does not allow you to directly access one of the elements unless you use a vector, Scala does allow you to get the nnth element of a list:

scala> val list = List(1, "a", 2, "b")
list: List[Any] = List(1, a, 2, b)

scala> list(0)
res80: Any = 1

scala> list(2)
res81: Any = 2

scala> list(1)
res82: Any = a

Now, even though Scala does allow you to do it - that doesn't mean you should feel encouraged to code like that; it also defines a Vector, which is - just like the one in Clojure - way more optimized for random access.

Maps


This is the Clojure version of defining a map:

(def mymap '{:aap "monkey" :ezel "donkey" :walvis "whale" :onbekend "platypus"})

(:ezel mymap)

... and the Scala version:

val mymap = Map("aap"->"monkey", 
  "ezel"->"donkey", 
  "walvis"->"whale", 
  "onbekend"->"platypus")
mymap("ezel")

Updating the map is a little different than what you are used to, if you come from the Java space. In fact, in that sense, it's not unlike Clojure. In Clojure, you update a map like this:

(assoc mymap :onbekend "unknown")

In Scala it's done slight differently, but the net effect is the same: a new map, containing all of the previously defined entries plus a new one.

scala> mymap + ("onbekend" -> "unknown")
res90: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map((aap,monkey), (ezel,donkey), (walvis,whale), (onbekend,unknown))