Wednesday, April 6, 2011

Scala IO versus Guava: The Basics

A friend of mine once said that everything in life was about search and sort. Thinking about it for a while, it seems he's right. Almost. The rest is about IO.

IO in Java

Question is how you do IO. Long, long ago, probably before Java 1.2, Java's IO classes were sketchy, to say the least. Later versions solved some of that (introducing Readers and Writers), and eventually, with Java 1.4, we got Java NIO. If all goes well, we will have the new NIO soon.

IO in external libraries

Nevertheless, in many cases, people still rely on external library to make their lives a little easier. Commons IO has been a popular choice for some time, and at some point, Guava also added some IO abstractions to its libraries.

IO in Scala

It makes you wonder about Scala's IO classes. At first, it doesn't look too good. The 'scala.io' package has a Source class that eases reading files, doing some automatic resource management. That's good. But then it turns out the abstraction returned is an Iterable. And you don't want to have an Iterator traversing the contents of you file. If it bails out, then you are left with an open file handle, leaving your file open for the rest of the existence of your VM instance. In fact, if you're searching StackOverflow, you will quickly find many complaints about scala.io being broken, or about scala.io being still broken.

Scala's New IO

But there might be hope out there. There is a Scala library that seems to address some of the concerns normally addressed by the libraries I mentioned, including decent support for automatic resource management. The name of the library: scala-io. I know. It might be a good idea to change the name.

What does it give you?

Scala IO first of all is built on top of scala-arm, the library providing the foundation for automatic resource management. On top of that, it gives you quite a bit of goodness for reading and writing bytes and text. In this post, I will go over some of its features, comparing it to how it's done in Scala:

Copying an InputStream into a Byte Array

This is how't its done in Guava:

InputStream in = ...;
byte[] buffer = ByteStreams.toByteArray(in);

And this is the same thing, done in Scala IO:

val in: InputStream = ...
Resource.fromInputStream(in).byteArray

Similar, but there is a big difference though. In the first case, the stream is not closed. In the second cases, it is.

InputSuppliers

Guava has an abstraction that allows you to pass an object providing access to an InputStream around. The InputStream itself is not opened yet, but it will get opened once you ask the object give you the Input. The good thing about it is that code that opens the stream can also be responsible for closing it, without having to know how the stream got opened:

public interface InputSupplier<T> {
    T getInput() throws IOException;
}

In a way, a Scala IO Resource is an InputSupplier or/and an OutputSupplier. However, there is no need to implement an interface to defer the construction of the actual underlying object providing or accepting bytes. Instead, you just pass in a block of code that will get evaluated right before you are about to access or write your bytes, leveraging Scala's by-name parameters.

So you could do something like this:

Resource.fromInputStream(new FileInputStream(...))

...without the file already getting opened. As a consequence you can access the Resource multiple times without running into trouble. The FileInputStream will be closed after you have acted on it, but you can still 'reopen' it afterwards.

Filling a byte array

In some cases, all you want to do is fill an existing byte array. In Guava, this is how you would do it:

InputStream in = null;
try {
  in = ...
  byte[] buffer = new buffer[100];
  ByteStreams.readFully(in, buffer);
} finally {
  Closeables.closeQuietly(in);
}

In Scala IO, it's quite a bit easier:

val in: InputStream = ...
val buffer = new Array[Byte](100)
Resource.fromInputStream(in).bytes.copyToArray(buffer)

Note the absence of a try finally block. First a Resource is getting created, then we obtain a bytes view on that object, and then we use Traversable's copyToArray method to copy the data into the array.

Copy InputStream to OutputStream

This is how you do it in Java using Guava:

InputStream in = ...;
OutputStream out = ...;
try {
  ByteStreams.copy(in, out);
} finally {
  Closeables.closeQuietly(in);
  Closeables.closeQuietly(out);
}

This is the same thing done in Scala IO:

val in: InputStream = ...
val out: OutputStream = ...
Resource.fromInputStream(in).copyData(Resource.fromOutputStream(out))

Seems rather verbose. And as a matter of fact, it doesn't need to be this way. If you import a number of implicits, then the above could expressed like this as well:

in.asInput.copyData(out.asOutput)

There are implicits turning the InputStream into an Input object with the copyData operation, and similar implicit conversions from OutputStream to an object upon which you can invoke toOutput.

Reading a String

This is how it's done in Guava:

InputStream in = ...;
String content = null;
try {
  content = CharStreams.toString(new InputStreamReader(in, "UTF-8"));
} finally {
  Closeables.closeQuietly(in);
}

... and this is the same thing, done in Scala IO:

val in: InputStream = ...
val content = Resource.fromInputStream(in).slurpString(Codec.UTF8)

or, alternatively:

val in: InputStream = ...
val content = in.asInput.slurpString(Codec.UTF8)

Monday, March 28, 2011

JIT Adjusted Map

One of the things I am currently working on is a Scala Kyoto Cabinet API. Kyoto Cabinet is a C++ library for accessing a very fast persistent key value store. It has a Java library, and as a consequence you can use it from Scala without a problem, but the standard Java API isn't really all that Scala-esque.

Kyoto Cabinet DB as a Map

A key value store is not all that different than a mutable Map in Scala. You pass in a key, and a value comes out. That means you can actually wrap a Kyoto Cabinet DB object by an implementation of Scala's mutable Map interface.

However, Kyoto Cabinet API only supports storing two types of values: Strings and byte arrays. In both cases, both the key and the value need to be of the same type. That means that - without doing any transformations - you can only wrap its DB object inside a Map[String,String] or a Map[Array[Byte], Array[Byte]]. That clearly leaves a lot left to be desired.

201103270903.jpg

Adapters


So, whatever I am going to do, it should at least (1) allow me to wrap a DB object in a Map and (2) allow me to transform keys and values to the appropriate type. I want to be able to access a Kyoto Cabinet DB as a Map[Int,Date], - if I feel like it.

Instead of addressing both concerns inside a single class, I eventually opted for factoring it out into separate classes. It seemed having a mutable Map abstraction that on the fly transforms its keys and/or values to an alternative type would be useful in other circumstances as well.

The result works like this:

import scala.collection.mutable.Map
import nl.flotsam.collectionables.mutable.AdaptableMap._
val original = Map("1" -> "a", "2" -> "b", "3" -> "e")
val adapted = map.mapKey(_.toInt)(_.toString)
adapted += (1 -> "foobar")

How is this different than just mapping it?


Note that this is definitely not the same as this:

val adapted = original.map{ case (x,y) => (i.toInt, y) }
adapted += (1 -> "foobar")

In the second case, the entire original map is replaced by a new map. After the transformation, all keys have been transformed to an Int. In the former case, operations on 'adapted' taking keys of type Int will transform the key on the fly to the same operation taking a key of type String on the underlying Map. In some cases, transforming the entire Map in a single go might be the better option. But if you have a Kyoto Cabinet database with millions of records, then this is the last thing you want to do.

Show me the code

This is is the latest version of the code. It still is in flux, but you get the picture:

class AdaptedMap[A, B, AA, BB](decorated: Map[AA, BB],
                               a2aa: (A) => AA,
                               b2bb: (B) => BB,
                               aa2a: (AA) => A,
                               bb2b: (BB) => B) extends Map[A, B] {

  def iterator = new AdaptedMapIterator[A, B, AA, BB](decorated.iterator, aa2a, bb2b)

  def get(key: A) = decorated.get(a2aa(key)) map (bb2b(_))

  def -=(key: A) = {
    decorated -= a2aa(key)
    this
  }

  def +=(kv: (A, B)) = {
    val (key, value) = kv
    val adapted = (a2aa(key), b2bb(value))
    decorated += adapted
    this
  }

  def mapKey[C](a2c: (A) => C)(implicit c2a: (C) => A) = {
    def c2aa(c: C) = a2aa(c2a(c))
    def aa2c(aa: AA) = a2c(aa2a(aa))
    new AdaptedMap[C, B, AA, BB](decorated, c2aa, b2bb, aa2c, bb2b)
  }

  def mapValue[C](b2c: (B) => C)(implicit c2b: (C) => B) = {
    def c2bb(c: C) = b2bb(c2b(c))
    def bb2c(bb: BB) = b2c(bb2b(bb))
    new AdaptedMap[A, C, AA, BB](decorated, a2aa, c2bb, aa2a, bb2c)
  }

}

/**
 * Providing the implicit transformation allowing you to transform an existing mutable Map into an AdaptedMap, allowing
 * you to invoke mapKey an mapValue on it.
 */
object AdaptedMap {

  implicit def map2adaptable[A, B](map: Map[A, B]) =
    new AdaptedMap[A, B, A, B](map, identity, identity, identity, identity)

}

Sunday, March 27, 2011

Groovy Int operations in Scala

Today, I briefly opened a book on Groovy, looked at the first line, and it said something like this:

10.times(print it)

... and I realized Scala doesn't have it. Now obviously, you can do this:

(1 to 10) foreach(println(_))

... but that seems slightly more complicated than what Groovy has to offer. No worries. Let's fix that:

class SmartInt(i: Int) {
def times(block: Int => Unit): Unit = (1 to i) foreach { j => block(j) }
def times(block: => Unit): Unit = (1 to i) foreach { i => block }
}
implicit def int2SmartInt(i: Int) = new SmartInt(i)

Now, I can call times on an Int, passing in either a parameterless block, or a function accepting an Int.

3 times println("foo")
3 times { println("foo") }
3 times { println(_) }
3 times { i => println(i) }

Note that SmartInt defines two operations called times(...); one with and one without a parameter. I figured that - in case of times(...) - it would be pretty normal to have it accept a function that ignores the value. If we would only have had the first operation, you would always have to capture the parameter, and then ignore it. With the second version of times(...), you can pass in an arbitrary expression ignoring the parameter carrying the current element.

Saturday, March 26, 2011

Scala Roles

Not sure how I ever could have missed it, but for some reason the 2008 paper on Scala Roles (think DCI done in Scala) is starting to pop up all over the Internet. I just read it, and it actually looks pretty sensible. Just a couple of things that I haven't been able to figure out yet, and keeping here for future reference.

Disconnected Roles

Without going into too much detail, the general idea is that collaborations are instantiated with their roles. So if I instantiate a ThesesSupervision instance, I get an instance of each of the associated roles for free. These roles are stateful. The Student for instance has motivation and wisdom. The SuperVisor has the capability to advise and grade the Student.

201103260821.jpg
Suppose we have two persons: Peter and Paul. If Paul is Peter's supervisor, then on advising Peter, he basically steps into the role of SuperVisor. In this role, he is able to advise Peter.201103260829.jpg

The corresponding Scala code:

(peter as phd.supervisor).grade


What I found surprising is that grading the roles continue to be unaware of the objects playing the role. Therefore, an action affecting the Student will never affect its 'Personality'. In reality, it obviously will. If I would be a student and get a bad grade, it would affect my personal life as well. My happiness would drop, for instance. The solution outlined in the paper seems to be unable to address that concern.

Links

An updated version of the library is here on GitHub. The original paper can be downloaded here.

Thursday, March 24, 2011

Scalatra, SBT and MongoDB

Last week I did a presentation on NoSQL at bol.com. In order to make it a little bit more compelling, I figured I would throw in a demo on how to use MongoDB for real - but I obviously didn't feel like doing it using Java.

So, behold, here is the entire catalog.

import javax.servlet.ServletConfig
import com.mongodb.casbah.Imports._
import scala.xml._
import org.scalatra._
import scala.util.control.Exception._

class WebApp extends ScalatraServlet {

  val missing = "http://cdn2.iconfinder.com/data/icons/august/PNG/Help.png"
  val mongo = MongoConnection()
  val coll = mongo("amazon")("products")

  get("/products") {
    val numberFormat = catching(classOf[NumberFormatException])
    val limit = numberFormat opt request.getParameter("limit").toInt
    val offset = numberFormat opt request.getParameter("offset").toInt
    <html>
    <head>
      <style type="text/css">
        body {{ font-family: Calibri; }}
      </style>
      <title>Products</title>
    </head>
    <body>
    <ul>
    {
      val items = coll.find().limit(limit getOrElse 10).skip(offset getOrElse 0)
      for (item <- items) yield {
        val set = item.as[DBObject]("ItemAttributes")
        val authors = set.getAs[BasicDBList]("Author") map(_.mkString(", ")) getOrElse("No authors")
        val title = set.as[String]("Title")
        val publisher = set.getAs[String]("Publisher") getOrElse("No publisher")
        val img: String = item.getAs[DBObject]("SmallImage") flatMap(_.getAs[String]("URL")) getOrElse(missing)
        <li>
          <img src={img}/>
          <b>{title}</b>
          <span> ({publisher})</span>
          <em> {authors}</em>
        </li>
      }
    }
    </ul>
    </body>
    </html>
  }

}

Okay, it's just a single page, but the first lesson learned is that the combination of Scalatra, SBT and MongoDB gives you a lot of bang for the buck.

Now, I could easily imagine that it is quite hard to digest everything in a single go, so I am going to explain a couple of things.

Lesson learned 2: Dealing with exceptions

One way of dealing with exceptions in Scala is to use a try-catch block. I am not even going to discuss that, because it's pretty much the same as in Java, apart from the fact that in Scala it's less code.

In my particular case however, I had to see if some parameters would be present in the request. I could have created a complicated conditional block containing a try-catch block to capture NumberFormatExceptions, but that would be a lot of code.

Instead I did this:


    val numberFormat = catching(classOf[NumberFormatException])
    val limit = numberFormat opt request.getParameter("limit").toInt
    val offset = numberFormat opt request.getParameter("offset").toInt

First I defined an object called number format by calling a factory method on the Exceptions object, passing in the type of exceptions I want to have handled. The object returned is giving me several options for handling blocks of code that will generate these exceptions. The method I am using here is opt. 

The 'opt' method takes a by-name parameter that will be evaluated by the operation itself. Once it is evaluated, it will wrap the result into a Some, and return that Option. That is, unless the NumberFormatException occurred. In that case it will return a None. And later on, I am calling getOrElse(...) on that option, to supply a default value in case it is a none

So in terms of Java, I am doing this:

int limit = 0;
try {
  limit = Integer.parseInt(request.getParameter("limit"));
} catch (NumberFormatException nfe) {
  limit = 10;
}

The whole construct in Scala is getting reduced to:

val limit = 
  (numberFormat opt request.getParameter("limit").toInt) getOrElse(10)

To me, that looks a lot more sensible. The entire policy for dealing with the exception has now been encoded in a library class.

Lesson learned 3: Accessing MongoDB from Scala is Easy


Accessing MongoDB from Scala seems pretty easy. The Casbah library makes it easy. One of the things that I found a little hard to grasp at first is what to expect from the object model returned from MongoDB. If you don't have a clue what the MongoDB Java drivers would have normally returned, then figuring out what to expect from Casbah can be a little challenging. I think I'm getting the hang of it now though.

These expressions might seem a little bewildering at first:

val publisher = set.getAs[String]("Publisher") getOrElse("No publisher")
val img = item.getAs[DBObject]("SmallImage") flatMap(_.getAs[String]("URL")) getOrElse(missing)

but actually Scala is helping a lot in these cases. In my database schema, a lot of fields are optional. In Java, you would have no other option than getting the value, storing it in a variable, checking if it is null, and then continuing based on the outcome. If your data is tugged away deeply into your document, then you would have pages of code in no time.

In Scala, with Scala's support for Options, it is actually quite easy. No need to capture results in variables before being able to move on. The Option allows you to keep on chaining operations to the result of previous operations. (By the way, the flatMap operation on the second line is going to make sure that instead of getting an Option[Option[String]], I end up getting an Option[String]. On that result,  I can invoke getOrElse and pass a default value.)

Lesson learned 4: Scalatra is simple

In all honesty, I have only scratched its surface, and it's questionable if you would ever create a huge web application this way, but it really is a 'hit-the-ground-running' experience.

Lesson learned 5: SBT makes it even sweeter


This is the way it works. You start sbt, and then type:

jetty-run
~prepare-webapp

From that point on, SBT will examine changes in your sources, and for every change immediately recompile your code and replace the existing web app. Way faster than you would imagine.


Saturday, February 12, 2011

Properties of relations, done in Scala

I am currently reading "Elements of Distributed Computing". In order to eventually explain different models for distributed computing, they start with introducing the notion of total order and partial order. A relation defines a reflexive partial order if its reflexive, antisymmetric and transitive.

If X is a set of things, then a relation R over X is a subset of X * X. For example, let

scala> val X = Set('a, 'b, 'c)
X: scala.collection.immutable.Set[Symbol] = Set('a, 'b, 'c)

Then, one possible relation is:

scala> val R = Set(('a, 'c), ('a, 'a), ('b, 'c'), ('c, 'a))
R: scala.collection.immutable.Set[(Symbol, Any)] = Set(('a,'c), ('a,'a), ('b,c), ('c,'a))

A relation is reflexive if for each x belonging to X, (x, x) belongs to R. How would you check if a relation is reflexive in Scala? The trouble is, with the "natural" way of defining a relation as a Set of tuples in Scala, there is no way of telling which values are part of X, just like that. However, if the relation is completely defined by all values in R, then we can determine the values in X simply by combining the values in the domain (the first value in the tuple) and the domain (the second value in the tuple).

If X is not provided than we need a way to derive X from R. In order to get there, I am going to write two functions:

scala> def toSet[T](xs: (T, T)): Set[T] = Set(xs._1, xs._2)  
toSet: [T](xs: (T, T))Set[T]

scala> def domainAndRangeOf[T](xs: Set[(T, T)]): Set[T] = xs flatMap(toSet) 
domainAndRangeOf: [T](xs: Set[(T, T)])Set[T]

So, the set on which the binary relation R is defined is:

scala> val X = domainAndRangeOf(R)                                         
X: Set[Symbol] = Set('a, 'c, 'b)

A relation is reflexive if x R x for all x in X.

scala> def isReflexive[T](xs: Set[(T, T)]): Boolean = 
  domainAndRangeOf(xs) forall(x => xs contains (x, x))
isReflexive: [T](xs: Set[(T, T)])Boolean

scala> isReflexive(Set((1, 1), (1, 2), (2,2)))        
res17: Boolean = true

A relation is antisymmetric if for x, y for which x R y there is no y R x.

scala> def flip[T](t: (T,T)) = (t._2, t._1)                                                             
flip: [T](t: (T, T))(T, T)

scala> def isAntiSymmetric[T](xs: Set[(T,T)]): Boolean = 
  xs forall (x => x == flip(x) || !(xs contains flip(x)))
isAntiSymmetric: [T](xs: Set[(T, T)])Boolean

scala> isAntiSymmetric(Set((1, 2), (1, 3), (1, 4)))
res19: Boolean = true

scala> isAntiSymmetric(Set((1, 2), (1, 3), (2, 1)))
res20: Boolean = false

A relation is transitive if for all x, y, z for with x R y and y R z, x R z is also defined.

scala> def isTransitive[T](xs: Set[(T,T)]) = {
     |   xs forall { x =>
     |     xs filter(y => y._1 == x._2) forall (y => xs contains ((x._1, y._2)))
     |   }
     | }
isTransitive: [T](xs: Set[(T, T)])Boolean

scala> isTransitive(Set((1, 2), (2, 3)))
res21: Boolean = false

scala> isTransitive(Set((1, 2), (2, 3), (1, 3)))
res22: Boolean = true

A relation R defines a reflexive partial order if it is both reflexive, transitive and antisymmetric:

scala> def isReflexivePartialOrder[T](xs: Set[(T,T)]): Boolean = 
     | isReflexive(xs) && isTransitive(xs) && isAntiSymmetric(xs)

scala> isReflexivePartialOrder(Set((1, 2), (1, 4), (2, 4), (1, 1), (2, 2), (4, 4)))
res31: Boolean = true

Wednesday, January 5, 2011

Clojure versus Scala (part 2)

In my previous post, I went over all of the basics introduced by the authors of "Clojure: functioneel programmeren". In the second part of their first article, they build a Last.fm client, based on the programming concepts introduced before. Let me do the same thing for Scala.

Build environment


Clojure has Leiningen, but I bet Maven is supported as well. Same goes for Scala: there are people using Rake or Gradle, and of course there's SBT (discussed before). However, for people coming from a Java world, Maven works just as well.

So, to start a Scala Maven project, just type this on the commandline:

mvn archetype:generate -DarchetypeCatalog=http://nexus.scala-tools.org/content/groups/public

... and choose the simple Scala project. Fill out the basic details, and you will have something working. (Now, this is a command that you're going to use more often. This might be a good time to turn it into a key macro.)

In order to make sure you can download the proper libraries, you obviously need to add the repo and a dependency:


        
            xebia-maven
            http://os.xebia.com/maven2
        
    
...
    
        
            net.roarsoftware
            last.fm-bindings
            1.0
        

Namespace


The next thing the authors do is talk about namespaces for a while. They mention that in Clojure, namespaces are first-class citizens. I guess the same applies to Scala as well. However, you cannot add new symbols to a package, as Clojure allows you to do.

Listing the top tracks


This is the Clojure version:

(defn top-tracks
  [user-name api-key]
  (User/getTopTracks user-name api-key))

This is the Scala version:

def topTracks(user: String, apiKey: String) =
  getTopTracks(user, apiKey).toSeq

Now, the above only works since I imported all of the User's functions somewhere else (so getTopTracks has been pulled into scope). And I can only invoke toSeq on the results of getTopTracks (normally a java.util.Collection) because of an import of some implicit conversions:

import net.roarsoftware.lastfm.User._
import net.roarsoftware.lastfm.Track
import scala.collection.JavaConversions._

Converting Track to a String


This is the Clojure version:

(defn track-to-str
  [track]
  (let [track-name (.getName track)
        artist-name (.getArtist track))
  str track-name " by " artist-name)))

This is the Scala version:

def trackToString(track: Track) =
    track.getName + " by " + track.getArtist

Numbering a list of items


This is the way the authors do it in Clojure:

(defn number-a-sequence
  [seeq]
  (map-indexed #(str (+ 1 %1) " " %2) seeq))

This is the Scala version. Basically, what it's doing is first create a sequence of tuples, existing of the element itself followed by its index. And then it goes on to map each individual item to a String.

def numberASequence(seq: Seq[Any]) =
  seq.zipWithIndex.map({
    case (elem, index) => (index + 1) + " " + elem
  })

Building HTML


Again, Clojure:

(defn to-html
  [str-seeq]
  (let [ header "<html><body>"
         footer "</body></html>"]
    (str header (reduce str (map #(str % "< br />") str-seeq)) footer )))

And this is Scala:

def toHtml(list: Traversable[Any]) =
  <html>
    <body>{list.map(item => <p>{item}</p>)}</body>
  </html>

In this case, it might be worth noting that the Scala version is actually building XML, whereas the Clojure version is generating a String. Building XML is a little safer: if the text included in your XML contains special characters, then Scala's XML support will guarantee that those special characters are getting escaped properly. (Who knows, perhaps there is an artist called "".)

Conclusion


This basically constitutes everything discussed in the Clojure article. They conclude that a Clojure program like this required less than 25 lines of code. I think it's fair to say that both Scala and Clojure are in good shape in that regard. I have counted the LoC of the Scala version, and it adds up to 22.

So, which one is the winner? I think it's inconclusive. I like the fact that Scala is statically typed, without a significant penalty. The number of lines of code is roughly the same. What do you think?

(Full source code is here.)