Wednesday, April 6, 2011

Scala IO versus Guava: The Basics

A friend of mine once said that everything in life was about search and sort. Thinking about it for a while, it seems he's right. Almost. The rest is about IO.

IO in Java

Question is how you do IO. Long, long ago, probably before Java 1.2, Java's IO classes were sketchy, to say the least. Later versions solved some of that (introducing Readers and Writers), and eventually, with Java 1.4, we got Java NIO. If all goes well, we will have the new NIO soon.

IO in external libraries

Nevertheless, in many cases, people still rely on external library to make their lives a little easier. Commons IO has been a popular choice for some time, and at some point, Guava also added some IO abstractions to its libraries.

IO in Scala

It makes you wonder about Scala's IO classes. At first, it doesn't look too good. The 'scala.io' package has a Source class that eases reading files, doing some automatic resource management. That's good. But then it turns out the abstraction returned is an Iterable. And you don't want to have an Iterator traversing the contents of you file. If it bails out, then you are left with an open file handle, leaving your file open for the rest of the existence of your VM instance. In fact, if you're searching StackOverflow, you will quickly find many complaints about scala.io being broken, or about scala.io being still broken.

Scala's New IO

But there might be hope out there. There is a Scala library that seems to address some of the concerns normally addressed by the libraries I mentioned, including decent support for automatic resource management. The name of the library: scala-io. I know. It might be a good idea to change the name.

What does it give you?

Scala IO first of all is built on top of scala-arm, the library providing the foundation for automatic resource management. On top of that, it gives you quite a bit of goodness for reading and writing bytes and text. In this post, I will go over some of its features, comparing it to how it's done in Scala:

Copying an InputStream into a Byte Array

This is how't its done in Guava:

InputStream in = ...;
byte[] buffer = ByteStreams.toByteArray(in);

And this is the same thing, done in Scala IO:

val in: InputStream = ...
Resource.fromInputStream(in).byteArray

Similar, but there is a big difference though. In the first case, the stream is not closed. In the second cases, it is.

InputSuppliers

Guava has an abstraction that allows you to pass an object providing access to an InputStream around. The InputStream itself is not opened yet, but it will get opened once you ask the object give you the Input. The good thing about it is that code that opens the stream can also be responsible for closing it, without having to know how the stream got opened:

public interface InputSupplier<T> {
    T getInput() throws IOException;
}

In a way, a Scala IO Resource is an InputSupplier or/and an OutputSupplier. However, there is no need to implement an interface to defer the construction of the actual underlying object providing or accepting bytes. Instead, you just pass in a block of code that will get evaluated right before you are about to access or write your bytes, leveraging Scala's by-name parameters.

So you could do something like this:

Resource.fromInputStream(new FileInputStream(...))

...without the file already getting opened. As a consequence you can access the Resource multiple times without running into trouble. The FileInputStream will be closed after you have acted on it, but you can still 'reopen' it afterwards.

Filling a byte array

In some cases, all you want to do is fill an existing byte array. In Guava, this is how you would do it:

InputStream in = null;
try {
  in = ...
  byte[] buffer = new buffer[100];
  ByteStreams.readFully(in, buffer);
} finally {
  Closeables.closeQuietly(in);
}

In Scala IO, it's quite a bit easier:

val in: InputStream = ...
val buffer = new Array[Byte](100)
Resource.fromInputStream(in).bytes.copyToArray(buffer)

Note the absence of a try finally block. First a Resource is getting created, then we obtain a bytes view on that object, and then we use Traversable's copyToArray method to copy the data into the array.

Copy InputStream to OutputStream

This is how you do it in Java using Guava:

InputStream in = ...;
OutputStream out = ...;
try {
  ByteStreams.copy(in, out);
} finally {
  Closeables.closeQuietly(in);
  Closeables.closeQuietly(out);
}

This is the same thing done in Scala IO:

val in: InputStream = ...
val out: OutputStream = ...
Resource.fromInputStream(in).copyData(Resource.fromOutputStream(out))

Seems rather verbose. And as a matter of fact, it doesn't need to be this way. If you import a number of implicits, then the above could expressed like this as well:

in.asInput.copyData(out.asOutput)

There are implicits turning the InputStream into an Input object with the copyData operation, and similar implicit conversions from OutputStream to an object upon which you can invoke toOutput.

Reading a String

This is how it's done in Guava:

InputStream in = ...;
String content = null;
try {
  content = CharStreams.toString(new InputStreamReader(in, "UTF-8"));
} finally {
  Closeables.closeQuietly(in);
}

... and this is the same thing, done in Scala IO:

val in: InputStream = ...
val content = Resource.fromInputStream(in).slurpString(Codec.UTF8)

or, alternatively:

val in: InputStream = ...
val content = in.asInput.slurpString(Codec.UTF8)

No comments:

Post a Comment