Friday, September 5, 2008

GridGain - simple, effective, and made of pure happy juice

Let's talk about real, tangible, hype-less, grid computing. I swear, it's real.

I've heard endless sales pitches about moving your apps to the grid. It's almost quaint how easy they make it sound. Let's pretend, for a moment, you have an application that benefits from parallelization, you have no issues with data availability, and you control your source code (because you're not going to make every application you use magically run in parallel regardless of the sales pitch).

To this end, I've been using GridGain, recently. In a few weeks I might even be able to talk about why, but that's far less important. For those living under a rock, GridGain is a Java implementation of a Google-ish map / reduce system. The effect is that, with very little code and even less configuration, one can cause suitable code to run in parallel, on a grid of JVM nodes. I won't discuss the map / reduce concept in detail, other than to say the idea is to:

  1. take a job, break it up into smaller units of work
  2. assign these units to processing nodes
  3. collect the results from the processing nodes
  4. aggregate the results, as appropriate

Items #1 and #2 refer to the map part of the process, while #3 and #4 refer to the reduce component.

There are two ways to use GridGain. The first method is to annotate methods (either static or otherwise) that should be run on the grid. This is fast to write, and coarse-grained in that the unit of work is (usually, but need not be) the entire method. The second method of running tasks on the grid is by directly accessing the GridGain APIs. This, of course, tends to mean a bit more code, but to some, may be easier to understand and debug. Which method you choose, will probably depend mostly on preference, but could also be influenced by dependencies. I opted for direct invocation via the APIs because I prefer not to futz around with AspectJ - which is how the annotations actually work - and friends unless I'm also using it for other things. GridGain is happy to work with AspectJ, Spring AOP, or JBoss AOP, according to the docs, although they seem to recommend against Spring AOP due to a lack of full functionality or some such. I didn't get too much into the details.

The APIs are surprisingly direct and simple to understand for something that is so seemingly nebulous. This, coupled with some of the best documentation I've seen in some time, makes for a pleasant experience. The only thing that makes my vision a little blurry is the generic-soup that begins to occur, but for the type safety, it's probably worth it; anything that needs to run on a grid is arguably important enough that you can groan through the angle brackets.

Running a GridGain node requires zero configuration, in many cases, although being a Spring application, itself, and being well written with a number of service provider interfaces, one could easily sculpt what amounts to a custom product out of it. What is nice is that one can easily run GridGain within an IDE like Eclipse during testing and development, which significantly speeds the process of rolling out your apps. While on the topic of time saving features, GridGain using peer-based class loading meaning, for simple, self contained, tasks, one need not do any kind of special deployment to execute tasks. The task classes are transmitted and loaded by GridGain itself, without any need for restarts or copying. It's like a magic fairy sprinkled happy-time sparkles in my brain.

Of course, GridGain provides all manners of fancy pants things that I haven't yet explored such as node affinity for processing close to your data, integration with caching products from Oracle and JBoss, different scheduling and communication methods, and similar goodies. I also haven't fully explored implementing custom service providers, but it's mostly because I couldn't find one that I might need that wasn't already implemented.

Here's a bottom of the barrel, super-contrived, example to whet your noggin knobs.

public class TestApp {

  static public void main(String[] args) throws GridException {
    try {
      Grid grid;
      GridTaskFuture future;

      grid = GridFactory.start();

      future = grid.execute(MyTask.class, "Print me...");

      // This causes GridGain to block until all jobs are complete. It
      // also returns the result, if there is one.
      future.get();

    } finally {
      GridFactory.stop(false);
    }
  }
}

/*
 * The task class is going to be the unit of word that gets split
 * into smaller jobs (GridJob) that get distributed and executed.
 * I'm using adapters for both the task as well as the jobs to avoid
 * extra work, so yes, I'm cheating.
 */
public class MyTask extends GridTaskAdapter<String, String> {

  @Override
  protected Collection<? extends GridJob> split(int gridSize, String arg)
    throws GridException {

    List<GridJobAdapter<String>> jobs;

    jobs = new ArrayList<GridJobAdapter<String>>();

    // Build a collection of jobs to run on the grid.

    for (int i = 0; i < 10; i++) {
      jobs.add(new GridJobAdapter<String>(arg) {
        @Override
        public Serializable execute() throw GridException {
          System.out.println("arg:" + getArgument());
        }
      });
    }

    return jobs;
  }


  @Override
  public String reduce(List<GridJobResult> results) throws GridException {
    System.out.println("Not much to reduce when we're printing stuff...");

    return null;
  }
}

I know it's probably considered verbose, still, to the scripting community at large, but when you consider what's happening here, it's pretty bad ass.

It's worth noting that there are other frameworks similar to GridGain, in this space, that deserve a look as well. Hadoop comes to mind. I have no direct experience with them, so I don't want to say either way, but I have heard nice things.

This is all very contrived and still thin on details. My goal wasn't to reproduce a howto, but more so to, well, to glow about GridGain, to some small degree. I encourage everyone to check out GridGain for themselves and find the things relevant to them.

(Brief) Thoughts on Rails

In working with Ruby, and specifically Ruby on Rails, I've developed a few opinions about the underlying design, architecture, but more so the development principals and philosophies. Note that I wouldn't classify myself as a Ruby nor a Rails expert, although I do have extensive experience with other languages and frameworks.

Both Ruby and Rails are philosophy heavy. This isn't inherently a bad thing, in my opinion. These communities are rife with smart people writing smart software; you wouldn't want the opposite, or any other combination thereof. The gripes that I do have are in the places where those smarts are complimented to the point where they metamorphose into arrogance. Please don't get caught up on the word arrogance. Keep going. I just couldn't find a less provocative synonym.

Let Rails be decomposed into three pieces - the model (ActiveRecord), view (ActionView and ERb, in my case), and controller (ActionController) components. When building any Rails application, one is sure to work with all three. Therein, lies the first issue; Rails owns all.

Configuration has use

Rails (and Ruby) favor convention over configuration. I think this is absolutely a good, general, idea. Having worked with things like Spring and Java in general, I would welcome a bit less in the way of configuration, in some cases. That said, in the case of Rails, there are certain things I wish I could configure via (something like) XML rather than code. For instance, the route configuration in Rails (aka: mapping URLs to controllers) is done in code. The code is kind of DSL-ish which makes it a bit better, I suppose, but developing tools to parse and generate said configuration would be painful. One could eval the code and attempt to work with it that way, but that's not necessarily safe, nor would it be free from a huge number of dependencies (like Rails, itself). Another option is to replace routes.rb (the standard file name) with something skeletal that loads configuration from XML or YAML even, but shouldn't that come out of the box? You know, in the interest of the entire Rails community not repeating themselves and all. Please Rails, let me use configuration where it makes sense. This is but one simple example.

Ruby on Rails. It's not new and there's no shortage of both hype and panning from all sides. Nothing is ever the panacea. There is no short cut to building complete, real world, applications. You can make it easier, but you can't take away the requirements that a client or business places on an application. Specifically, a framework, language, or library may never dictate what is possible unless there is a technical limitation or purpose. Philosophy in software is excellent; obstinance is unforgivable.