Friday, September 5, 2008

GridGain - simple, effective, and made of pure happy juice

Let's talk about real, tangible, hype-less, grid computing. I swear, it's real.

I've heard endless sales pitches about moving your apps to the grid. It's almost quaint how easy they make it sound. Let's pretend, for a moment, you have an application that benefits from parallelization, you have no issues with data availability, and you control your source code (because you're not going to make every application you use magically run in parallel regardless of the sales pitch).

To this end, I've been using GridGain, recently. In a few weeks I might even be able to talk about why, but that's far less important. For those living under a rock, GridGain is a Java implementation of a Google-ish map / reduce system. The effect is that, with very little code and even less configuration, one can cause suitable code to run in parallel, on a grid of JVM nodes. I won't discuss the map / reduce concept in detail, other than to say the idea is to:

  1. take a job, break it up into smaller units of work
  2. assign these units to processing nodes
  3. collect the results from the processing nodes
  4. aggregate the results, as appropriate

Items #1 and #2 refer to the map part of the process, while #3 and #4 refer to the reduce component.

There are two ways to use GridGain. The first method is to annotate methods (either static or otherwise) that should be run on the grid. This is fast to write, and coarse-grained in that the unit of work is (usually, but need not be) the entire method. The second method of running tasks on the grid is by directly accessing the GridGain APIs. This, of course, tends to mean a bit more code, but to some, may be easier to understand and debug. Which method you choose, will probably depend mostly on preference, but could also be influenced by dependencies. I opted for direct invocation via the APIs because I prefer not to futz around with AspectJ - which is how the annotations actually work - and friends unless I'm also using it for other things. GridGain is happy to work with AspectJ, Spring AOP, or JBoss AOP, according to the docs, although they seem to recommend against Spring AOP due to a lack of full functionality or some such. I didn't get too much into the details.

The APIs are surprisingly direct and simple to understand for something that is so seemingly nebulous. This, coupled with some of the best documentation I've seen in some time, makes for a pleasant experience. The only thing that makes my vision a little blurry is the generic-soup that begins to occur, but for the type safety, it's probably worth it; anything that needs to run on a grid is arguably important enough that you can groan through the angle brackets.

Running a GridGain node requires zero configuration, in many cases, although being a Spring application, itself, and being well written with a number of service provider interfaces, one could easily sculpt what amounts to a custom product out of it. What is nice is that one can easily run GridGain within an IDE like Eclipse during testing and development, which significantly speeds the process of rolling out your apps. While on the topic of time saving features, GridGain using peer-based class loading meaning, for simple, self contained, tasks, one need not do any kind of special deployment to execute tasks. The task classes are transmitted and loaded by GridGain itself, without any need for restarts or copying. It's like a magic fairy sprinkled happy-time sparkles in my brain.

Of course, GridGain provides all manners of fancy pants things that I haven't yet explored such as node affinity for processing close to your data, integration with caching products from Oracle and JBoss, different scheduling and communication methods, and similar goodies. I also haven't fully explored implementing custom service providers, but it's mostly because I couldn't find one that I might need that wasn't already implemented.

Here's a bottom of the barrel, super-contrived, example to whet your noggin knobs.

public class TestApp {

  static public void main(String[] args) throws GridException {
    try {
      Grid grid;
      GridTaskFuture future;

      grid = GridFactory.start();

      future = grid.execute(MyTask.class, "Print me...");

      // This causes GridGain to block until all jobs are complete. It
      // also returns the result, if there is one.
      future.get();

    } finally {
      GridFactory.stop(false);
    }
  }
}

/*
 * The task class is going to be the unit of word that gets split
 * into smaller jobs (GridJob) that get distributed and executed.
 * I'm using adapters for both the task as well as the jobs to avoid
 * extra work, so yes, I'm cheating.
 */
public class MyTask extends GridTaskAdapter<String, String> {

  @Override
  protected Collection<? extends GridJob> split(int gridSize, String arg)
    throws GridException {

    List<GridJobAdapter<String>> jobs;

    jobs = new ArrayList<GridJobAdapter<String>>();

    // Build a collection of jobs to run on the grid.

    for (int i = 0; i < 10; i++) {
      jobs.add(new GridJobAdapter<String>(arg) {
        @Override
        public Serializable execute() throw GridException {
          System.out.println("arg:" + getArgument());
        }
      });
    }

    return jobs;
  }


  @Override
  public String reduce(List<GridJobResult> results) throws GridException {
    System.out.println("Not much to reduce when we're printing stuff...");

    return null;
  }
}

I know it's probably considered verbose, still, to the scripting community at large, but when you consider what's happening here, it's pretty bad ass.

It's worth noting that there are other frameworks similar to GridGain, in this space, that deserve a look as well. Hadoop comes to mind. I have no direct experience with them, so I don't want to say either way, but I have heard nice things.

This is all very contrived and still thin on details. My goal wasn't to reproduce a howto, but more so to, well, to glow about GridGain, to some small degree. I encourage everyone to check out GridGain for themselves and find the things relevant to them.

No comments: