Let's talk about real, tangible, hype-less, grid
computing. I swear,
it's real.
I've heard endless sales pitches about moving your apps to the grid.
It's
almost quaint how easy they make it sound. Let's pretend, for a moment, you have
an application that benefits from parallelization, you have no issues with data
availability, and you control your source code (because you're not going to
make every application you use magically run in parallel regardless of the sales
pitch).
To this end, I've been using GridGain, recently.
In a few weeks I might even be
able to talk about why, but that's far less important. For those living under
a rock, GridGain is a Java implementation of a Google-ish map / reduce system.
The effect is that, with very little code and even less configuration, one can
cause suitable code to run in parallel, on a grid of JVM nodes. I won't discuss
the map / reduce concept in detail, other than to say the idea is to:
- take a job, break it up into smaller units of work
- assign these units to processing nodes
- collect the results from the processing nodes
- aggregate the results, as appropriate
Items #1 and #2 refer to the map
part of the process, while
#3 and #4 refer to the reduce
component.
There are two ways to use GridGain. The first method is to annotate methods
(either static or otherwise) that should be run on the grid. This is fast
to write, and coarse-grained in that the unit of work is (usually, but need not be)
the entire method. The second method of running tasks on the grid is by directly
accessing the GridGain APIs. This, of course, tends to mean a bit more code, but
to some, may be easier to understand and debug. Which method you choose, will
probably depend mostly on preference, but could also be influenced by dependencies.
I opted for direct invocation via the APIs because I prefer not to futz around
with AspectJ - which is how the annotations actually work - and friends unless I'm
also using it for other things. GridGain is happy to work with AspectJ, Spring AOP,
or JBoss AOP, according to the docs, although they seem to recommend against Spring
AOP due to a lack of full functionality
or some such. I didn't get too much
into the details.
The APIs are surprisingly direct and simple to understand for something that is
so seemingly nebulous. This, coupled with some of the best documentation I've
seen in some time, makes for a pleasant experience. The only thing that makes
my vision a little blurry is the generic-soup that begins to occur, but for
the type safety, it's probably worth it; anything that needs to run on a grid
is arguably important enough that you can groan through the angle brackets.
Running a GridGain node requires zero configuration, in many cases, although being
a Spring application, itself, and being well written with a number of service
provider interfaces, one could easily sculpt what amounts to a custom product out
of it. What is nice is that one can easily run GridGain within an IDE like Eclipse
during testing and development, which significantly speeds the process of rolling
out your apps. While on the topic of time saving features, GridGain using peer-based
class loading meaning, for simple, self contained, tasks, one need not do any kind
of special deployment to execute tasks. The task classes are transmitted and loaded
by GridGain itself, without any need for restarts or copying. It's like a magic
fairy sprinkled happy-time sparkles in my brain.
Of course, GridGain provides all manners of fancy pants things that I haven't yet
explored such as node affinity for processing close to
your data, integration
with caching products from Oracle and JBoss, different scheduling and communication
methods, and similar goodies. I also haven't fully explored implementing custom service
providers, but it's mostly because I couldn't find one that I might need that wasn't
already implemented.
Here's a bottom of the barrel, super-contrived, example to whet your noggin knobs.
public class TestApp {
static public void main(String[] args) throws GridException {
try {
Grid grid;
GridTaskFuture future;
grid = GridFactory.start();
future = grid.execute(MyTask.class, "Print me...");
// This causes GridGain to block until all jobs are complete. It
// also returns the result, if there is one.
future.get();
} finally {
GridFactory.stop(false);
}
}
}
/*
* The task class is going to be the unit of word that gets split
* into smaller jobs (GridJob) that get distributed and executed.
* I'm using adapters for both the task as well as the jobs to avoid
* extra work, so yes, I'm cheating.
*/
public class MyTask extends GridTaskAdapter<String, String> {
@Override
protected Collection<? extends GridJob> split(int gridSize, String arg)
throws GridException {
List<GridJobAdapter<String>> jobs;
jobs = new ArrayList<GridJobAdapter<String>>();
// Build a collection of jobs to run on the grid.
for (int i = 0; i < 10; i++) {
jobs.add(new GridJobAdapter<String>(arg) {
@Override
public Serializable execute() throw GridException {
System.out.println("arg:" + getArgument());
}
});
}
return jobs;
}
@Override
public String reduce(List<GridJobResult> results) throws GridException {
System.out.println("Not much to reduce when we're printing stuff...");
return null;
}
}
I know it's probably considered verbose, still, to the scripting community
at large, but when you consider what's happening here, it's pretty bad ass.
It's worth noting that there are other frameworks similar to GridGain, in
this space, that deserve a look as well. Hadoop comes to mind. I have no
direct experience with them, so I don't want to say either way, but I have
heard nice things.
This is all very contrived and still thin on details. My goal wasn't to
reproduce a howto, but more so to, well, to glow about GridGain, to some
small degree. I encourage everyone to check out GridGain for themselves and
find the things relevant to them.