Wednesday, January 21, 2009

Another Night with GridGain

Earlier this evening, I had a chance to attend a presentation on GridGain at this month's NYC JavaSIG at the Google Engineering building here in New York City, NY, US. I've written about GridGain before, but if you haven't read my thoughts on it, I'll sum it up; I'm a fan.

I got a chance to talk to Nikita Ivanov, if only briefly. Nice enough guy. What I like about his presentation most is the lack of - and there's probably no other way to really say it - bullshit. Sure, he uses words like grid and cloud which is always suspect, but in this case, he provides an actual, single slide, definition of what it means to him and GridGain.

Grid Computing = Compute Grid + Data Grid

Makes sense if the terms compute grid and data grid mean something to you. Nikita seems to stick to (what I think is) the standard definition of a data grid - a network of data storage machines containing partitioned or distributed storage. I'm paraphrasing a bit here, but mostly because I don't recall his exact wording. The compute grid portion of that should be obvious. I'm not providing any hints on that one.

Cloud computing is defined by Ivanov as follows.

Cloud Computing = Grid Computing + Data Center Automation

This is also simple and concise. So we get grid computing; at least in the context of GridGain. Data center automation, in this case (and Ivanov's opinion) covers not just the normal stuff, but specifically the creation and shutdown of instances of machines. This is generally stuck behind an API such as Amazon's EC2 and related services with the goal that one can have a greater degree of flexibility. While I'm not really in love with EC2 as some others may be (not referring to Ivanov, specifically, just the sometimes expressed idea that EC2 is solution to all woes) it is a readily available cloud environment that one can play with. I'm glad it exists.

A panacea? Of course not. Honestly, what could be, short of a super code-monkey falling from the sky to do your evil bidding? Ok, maybe an intern.

The point, I think, is that this kind of functionality - the ability to perform massive, distributed, parallelized computing without six-plus figures worth of hardware and software - is both simple and significant.

As usual, no grand epiphany here on my part... just some commentary on one of the areas where we can push performance in real world applications. That, of course, being something we should always be looking to do. Thanks to GridGain, Ivanov, NY JavaSIG, Google, the JavaSIG sponsors, and my employers for not getting annoyed that I suckered my team into cutting out early to go to the event.

Friday, January 9, 2009

Of Maven Dependencies and Repositories

I think of Maven the same way I tend to think of Git; excellent features, but just a little more complicated and obtuse than is really reasonable for the task. I know that's insanely unpopular to say about Git, but luckily this isn't about Git.

Recently, I was converting a project at work to Maven (from Ant) as an experiment. This is a relatively standard, mid-sized, Java project that makes heavy use of a number of what I would consider common Java libraries. In our case, we use Spring very heavily, along with other staples like Hibernate. One of Maven's killer features is the ability to resolve dependencies and pull the correct versions from the Maven Repository, but we already know that.

I found the selection of dependencies from the central repository to be one of the worst things I've had to do in recent days. I was spending more time setting up different repositories and wading through the duplicate packages than I was enjoying the benefits of such features. It's almost more of a headache than doing it all by hand.

In OSGI bundles, which have some similarities in dependency specification, at least, with Maven, one has the ability to express dependencies in a few ways. The obvious unit of dependency is specifying another bundle. This is, effectively, the same as Maven. OSGI bundles, though, may also opt to specify only what Java packages the bundle imports and let the runtime figure out what bundle to take those packages from. This is similar to how many Linux package managers operate when more than one package can fulfill a dependency and it's traditionally referred to as a virtual dependency.

Maybe what we need from Maven is the notion of the virtual dependency. A Maven POM could specify virtual packages as dependencies that could be filled by any one of a number of providers. Java lends itself to this very well because the majority of standards define the APIs with service providers being distributed separately. Think of things like JPA (provided by Hibernate EM), JAXP (Xerces and friends), and so on. I suppose it's a little different because Java developers want to pick an implementation for a specific reason, but having virtual dependencies would eliminate many of the overly specific dependency graphs created when dealing with complex packages such as Spring, for instance.

It's worth noting that the most significant issue I have with Maven is the quality of the metadata. It is just plain awful. Some of the things I ran into were:

  • Packages that weren't updated with bug fixes or recent versions
  • Many copies of the same package with different names and odd descrepancies in versions
  • Missing (or unavailable) dependencies
  • When using Spring's Maven repositories, duplicates of the dependencies are pulled in because Spring depends on versions not in the central rep.
  • Because Spring came from Spring's repository, packages like GridGain which depend on Spring, grab the version from the central repository, but Spring Integration which is only available from Spring's rep has a dependency on the version of Spring from Spring's rep... AARRRRRRRRRRRRGGGGGGGGGGHHHHH!

I get that this is hard and it requires a lot of coordination. I get that I could repackage things in my local repository or a corporate shared repository. Should I have to? A lot of the advantage of Maven is lost when one has to manually follow dependencies to figure out why there are two (full) versions of Spring Core in the project. It's annoying, wasteful, and prone to error.

Maven, I want to like you. Really I do. But like a real, live, flesh and blood human, you make it so difficult sometimes, just like your sister (Git).

Wednesday, January 7, 2009

Declarative Concurrency in Java

Good, solid, safe, effective concurrent programming is hard. Modern languages and paradigms make it easier, but for most, it's still a challenge to get right, right away. Many people have predicted the end of the great Ghz race. They're probably right. I don't have any great insight into the CPU design community. Honestly, it just doesn't hold my attention. Multi-core systems are all the rage these days, though, and that's pretty damn cool. None of this is new; plenty of smarter people than myself have pointed it out.

One of the purported benefits of functional programming is how it lends itself to concurrent programming. Luckily, I work with a smart guy who's both patient and polite enough to talk to me about FP without serving kool-aid (thanks Adam). Many of those conversations entail discussion about state, immutability, and side effects in software implementation. This, of course, leads me to think about how some of these things apply to one of our weapons of choice where we work - Java.

Java accomplishes concurrency via thread objects. Big deal; nothing new here. Most of the confusion comes into play not when deciding what should run concurrently - that's usually obvious - but when figuring out how to protect shared state. Again, in Java-land, we do this with different types of locks, either implicitly with synchronized blocks or explicitly with the grab bag of fun from the java.util.concurrent.locks package. Many of the Sun docs talk about how we use locks to establish happens before relationships between points in code. What's interesting is that this language seems so natural and simple. So why is lock management such a pain?

Maybe imperative locking isn't the right approach. Maybe, there's a more natural way to establish a happens before relationship. It sounds like dependency declaration. I'm wondering if we can't find a way to declare dependencies within source code. Maybe there's a way to declare dependencies, with something like annotations, where instrumentation can infer what we're looking for. This, of course, is sugar for what we have now, but I don't think sugar is always bad.

 public class MyClass {

   private int counter = 0;

   @Concurrent( stateful = true )
   public void execute() {
     /* Do something that might touch shared state. */
     this.counter++;
   }

   @Concurrent( unitName = "otherExecute", stateful = false )
   public void otherExecute(String someArg) {
     /* Do something that promises not to alter ourselves. */
   }

   @Concurrent(
     unitName      = "somethingElse",
     stateful      = true,
     happensBefore = "otherExecute"
   )
   public void somethingElse() {
     /* This can be run concurrently, could touch state, but
      * must happen before "otherExecute" is called.
      */
   }

   static public void main(String[] args) {
     ConcurrentController controller;
     ConcurrentUnit       unit1;
     ConcurrentUnit       unit2;

     controller = ConcurrencyController.forClass(MyClass.class);

     unit1 = controller.getUnit("somethingElse").setThreadPoolSize(10);
     unit2 = controller.getUnit("otherExecute").setThreadPoolSize(5);

     unit1.start();
     unit2.start();
   }
 }

The @Concurrent annotations would instruct an instrumentation library to perform an operation in parallel. The hints stateful and happensBefore could be used to perform additional automatic member variable monitor acquisition or something equally snazzy. The unitNames could be used to grab a handle, of sorts, to a concurrent unit of work and be used to establish relationships or to report on concurrency plans (which could be similar to an RDBMS query execution plan). Who knows... I'm tossing ideas around.

I don't think it covers every situation. In fact, I'm sure it doesn't cover everything. It's beyond flawed and probably not possible. I'm just trying to get some wheels turning. The goal is to have simpler, coarse-grained, declarative concurrency definition that can be externalized.

I'm intrigued by the idea of simple concurrency models that don't remove the fine-grained control given to us by the language and APIs. If concurrency isn't going away, it has to get easier for the majority of people to do it correctly.

I'm especially interested in feedback on this.

Thursday, January 1, 2009

Agile Languages and Developer Experience

I just finished reading Jamis Buck's Legos, Play-Doh, and Programming article where he discusses some significant differences in methodologies between languages like Ruby and Java. It got me thinking.

Jamis makes a number of points about how the Ruby way is generally more dynamic and less prone to specialized components. This has a lot to do with the points of extension in Ruby and the malleability of the language, itself. For instance, Ruby's ability to inject methods into existing classes or the convention of duck typing in standard libraries allow developers to pose as different types of objects and get away with a lot more than a language like Java. There are tons of arguments in both communities over which approach is better. I call shenanigans.

It's my hypothesis that both are great. I know that sounds like a cheap way to duck the flying artillery between the camps, but it's the truth. Hey, I wrote Perl for years so don't pretend to have invented the notion of language flexibility with me. (The previous sentence just started a million replies; let's pretend I've already read them all because I know what you're going to say.) I've used Ruby, Python, Perl, Java, C, C++, and others on medium to large sized, real projects so I think I can be objective on this one.

It's my experience that the problem isn't with either approach, really. I've met ninja-good developers on all sides. The wall I have run into, on the other hand, is that the amount of rope given to a lesser experienced or disciplined developer is almost directly proportional to the insanity they can manufacture. I don't think one can make a blanket statement on language suitability one way or the other.

In the past, I've used dynamic languages (heavily) for my own projects. No matter how right or wrong my own code is, it's mine and I get it, most of the time. I have a basis in C and tend to do a lot of mental book keeping when I code. It's a holdover from days when I didn't have things to sanity check my own work. There was a lot of rope, but I was the only one in the room so I made sure not to throw it over the rafters, wrap it around my neck, climb on a chair, and then try to figure out what I had done wrong. If I did, I had only myself to blame.

When I'm working with a team of fifty developers of varying levels of experience, discipline, knowledge of good design and testability, and interest in their craft, the game changes. You can't bank on each member of the team having the same skill set or even interest in what you're trying to get done. Or, maybe you can, simply by applying very strict guidelines. The fact of the matter is that it's not as cut and dry either side wants you to believe. You're going to have code-cowboys who are clever - and beware that word for it means smart, but with a hint of tricky and subversiveness - who are going to do things like override the built in functions to given them cool new uses never previously imagined! You're going to have recent university grads who need mentoring, even if only half of them know it. You're going to have your average, mediocre, developer for whom this is a nine to five gig funding an ever more expensive pot habit. And, if you're lucky, you'll have one or two super star, ninja coders, who turn out reliable, efficient, readable, documented, testable, well designed, code. (As an aside, if you're one of those people, have one, or know one, I'm looking for resumes.)

My point: do not optimize for a group you do not have.

Lying to yourself will only get you knee deep in your own rationalizations about why your language is, in fact, the best one on the planet, and how the other guys just don't get it. The worst part is you'll still be right every single time.