Friday, June 12, 2009

Why OSGi needs to come of age

The team I'm lucky enough to be part of is currently working on a project with GridGain, which I've mentioned before, once or twice. In our case, we are invoking map / reduce jobs via RESTful web services. It works well. The problems start cropping up with the way classes are loaded (or not) inside Tomcat.

GridGain is (in our case), effectively, deployed in a war file. This is because we want to communicate with a layer around GridGain via HTTP (and Spring MVC, specifically). In the interest of time and simplicity, we chose to start up the grid from within Tomcat, although, a cleaner design would be to queue tasks from within Tomcat, run our GridGain app as a daemon, and dequeue tasks to be executed there (maybe using JMS or something similar). It's a lot of jumping through hoops, really, because at that point, you may as well speak JMS directly to the GridGain-enabled daemon. Too much bloat.

Most of the complexity is really introduced simply because of the environment required to receive HTTP requests. Really, when you think about it, we've had to - simply to support HTTP invocation - change how we build, package, deploy, start and stop services, and deal with 3rd party dependencies. Worst of all, Tomcat has to deal with the servlet specification and how class loading works, therein. As it turns out, GridGain plus Tomcat equals yuck.

Now, I won't really go into specifics (in this post) about my thoughts on class loading in Java, peer class loading with GridGain, and the myriad of issues we've seen and run into. What I do want to touch on is the more general theme that, even after all of this time, this stuff (i.e. static and dynamic class loading in containers) is still way too thick, intrusive, delicate, under-documented, and buggy. We need evolution, if not revolution.

I dropped a war file containing our code, and approximately 100+ jars worth of dependencies to make Spring, Hibernate, AspectJ, and GridGain happy. One hundred, plus. Smells like disaster already. GridGain starts up, Spring MVC does its thing, life looks good. I make a request to a controller that invokes a task via GridGain and things get interesting.

The task we want to start is looked up in a Spring config, and instantiated using reflection. It makes it easy to add new types of tasks without mucking around with the core grid assembly and packaging, but it creates an interesting case. The jar(s) containing the grid tasks need to now live in the war file. The alternative seemed to be storing them in a directory accessible to the Tomcat shared class loader, but during testing, we found that Tomcat couldn't handle this (for some reason that is still admittedly a little unclear to me). The class simply wasn't found. The class loader hierarchy is modified in Tomcat due to the servlet spec so the order should be bootstrap, system (JDK), the war classes and internal libs, then the common and shared class loaders (if I remember correctly). This means that when we move our grid tasks to the shared class loader, they would come after the web app class loader. Normally this shouldn't be a problem, but I think the issue we ran into was related to the fact that the grid task extended a class that was present in the war's class loader, not in the shared class loader. When we move the grid task jar into WEB-INF/lib, it magically starts working. Sigh.

I'm a fan of OSGi. It (mostly) makes sense to me. It's explicit (painfully so, in many cases) and direct. My feeling is that, in a case like we have here, at least it would be obvious that class loaders needed to be wired together to make this work. The main grid application could have fragments containing the grid tasks, for instance, to push additional task plugins into the core application, making the classes available. Alternatively, a more direct way would be to simply list the packages as dependencies in the manifest of the core application. This would obviously couple the core app to the tasks, which isn't very nice.

The GridGain team, unfortunately, hasn't embraced OSGi. In fact, Ivanov seems to summarily disregard it in the comments of this post. I was a bit let down to see this kind of hard-line stance on a subject that more and more people seem to be interested in, especially with the SpringSource folks driving at it. While Nikita makes an excellent point about distributed class loading and OSGi being a less than nice match, I do think there's room for GridGain with peer class loading disabled in an OSGi container. For some of us who don't care about the hassle of deploying grid tasks (and GridJob implementations, specifically) on all nodes, at least having the option of GridGain in an OSGi container like Spring dm would be nice. It's possible that this can be done, but when I tried, things didn't go well. There's no doubt that Ivanov and the GridGain team have far more experience with the class loading details than I do, but the stance taken does still let me down, at least not without a better explanation and documentation.

OSGi isn't perfect. It's far from it, obviously. What it does begin to chip away at, though, is the big fat bundle-it-all-in-one-giant-file issue that plagues Java-land. Honestly, Java is thick and we have to change that. OSGi is one way to cut it down to a reasonable footprint while still allowing for decoupling and service-like component design. Sure, permgen errors are a dirty word and the class loading mechanics may be too simplistic but it scares me to think we'd be stuck in war-land for the rest of the foreseeable future.

2 comments:

Nikita Ivanov said...

Hi There,
Just wanted to clarify my point on OSGi. The idea I was talking about is that OSGi is predominately a local in-JVM technology (similar to our SPI-based architecture).

Yes - you can have distributed OSGi (so to speak) a-la Newton, etc. - but let me ask you a question: what specifically are you trying to achieve by having OSGi internally in GridGain? What feature will it enable? Will it make anything significantly simpler, faster, more productive to use?

Nikita.

E. Sammer said...

Nikita:

I should clarify what I'd like to do. In this case, I would not like to have OSGi "internally" in GridGain. I'd like to deploy a GridGain app in an OSGi environment (Spring dm).

The reason why is because I'd like my grid jobs to be able to make use of OSGi services, but also because I'd like to have the class isolation that OSGi provides to allow for things like concurrent versions of the same class.

I don't really have an opinion on the GridGain internals; I think GridGain should do internally whatever makes sense. I only want to be able to run GridGain within an OSGi environment (which didn't seem to go well).

I really appreciate your feedback on this. I think I misunderstood what you meant about OSGi. I'm just talking about using GridGain in such an environment.

Thanks!