Monday, October 29, 2007

Time Keeping and Scheduling Systems

The following is probably old hat. For me, today, it's worth rehashing.

Recently, I've had to work on a rather feature rich scheduling system. This is one of those tasks that always seems trivial the first time you need to do it, but really isn't. In many, many, many cases, the existing frameworks can't bail you out of the details. Sure, plenty of frameworks and libraries do things like timezone conversion and the like; that's helpful (if not required), but not the complete package.

I was working on recurrent schedule generation with a coworker, GR. Whilst working out the scary details of operating in multiple client timezones, being locale sensitive, guaranteeing non-duplicate job execution, he turned to me with a stunning revelation (as he's known to do, often), and said

You know, this is one of those things that has been done a thousand times, correctly, but none of those implementations were ever right.

Obviously, what he meant is that you can't get this right because it's so heavily dependent on the context within one is operating. It depends on desirable locale effects, granularity of event intervals, and similar concerns. In one case, it may not matter if an event fires twice, once a year, during the day light savings time adjustment; task idempotency may be guaranteed, whereas in other cases, this isn't true (like for me). Maybe you have the have luxury of operating within a single timezone (lucky you). All of these things affect the feature set and can really change the game.

All of the solutions I have (personally) encountered for task scheduling miss the mark, in my current case, in one way or another. It's frustrating to have to build something that seems as if it should already exist. Of course, there are internal, business requirements that prevent me from either going into significant detail, or from my open sourcing the result, but trust me when I say this isn't as easy as it sounds. (I say this in hope of staving off the slew of well, why not just use XYZ comments I'm bound to get.)

If all of this means nothing to you, take away only this: sometimes, being forced to reinvent the wheel gives one an amazing appreciation for the work that went into the damn thing.

Many thanks to my ninja-smart coworker, GR, as well as my apologies for my awful paraphrasing of his insightful, comedic wit, and view of the task at hand.

Monday, October 22, 2007

A Quick Note on BlogRush

When I first created this blog (ugh... I hate that word) I played around with different things in a feeble attempt to get some traffic. One of those things was BlogRush. I won't provide a link or more information other than they claim to drive traffic based on... oh who cares. It didn't work, anyway.

What's pretty damn awesome is that, in addition to not working, their site was non-functional for some time. Oops. But - and this is the icing on the cake - today, I received this email from them (edited for brevity).

We regret to inform you that your BlogRush Account has been made INACTIVE because your blog did not pass our Quality Review criteria. You will find instructions below for making your account active again.

Wait. It gets better.

We determined that your blog did not meet our strict quality guidelines. Please do not take this personally but realize that we must abide by a very strict set of quality guidelines. (They are listed below.)

Note that the content syndicated via BlogRush was so unbelievably off topic, it wasn't funny. Here's the great part, though; their standards.

- The blog contains unique, quality content that provides opinions, insights, and/or recommended resources that provide value to readers of the blog. Articles, videos, public domain works, press releases, and content written by others are okay to be used on the blog, but the ratio of unique content should far outweigh content from other sources.

Ok, so I have opinions and I'm not a link farm. Check. Admittedly, I can't quantify my thoughts or opinions as quality content, but...

- The blog should be updated on a regular basis (at least several times a month) and should not just go a few months between posts.

Check.

- The blog should already contain at least 10-12 quality posts. New blogs with very little content will not be accepted.

New? Yes, but I meet their 10 - 12 posts. Check.

- The blog's primary contain must be in English. BlogRush is currently not available for non-English blogs.

Again, admittedly, my writing leaves a lot to be desired, but it's (mostly considered) English. Check.

- The blog should not contain an excessive amount of advertising and links and very little actual content. The focus of the blog should be quality content.

This is not without a certain degree of irony, but OK. Check.

- The primary content of the blog should not be "scraped" content from other sources and/or script-generated pages for the sole purpose of search engine rank manipulation. The focus of the blog should be quality content.

My writing is so terrible I would never presume to pin it on someone else. It's mine. The focus is certainly quality content (minus this post, oddly enough) - whether or not I achieve said goal is an exercise best left to the reader. Well, OK... Check.

- The blog's content (or advertising) should not contain any of the following types of content: hate, anti-racial, terrorism, drug-related, hacking, phishing, fraud, pornographic, nudity, warez, gambling, copyright infringement, obscene or disgusting material of any kind, or anything considered illegal.

Uh... I guess it depends on how one defines hacking, but otherwise, I think we're safe on this one as well. As an aside, someone will have to explain what anti-racial means; I do not endorse any kind of racism, but do I no longer even have a race? That's... weird. Just the phrase anti-racial hurts my head. Dear BlogRush lawyers; racial != racism and you probably do want to be anti-racism. You confuse me.

So there you have it. BlogRush has deemed this content to be of low quality or anti-racial and I have removed the BlogRush widget. I'm actually kind of feeling better already; it had From the Blogosphere on it and that just makes me... cringe.

I urge others drop BlogRush, as well.

Sunday, October 21, 2007

An OSGi Experience

Last Wednesday, I had a chance to attend the local Java Special Interest Group meeting at the Google Engineering building, here in New York City. This month's topic was the OSGi standard and the technology behind it. The presenters were Dr. Richard Nicholson and David Savage from Paremus.

Unfortunately, while the slides were good and the presenters were clearly very smart, their presentation style left quite a bit to be desired. (Hint: Know your slides, know your audience, don't read from the slides, etc. Credit to my very smart coworker, JF, for those helpful suggestions.)

If you're not familiar with OSGi (which I wasn't, really, until recently), it is basically a framework for structuring applications as loadable bundles which can be dynamically managed. Of course, that's a simplification; I suggest reading the OSGi group's explanation for a better understanding. One person at the meeting who sits on the OSGi board from a major web app server company, with whom I had a chance to talk prior to the presentation, summed it up as class loaders on steroids.

During the presentation, Dr. Nicholson discussed some of the failings of service oriented architectures and how it hadn't really lived up to the hype. Probably true, but what does? It made me wonder how one can judge something like that, but I suppose (and according to my presentation saavy cohort, JF) making a controversial statement during a presentation keeps people paying attention. It worked on me, to be sure.

OSGi, in theory, allows one to build application bundles as independent components and wire them together at runtime. I'm still shady on some of the details around how that wiring takes place, but it's certainly comparable to the idea of software services. As I understood it, OSGi is local, within a JVM. If you want to extend the concept to encompass distributed applications, you need another layer on top of this. During the presentation, Dr. Nicholson demonstrated a product from Paremus called InfiniFlow which he described as an Enterprise Service Fabric. Names aside (because that's a little, shall we say, marketing inspired?), he used 12 Apple Mac Minis running the InfiniFlow agent and his Mac Book Pro to dynamically build a fractal rendering farm. Not too shabby, Doc.

With this, he loaded a number of OSGi bundles into (presumably) some InfiniFlow management application, deployed it to all 12 machines (getting instant feedback the entire time), and showed us that the network (or fabric) was set to render a Mandelbrot set. Kicking off the rendering showed that each Mac Mini was performing calculations and sending the results back to the laptop where pieces of the image were being drawn. That was neat.

What was more impressive is that he, next, loaded a new OSGi bundle and almost immediately, the client had new options for the Julian set. Starting the rendering process showed that the 12 nodes now supported and were executing a new function with no down time, no restarts, and no administrative work (other than the initial loading of the bundles).

The idea behind OSGi is very cool. Dr. Nicholson and Mr. Savage told us that some projects such as Eclipse already use OSGi functionality and that Apache is releasing an OSGi R4 implementation as open source in the form of Felix. Additionally, they said that they have plans to try and push OSGi for other languages, specifically mentioning PHP and others (sorry, don't remember all the details). This could be nice to see, especially in environments where multiple languages are supported in one runtime (like with language independent VMs like Parrot) such that Java OSGi bundles could be loaded in Perl or Ruby. This is where I'd mention the CLR if I were a Windows guy.

Who knows if OSGi will gain the kind of traction they're hoping for, but it does have some major players involved including Eclipse, Apache, BEA, IBM, Interface21 (Spring guys), SAP AG, and many others. Either way, the ideas behind this technology are really exciting, from an architectural point of view for obvious reasons.

As an aside, it's worth considering the rather yucky learning curve to understanding how these bundles are wired together. With systems such as OSGi (or anything like it), it can easily become a debugging nightmare and a maze of abstraction. I'd be remiss if I didn't at least make mention of this aspect. You need smart tools, smart people, and a very clear understanding of this kind of technology before deciding to go down this road. Here's to hoping these guys do it right (and keep it as open source)!

Tuesday, October 9, 2007

Strategy and Decorator - A comparison (Part 2)

In my last entry, I talked a bit about the Decorator design pattern and how one can use it to extend the functionality of a class. In part two of my look at the Decorator and Strategy design patterns, I'll take a similar look at the Strategy pattern. (Go figure.)

The Strategy pattern, like Decorator, lets one vary part of a class's functionality dynamically, but in a different way (and for a different purpose). The goal with Strategy is to define an interface for a family of algorithms and allow them to be dynamically substituted. It does this in a very similar way as Decorator, at some level, in that all strategy implementations must conform to an interface that is known to the caller. The important difference is that rather than one object wrapping another up, from a structural point of view, the caller (sometimes called the Context as it is in the Wikipedia article), delegates a single method (usually) to the concrete strategy implementation. That's probably a bad explanation. Let's look at an example.

public interface SortStrategy {
  public String[] sort(String[] items);
}

public class QuickSortStrategy implements SortStrategy {
  public String[] sort(String[] items) {
    // ...
  }
}

public class MyApplication {
  private SortStrategy sortStrategy;

  public void main(String[] args) {
    MyApplication app = new MyApplication();

    app.setSortStrategy(new QuickSortStrategy());

    app.arrayPrinter("Sorted args: " + app.sort(args));
  }

  // Getters, setters, arrayPrinter(), etc...
}

Of course, it's not very interesting when we have just one concrete strategy implementation like in our example, but you can easily see how you can pick a class based on external information. I use sorting as an example because it's dead simple to understand why one would want to vary the algorithm based on the data.

What's really interesting about this is when we start combining a pattern like Strategy with, for instance, generics. Now, you have a situation where you can vary algorithms and types; that's reusability.

When people talk about Strategy, then tend to talk about vary an algorithm, but it's also interesting to consider things that aren't necessarily algorithms. For instance, Strategy is an option when talking about varying serialization methods, network transports, and so forth. If there is a case where you want to vary part of an object's behavior at runtime, but not necessarily all of it (or can't because of other inheritance requirements), Strategy can really bail you out. In short, you're delegating some behavior to a dynamically selected class that is guaranteed to conform to a known interface for implementing said behavior; that's Strategy.

While this hasn't been a direct A / B comparison of Decorator and Strategy, hopefully a better understanding of these two patterns will help you as it has me.

As always, comments are very welcome!

Friday, October 5, 2007

Strategy and Decorator - A comparison (Part 1)

Just the other day, while working on a project, I had the opportunity to see first hand how the Strategy and Decorator patterns handle similar cases, but with very different results and methods. To summarize (and my apologies to the GoF and all other authorities on such subjects), the Decorator wraps an object (creating a has-a relationship) and extends functionality in one of a few ways. The Strategy pattern, on the other hand, allows one to swap out of the guts of an object, usually the implementation of a specific method or algorithm, for another, dynamically.

In this entry, I'll take a look at the Decorator pattern, an example, and what makes it cool.

The Decorator is neat because you can simply nest decorators infinitely, delegating methods that aren't interesting to extend to the inner object. All decorators must implement the interface of the object they decorate so that users of decorated objects don't notice a difference and find all the methods they expect. If this looks a lot like inheritance, that's because it is (albeit very loosely coupled inheritance). Clearly there are upsides and downsides to the Decorator pattern - classes become loosely coupled (we like that), but sometimes you can wind up with a system with lots of little objects that are hard to debug (we don't like that). Different languages can accomplish this pattern in different ways, but an obvious method is to do something like the following.

/*
 * The interface our objects must implement.
 */
public interface GenericValidator {
  public void validate(Object value) throws InvalidValueException;
}

/*
 * A string validator
 */
public class StringValidator implements GenericValidator {
  public void validate(Object value) throws InvalidValueException {
    // ...
  }
}

/*
 * A length validator. This assumes that the value to validate
 * also conforms to some kind of interface. In this case, we guess
 * that it supports the method getLength(). This, of course, won't
 * actually work because the Object class doesn't support getLength()
 * and we don't do any casting (nor do we use generics which would be
 * even better).
 */
public class LengthValidator implements GenericValidator {
  private GenericValidator validator;
  private int maxLength;

  public void LengthValidator(GenericValidator validator) {
    this.validator = validator;
    maxLength = 50;
  }

  public void validate(Object value) throws InvalidValueException {
    if (value.getLength() > maxLength) {
      throw new InvalidValueException();
    }

    /* Invoke another validator if one exists */
    if (validator != null)
      validator.validate(value);
  }
}

In our example above, we have an interface that all validators must support - GenericValidator. This effectively gives us the same guarantee that inheritance does; that all objects will provide these methods. The difference here is that the StringValidator and LengthValidator are not related, at all, hierarchically. In fact, the LengthValidator could wrap up ArrayValidator to confirm that it fits within some given size constraints. This is where decorators get interesting.

The other point of note in this example is that of the value argument to the validate() method. This has nothing to do with the Decorator pattern, but it illustrates how one can write a generic utility to handle arbitrary types. Our example is fundamentally flawed (read: it doesn't really work) but that's due to my laziness and desire to not clutter the example with Java's generics which some non-Java / non-C++ folks may not be able to read. Let's pretend it does work. You get the point. In languages that favor the idea of so called duck typing, like Ruby, this kind of code not only works, but is desirable.

In practice, it is usually obvious when it makes sense to use Decorator rather than inheritance, for instance. Usually, there are a number of factors such as whether the relationships are dynamic or static, whether all or most of the combinations make sense, and if the number of subclass combinations would create an explosion of classes. It also depends on the features and behaviors of your language and what developers in that camp are prone to doing (i.e. best practices for you and them).

Next time, we'll take a similar look at the Strategy pattern. As always, I'm interested in comments.

Wednesday, October 3, 2007

Complexity, FUD, and Evolution

Code complexity is one of those topics that kind of makes me cringe when I think about it. Not necessarily because I fear complex code or that I like it - that's too easy to talk about - but more so because it's an easy way to write something off if you don't understand it.

I've come across a lot of people who, when encountered with something that they haven't seen or haven't fully explored, they slap the complicated sticker on it and keep moving. Think about it; it's not that uncommon. When you first look at something that is, in some way, different than what you're used to and you have very little transferable knowledge, it's easy to get discouraged and want to fall back to known territory. I do that all the time. Sometimes, I catch myself, but other times, I fall victim to my own lack of understanding.

For example, the first time I worked with Oracle (I mean really worked with it as a DBA), I was completely baffled by the way certain things worked. Everything seemed so convoluted. After time, most of it made sense, but I had to invest the time and energy in learning about it before it seemed, at all, tractable.[1]

Commonly, I see this kind of FUD (because that's exactly what it is - fear, uncertainty, and doubt) when I discuss things like design patterns and system architecture with people that don't have a lot of experience or who haven't invested a lot of time expanding their view of these subjects. One case that comes to mind (and that is short enough to explain here) is when I explained what the Singleton pattern was to someone and why it made sense in the particular context. The developer looked at me like I hadn't written a lick of code in my life.

Why do I need to write a method to provide access to the static variable when I can just make it public? This single-whatever-thing just bloats the code and slows things down - accessing a public static variable is faster than calling a method.

Honestly, I didn't know what to say. I explained things like encapsulation and centralized construction. I talked about all the things I've never had to actually argue in favor of (this was some time ago; I've since had this exact and many similar discussions) but it ultimately fell on... well, I'm not sure; just normal ears, I guess.

The person (and other people, since) that made this argument wasn't / weren't dumb, nor were they inexperienced. What they were was overly conditioned to think about things in such a lopsided way they had closed themselves off to learning about anything deviating from the conventional theory they had developed. And, the entire development team would lose out, dealing with code that was slightly, but not catastrophically, harder to maintain.

This mentality is contradictory to what we do. What we do is not static. Programming practices evolve and, hopefully, we keep the good stuff while incorporating new ideas and technologies. I'll eat my own dog food on this one, too; if we find things better than agile development, software design patterns, and the like, I, too, will integrate those ideas. It will happen. The above example (i.e. the infamous singleton story) is indicative of dogma, more so than a science (with my apologies to the mathematicians).

[1] - Note that I do believe that, while I understand why Oracle does a lot of the things it does, I don't believe it necessarily does those things in the best (easiest to understand, most useful, etc.) ways. I say that having worked with Oracle in a development, DBA, and system administration capacity from versions 7.x through 10g, inclusive. Your mileage may vary.

Monday, October 1, 2007

Violating My Space - Client Integration

If you work in an environment where you have to integrate with external systems, such as clients or vendors, there's a significant chance you've felt the subtle encroachment. In terms of system architecture, this is a kind of violation (albeit necessary) of one's personal space. If you're anything like me, you're somewhat protective of the systems you work on. You care about making compensations for external entities.

Even as I write this, I can't help but think it's a coupling problem; they're too close. The considerations of external systems belongs in a highly quarantined detention facility, far from the core business logic of one's code. Let's call this detention facility the client integration end point.

Working with any number of external entities (to which are you beholden to as a service provider) means that you have to make compromises. In fact, it seems as if most of the compromises are always on our end. I mean, after all, the client is footing the bill and sometimes, no matter how hard you try, they're set on doing things their way. Let's take an example.

You have a system that deals with data. Your clients are going to send you data and you need to import it (whatever that means). You store data in UTF-8, internally. One client like ISO-8859-1, another likes ISO-2022-JP, and so on. Moreover, one client prefers SFTP file delivery, another is interested in making API calls and controlling things from their end. You, on the other hand, just put in your two weeks notice.

This isn't entirely out of the question, actually (surely most of you know that). You need to accommodate all of these different entities without loss of functionality in any of the given channels. You also know you could have N clients, where N is a really big number (because clearly, you're good at what you do). You need to work out efficiency issues and handle data loads of arbitrary size. It's no picnic.

I suppose the point of the example is that, even in such a contrived example, you need to handle:

  • Varying transport mechanisms
  • Configurable data recoding
  • Different granularities of data updates (record level, data set level, etc.)
We didn't even talk about the format of files or APIs.

Sure, there are lots of standards around each one of these things, but I can tell you from experience everyone interprets them differently. Clients all use disparate platforms, libraries, and technologies that are out of their control. Sometimes, they, themselves, don't even know what their platform of choice is doing! Trust me. You can't tell them sorry, we just don't support that because there's money - a lot of it - at stake. They've already been sold on you and your technology and now it's time to put up (because shut up isn't an option anymore).

The worst part is that even if you're savey - you have your SOA enabled enterprise thing-a-joobers all wizzing and buzzing around, like they do - it might not make a damn bit of difference; you've still got the connector portion to define, which winds up being client-specific. No matter what, that's a lot of effort to be spent just so you can start doing business together.

There's only so much you can abstract away. Ultimately, if you receive format A and you want format B, you've got some work to do. The point is that it can be unpleasant. Find a good book, let it sink in, know what your clients want to do, and find what works for you.

I'm really interested in hearing from anyone who has significant client integration experience and learning more about the techniques other people use. Drop me a note.