Monday, October 29, 2007

Time Keeping and Scheduling Systems

The following is probably old hat. For me, today, it's worth rehashing.

Recently, I've had to work on a rather feature rich scheduling system. This is one of those tasks that always seems trivial the first time you need to do it, but really isn't. In many, many, many cases, the existing frameworks can't bail you out of the details. Sure, plenty of frameworks and libraries do things like timezone conversion and the like; that's helpful (if not required), but not the complete package.

I was working on recurrent schedule generation with a coworker, GR. Whilst working out the scary details of operating in multiple client timezones, being locale sensitive, guaranteeing non-duplicate job execution, he turned to me with a stunning revelation (as he's known to do, often), and said

You know, this is one of those things that has been done a thousand times, correctly, but none of those implementations were ever right.

Obviously, what he meant is that you can't get this right because it's so heavily dependent on the context within one is operating. It depends on desirable locale effects, granularity of event intervals, and similar concerns. In one case, it may not matter if an event fires twice, once a year, during the day light savings time adjustment; task idempotency may be guaranteed, whereas in other cases, this isn't true (like for me). Maybe you have the have luxury of operating within a single timezone (lucky you). All of these things affect the feature set and can really change the game.

All of the solutions I have (personally) encountered for task scheduling miss the mark, in my current case, in one way or another. It's frustrating to have to build something that seems as if it should already exist. Of course, there are internal, business requirements that prevent me from either going into significant detail, or from my open sourcing the result, but trust me when I say this isn't as easy as it sounds. (I say this in hope of staving off the slew of well, why not just use XYZ comments I'm bound to get.)

If all of this means nothing to you, take away only this: sometimes, being forced to reinvent the wheel gives one an amazing appreciation for the work that went into the damn thing.

Many thanks to my ninja-smart coworker, GR, as well as my apologies for my awful paraphrasing of his insightful, comedic wit, and view of the task at hand.

Monday, October 22, 2007

A Quick Note on BlogRush

When I first created this blog (ugh... I hate that word) I played around with different things in a feeble attempt to get some traffic. One of those things was BlogRush. I won't provide a link or more information other than they claim to drive traffic based on... oh who cares. It didn't work, anyway.

What's pretty damn awesome is that, in addition to not working, their site was non-functional for some time. Oops. But - and this is the icing on the cake - today, I received this email from them (edited for brevity).

We regret to inform you that your BlogRush Account has been made INACTIVE because your blog did not pass our Quality Review criteria. You will find instructions below for making your account active again.

Wait. It gets better.

We determined that your blog did not meet our strict quality guidelines. Please do not take this personally but realize that we must abide by a very strict set of quality guidelines. (They are listed below.)

Note that the content syndicated via BlogRush was so unbelievably off topic, it wasn't funny. Here's the great part, though; their standards.

- The blog contains unique, quality content that provides opinions, insights, and/or recommended resources that provide value to readers of the blog. Articles, videos, public domain works, press releases, and content written by others are okay to be used on the blog, but the ratio of unique content should far outweigh content from other sources.

Ok, so I have opinions and I'm not a link farm. Check. Admittedly, I can't quantify my thoughts or opinions as quality content, but...

- The blog should be updated on a regular basis (at least several times a month) and should not just go a few months between posts.

Check.

- The blog should already contain at least 10-12 quality posts. New blogs with very little content will not be accepted.

New? Yes, but I meet their 10 - 12 posts. Check.

- The blog's primary contain must be in English. BlogRush is currently not available for non-English blogs.

Again, admittedly, my writing leaves a lot to be desired, but it's (mostly considered) English. Check.

- The blog should not contain an excessive amount of advertising and links and very little actual content. The focus of the blog should be quality content.

This is not without a certain degree of irony, but OK. Check.

- The primary content of the blog should not be "scraped" content from other sources and/or script-generated pages for the sole purpose of search engine rank manipulation. The focus of the blog should be quality content.

My writing is so terrible I would never presume to pin it on someone else. It's mine. The focus is certainly quality content (minus this post, oddly enough) - whether or not I achieve said goal is an exercise best left to the reader. Well, OK... Check.

- The blog's content (or advertising) should not contain any of the following types of content: hate, anti-racial, terrorism, drug-related, hacking, phishing, fraud, pornographic, nudity, warez, gambling, copyright infringement, obscene or disgusting material of any kind, or anything considered illegal.

Uh... I guess it depends on how one defines hacking, but otherwise, I think we're safe on this one as well. As an aside, someone will have to explain what anti-racial means; I do not endorse any kind of racism, but do I no longer even have a race? That's... weird. Just the phrase anti-racial hurts my head. Dear BlogRush lawyers; racial != racism and you probably do want to be anti-racism. You confuse me.

So there you have it. BlogRush has deemed this content to be of low quality or anti-racial and I have removed the BlogRush widget. I'm actually kind of feeling better already; it had From the Blogosphere on it and that just makes me... cringe.

I urge others drop BlogRush, as well.

Sunday, October 21, 2007

An OSGi Experience

Last Wednesday, I had a chance to attend the local Java Special Interest Group meeting at the Google Engineering building, here in New York City. This month's topic was the OSGi standard and the technology behind it. The presenters were Dr. Richard Nicholson and David Savage from Paremus.

Unfortunately, while the slides were good and the presenters were clearly very smart, their presentation style left quite a bit to be desired. (Hint: Know your slides, know your audience, don't read from the slides, etc. Credit to my very smart coworker, JF, for those helpful suggestions.)

If you're not familiar with OSGi (which I wasn't, really, until recently), it is basically a framework for structuring applications as loadable bundles which can be dynamically managed. Of course, that's a simplification; I suggest reading the OSGi group's explanation for a better understanding. One person at the meeting who sits on the OSGi board from a major web app server company, with whom I had a chance to talk prior to the presentation, summed it up as class loaders on steroids.

During the presentation, Dr. Nicholson discussed some of the failings of service oriented architectures and how it hadn't really lived up to the hype. Probably true, but what does? It made me wonder how one can judge something like that, but I suppose (and according to my presentation saavy cohort, JF) making a controversial statement during a presentation keeps people paying attention. It worked on me, to be sure.

OSGi, in theory, allows one to build application bundles as independent components and wire them together at runtime. I'm still shady on some of the details around how that wiring takes place, but it's certainly comparable to the idea of software services. As I understood it, OSGi is local, within a JVM. If you want to extend the concept to encompass distributed applications, you need another layer on top of this. During the presentation, Dr. Nicholson demonstrated a product from Paremus called InfiniFlow which he described as an Enterprise Service Fabric. Names aside (because that's a little, shall we say, marketing inspired?), he used 12 Apple Mac Minis running the InfiniFlow agent and his Mac Book Pro to dynamically build a fractal rendering farm. Not too shabby, Doc.

With this, he loaded a number of OSGi bundles into (presumably) some InfiniFlow management application, deployed it to all 12 machines (getting instant feedback the entire time), and showed us that the network (or fabric) was set to render a Mandelbrot set. Kicking off the rendering showed that each Mac Mini was performing calculations and sending the results back to the laptop where pieces of the image were being drawn. That was neat.

What was more impressive is that he, next, loaded a new OSGi bundle and almost immediately, the client had new options for the Julian set. Starting the rendering process showed that the 12 nodes now supported and were executing a new function with no down time, no restarts, and no administrative work (other than the initial loading of the bundles).

The idea behind OSGi is very cool. Dr. Nicholson and Mr. Savage told us that some projects such as Eclipse already use OSGi functionality and that Apache is releasing an OSGi R4 implementation as open source in the form of Felix. Additionally, they said that they have plans to try and push OSGi for other languages, specifically mentioning PHP and others (sorry, don't remember all the details). This could be nice to see, especially in environments where multiple languages are supported in one runtime (like with language independent VMs like Parrot) such that Java OSGi bundles could be loaded in Perl or Ruby. This is where I'd mention the CLR if I were a Windows guy.

Who knows if OSGi will gain the kind of traction they're hoping for, but it does have some major players involved including Eclipse, Apache, BEA, IBM, Interface21 (Spring guys), SAP AG, and many others. Either way, the ideas behind this technology are really exciting, from an architectural point of view for obvious reasons.

As an aside, it's worth considering the rather yucky learning curve to understanding how these bundles are wired together. With systems such as OSGi (or anything like it), it can easily become a debugging nightmare and a maze of abstraction. I'd be remiss if I didn't at least make mention of this aspect. You need smart tools, smart people, and a very clear understanding of this kind of technology before deciding to go down this road. Here's to hoping these guys do it right (and keep it as open source)!

Tuesday, October 9, 2007

Strategy and Decorator - A comparison (Part 2)

In my last entry, I talked a bit about the Decorator design pattern and how one can use it to extend the functionality of a class. In part two of my look at the Decorator and Strategy design patterns, I'll take a similar look at the Strategy pattern. (Go figure.)

The Strategy pattern, like Decorator, lets one vary part of a class's functionality dynamically, but in a different way (and for a different purpose). The goal with Strategy is to define an interface for a family of algorithms and allow them to be dynamically substituted. It does this in a very similar way as Decorator, at some level, in that all strategy implementations must conform to an interface that is known to the caller. The important difference is that rather than one object wrapping another up, from a structural point of view, the caller (sometimes called the Context as it is in the Wikipedia article), delegates a single method (usually) to the concrete strategy implementation. That's probably a bad explanation. Let's look at an example.

public interface SortStrategy {
  public String[] sort(String[] items);
}

public class QuickSortStrategy implements SortStrategy {
  public String[] sort(String[] items) {
    // ...
  }
}

public class MyApplication {
  private SortStrategy sortStrategy;

  public void main(String[] args) {
    MyApplication app = new MyApplication();

    app.setSortStrategy(new QuickSortStrategy());

    app.arrayPrinter("Sorted args: " + app.sort(args));
  }

  // Getters, setters, arrayPrinter(), etc...
}

Of course, it's not very interesting when we have just one concrete strategy implementation like in our example, but you can easily see how you can pick a class based on external information. I use sorting as an example because it's dead simple to understand why one would want to vary the algorithm based on the data.

What's really interesting about this is when we start combining a pattern like Strategy with, for instance, generics. Now, you have a situation where you can vary algorithms and types; that's reusability.

When people talk about Strategy, then tend to talk about vary an algorithm, but it's also interesting to consider things that aren't necessarily algorithms. For instance, Strategy is an option when talking about varying serialization methods, network transports, and so forth. If there is a case where you want to vary part of an object's behavior at runtime, but not necessarily all of it (or can't because of other inheritance requirements), Strategy can really bail you out. In short, you're delegating some behavior to a dynamically selected class that is guaranteed to conform to a known interface for implementing said behavior; that's Strategy.

While this hasn't been a direct A / B comparison of Decorator and Strategy, hopefully a better understanding of these two patterns will help you as it has me.

As always, comments are very welcome!

Friday, October 5, 2007

Strategy and Decorator - A comparison (Part 1)

Just the other day, while working on a project, I had the opportunity to see first hand how the Strategy and Decorator patterns handle similar cases, but with very different results and methods. To summarize (and my apologies to the GoF and all other authorities on such subjects), the Decorator wraps an object (creating a has-a relationship) and extends functionality in one of a few ways. The Strategy pattern, on the other hand, allows one to swap out of the guts of an object, usually the implementation of a specific method or algorithm, for another, dynamically.

In this entry, I'll take a look at the Decorator pattern, an example, and what makes it cool.

The Decorator is neat because you can simply nest decorators infinitely, delegating methods that aren't interesting to extend to the inner object. All decorators must implement the interface of the object they decorate so that users of decorated objects don't notice a difference and find all the methods they expect. If this looks a lot like inheritance, that's because it is (albeit very loosely coupled inheritance). Clearly there are upsides and downsides to the Decorator pattern - classes become loosely coupled (we like that), but sometimes you can wind up with a system with lots of little objects that are hard to debug (we don't like that). Different languages can accomplish this pattern in different ways, but an obvious method is to do something like the following.

/*
 * The interface our objects must implement.
 */
public interface GenericValidator {
  public void validate(Object value) throws InvalidValueException;
}

/*
 * A string validator
 */
public class StringValidator implements GenericValidator {
  public void validate(Object value) throws InvalidValueException {
    // ...
  }
}

/*
 * A length validator. This assumes that the value to validate
 * also conforms to some kind of interface. In this case, we guess
 * that it supports the method getLength(). This, of course, won't
 * actually work because the Object class doesn't support getLength()
 * and we don't do any casting (nor do we use generics which would be
 * even better).
 */
public class LengthValidator implements GenericValidator {
  private GenericValidator validator;
  private int maxLength;

  public void LengthValidator(GenericValidator validator) {
    this.validator = validator;
    maxLength = 50;
  }

  public void validate(Object value) throws InvalidValueException {
    if (value.getLength() > maxLength) {
      throw new InvalidValueException();
    }

    /* Invoke another validator if one exists */
    if (validator != null)
      validator.validate(value);
  }
}

In our example above, we have an interface that all validators must support - GenericValidator. This effectively gives us the same guarantee that inheritance does; that all objects will provide these methods. The difference here is that the StringValidator and LengthValidator are not related, at all, hierarchically. In fact, the LengthValidator could wrap up ArrayValidator to confirm that it fits within some given size constraints. This is where decorators get interesting.

The other point of note in this example is that of the value argument to the validate() method. This has nothing to do with the Decorator pattern, but it illustrates how one can write a generic utility to handle arbitrary types. Our example is fundamentally flawed (read: it doesn't really work) but that's due to my laziness and desire to not clutter the example with Java's generics which some non-Java / non-C++ folks may not be able to read. Let's pretend it does work. You get the point. In languages that favor the idea of so called duck typing, like Ruby, this kind of code not only works, but is desirable.

In practice, it is usually obvious when it makes sense to use Decorator rather than inheritance, for instance. Usually, there are a number of factors such as whether the relationships are dynamic or static, whether all or most of the combinations make sense, and if the number of subclass combinations would create an explosion of classes. It also depends on the features and behaviors of your language and what developers in that camp are prone to doing (i.e. best practices for you and them).

Next time, we'll take a similar look at the Strategy pattern. As always, I'm interested in comments.

Wednesday, October 3, 2007

Complexity, FUD, and Evolution

Code complexity is one of those topics that kind of makes me cringe when I think about it. Not necessarily because I fear complex code or that I like it - that's too easy to talk about - but more so because it's an easy way to write something off if you don't understand it.

I've come across a lot of people who, when encountered with something that they haven't seen or haven't fully explored, they slap the complicated sticker on it and keep moving. Think about it; it's not that uncommon. When you first look at something that is, in some way, different than what you're used to and you have very little transferable knowledge, it's easy to get discouraged and want to fall back to known territory. I do that all the time. Sometimes, I catch myself, but other times, I fall victim to my own lack of understanding.

For example, the first time I worked with Oracle (I mean really worked with it as a DBA), I was completely baffled by the way certain things worked. Everything seemed so convoluted. After time, most of it made sense, but I had to invest the time and energy in learning about it before it seemed, at all, tractable.[1]

Commonly, I see this kind of FUD (because that's exactly what it is - fear, uncertainty, and doubt) when I discuss things like design patterns and system architecture with people that don't have a lot of experience or who haven't invested a lot of time expanding their view of these subjects. One case that comes to mind (and that is short enough to explain here) is when I explained what the Singleton pattern was to someone and why it made sense in the particular context. The developer looked at me like I hadn't written a lick of code in my life.

Why do I need to write a method to provide access to the static variable when I can just make it public? This single-whatever-thing just bloats the code and slows things down - accessing a public static variable is faster than calling a method.

Honestly, I didn't know what to say. I explained things like encapsulation and centralized construction. I talked about all the things I've never had to actually argue in favor of (this was some time ago; I've since had this exact and many similar discussions) but it ultimately fell on... well, I'm not sure; just normal ears, I guess.

The person (and other people, since) that made this argument wasn't / weren't dumb, nor were they inexperienced. What they were was overly conditioned to think about things in such a lopsided way they had closed themselves off to learning about anything deviating from the conventional theory they had developed. And, the entire development team would lose out, dealing with code that was slightly, but not catastrophically, harder to maintain.

This mentality is contradictory to what we do. What we do is not static. Programming practices evolve and, hopefully, we keep the good stuff while incorporating new ideas and technologies. I'll eat my own dog food on this one, too; if we find things better than agile development, software design patterns, and the like, I, too, will integrate those ideas. It will happen. The above example (i.e. the infamous singleton story) is indicative of dogma, more so than a science (with my apologies to the mathematicians).

[1] - Note that I do believe that, while I understand why Oracle does a lot of the things it does, I don't believe it necessarily does those things in the best (easiest to understand, most useful, etc.) ways. I say that having worked with Oracle in a development, DBA, and system administration capacity from versions 7.x through 10g, inclusive. Your mileage may vary.

Monday, October 1, 2007

Violating My Space - Client Integration

If you work in an environment where you have to integrate with external systems, such as clients or vendors, there's a significant chance you've felt the subtle encroachment. In terms of system architecture, this is a kind of violation (albeit necessary) of one's personal space. If you're anything like me, you're somewhat protective of the systems you work on. You care about making compensations for external entities.

Even as I write this, I can't help but think it's a coupling problem; they're too close. The considerations of external systems belongs in a highly quarantined detention facility, far from the core business logic of one's code. Let's call this detention facility the client integration end point.

Working with any number of external entities (to which are you beholden to as a service provider) means that you have to make compromises. In fact, it seems as if most of the compromises are always on our end. I mean, after all, the client is footing the bill and sometimes, no matter how hard you try, they're set on doing things their way. Let's take an example.

You have a system that deals with data. Your clients are going to send you data and you need to import it (whatever that means). You store data in UTF-8, internally. One client like ISO-8859-1, another likes ISO-2022-JP, and so on. Moreover, one client prefers SFTP file delivery, another is interested in making API calls and controlling things from their end. You, on the other hand, just put in your two weeks notice.

This isn't entirely out of the question, actually (surely most of you know that). You need to accommodate all of these different entities without loss of functionality in any of the given channels. You also know you could have N clients, where N is a really big number (because clearly, you're good at what you do). You need to work out efficiency issues and handle data loads of arbitrary size. It's no picnic.

I suppose the point of the example is that, even in such a contrived example, you need to handle:

  • Varying transport mechanisms
  • Configurable data recoding
  • Different granularities of data updates (record level, data set level, etc.)
We didn't even talk about the format of files or APIs.

Sure, there are lots of standards around each one of these things, but I can tell you from experience everyone interprets them differently. Clients all use disparate platforms, libraries, and technologies that are out of their control. Sometimes, they, themselves, don't even know what their platform of choice is doing! Trust me. You can't tell them sorry, we just don't support that because there's money - a lot of it - at stake. They've already been sold on you and your technology and now it's time to put up (because shut up isn't an option anymore).

The worst part is that even if you're savey - you have your SOA enabled enterprise thing-a-joobers all wizzing and buzzing around, like they do - it might not make a damn bit of difference; you've still got the connector portion to define, which winds up being client-specific. No matter what, that's a lot of effort to be spent just so you can start doing business together.

There's only so much you can abstract away. Ultimately, if you receive format A and you want format B, you've got some work to do. The point is that it can be unpleasant. Find a good book, let it sink in, know what your clients want to do, and find what works for you.

I'm really interested in hearing from anyone who has significant client integration experience and learning more about the techniques other people use. Drop me a note.

Thursday, September 27, 2007

Tired of the Fighting

Lately, I've been reading a lot of blogs. There's such an amazing wealth of really smart people out there with interesting opinions. But, then there's everyone else. I'm getting really tired of the mudslinging about languages, methodologies, and frameworks.

Really.

I can't count the number of Ruby vs. Java[1] stuff I've read in the last 48 hours. Granted, it's an interesting discussion, but definitely not the end all, be all debate our industry is seeing. More to the point, the amount of misinformation and TL;DR is astounding. Is it true that, for some people, sounding smart is more important than being smart? That makes me sad.

I have a hard time imagining how one can get so entrenched in something as to think there's nothing beyond the walls that box them in. I can only liken it to what it must be like to have some kind of profound spiritual experience[2]; it was so life altering, you forget where you came from so quickly, the experience doesn't mean anything anymore - it's so drastic, you are no longer what you were before.

Let's face it; there will be countless languages, frameworks, platforms, and methodologies that will come and go during our careers. We should be so lucky! Why on Earth would you shut yourself off from something to such a degree?

Ideally, we all take the requisite time to evaluate everything prior to passing judgement. I'm sure we aren't all so pure of heart, mind, and principal. There are so many things we don't like about the things we see. They're easy to point out, too. That said, I have never met anyone who had nothing left to learn (even if they wouldn't agree). There is something good about 99.9999% of the things under heated debate. The things that aren't debated are usually not because everyone agrees they're great or they're terrible.

I'll settle it (to my own satisfaction) for you. The following is for comedic value.

  • Ruby needs more IDE support
  • Java documentation is rather opaque at times
  • Ruby's syntax is inconsistent, at times
  • Java doesn't let programmers express themselves like the true artists they are
  • Ruby doesn't have as much commercial backing and support
  • Java has been molested by commercial backing and support
  • Ruby-ists can, at times, be arrogant, claiming to have invented some ideas that have been around for a long time
  • Java-ists can, at times, be arrogant, claiming every other language to be toy
  • Ruby is slow
  • Java is a memory hog

Are you getting the point? Hint: If you caught yourself saying but wait... Eclipse has Ruby support now or but wait... Java has an excellent community you need to start from the beginning. Go ahead. I'll wait...

Good. Feel better?

If you write Java for a living, go (yes right now) to this site and find something interesting. If you write Ruby for a living, go read up on all the neat stuff you can do in Hibernate.

So there you have it. Now don't make me reach back there. You don't want me to stop this car.

...and if I hear one more thing about how you're an idiot if you do / don't use git / subversion / svk, I will find you and force you to go back to using RCS[3]. Try me.

[1] - At my day job, I write neither Java nor Ruby. I like both for different reasons. I have no real stake in this debate.

[2] - I'm not trying to disparage those that are spiritual or those that have had any kind of spiritual awakening. If you're upset by this, you're reading too deeply.

[3] - If you actually like using RCS for source code control, I'm more than happy to have offended you by this.

Wednesday, September 26, 2007

Version Control and Release Management With SVK

SVK is self described as a decentralized version control system built with the robust Subversion filesystem. Possibly more important (although, who doesn't like words like decentralized and robust), are some of the features it supports:

  • Repository mirroring
  • Disconnected (i.e. network-less) operation
  • History-sensitive merging

There are other features it supports, such as patch generation, but those above are the biggies. At this point, those in the open source development community have heard about SVK and I wouldn't claim that it's news, per se. What is interesting is that it works.

For the last year or so, I have been primarily responsible for managing the software release process at my place of employment. This, for me, entails coordination of release cycles (weekly, for us), merging of bug fixes from other developers, and branch management. We used to be a CVS shop, but at some point, I pushed hard to get everyone into Subversion for reasons that should be obvious (if they're not, rest assured they were valid). Our release process was not terribly complicated and was mostly informal, but SVK made it much easier to maintain. Here's it is...

  1. Work in dev branch - //rep/branches/dev
  2. Developers merge stable code (by their definition) to /rep/trunk
  3. Every 7 days, a Release Manager create a stable branch from trunk - svk cp //rep/trunk //rep/branches/stable_yyyymmdd
  4. QA team tests stable_yyyymmdd while devs continue to work in //rep/branches/dev and integrate changes into //rep/trunk.
  5. QA team gives the stable branch their blessings
  6. Release Manager blesses the stable branch into a release by making a copy - svk cp //rep/branches/stable_yyyymmdd //rep/branches/release_yyyymmdd (where the dates are the same)

Points of interest:

  • //rep/trunk is where devs can do integration testing
  • //rep/branches/stable_yyyymmdd is where QA and executives can preview what is coming
  • //rep/branches/release_yyyymmdd is safe to deploy to new machines at any given time
  • //rep/branches/release_yyyymmdd is copied from the stable branch so if there are bug fixes, during that testing, the release branch is unchanged and still safe for re-deployment
  • devs can cut personal or per-project branches off of //rep/branches/dev should they need to do so, and manage them themselves

What SVK brings to the same, and what many see as one of its killer features, is the smart merging. This means that SVK remembers the source of a branch and the last merge point. This means that if I want to bring all outstanding changes from the dev branch into trunk, I can do:

svk smerge          \
  -m "Merging everything from dev into trunk with original log messages" \
  -l                \
  --to //rep/trunk

The -l option tells SVK to include the original log messages (in addition to the message specified by -m) and the --to <rep> specifies the repository location to merge changes to.

If bug fixes need to be applied, developers merge them from dev to trunk, then request that those changes be merged to whatever the appropriate stable branch is for testing by QA, and eventually the release. As time goes on, we kill off old branches to keep things tidy, but we can always go back and resurrect them should we need to do so.

SVK has significant advantages when traveling or working in an area where the Subversion repository is unavailable, as well. This is probably the other major killer feature. Because SVK mirrors the Subversion repository and keeps it in a local repository, you can continue to commit changes when you don't have access to the core Subversion repository. When you get back to a trust location, you can instruct SVK to synchronize all changes.

While still clunky in a few places, SVK brings some fantastic features to the table with its smart merging and distributed operation. Its patch generation features as well as the ability to synchronize to an external repository make it the perfect tool for customizing or working on open source projects without commit access. At some point, it would be great to see these kinds of features make it back into Subversion, but until then, SVK does what was otherwise a real pain and it does it well. Check it out.

SVK Wiki

Many thanks to Chia-liang Kao and Best Practical for all of their work and support of a great project.

Tuesday, September 25, 2007

What a Senior SysAdmin Should Know

Prior to taking on the System Architect role where I am, my title was Manager of System Administration. I've worked in system administration enough to know about all the things I don't know. Sadly, I've also learned a lot about what, does not, a sysadmin make when interviewing people.

In the Linux world, there are a growing number of people who call themselves system administrators. Technically, everyone who sits behind a computer administers it to some degree. The question, in the case of those that want to make it their livelihood is, to what extent do you know your field? Granted, these aren't the days of punch cards, nor do you have to be an electrical engineer to touch a computer (put down that soldering iron, if you please), but you do have to know your stuff, to be sure. But what stuff, and to what degree? Fair enough.

Networking concepts and Protocols

This is a hard and fast requirement; you must understand basic networking concepts and common protocols. It's that simple. You must be able to explain how TCP/IP works (if SYN, ACK, SYN/ACK doesn't mean anything to you, you're in trouble here). You must know the difference between stateful and non-stateful firewalls. You must know what an MTU size is and what changing it means. You must know what happens if a DNS response packet is larger than 512 bytes. You must understand, conceptually, how routing protocols work. This stuff isn't brain surgery (recently, I heard that brain surgeons joke that brain surgery isn't brain surgery), but it's a bit dry. You absolutely can not be a (structural) architect if you can't do basic math.

Linux as an Operating System

Should I need to say that you have to know about Linux-ish things to be a Linux sysadmin?

You must know as much as you can stand about the Linux kernel; building one, updating (more on tools like package managers later) kernels, where modules live, when they're loaded, why, how, how to manually load and unload them, and other exciting stuff. You must understand the Linux boot process and init. You must know about the boot managers and how they differ (lilo, grub, etc.). You must be able to write, modify, and maintain shell scripts (and understand concerns like quoting errors, for $deity's sake). You must understand basic system libraries such as (g)libc, and things like PAM (trust me, you'll thank me later). You must know how to tunnel everything under the sun over ssh. You must understand the interesting file systems like ext{2,3}fs (and you must understand what that {2,3} syntax means), udev / devfs, tmpfs, and especially proc. You must understand how to modify file system options (hint: tune2fs).

Security

You can not survive as a system administrator without a better than average understand of security. No excuses.

You must be able to read and interpret bug reports and know about things like the CVE list. You need to understand system hardening. You need to know iptables / netfilter. You need to know the security implications around certain bits that can be twiddled in /proc. You must understand common attacks and know how to mitigate them. You must understand what IDSs and NDSs are. You need to understand, at least conceptually, cryptography and how it applies to security (hint: lolzEncryptEverything!!!111 doesn't count as a valid answer). You must understand PKI. You must know how to respond to security incidents.

A Bit About Tools

Who doesn't like things that make our lives easier? If I were to read this, and admittedly, even as I type it, I would think that this sounds like a lot of stuff that is made easier by tools provided by vendors, open source projects, or even distros, themselves. That is entirely true. There are RPMs or DEBs of kernels, so why should you care where modules live or how the system boots? Because it's your job and knowing what tools do and how they work will not only help you, but will bail you out of trouble. I think projects like Webmin, Firewall Builder, and the like are great, but you still must know what they do and the concepts behind them before you can really understand how to use them properly. I'm guessing that the authors of those projects would agree.

This isn't about being elitest, but about doing what we do, and doing it well. Take everything as an opportunity to learn and grow and you'll quickly find that it's much easier to succeed when you know rather than when you guess correctly.

Monday, September 24, 2007

Apache ActiveMQ and Perl

I write an awful lot of Perl code as part of my day job. In fact, I've been writing Perl for quite a while, now (about ten years). Perl is one of those languages that does a lot of what it says it does. It's Perl; no more, no less. I tend to use Perl as if it were very strict, strongly typed, object oriented and don't get very Perl-ly with it. I like encapsulation, accessor methods, design patterns, and other things of that sort. I suppose, in the end, I'm not really a Perl guy.

Recently, I've been working on a project that aims to decouple systems via messaging. Not wanting to necessarily build a messaging infrastructure from the ground up, I went in search of something to do the work for me. I'm a fan of the JMS feature set and wanted to find a way to bring such a thing to Perl. The Apache project has developed a JMS compliant message broker called ActiveMQ. ActiveMQ, as you might expect of a JMS broker, is implemented in Java and supports a few different wire protocols. One of these protocols is called the Streaming Text Oriented Messaging Protocol, or more succinctly, Stomp. The idea behind Stomp seems to be to provide an interoperable wire format so that any of the available Stomp Clients can communicate with any Stomp Message Broker to provide easy and widespread messaging interop among languages, platforms and brokers. That's truth in advertising.

As with most things, someone else has thought of talking to JMS brokers from Perl long before I have and has manifested this as the Net::Stomp Perl module on CPAN. I, for one, am grateful.

For Perl monkeys, it's about as straight forward as it gets.

  use strict;
  use Net::Stomp;

  my ($stomp);

  $stomp = Net::Stomp->new({
    hostname     => $host,
    port         => $port,
  });

  $stomp->connect({
    login        => $username,
    passcode     => $password,
  });

  # Sending a message to a topic
  $stomp->send({
    destination  => '/topic/MagicDoSomethingBucket',
    body         => 'Hello from Perl land.',
  });

  $stomp->disconnect();

A rather contrived example, but you get the idea. In my case, I opted to use a binary message body (mostly due to time constraints) and used Storable's nfreeze() function to serialize a data structure for transport. If you're interested in doing content based routing or filtering, XML is a better way to go. Either way, I found this to be fast (about 20k messages per second when talking to a local message broker and posting to a publish / subscribe topic with one consumer), flexible, simple, and reliable. In its final home, I will be using ActiveMQ's store and forward functionality with a series of brokers configured in a grid. It really is neat stuff.

As always, I'm interested in any experience people have with ActiveMQ and other messaging solutions, especially as a method of communication between different languages. If you're not familiar with messaging solutions, take a few moments to read up.

You'll be glad you did.

Sunday, September 23, 2007

Caving in to the iPhone (why it matters)

I was in need of a new phone. I usually sport a rather modest Blackberry which I use mostly for email and data. Well, I traded up.

The newer blackberry I was looking at was nearing the $300 mark with a two year renewal. I quickly realized for just a bit more, I too could own an iPhone (potentially against my better judgement). I say against my better judgement more so out of embarrassment for buying such a hip product than out of technical quality. The conventional thinking should always be to question the conventional thinking, especially for those in any kind of scientific field (which we, arguably, are). I think we - because hopefully it's not just me - sometimes get caught up so much in the questioning that we often don't realize that the popular answer is the correct one. Clearly the word correct is extremely subjective and will depend on the context, but for me, this is one of those cases. It really is just a nice phone.

So what's the point? I fear I've done this in technical cases, in the past. I have been harsh toward Java because of this. In the last few years I think that Java has grown and progressed in places where I used to fault it, like performance. I've also had a similar epiphany with regard to development processes (although admittedly not until I learned about agile methods a number of years ago). I now feel that it is a lot of the unnecessary ceremony and inflexibility that i actually didn't care for.

I think all I'm trying to say is learn before you burn. Make sure you understand what you're panning before you completely write it off. Consider the intended audience and the context surrounding something. Be it functional or object oriented programming, Java, Ruby, Linux, Windows, or even the iPhone; the goal is to make an educated, intelligent decision, not to hold up some misplaced notion of purity or loyalty which is, in fact, just thinly veiled ignorance and bigotry.

On the other hand, sometimes you run smack into something that turns out to be exactly what you thought it was. When that happens, well, you know what to do...

Saturday, September 22, 2007

The development process - a look at OpenUP

I've encountered a number of companies that eschew any kind of formal process. The thought seems to be that a development process, any process, impedes flexibility and bogs down the process (lowercase p) of development. Personally, I've always been skeptical of processes for similar reasons. As a developer, you feel constrained by what your role is defined as, in any specific development methodology.

The longer I do what I do, the more I find that process isn't bad; it's the processes that we have (or had) that were bad. It's the type of process that needlessly constrains and restricts more so than having a process at all. Because of this belief, I was relieved when I first discovered agile methods. I was late to the game, to be sure, but I was equally impressed, none the less.

I read, on some random blog (sorry, I don't recall which one), about OpenUP - described as a lean Unified Process that applies iterative and incremental approaches within a structured lifecycle. In my reading of the wiki, I found that it's very easy to understand in about fifteen minutes, even for mere mortals (read: non-PMPs or trained project managers) provided you have a rudimentary understanding of software development. It has some interesting advantages:

  • Low-ceremony
  • Tool agnostic
  • Project type independent
  • Attention to small time windows (talks about days and months, not years)

Of course, it's also chock full of the normal agile development goodies like acceptance of change, iterative thinking, and so forth. It's interesting to see people defining simple, easy to understand and implement, processes that, themselves, are open to refinement. It just makes sense.

Give it a look. It's neat stuff.

OpenUP Wiki

Friday, September 21, 2007

Collaborative development environments (a follow up)

After my entry the other day, Distributed Teams in Development, I did some extra poking around for papers written on the subject. Luckily, the software development world is chock full of these smart people who are willing to make their thoughts and work available to us all out of the goodness of their hearts. Grady Booch's Papers section has a wealth of great information on tough topics. Better still, it's accessible to us mere mortals in that it's not self referential in ideas or terminology. I'm looking at you J2EE.

One paper of interest, titled Papers on collaborative development environments (CDEs) (doc, pdf) relates to what I discussed the other day and breaks it down much better than I ever could. This is worth a read if you work in an environment that is, as he describes distributed by time and distance. There's references to products, both open source and commercial, that are interesting as well. In some cases, he offers a unique view on how to use these products together.

Either way, it's a good bit of insight into a difficult problem.

Thursday, September 20, 2007

Alphabet soup

More so lately, I've been doing a fair amount of work in Java with Eclipse as a development environment. I've been working on both Linux (my main platform for development) and Mac OS X (which I keep around for digital audio / music applications and a few games). The experience is about the same on both platforms; it's nice to see some of the advantages of desktop Java applications shining through.

What gets me about the development world is the sometimes almost comical barrier to entry in terms of nomenclature. Specifically, the endless swarm of acronyms, marketing speak, and double talk that surrounds certain technologies, that is almost damaging. Learning Java, the programming language, is trivial compared to learning Java, the marketing object. I understand differentiating one's self within a market place; fear not Sun - you're different.

One of the worst offenders of terminology warfare is J2EE. I've spent the last few weeks sorting through all of the Blueprints, Tech Articles, White Papers, FAQs, API Specifications, and Glossaries that Sun has to offer at the main java site and I'm not sure it had to be as soup-ish as it was. I'm positive a lot of the ire that Java has been awarded by certain communities is due, in part, to the thickness of said barrier to entry. Some people don't have an enterprise to worry about and are alienated by some of this kind of opacity.

And, more to the point, it's really a shame. When you dig into the ideas behind the Java Message Service, for example, it all makes perfect sense and really is a pleasure to use (most of the time). I understand that with abstraction comes a certain degree of, well, abstraction, but this is kind of silly.

The JMS API enhances the Java EE platform by simplifying enterprise development, allowing loosely coupled, reliable, asynchronous interactions among Java EE components and legacy systems capable of messaging.

I know what it means, but it really raises more questions than it answers for those that don't have a clear understanding of what (exactly) a Java EE component is. Admittedly, the Java 5 EE Tutorial is one of the better guides I've seen for jumping into a framework, but finding it when you don't know it exists is a daunting task, in and of itself.

Eclipse, as a platform, also suffers from a bit of the same. Again, here's a case of a great tool buried under a strange sheen of marketing cruft. The Eclipse project page is daunting to someone who's looking to figure out just exactly what it can do. Clearly, it's extensible, but does it have to actually optimize solely for the beginner and the expert, only? As a real world example, I wanted to find an integrated UML editor for Eclipse. I won't tell you how that experiment ended, but it included a tour of EMF, GMF (which isn't at all like the GEF), GMT, M2T, Model Driven Development integration (affectionately dubbed MDDI), and MDT. In the end, I was rewarded with a tree view that allowed me to add children like Class, or Interface Realization. I know what that means, but I swear I saw screenshots of a graphical UML editor, didn't I? Surely those G* projects do something other than draw trees.

I hope that, one day, a bit of transparency can be brought to these kinds of tools (or platforms; whatever). Until then, I suppose it's just the price of entry into a world that does, in fact, have a lot to offer to a wider audience.

Wednesday, September 19, 2007

Distributed teams in development

I suppose, at some point, an organization becomes large enough where teams of developers are geographically disparate. In my experience, this is never fun. Each team seems to develop a good rapor locally, for obvious reasons, but that doesn't necessarily translate to a strong connection to peers in other cities (or even countries). It's always difficult to spread common knowledge when the level of interaction between teams is low.

There's been some effort to bring people closer together, but these applications or tools are either too "low bandwidth" or don't allow for the kind of communication you really need for things like peer programming and code reviews. I'm referring to tools like MS Live Meeting, which is something I've been personally subjected to. Yes, subjected is the word I mean to use.

I've recently read a blog entry by Grady Booch where he briefly mentions using Skype and Second Life, which sounds interesting. This also appeals because it's available to us Linux folks (at our office, we have a projector connected to a box running Windows, a wireless mic with a polycom, a bridge phone conference thingie, all of which we use solely for team presentations). All bias aside (I have a distinct dislike of Live Meeting), Live Meeting is too slow and is a good representation of low interactivity, low bandwidth distributed tools. Skype (or other VoIP-ish tools) are a bit more natural for realtime conversations, but lack integration with any kind of tools. I'm not entirely sure what Second Life brings to the table; my exposure to it is rather limited.

It should be noted that many, if not all, open source projects are developed by distributed teams. I've been directly involved in a few popular projects such as Gentoo Linux and contributed patches and bug reports to others. In most cases, collaboration tools are critical to these kinds of projects; IRC networks, forums, mailing lists and numerous other bits of infrastructure exist to support this development model. The major thing I've found lacking is integration into the common toolsets of choice, be it vim, emacs, or a full blown workbench like Eclipse.

So why is it that a bunch of developers can't get on the same page about good, distributed, high bandwidth collaborative tools? Better yet, why can't our normal tools support collaboration as a standard part of development? Things like version control systems cover history management, tracking, auditing, and other functions, but they don't help with real time code review and pair programming. Distributed source code editors are interesting, but I'm not sure I've seen one that fits with the tools a personal already uses, as opposed to simply making them use other tools. In an age of modular language workbenches - I'm looking at you Eclipse - this should be something to pursue.

The role of the System Architect

My official title at my day job is System Architect. Simple, straight forward, theoretically well defined. The wikipedia definition of a system architect makes it seem like the world is made of butterflies and honey and everyone is in love. It couldn't be less true.

Admittedly, at my place of business - a tech-saavy company with a development team of about forty very smart people - the idea of what an architect is and what he or she does is still forming. The position didn't exist up until recently. It's a little less like the wikipedia definition cited above, and quite honestly, thank God. My position is definitely more technical than it is business oriented (i.e. I still write code, et al). I tend to focus on (and possibly obsess over) design, system structure, subsystem integration and communication, and other seemingly opaque and lofty topics. Luckily, I still have to eat my own dog food and practice what I preach, so it can't all be about purity and pie in the sky conceptual cruft. If there isn't any meat on the bone, I could easily get eaten alive by my peers. (My apologies for the list of inappropriately used cliches above.)

Sometimes, the responsibilities can get hazy, though. Given that I am in a consultative rather than authoritative position, the onus is on me to prove that the solution offered is the best there is. There's no guru hat here; it's put up or shut up.

Not often, but almost with certainty, there are situations where time constraints around a project force a sense of impending doom and severely limit the time people are willing to put into discussion of a system. It happens. It can be even worse than that, though. The time it takes to have these discussions, not to mention the emotional energy when everyone in the room actually cares, is immense. People can get their egos bruised, or worse. I'm certainly not above such things, at times.

So what is a System Architect, really? Specifically, how do we prove the value of things like design patterns, integration patterns, and the like to other developers?