Wednesday, December 31, 2008

Twitter (Yea, I know)

Against my better judgement, which is how I do most things in life, I've begun playing with twitter again. I invite you to follow me there, taking every opportunity to taunt me for my usage of such a self indulgent, narcissistic, hey look at me, kind of (dare we call it) technology.

I blame some of those I work with for my renewed experimentation. You know who you are (ahem-Kiril-cough).

Email or reply here and let me know all the reasons why I shouldn't succumb to the silliness, or to let me what your nick there is so we can be silly together, exchanging absurdly pithy comments about happenings in our lives.

Friday, September 5, 2008

GridGain - simple, effective, and made of pure happy juice

Let's talk about real, tangible, hype-less, grid computing. I swear, it's real.

I've heard endless sales pitches about moving your apps to the grid. It's almost quaint how easy they make it sound. Let's pretend, for a moment, you have an application that benefits from parallelization, you have no issues with data availability, and you control your source code (because you're not going to make every application you use magically run in parallel regardless of the sales pitch).

To this end, I've been using GridGain, recently. In a few weeks I might even be able to talk about why, but that's far less important. For those living under a rock, GridGain is a Java implementation of a Google-ish map / reduce system. The effect is that, with very little code and even less configuration, one can cause suitable code to run in parallel, on a grid of JVM nodes. I won't discuss the map / reduce concept in detail, other than to say the idea is to:

  1. take a job, break it up into smaller units of work
  2. assign these units to processing nodes
  3. collect the results from the processing nodes
  4. aggregate the results, as appropriate

Items #1 and #2 refer to the map part of the process, while #3 and #4 refer to the reduce component.

There are two ways to use GridGain. The first method is to annotate methods (either static or otherwise) that should be run on the grid. This is fast to write, and coarse-grained in that the unit of work is (usually, but need not be) the entire method. The second method of running tasks on the grid is by directly accessing the GridGain APIs. This, of course, tends to mean a bit more code, but to some, may be easier to understand and debug. Which method you choose, will probably depend mostly on preference, but could also be influenced by dependencies. I opted for direct invocation via the APIs because I prefer not to futz around with AspectJ - which is how the annotations actually work - and friends unless I'm also using it for other things. GridGain is happy to work with AspectJ, Spring AOP, or JBoss AOP, according to the docs, although they seem to recommend against Spring AOP due to a lack of full functionality or some such. I didn't get too much into the details.

The APIs are surprisingly direct and simple to understand for something that is so seemingly nebulous. This, coupled with some of the best documentation I've seen in some time, makes for a pleasant experience. The only thing that makes my vision a little blurry is the generic-soup that begins to occur, but for the type safety, it's probably worth it; anything that needs to run on a grid is arguably important enough that you can groan through the angle brackets.

Running a GridGain node requires zero configuration, in many cases, although being a Spring application, itself, and being well written with a number of service provider interfaces, one could easily sculpt what amounts to a custom product out of it. What is nice is that one can easily run GridGain within an IDE like Eclipse during testing and development, which significantly speeds the process of rolling out your apps. While on the topic of time saving features, GridGain using peer-based class loading meaning, for simple, self contained, tasks, one need not do any kind of special deployment to execute tasks. The task classes are transmitted and loaded by GridGain itself, without any need for restarts or copying. It's like a magic fairy sprinkled happy-time sparkles in my brain.

Of course, GridGain provides all manners of fancy pants things that I haven't yet explored such as node affinity for processing close to your data, integration with caching products from Oracle and JBoss, different scheduling and communication methods, and similar goodies. I also haven't fully explored implementing custom service providers, but it's mostly because I couldn't find one that I might need that wasn't already implemented.

Here's a bottom of the barrel, super-contrived, example to whet your noggin knobs.

public class TestApp {

  static public void main(String[] args) throws GridException {
    try {
      Grid grid;
      GridTaskFuture future;

      grid = GridFactory.start();

      future = grid.execute(MyTask.class, "Print me...");

      // This causes GridGain to block until all jobs are complete. It
      // also returns the result, if there is one.
      future.get();

    } finally {
      GridFactory.stop(false);
    }
  }
}

/*
 * The task class is going to be the unit of word that gets split
 * into smaller jobs (GridJob) that get distributed and executed.
 * I'm using adapters for both the task as well as the jobs to avoid
 * extra work, so yes, I'm cheating.
 */
public class MyTask extends GridTaskAdapter<String, String> {

  @Override
  protected Collection<? extends GridJob> split(int gridSize, String arg)
    throws GridException {

    List<GridJobAdapter<String>> jobs;

    jobs = new ArrayList<GridJobAdapter<String>>();

    // Build a collection of jobs to run on the grid.

    for (int i = 0; i < 10; i++) {
      jobs.add(new GridJobAdapter<String>(arg) {
        @Override
        public Serializable execute() throw GridException {
          System.out.println("arg:" + getArgument());
        }
      });
    }

    return jobs;
  }


  @Override
  public String reduce(List<GridJobResult> results) throws GridException {
    System.out.println("Not much to reduce when we're printing stuff...");

    return null;
  }
}

I know it's probably considered verbose, still, to the scripting community at large, but when you consider what's happening here, it's pretty bad ass.

It's worth noting that there are other frameworks similar to GridGain, in this space, that deserve a look as well. Hadoop comes to mind. I have no direct experience with them, so I don't want to say either way, but I have heard nice things.

This is all very contrived and still thin on details. My goal wasn't to reproduce a howto, but more so to, well, to glow about GridGain, to some small degree. I encourage everyone to check out GridGain for themselves and find the things relevant to them.

(Brief) Thoughts on Rails

In working with Ruby, and specifically Ruby on Rails, I've developed a few opinions about the underlying design, architecture, but more so the development principals and philosophies. Note that I wouldn't classify myself as a Ruby nor a Rails expert, although I do have extensive experience with other languages and frameworks.

Both Ruby and Rails are philosophy heavy. This isn't inherently a bad thing, in my opinion. These communities are rife with smart people writing smart software; you wouldn't want the opposite, or any other combination thereof. The gripes that I do have are in the places where those smarts are complimented to the point where they metamorphose into arrogance. Please don't get caught up on the word arrogance. Keep going. I just couldn't find a less provocative synonym.

Let Rails be decomposed into three pieces - the model (ActiveRecord), view (ActionView and ERb, in my case), and controller (ActionController) components. When building any Rails application, one is sure to work with all three. Therein, lies the first issue; Rails owns all.

Configuration has use

Rails (and Ruby) favor convention over configuration. I think this is absolutely a good, general, idea. Having worked with things like Spring and Java in general, I would welcome a bit less in the way of configuration, in some cases. That said, in the case of Rails, there are certain things I wish I could configure via (something like) XML rather than code. For instance, the route configuration in Rails (aka: mapping URLs to controllers) is done in code. The code is kind of DSL-ish which makes it a bit better, I suppose, but developing tools to parse and generate said configuration would be painful. One could eval the code and attempt to work with it that way, but that's not necessarily safe, nor would it be free from a huge number of dependencies (like Rails, itself). Another option is to replace routes.rb (the standard file name) with something skeletal that loads configuration from XML or YAML even, but shouldn't that come out of the box? You know, in the interest of the entire Rails community not repeating themselves and all. Please Rails, let me use configuration where it makes sense. This is but one simple example.

Ruby on Rails. It's not new and there's no shortage of both hype and panning from all sides. Nothing is ever the panacea. There is no short cut to building complete, real world, applications. You can make it easier, but you can't take away the requirements that a client or business places on an application. Specifically, a framework, language, or library may never dictate what is possible unless there is a technical limitation or purpose. Philosophy in software is excellent; obstinance is unforgivable.

Sunday, July 20, 2008

Content Management Systems

Recently, I've encountered a situation where we needed certain (static) elements of our site to be under the control of a content management system. This seems easy enough. I did a fair amount of research into both Drupal and Joomla, two of the more popular projects that are out in the wild these days. The requirements are straight forward:

  1. Non-technical people must be able to edit content and publish it to the site.
  2. The content needs to be published as static, well formed, HTML.
  3. Multiple templates must be supported such that one can create and manage content for each type of page.

There's always two questions we ask when we see requirements like this; "are these things technically feasible and realistic", but more importantly, "why do we have these requirements?" To better address the former, the latter must be defined.

One of the tenets of a system such as a CMS is that it must integrate with existing infrastructure. In our case, we have an existing site, in production, that can't be easily disrupted. Nor can we take the time to perform any kind of wholesale migration of code or deployment strategy. Again, the goal is to manage just a very small subsection of the site that is entirely static with one of these systems. Easier said than done.

Both Drupal and Joomla have an amazing feature set, to be sure. Both have large, active, communities, plenty of modules, and significant deployments. Upon further inspection of both, what I found both surprised and scared me; neither does what I need it to do. Let me break down the requirements in a more specific way; I would like to create multiple, complete, HTML templates with small sections that can be edited by content authors.

This seems like the easiest use case possible for a content management system, yet neither system seems to handle this easily. Neither Drupal nor Joomla operates "off line," instead opting to render pages dynamically, at request time. Drupal's caching system, which initially sounded promising, simply builds the static page and insert it back into the database. Worse, is that Drupal insists that a site has a "theme" which, while similar to a template, implies that there is only one per site. Joomla at least seems to acknowledge that a site can have multiple templates, but there's no provision for creating them (although one can edit an existing template). Of course, Joomla also seems to lack support for generating static HTML output in a location of my choice.

What Drupal and Joomla both seem to do extremely well is allow one to create a site from scratch with all manners of bells and whistles. Both have extensive support for users, comments, neat little calendar thingies, forums, RSS feed generators... wowzers. These are both great projects for building a site with specific needs, but, and I know this is going to result in hate mail, without writing custom code, neither performs the functions I would consider the baseline requirements of a content management system.

On a side note, I did find the Apache Lenya project which seems to be much more what I would expect from a CMS. The online demo for Lenya doesn't make me feel very confident about it, but I'm hoping it's going to improve over time. In true Apache project style, they seem to have people doing evaluation of what areas they're lacking in and what they need to be successful. No wonder the Apache umbrella turns out such good work.

There's also a number of other projects out there, some of which I've used before, some of which I'm intimately familiar with. That's what I'm avoiding them and not mentioning them at all. Suffice it to say there's a lot of work to do in the CMS arena.

I am interested in Drupal and Joomla in that I think there's something there, but I am surprised at the wide-and-shallow approach to content management. I hope to get a better understanding of the goals of the projects in the coming weeks to see what they're really trying to address. Unfortunately, the do-all projects tend to never do one thing well.

Friday, May 23, 2008

Operations on the cheap

My current position puts me back into the world of tech ops, for lack of a better description. I'm primarily responsible for data center operations and infrastructure. It's not a bad gig, but as one might expect, it shifts the focus.

I have the omnipresent mandate of running the show on the cheap. This, of course, should be an anti-surprise for anyone in either a startup or a smart company. Waste is always wasteful even if you can afford it. Hint: you can never afford it.

Given this most recent brush with corporate poverty (which I fully support and feel is one of the things that keeps us honest about what is really necessary) I feel like I need to write something up on what has worked, in my experience. Maybe it's time for motivation.

Sunday, March 16, 2008

Open Source and the IP Clause

This article reflects my lay-person's understanding of employment and intellectual property. It is not legal advice of any kind. I am not a lawyer. Don't assume that any of this is correct. This is based on drips of information I've picked up over the years. Corrections are very welcome.

As I've mentioned recently, I've switched jobs in the past month. One of the things I always worry about is the dreaded intellectual property clauses in most modern employment contracts. You may be surprised to find out that most companies in the know are willing to modify these, traditionally, draconian clauses if you phrase the request correctly.

The Problem

Commonly, these sections claim ownership of all IP, in all forms, for the company, with very little restriction. In more than one case, I've seen contracts that claim ownership of all IP developed on and off the clock, regardless of whether or not the work is related to what the business does or may do. This, of course, means that anyone interested in doing any kind of open source development, or even research of any kind, will not necessarily own such work.

I like to get involved in open source projects so having the freedom to do so is worth something to me. I found that simply being up front and asking the employer for modifications to the contract, if necessary, is the best way to go.

The Changes

In the most recent contract I was presented with, the company claimed all rights over everything developed while I was employed. Having one or two projects I'm interested in working on that are entirely unrelated to the company's stated business, I asked for changes to the contract.

I did not provide specific language for the changes I requested. I asked the company's management and general counsel for the following privileges.

  • Ownership and the right to develop source code, written work, and IP in any other form provided it did not compete with or interfere in the company's stated goals and business.
  • The right to release any IP, in any form, under a license of my choosing, citing the GPL, LGPL, BSD, APL, MPL, Creative Commons, and other licenses as examples of the type of license I may select.
  • The right to participate in standards groups and bodies, user groups, and conferences, presenting any IP that I would own, under these changes.

Some of that is redundant. I asked for these things this way because I wanted to make very clear what my intentions were; to effectively disclose to the public, any IP I developed during my employment that was unrelated to the business.

As concessions, I offered that the company would be allowed to dictate the following.

  • Whether or not I disclose for whom I work. They have the option of requiring me to state for whom I work (i.e. if they feel it helps them in some way).
  • If any IP was in a shady grey area about whether or not it was in conflict with the business's goals or primary line of business, it would remain unreleased by me.
  • I must select a license that allows the company to use said IP. Restrictions on their use of said IP was acceptable (i.e. I license something under GPL which limits their usage in some cases).
  • They may opt to review work for conflict with their business.
  • I am still (obviously) subject to, and aware of, the non-compete clauses in my contract.

Reality

It's not utopia. I don't even know if it's all enforceable; like I said, I'm really not a lawyer (and they are). I probably won't develop anything worth their time, but I want the option to try. More importantly, we, as potential contributors to open source work and standards, have the obligation of protecting projects and bodies to which we contribute. I don't know how effective the changes are that I requested, but certainly in its original form, that contract could be problematic to an open source project if I had contributed something and my employer wasn't happy about it.

I hope more companies, especially those that thrive on open source software and standards (even if only indirectly; I'm looking at you, Oracle, Java, and Microsoft shops), become more tolerant of what is required to generate such IP. Better, of course, are those that can directly contribute company owned IP, but by simply not attempting to exert ownership of the minds of their employees, a company is taking a huge step in the right direction. At least in my (non-lawyer) opinion.

Wednesday, March 12, 2008

Mixed Language Architectures

I've recently switched jobs. I won't get too much into that. I hate interviewing and recruiters even more. More importantly, my new home is full of smart people and a snazzy set of problems to solve.

The CTO at this company was dealt a bad hand. Before either of us were hired, a functional, but severely damaged PHP application was built by contractors for the main web site. Put away your flame throwers; that's a comment about the situation and has zero to do with the language. Because we're not far into the life of this company or the application, there's an opportunity to fix what doesn't work, and the underlying (lack of) architecture doesn't work.

In an effort not to drastically upset time lines and ongoing projects, our CTO had to come up with an approach that would increase capacity, scale, and inject design principals and best practices into what already existed. The current application did not allow for the features the business folks wanted down the road. It needed love and attention.

The plan was to leave PHP for the presentation logic, templating, and rendering, but to gut major pieces of business functionality and implement them as services that exist in a middle tier. These services are written in Java and served up as web services for easy consumption by the PHP front end layer as well as by third party applications for integration. The PHP layer retains some ability to cache results from the service layer, where necessary, but ultimately, the real heavy lifting in business logic takes place within the services, themselves.

The advantages to this approach are:

  • Existing functionality can be tackled and replaced piecemeal; a much less taxing proposal.
  • A very clear separation of responsibility must exist.
  • Rapid prototyping can still occur in PHP which is generally better at fast turn around and testing.
  • The majority of very complex configuration and scaffolding that needs to happen with Java is in the web tier and its associated frameworks. Using Java as a back end service layer lets you get strongly typed, well defined, and proven technology in a place where it's easy to work with. Frameworks and components like Spring, Quartz, Hibernate, JMS, Lucene, and solid transaction management can be exploited without having to incur the long development times that are expected at the web layer.

Surely this isn't magic, nor is it rocket science. I do find it interesting that the downside - a strained architecture - can be turned into something positive by clearly defining lines and using standards as a means of self-integration.

If you work in an environment similar to this, I'm interested in hearing your experiences, both good and bad.

Friday, February 22, 2008

What Makes the Job

As much as I have a burning hatred for interviewing for jobs, I've had to do so lately. I'm a creature of habit and comfort. I like working with and getting to know a core group of people. People are important like that. People are, absolutely, what make the job.

More so than anything, I've realized over the last week of interviews (I limit it to exactly one week of interviews) that a little extra money, the promise of equity in a company, even the business the company is in; all a far distant second to the people. Smart, talented, interested, people.

Almost unbelievably, I found two companies that seem to be full of people that are exactly what I would look for (well, at least the ones I met). For the record, I'm not sucking up; I've already received the offers and I don't think they know about or care about this blog. There shouldn't be anything remarkable about this, but there is. No pinball machine, free bagels, or company sponsored potato sack race can possibly come close to what I get out of picking the brains of smart coworkers.

That's all. Nothing notable, per se, just a realization about what's really important to me.