Complexity and Productivity
Coupling and Cohesion, Part 1

If you work in any kind of medium or large company that isn't a software company it's probably pretty obvious that the software developed for the business in-house is vastly inferior to the software you use in your daily life, mostly on your phone. In-house development ("Enterprise" development) takes forever to deliver crap and software companies "get it" whatever "it" is.

There is nothing new about this perception. Brooks starts off The Mythical Man Month examining exactly this phenomenon and he wrote in the early 1970's about his experiences at IBM in the mid-60s.

Most often this gap is attributed to two things: start-ups attract a better, smarter class of developer and successful start-ups have enormous resources to pour into things like graphic design and testing. While both these things are somewhat true, one of the main reasons for the productivity gap is that software gets more complicated more quickly than people realize. Naive developers and managers don't fully appreciate how easy it is to create a big mess and how intractable those messes can be for an organization.

In software, complexity comes in three forms:

  1. Algorithmic Complexity is the most intuitive meaning of complexity - using sophisticated techniques to do hard things. For instance, analyzing a genetic sequence to determine mutations uses sophisticated mathematical algorithms that have been continuously refined over 3 decades.
  2. Scale/Performance Complexity is another type of complexity that is intuitive and easily grasped. It's not much a surprise that doing something 10 times a day is simpler than doing the same thing 1,000 times per second. High throughput environments require much more complex operating environments.
  3. Combinatorial Complexity is less obvious. In one sense, it's clear that a system that does 3 simple, unrelated things is much simpler than doing 300 simple, somewhat related things, but it's often overlooked how quickly the complexity compounds in having large-ish number of simple things interacting with each other.

Twitter and Instagram and the like have a great deal of the first two types of complexity but, relatively speaking, not much of the third. They have relatively small feature sets. The apps a lot of people use at work to do things like handle insurance claims, settle securities trades, or report genetic sequencing results to patients have large feature sets with lots of provisions to handle special cases that mean they have large amounts of the third type of complexity (and depending on the business, sometimes a good deal of the other two, too).

n * (n -1 ) / 2

If there's anything equivalent to Newton's Laws in software development it's that n things have a possible universe of interactions between themselves of n * (n-1) / 2. This means, for instance, that on a team of 4 there are 4*3/2 = 6 different communication pathways between team members.

But on a team of 8 there are 8*7/2 = 28 pathways - more than four times as much.

This scaling dynamic applies not just to team members but to everything: code, teams, environments and processes. The number of services in an environment, number of environments in the deployment pipeline, number of database tables, test data sets, fields on a screen, cron jobs, git repositories, sign-offs... from little things big things grow. More accurately, from simple little things big complicated things grow.

The combinatorial growth in complexity has an impact on productivity. It looks like this:

Productivity is initially low as the development team begins work on features. The team is doing some feature development but also laying the groundwork to automate repetitive work, whether it be writing classes to handle domain objects, selecting and leanring libraries to do things like handle pop-ups and monitor usage, building tools to run automated test suites, deploy builds to test or production environments. This makes sense; one must learn to walk before one can run.

It's worth noting that before any of this begins, system architects and development managers make some high-level assumptions about the types of problems they are being asked to solve and make decisions about languages, frameworks, and methodologies that are best suited to solve these problems. Are noSQL databases like Redis the way to go or is SQL the right thing? RESTful services? Are the UIs complicated enough that a desktop app is needed? If not, which of the many javascript/web frameworks? So many decisions... and they must be made fast, almost immediately.

The build phase reaps the benefit of the prep phase. No longer walking, teams are running, even sprinting. The architectural bets made early-on tend to be good ones at this stage - features are straighforward to implement. Developers are focused on new features without worrying about how those features are going to break older features. The everyday design tradeoffs are clear and un-complicated, everyone agrees. The support burden is light or non-existent; everyone's working on new features. Every thing's pretty easy, it's obvious a lot of stuff is getting done. Everyone goes home satisfied and wakes up stoked to go to work.

Eventually, things start to slow. New features combine with existing features in un-anticipated ways. Other new features invalidate some of architectural assumptions made up-front. Teams grow and split. The number of environments and supported versions grow. The teams now spend more time on support issues and more time re-factoring code. Poorly understood aspects of that awesome tool the arhitects picked now require lots of time to troubleshoot arcane production issues. Design tradeoffs are more complicated, meetings are required to negotiate solutions no one is happy with. Some of the key early guys have left for greener pastures...

Roll forward another couple of years and every little thing is hard. Stagnation has set in. There are more meetings and more lifecycle processes. Rather than "agile" now the common refrain echoing the hallways is "technical debt." The big issues are intractable and there is no executive support to address them, so the developers take refuge in small, incremental things so they can go home with some measure of satisfaction. Processes have accumulated into such an unwieldy mess that sidestepping and short-cutting them is now routine. Deviance has been normalized.

Another thing worth pointing out is that not all teams are created equal. How far an organization can ride the build phase is dependent on the skill of the early architects. Good groups tend to start a little slower but stagnate much later. Less experienced groups start a little quicker by unknowingly cutting corners and it coms back to bite them sooner because they literally have no plan to handle combinatorial complexity.

Stagnation is Success

In many organizations, one of the interesting and difficult factors in dealing with stagnation is that it represents success. The organization achieved what it set out to achieve and is now focused on things no one thought they'd be doing at the outset. It's hard for successful organizations to understand that success now requires new thinking, a new approach.

One common dysfunction is that the people most responsible for that success, the early managers, architects and developers who rode the build phase, have earned a lot of political credibility and clout in the organization.

Generally, these people are exactly the wrong people to tackle the problem of stagnation. The problems inherent in stagnation are almost by definition outside of the scope of their original intents, their original assumptions, and often outside of their skillset. It takes a rare personality to say "Yeah, I did a good job all things considered but we're in uncharted waters now and I'm way out of my depth. I need to convince my boss to recruit to some really expensive people from much bigger companies and cede control to them."

Buying Time

The most common fate for stagnant organizations is that they stay stagnant. But companies often attempt to cheat by starting an "Advanced Project" or R&D group to bypass the morass they've created and fast-track new, important projects. They leverage the high productivity of small teams unemcumbered by the need to integrate their efforts in a coherent whole. When the project is done, it gets tossed on to the stagnant, Enterprise heap and left to the already slow team to do the difficult, slow work of integrating the new project into the set of existing projects.

As you might've guessed, these R&D groups are usually run by exactly the same people whose architectural model led to stagnation in the first place - by people who don't understand coupling and cohesion dynamics at scale.

This two-track development provides an illusion of success for a while. Pretty soon the integration burden becomes too great. The Enterprise team all but locks up, morale plummets, the good people leave, business dissatisfication at seeming incompetence grows.

How to Proceed

In a later post, I'll write about some ways to move forward.