Saturday, January 3, 2015

Code rot & OpenBSD


I've been often asked by friends why I'm diving into OpenBSD. This post is meant as a single place I can send people to in order to explain to them when, how & why all of this happend.

At my previous job I worked with a 2 mln LoC code base for a core banking system. This was a huge project developed by a large development team (~60 devs), with constant change as new features were ordered by clients and a steady flow of issues to fix. Both required delivery 'by yesterday' leaving not much time for clean-ups.

One of my friends nicely described the process during an ugly hot-fix as 'powdering the corpse'.



There was literally no time to slow down, look at the big picture and clean up the accumulated cruft. This did change towards the better during my last year there but for 7 years rapid change was the norm.

Imagine working with a code-base, that's layer upon layer of quick fixes. Imagine being woken up at 3 am to diagnose & resolve an issue with it. It was workable, but also felt like being trapped in a pressure cooker. I gave my 3 month leave notice to my employer at the end of March 2014.

Things changed radically on that day. My regular tasks were:
  • team management (I was a team leader)
  • fixing bugs in the code
  • implementing features
After giving my notice I was generally given a free hand with one general direction:
  • Pass down knowledge
So I did my best.

With a system that large, no single person knew the whole thing in and out. Like everyone else I had my 'area' of expertise with the code base. What I started was a slow code review of each functionality that I was part of implementing for
the past 7 years. Documenting:
  • what we knew while implementing it
  • what we found out after shipping it
  • the actual state of the code
The results were more then horrifying:
  • I found functionality broken by subsequent fixes
  • fixes that were no longer relevant (dead code)
  • duplicated code based on old functionality that didn't have recent fixes applied to them
  • actual issues caused by refactorization in yet undeployed code, that would be catastrophic on deployment
  • potential changes that would result in 15-25% performance increases in critical tasks
After those revelations I did three things:
  • obtained a green card for code clean-ups/modifications from upper management
  • raised alerts on code that would break on deployments
  • passed down knowledge to my team mates on the areas I had any insight for
The last 3 months at that job were the best ones I had in seven years. I removed code daily, to the point that I started to believe that half the code could be removed from the system while still keeping the same functionality. I remember joking in a room to friends about removing a 300-500 LoC function and mentioning the '50% code is not needed' line - wondering if I ever find a 'module' that could be completely dropped. To which one of my friends replied that he knows at least one. So I took a look. That was the day I removed 60k lines of dead code in a single go. Fun fact? That code received regular bug fixes - just in case.

During that time, heartbleed happened.

Heart Bleed

The company wasn't really affected by heartbleed. That was more personal. I am a regular *nix user since the 90s and really kept following the news. Article after article demonizing the OpenSSL code base, counting how much resources are needed to plumb it into shape, how the original maintainers - let's not go there. Let's say 'didn't do a great job'.

Talk is cheap. The news quickly started to get boring. Then LibreSSL happened. I don't remember how I first found the CVS commit for the start of the project. Though you can find my G+ post on it which perfectly sums up how I felt back then:

14 Apr 2014
So instead of bike shedding like the rest of the internet is doing on the state of OpenSSL.
The OpenBSD team silently started putting the beast into shape - http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libssl/src/ssl/

I really felt connected with that. They were ripping through thousands of lines of legacy code and ripping it out - same thing as I did daily for my last 3 months.

During the time I was leaving, the company started gathering metrics about the codebase with Sonar. After I left one of my ex-co-wokers said that there was a moment when the management wondered why the codebase shrunk by 200k LoC. It was summed up as 'Adam probably removed them'.

Koparo

I have more leeway at Koparo. Even though the codebase is much smaller it still accumulates cruft that needs to be pruned regularly to allow a more flexible path forward. There are of course periods of time where we move forward on such a fast pace that some solutions aren't as good as they should - the difference is that we go back and kill off all the fallout.

Go back and look at the kind of issues a simple code review required to prepare a presentation revealed. That stumps me even to this day. We do code reviews at Koparo for each change - they are just too good to pass off even during the
high churn times.

OpenBSD

The background is set, you know why I took interest now it's time to tell what solidified the decision.

Each change made to the OpenBSD codebase undergoes a code review before it's committed to the tree. If you managed to get that far in this article you perfectly know why I deem that of upmost importance.

No matter how many features your software has, if your documentation is sub-par then your software is most likely useless. The quality of OpenBSD documentation is on a level I didn't expect even after reading about it in so many places. Not only will you get information on how something works, what's possible and how to use it. It will also tell you the best practices for using the tool and warn you about the common pitfalls.

Release cycles. Remember those bugs I found that would break the next deployment? Time to ship at my previous job was counted in months before code hit the machines of our clients. In OpenBSD you are encouraged to run current. The whole team tries it's best to make it as stable as it can. You know why? They eat their own dog food. That's so simple yet so amazing that it blows my mind. Developers actually run OpenBSD on their machines daily. Development isn't done on virtual machines on a Macbook Pro. If current fails then you can be guaranteed that a solid userbase of people able to fix it will also have a large interest in doing so. They also ship like clockwork - this speaks miles of the quality this project outputs. You can count on a OpenBSD release being here every 6 months. Go ahead, sync your clock against it.

Which nicely brings us to the next point. Stuff is compiled on actual hardware. i386? amd64? VAX? you can be sure to hit obscure problems on real hardware - not so much on virtualized systems. Personally I am glad that software running on my MSI Wind i386 machine was actually compiled on a i386 machine not on some virtualized host.

Do it right or don't do it at all. This shows up in OpenBSD development a lot. Yes it's not nice that the Realtek 8187SE wifi chip in this laptop is not supported by OpenBSD. Though it's also not funny that this driver is of such low quality that even the Linux kernel will only take it into the staging area. Distributions that somewhat worked with this card now plainly refuse to do so without manual intervention. Whose fault is it? Mine. I bought this laptop for my wife as it came with SUSE SLED preinstalled so I assumed it will have no hardware issues - boy was I wrong. If Opera didn't take down my blog I would gladly link you to the whole story (I have a backup, hit me up on email or comment below if you're curious). Nowadays I shop for hardware with a bootable OpenBSD thumbstick. There's no better insurance for hardware support than booting up the installer, dropping to the shell and checking dmesg for not configured devices. If OpenBSD states that something is configured then it works and will remain working flawlessly or will only get better over time.

There is so much more to it. That I don't even know if I should keep going. Exploit mitigation, security research & modern core tool reimplementations to name the few things that OpenBSD is doing on a regular basis. Funny, that the only thing that didn't really change since my early *nix days is the usage of OpenSSH. Regardless of platform. If that doesn't speak to you then I don't know what else to name.

Getting Started

I described this once in an email to BSDNow. Here is a small excerpt:
The approach I took is as follows:
  1.  I subscribed to OpenBSD mailing lists (cvs, bugs, misc, tech) - I'm learning tons from just following the discussions
  2.  I started running OpenBSD in a qemu instance on my current box - I intend to follow current on it
  3.  I'm learning the system by usage in qemu and following changes from recent commits on the tree
Those were my baby steps. Since then I also installed OpenBSD on bare metal. Thanks to my lovely wife who donated her MSI Wind for this purpose. So I ordered a CD set. I did not need one. The MSI Wind doesn't even have a CD-ROM, the order was plainly to support the project that benefited me for such a long time.

Here are some things I found since starting my journey:
  1. OpenBSD sources on the hard drive
  2. following the lists
  3. qt5 port
  4. gdb

 

Lesson 1 sources on the hdd

This might sound simple but in a long time I didn't feel so connected to my OS. Having the sources for every piece of software I use around made things really different. How? I'm actually looking at them. I used Linux for a really long time but rarely took the time to dive down into what happens when a specific thing occurred in my OS, desktop, software etc. Not the case with OpenBSD - I'll gladly drop into the debugger and see what really is going on.

 

Lesson 2 following the lists

Mailing lists are verbose, especially if you decide to subscribe to as many as I did. Honestly though I don't regret it. There's so much gold hidden there that even going through 5-10 emails per day often leads me into areas that I would not otherwise hit. It expands my mind and I'm really happy about it.

 

Lesson 3 qt5 port

I managed to contribute. Not in a big way. I really want to port otter-browser to OpenBSD so I started to work on it. While doing that I hit some bugs in the Qt5 port in OpenBSD and reported it upstream which resulted in at least one patch due to my reporting.

Pause here for a moment. OpenBSD has a reputation of harsh mailing lists. Show me one more system when you can reach core developers, maintainers etc. - get them to answer your questions and even act on your reports. I can take a lot of beating for that privilege.

I learned more in a couple of months on the OpenBSD mailing lists then in the past year of self passive research.

 

Lesson 4 gdb

Funny to plug in gdb here. I never had a need to learn it really well but now I want to. Having a large established code base to actually debug and experienced people to ask around really helps. During the last three weeks I learned more about gdb then in the last decade of Linux usage.

 

Finally

The OpenBSD Foundation is finishing up the fund raiser for 2014. Should you donate? Did you use OpenSSH today? You should.

In summary I'm learning more then ever - computing is fun again.