Wednesday, April 8, 2015

Contributing to OpenBSD

I got a new laptop on Friday and gave it a shot with OpenBSD. This post summarizes that attempt and the way I was able to contribute to the project over the weekend. If you want ideas on how to get involved and give something back to the project - read on.




The laptop I got is a Lenovo G50-70 which comes with quite nice guts for a fairly low price. The first thing I did was slapping in my 5.6 CD set in order to obtain a 5.6 release dmesg from the laptop. The installation had only one bump. The re0 ethernet card was unable to obtain a lease from my router, initially I skipped the issue and just continued the installation from the sets provided on the CD.

Before installation I saw a mail on misc@ stating that X doesn't work on this specific machine. I enabled xdm and the same issue manifested itself - black screen. I spent two days hunting through Xenocara & graphics card driver code Finally found a good workaround: get a wife. I plugged in an external monitor as a last attempt to get X working and behold - it just worked. My wife was looking over the shoulder and noticed an outline of my xterm on the laptops primary display. Yes folks, it defaults to a really low brightness in X. Couple of hits on the F12 key (that's the brightness up function key on this laptop) and everything worked as expected. You can see the whole misc@ thread here. I got in touch with the original poster, he was already trying out PC-BSD and was planning to try out NetBSD to go through the driver differences between the systems. He thanked me for saving him time - he had the exact same issue :) I'm really happy that I could solve at least this small problem and it resulted in someone else not being  discouraged from OpenBSD.

I did mention a non working DHCP. The other person with the same laptop stated that his re0 card worked perfectly fine. This gave me some hope for getting my card working. After sending the 5.6 dmesg I obtained an amd64 snapshot USB image and burned it onto an USB stick. Unfortunately yet again the installer didn't receive a proper DHCP lease from my router. Two small tips for anyone in that situation. First you can tell the installer to use the sets from the USB stick. Just enter disk when asked where to obtain sets (the installer hints only about cd / http). Second tip, you can obtain firmware sets for fw_update from the project site as documented in fw_update(1). After booting up you can use the -p flag to point fw_update at your downloaded firmware and have it install the required ones without a working internet connection.

All of that said. I still didn't want to give up. I tried to obtain the lease again by executing sh /etc/netstart and got immediately dumped into ddb with a kernel panic. I reported the problem to the mailing lists. It's quite a long thread. The crash occurred on the MP kernel so the first thing I tried was reproducing it on the SP kernel. Each time I retyped all the debugging information (you can avoid manual typing if you set up a serial console). The problem is narrowed down to executing DHCP twice which results in an immediate panic on this machine. If you followed the thread, you will also notice that I reported the system freezing when re0 is configured either for DHCP or manually with a static IP address. Those happen shortly after a watchdog timeout from the re0 driver. Leaving the card unconfigured results in a stable system but of course no network connection. I did manage to get the card working once with a static IP address but was never able to reproduce that (even using backed up copies of the working configuration). Don't get me wrong. I got a ton of support from developers, the thread isn't complete as some people reply off list. Currently it's suspected t hat something else is corrupting kernel memory but it's only noticed when some things are moving through the network (i.e. by the pool debug checks when creating/freeing mbuf clusters). Pool debug is a known 'poison' value that is written to freed kernel pool memory and checked and next use t o see if something is scribbling where it shouldn't be. This can detect memory use after free & sometimes a machine memory map misses (stuff the machine uses for itself, acpi, smm, or something else). Don't think that I came up with that information on my own :) I paraphrased this from an off list email to give you some scope on the problem. My bad luck was that all of this happened during a porting hackathon - so people that would know more about that specific issue weren't around or were busy with something else.

The weekend was almost over and without a working network connection I had to slap Linux on top of the box to do Koparo work during the working week. I'm not giving up on OpenBSD on this box. I intend to try snapshots regularly on it to see if the installer picks up the DHCP offer. I might also try to reproduce that bug on the bsd.rd (installer) kernel. In case a developer picks up my report for serious debugging - I will just kill Linux on this box and dedicate it for troubleshooting while doing Koparo work on my other machine.

That covers all the work on the laptop. Though that's far from everything I did over the weekend :) If you are a frequent reader of my blog you should now that I also have an i386 machine dedicated for OpenBSD. I was playing around with it in parallel to setting up the new laptop.

The i386 machine is rolling with snapshots so the first thing I did was obtain a new one and upgrade the box. I changed my primary mirror for snapshots from the one I used during installation and didn't ever bother to change it in the system configuration. Because of that during each upgrade the installer defaults to the mirror I no longer use. Fortunately the installer comes with a list of mirrors that you can view by hitting '?' when it asks for the HTTP server. This loads up the file into less and you can then pick a mirror by entering it's number that was displayed in the listing. To my surprise less failed. I reported the problem on the mailing list. This resulted in an immediate reply from a developer with a thank you for spotting the issue. Shortly after a fix was committed so the issue will be gone next time I upgrade my box.

CVSROOT: /cvs
Module name: src
Changes by: rpe@cvs.openbsd.org 2015/04/05 06:37:14

Modified files:
 distrib/miniroot: install.sub 

Log message:
Cope with the removal of less from install media.
Noted by Adam Wolk, thanks.

OK krw@ deraadt@

I'm happy that I was able to detect an issue. That's one of the main reasons I run snapshots. With the issue solved it was time to take a look at the ports I maintain. I started off with a small reminder for a port I submitted a couple of weeks ago. The port was picked up and committed so now you will be able to install the go cover tool with pkg_add gocover.

I'm also the maintainer of otter-browser and before that weekend otter released beta 5. The port in OpenBSD was still beta 4 so it was time to provide an update. I talked with the Otter upstream and again was told that we need to incorporate some hotfixes in the port. This time it was a significant amount of patches. I decided to start with a quick work in progress port update followed by a question to the mailing list on how to handle the additional patches. While waiting for the reply & some more talks with Otter upstream I discovered a problem with the previous port. The GH_COMMIT variable in ports didn't work the way I expected it to so the package didn't contain the upstream fixes we were supposed to incorporate. Since this resulted in an incorrect package I decided to send a documentation patch to tech@ which makes it more clear that the GH_COMMIT variable is preceded by GH_TAGNAME. This submission resulted in Stuart taking a closer look at how both of them work and resulted in quite a big overhaul (well from my perspective) of the ports ecosystem. Followed by a commit to base & my updated documentation patch. There was also a slew of commits to existing ports that used both variables.

CVSROOT: /cvs
Module name: ports
Changes by: sthen@cvs.openbsd.org 2015/04/05 07:32:16

Modified files:
 infrastructure/mk: bsd.port.mk 

Log message:
Make it fatal to specify both GH_TAGNAME and GH_COMMIT, when this happens the
GH_COMMIT is quietly ignored. Problem noted by Adam Wolk, discussed with a few
in the room
CVSROOT: /cvs
Module name: src
Changes by: sthen@cvs.openbsd.org 2015/04/05 07:33:06

Modified files:
 share/man/man5 : bsd.port.mk.5 

Log message:
Don't use GH_COMMIT and GH_TAGNAME together. From Adam Wolk.

Since the issue was solved, with significant help from Stuart I submitted an updated port for Otter beta 5, which was preceded by a fix of the previous port from Stuart himself. All in all you can now pkg_add otter-browser to get the beta 5 and give it a spin :)

If you didn't notice by now. I'm trying to encourage all of you to give OpenBSD a spin. It's not hard to contribute to the project - even in a small way. I do hope that my contributions to the project matter and will result in at least one less problem for someone else in the future.