Upgrade to Dapper
I recently upgraded Ubuntu Breezy to Ubuntu Dapper via a apt-get dist-upgrade. It was my first time using this method--all other upgrades have been clean installs from freshly-burned ISOs. The dry-run looked like it was going to probably work, so I just fired it up. It ended up being a lot faster than downloading the ISOs and installing from them. But there were problems.
Many of the problems I had could have been prevented by being more conservative about the new configurations I was presented with. When there were conflicting changes to config files I could review the diffs and decide if I wanted to keep my current custom version or the new developers version. I only picked the current custom version a few times, when I could see changes that I remembered making.
The local machines couldn't see through the firewall anymore[edit]
- After I puzzled about this trying to determine which machines could see which and whether DNS was failing or DHCP, I got an inspiration. I checked
cat /proc/sys/net/ipv4/ip_forward
and found that it was set to 0. I echoed a 1 in there and fixed that with no more hassle. But wait, there's more... of course this needs to go into a script somewhere for each boot. /etc/rc.local, is it? I hope so.
Mediawiki started acting odd[edit]
- It seemed to be torn between the packaged version of 1.4.14 and the version it had been running before the "upgrade" which was 1.5.3. I didn't find an immediate fix, so I upgraded to 1.6.6. That still didn't work. It was slow and gave an SQL error after making a change to a page. I dug enough to find that the problem was a crashed table (mw_searchindex), and no amount of mysqlcheck or myisamchk repairing could make it better. Well, they said they fixed it, but an immediate check of the table after fixing, showed it was crashed again. There was a simple fix in the mediawiki/maintenance directory: I just had to run rebuildtextindex.php. That dropped the old table, and rebuild a good version. It is still dog slow, it takes 4 seconds to render a small page (see memory exhaustion below), but appears to be free of corruption.
Mail stopped flowing[edit]
- This was a saslauthd problem. I didn't notice when the upgrade slipped in a change to the location of saslauthd. I like it to be in /var/run instead of in the postfix chroot jail. I had to put my edits back in:
/etc/default/saslauthd /etc/init.d/saslauthd
and then restart the various pieces.
Mail still wasn't flowing[edit]
- Did I forget to start the virus checker? No, but I didn't look too carefully when it said it refused to start because an old config file was in the way. In this case, I had wisely decided that I had important customizations that I didn't want to lose, and the amavis daemon said, OK, but you can keep your precious custom file, but I refuse to start until you delete it. It wasn't enough to rename it with .old on the end. It could still see it. So I put .old on the front too. Ha! Stupid amavis, can't see it now, haha.
After about 8 hours of running, memory was exhausted, including swap, and it started to thrash to death[edit]
- I haven't figured this out yet, but I'm somewhat relieved that I don't have to tune mysql, which seemed to be the culprit for a moment, until I thought to look at a graph of memory usage. Clearly some pathological processes are at work when all the lines spike upward at once. Even that green line (swap) which should never go above zero. I'm now suspecting a bug in logwatch which seems to be spawning a dozen perl processes which consume all CPU immediately, and all memory over time. I happened to look up at top at just the right moment to see all the processes turn into perl right after logwatch popped in and out. We'll see if there really is a correlation. The perl processes are finishing up and releasing some memory, but not before eating up some swap that is not getting released. Load average is coming down from 5, but some script is using cat to consume 15% of CPU now...
Perl hit 100% cpu usage and I decided I had to get serious. pstree -ap told me that the naughty process was started by logwatch which was started by cron and that it was trying to grok my mail logs, I guess. But I don't need to have my computer read through my logs that badly. I chmodded the logwatch cron entry in cron.daily to 000 and tried to kill the currently running mess of perl jobs. I made a mess of it and ended up rebooting, but I don't feel too bad, because I wanted that swap back anyway.
X failed to start[edit]
- I managed to make it work by switching to the nv driver instead of the nvidia driver. Probably need to recompile that driver for the current kernel. Sigh. I think I remember that dance. Maybe I'll stick with the slower and less fragile open source driver since there is no compelling reason to do otherwise.
saslauthd still not happy[edit]
- At some point postfix forgot how to talk to saslauthd. I can verify that the daemon is running, and I can test it with "testsaslauthd -u user -p password" but when I watch "tail -f /var/log/mail.info" I see that postfix clearly is complaining about not being able to connect to saslauthd: no such file or directory. Doesn't it know it is looking for a daemon? How do you tell postfix HOW to contact saslauthd? I thought I always just told it something like "sasl = yes" and that was that.
I had to rebuild sasldb. The new sasldblistusers2 command said it couldn't read the old sasldb that I had been using. Now I'm not sure what mechanism name to use to indicate saslauthd. I have two places to specify this: /etc/postfix/sasl/smtpd.conf and /etc/default/saslauthd Both files need a list of available mechanisms. So far I've tried keeping them in sync... heaven help me if they need different values.
1) sasldb: allows saslauthd to start, postfix can talk to it, but says unknown password verifier
2) saslauthd: prevents saslauthd from starting
3) getpwent: allows saslauthd to start, cannot connect to saslauthd
OK, maybe I need different values in those two files.
/etc/default/saslauthd | /etc/postfix/sasl/smtpd.conf
sasldb | sasldb : unknown password verifier
sasldb | pam : unknown password verifier
pam | pam : unknown password verifier
I finally found my Linux Cookbook and it says to use "pam" as the mechanism. But don't I then need some pam plugin somewhere? I've only used pam enough to fear it. Maybe I ignored that part when I was using the Linux cookbook last time...
cups no longer working[edit]
This was originally a problem caused by a corrupted package during the download. It took a few clearings of the cache and a few de-install/re-install cycles before the package error was eliminated. But now I have to get the printer to work like it used to. The cups web admin page got it to work locally, but there is still a glitch from the point of view of the samba clients. I either get the error "Printing failed when completing the page" or a silent failure.
Fix: The trick was to delete and re-install the printers on each client.