Difference between revisions of "Upgrade to Dapper"

From Finninday
Jump to: navigation, search
(After about 8 hours of running, memory was exhausted, including swap, and it started to thrash to death)
(After about 8 hours of running, memory was exhausted, including swap, and it started to thrash to death)
Line 21: Line 21:
  
 
==After about 8 hours of running, memory was exhausted, including swap, and it started to thrash to death==
 
==After about 8 hours of running, memory was exhausted, including swap, and it started to thrash to death==
: I haven't figured this out yet, but I'm somewhat relieved that I don't have to tune mysql, which seemed to be the culprit for a moment, until I thought to look at a graph of memory usage.  [img:yucky-memory.png] I'm now suspecting a bug in logwatch which seems to be spawning a dozen perl processes which consume all CPU immediately, and all memory over time.  I happened to look up at top at just the right moment to see all the processes turn into perl right after logwatch popped in and out.  We'll see if there really is a correlation.  The perl processes are finishing up and releasing some memory, but not before eating up some swap that is not getting released.  Load average is coming down from 5, but some script is using cat to consume 15% of CPU now...
+
: I haven't figured this out yet, but I'm somewhat relieved that I don't have to tune mysql, which seemed to be the culprit for a moment, until I thought to look at a graph of memory usage.  [[image:yucky-memory.png|right]] I'm now suspecting a bug in logwatch which seems to be spawning a dozen perl processes which consume all CPU immediately, and all memory over time.  I happened to look up at top at just the right moment to see all the processes turn into perl right after logwatch popped in and out.  We'll see if there really is a correlation.  The perl processes are finishing up and releasing some memory, but not before eating up some swap that is not getting released.  Load average is coming down from 5, but some script is using cat to consume 15% of CPU now...
 
Perl hit 100% cpu usage and I decided I had to get serious.  pstree -ap told me that the naughty process was started by logwatch which was started by cron and that it was trying to grok my mail logs, I guess.  But I don't need to have my computer read through my logs that badly.  I chmodded the logwatch cron entry in cron.daily to 000 and tried to kill the currently running mess of perl jobs.  I made a mess of it and ended up rebooting, but I don't feel too bad, because I wanted that swap back anyway.
 
Perl hit 100% cpu usage and I decided I had to get serious.  pstree -ap told me that the naughty process was started by logwatch which was started by cron and that it was trying to grok my mail logs, I guess.  But I don't need to have my computer read through my logs that badly.  I chmodded the logwatch cron entry in cron.daily to 000 and tried to kill the currently running mess of perl jobs.  I made a mess of it and ended up rebooting, but I don't feel too bad, because I wanted that swap back anyway.
  
 
==X failed to start==
 
==X failed to start==
 
: I managed to make it work by switching to the nv driver instead of the nvidia driver.  Probably need to recompile that driver for the current kernel.  Sigh.  I think I remember that dance.  Maybe I'll stick with the slower and less fragile open source driver since there is no compelling reason to do otherwise.
 
: I managed to make it work by switching to the nv driver instead of the nvidia driver.  Probably need to recompile that driver for the current kernel.  Sigh.  I think I remember that dance.  Maybe I'll stick with the slower and less fragile open source driver since there is no compelling reason to do otherwise.

Revision as of 06:16, 3 June 2006

I recently upgraded Ubuntu Breezy to Ubuntu Dapper via a apt-get dist-upgrade. It was my first time using this method--all other upgrades have been clean installs from freshly-burned ISOs. The dry-run looked like it was going to probably work, so I just fired it up. It ended up being a lot faster than downloading the ISOs and installing from them. But there were problems.

Many of the problems I had could have been prevented by being more conservative about the new configurations I was presented with. When there were conflicting changes to config files I could review the diffs and decide if I wanted to keep my current custom version or the new developers version. I only picked the current custom version a few times, when I could see changes that I remembered making.

The local machines couldn't see through the firewall anymore

After I puzzled about this trying to determine which machines could see which and whether DNS was failing or DHCP, I got an inspiration. I checked
cat /proc/sys/net/ipv4/ip_forward

and found that it was set to 0. I echoed a 1 in there and fixed that with no more hassle. But wait, there's more... of course this needs to go into a script somewhere for each boot. /etc/rc.local, is it? I hope so.

Mediawiki started acting odd

It seemed to be torn between the packaged version of 1.4.14 and the version it had been running before the "upgrade" which was 1.5.3. I didn't find an immediate fix, so I upgraded to 1.6.6. That still didn't work. It was slow and gave an SQL error after making a change to a page. I dug enough to find that the problem was a crashed table (mw_searchindex), and no amount of mysqlcheck or myisamchk repairing could make it better. Well, they said they fixed it, but an immediate check of the table after fixing, showed it was crashed again. There was a simple fix in the mediawiki/maintenance directory: I just had to run rebuildtextindex.php. That dropped the old table, and rebuild a good version. It is still dog slow, but appears to be free of corruption.

Mail stopped flowing

This was a saslauthd problem. I didn't notice when the upgrade slipped in a change to the location of saslauthd. I like it to be in /var/run instead of in the postfix chroot jail. I had to put my edits back in:
/etc/default/saslauthd
/etc/init.d/saslauthd

and then restart the various pieces.

Mail still wasn't flowing

Did I forget to start the virus checker? No, but I didn't look too carefully when it said it refused to start because an old config file was in the way. In this case, I had wisely decided that I had important customizations that I didn't want to lose, and the amavis daemon said, OK, but you can keep your precious custom file, but I refuse to start until you delete it. It wasn't enough to rename it with .old on the end. It could still see it. So I put .old on the front too. Ha! Stupid amavis, can't see it now, haha.

After about 8 hours of running, memory was exhausted, including swap, and it started to thrash to death

I haven't figured this out yet, but I'm somewhat relieved that I don't have to tune mysql, which seemed to be the culprit for a moment, until I thought to look at a graph of memory usage.
Yucky-memory.png
I'm now suspecting a bug in logwatch which seems to be spawning a dozen perl processes which consume all CPU immediately, and all memory over time. I happened to look up at top at just the right moment to see all the processes turn into perl right after logwatch popped in and out. We'll see if there really is a correlation. The perl processes are finishing up and releasing some memory, but not before eating up some swap that is not getting released. Load average is coming down from 5, but some script is using cat to consume 15% of CPU now...

Perl hit 100% cpu usage and I decided I had to get serious. pstree -ap told me that the naughty process was started by logwatch which was started by cron and that it was trying to grok my mail logs, I guess. But I don't need to have my computer read through my logs that badly. I chmodded the logwatch cron entry in cron.daily to 000 and tried to kill the currently running mess of perl jobs. I made a mess of it and ended up rebooting, but I don't feel too bad, because I wanted that swap back anyway.

X failed to start

I managed to make it work by switching to the nv driver instead of the nvidia driver. Probably need to recompile that driver for the current kernel. Sigh. I think I remember that dance. Maybe I'll stick with the slower and less fragile open source driver since there is no compelling reason to do otherwise.