Sunday, February 12, 2006

Poof! Well almost ...

About 3 months ago I went through a server shuffle because of power problems. Well last night the power problems reared their ugly head! This time it wasn't so bad. I learned many important lessons from the last debacle. Unfortunately I was still in the middle of getting stuff to keep the problems at bay. What I had done was to take advantage of the dnsmasq that is on my Linksys. In case of emergencies it's my secondary DNS (I have the local machines statically define in a file). So last night after we took a power hit (lights on/off/on/off/kinda on/kinda off/on/off) my server's grub stage2 was corrupted (that's twice now). I decided to leave it until this morning and I pulled out the rescue disk to try to repair the damage. Well the rescue disk has nothing to repair grub! So I go to my machine 2 and whip up a grub boot CD (I had to load a bunch of stuff including k3b). I tried to create a grub boot floppy but the floppy drives are messed up (ARGH!). I guess I really don't use them much anymore. If your starting to see a pattern here so am I. I'm not really prepared for a disaster (though I'm getting better at it). One thing after another fails. Each time a disaster occurs I get better at it but I'm still not there. The last part I'm waiting on is the new batteries for my UPS. I have a very old UPS that takes 2 - 33 A/h batteries. It'll probably cost me $100 (US) for the batteries and about $50 (US) for delivery (I'm still trying to find them locally). They're on next months budget (really I've been working on that since January). I've also planned to add a micro-controller board to monitor the health of the batteries and AC. Either one of the ZX-24, BX-24 or one of the small controller boards I have and I have quite a few to choose from). This will allow me to figure out a bunch of stuff, including the quality of the AC we have. I have a suspicion that the power company isn't meeting it's requirements. I'll be measuring AC voltage (monitor 100 - 130V), DC voltage (10 - 15V?), AC frequency, AC present, inside/outside temperatures and fan speed. So I'll need a clock chip some analog ports and digital ports. I have a bunch of boards with serial ports and a couple with Ethernet ports (this may be a job for the Rabbit or the EZ-80, hmm).

Notes: Here are some of my notes from this morning for grub boot problems recovery:

Grub boot floppy

Under Linux:

  • cat stage1 stage2 > boot
  • cat boot >/dev/fd0

Under DOS:

  • copy /b stage1 + stage2 boot
  • rawrite boot a:

Grub boot CD

Under Linux:

  • mkdir -p iso/boot/grub
  • cp /usr/share/grub/i386-redhat/stage2_eltorito iso/boot/grub
  • mkisofs -R -b boot/grub/stage2_eltorito -no-emul-boot -boot-load-size 4 -boot-info-table -o grub.iso iso
  • cdrecord -v dev=1,0,0 grub.iso

The locations of your stage file may differ but you get the jist of the commands. I wasn't able to get the cdrecord to actually work so I used k3b (very nice burner!). Sorry I don't know how to burn the software for Windows.

Boot from the grub prompt

I really like grub because it has saved my butt a number of times when something has gone wrong in the boot up process. Such as hosed up grub.conf parameters messed up boot sectors etc. ...

At the Grub prompt:

  • root (hd0,0)
  • kernel /vmlinuz-2.6.14-prep ro root=LABEL=/
  • initrd /initrd-2.6.14-prep.img

The names of the kernel and initrd may differ and the hd0,0 represents my /dev/hda1 (that's where I put the boot directory). My partitions are /boot (hda1), / (hda2 and swap (hda3). Yours may differ.

And the every helpful GRUB Tutorial

In case you're wondering why I posted this, it's because I just wanted the information kept somewhere other than on my machine or in my home. Yes I wrote this down but I like to use the internet as a backup. Just in case.

So my next project is to take one of my other computers and move Mr. House onto it. I think I'll use a flash card and no swap. I'll have to figure out what to do with the data it creates. I don't want it to write to flash. Meanwhile the batteries are on their way. Hopefully I'll figure out how to lower my total energy costs for my home in the process as I hear NJ is about to see a huge jump in energy costs. The energy companies are paying 50% more for their fuel and that snow ball in coming this way. Yikes!


At 2/17/2006 9:03 AM, Blogger steve said...

I like Recovery Is Possible as a rescue image - it comes as a 30 meg ISO, but there's an option to dump it into a USB key, too. SuperRescue is good for rescuing really screwed up systems (especially ones with software raid) but it's getting a bit long in the tooth now.

At 2/17/2006 10:19 AM, Blogger Neil Cherry said...

Thanks, I'll check that out. I like the grub iso that RIP has, it gives me the first line of repair, being able to boot to the original drive to see how damaged it is.

At 2/20/2006 11:42 AM, Blogger steve said...

Coincidentally, I got to use RIP this morning to copy a remote server to a local server. I used "sfdisk /dev/hda > /floppy/hda.out" on the remote server to get the partition table image, then did "sfdisk /dev/hda < /floppy/hda.out" on the client. Once I had the partitions created (and formatted), I mounted them on /mnt (/mnt/usr. /mnt/var, etc) and used the following commands:

( ssh -C root@remoteserver 'cd / && tar -cf - `ls ./ | grep -v proc | grep -v sys` ) | ( cd /mnt && tar -xvf -) ; mkdir /mnt/proc /mnt/sys ; grub-install /dev/hda

45 minutes later (for various reasons, this was using a Cisco switch locked at 10 meg half duplex) I had the new server ready to go, and a very relieved client :-)

The only real trickery is using the backticks for ls in the remote tar - /proc and /sys aren't real filesystems so must be excluded. Compression probably wouldn't help on anything faster than 100 meg, too.

At 3/24/2006 11:17 AM, Blogger Neil Cherry said...

Just an update on this subject. Looks like the trouble maker may have been 3 bad Western Digital drive (all 40G). They all behave the same way, run for a while and then the system locks up. I know this because I've upgraded from FC2/3 (Fedora Core 2/3, quasie-mess) to FC 4 (where the lockups began) to FC5. When I put the 2G drive with Gentoo the system runs stable. Also the problem follows the drives. Right now I'm backing everything up and moving the data over to FC5. So far I'm not liking FC5 as much as I did FC4. I am really shocked at how little Gentoo took up space (ran on a 2 G disk with a Gig spare, running jabber, asterisk and Misterhouse). FC5 loaded at 15G and I'm still fooling with installing packages (I think everything is loaded but I still have to install with a software installer, really annoying!).


Post a Comment

<< Home