[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Testbed-admins] Generic Boot Image Weirdness



Summary:

I'm unable to boot generic FreeBSD6.2 on most (but not all!) of my testbed machines. The /usr partition is shown as corrupt at boot time:

/dev/ad0s1f: /dev/ad0s1f: BAD SUPERBLOCK VALUE: VALUES IN SUPERBLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE

More Details:

I customized generic BSD6.2 and Fedora Core 6 images as described in the install instructions, making the default combined-image file FBSD62+FC6-GENERIC.ndz.

When I nfree my nodes out of hwdown and boot them, they come up in frisbee and download/install the combined generic image onto their hard drive, reboot and go to the prompt where they wait for further boot instructions. So far so good.

The image itself is apparently good, because on ONE of my machines I can enter Ctrl-C to go into interactive mode and boot "part:1" (the BSD6.2 image) successfully.

Unfortunately, on the other 8 machines I have tried so far, this same procedure leads to the following:
-----
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 25786 free (1138 frags, 3081 blocks, 1.8% fragmentation)
/dev/ad0s1f: /dev/ad0s1f: BAD SUPERBLOCK VALUE: VALUES IN SUPERBLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
/dev/ad0s1f: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
/dev/ad0s1f: CANNOT WRITE BLK: 12000
/dev/ad0s1f: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
/dev/ad0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1e: clean, 120395 free (251 frags, 15018 blocks, 0.2% fragmentation)
THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
	ufs: /dev/ad0s1f (/usr)
Automatic file system check failed; help!
Jun 24 16:26:18 init: /bin/sh on /etc/rc terminated abnormally, going to single user mode
Enter full pathname of shell or RETURN for /bin/sh:
-----

So apparently the image is somehow getting corrupted on almost all my machines during the frisbee download process or during the later "part:1" boot?

The machines are all identical (I think a few may have a slightly different BIOS revision). I disassembled two machines (the one that works and one that doesn't) and compared the hard drives. Identical part numbers and firmware revs. I swapped the drives between the machines and the problem followed the drive, not the chassis.

Strangest of all, I used nalloc/nfree to force a re-install of the image by frisbee on these two machines, and the problem STILL followed the hard drive!

WTF? Is it possible that there is some residue on the drives that's causing me trouble? Do I need to wipe the drives before using them?

Any suggestions? I'm at a loss...again.