Monday, July 23, 2007

Still... haven't started my engine..

For the past 2 weeks, I've done nothing but trained myself to familiarize with new surroundings, new tasks that I'm gonna be in deep shit later, and finding the right food(s) to eat when it is time to eat.

I would say, I enjoy the every bit of that moment, saving it up for the future.

Well, today, I started to get bored. So I took up some problem(s) and put it in my basket.

I'm not much of a writer, so if you found my writing sucks, put your socks in your mouth.

One interesting problem that I found is on system running UNITED Linux on IBM BladeCenter. Apprarently, some weird error(s) were generated that makes us engineers annoyed.

It all started when one lucky engineer stumbled upon syslog message that looks like this:
Jul 13 14:26:34 abc123 kernel: sdb : READ CAPACITY failed.
Jul 13 14:26:34 abc123 kernel: sdb : status = 1, message = 00, host = 0, driver = 08
Jul 13 14:26:34 abc123 kernel: Current sd00:00: sense key Not Ready
Jul 13 14:26:34 abc123 kernel: Additional sense indicates Medium not present
Jul 13 14:26:34 abc123 kernel: sdb : block size assumed to be 512 bytes, disk size 1GB.
Jul 13 14:26:34 abc123 kernel: sdb: Write Protect is on
Jul 13 14:26:34 abc123 kernel: sdb: I/O error: dev 08:10, sector 0
Jul 13 14:26:34 abc123 kernel: I/O error: dev 08:10, sector 0
Jul 13 14:26:34 abc123 kernel: I/O error: dev 08:10, sector 2097144
Jul 13 14:26:34 abc123 kernel: I/O error: dev 08:10, sector 2097144
Jul 13 14:26:34 abc123 kernel: I/O error: dev 08:10, sector 0
Jul 13 14:26:34 abc123 kernel: I/O error: dev 08:10, sector 0
Jul 13 14:26:34 abc123 kernel: unable to read partition table
Jul 13 14:26:35 abc123 kernel: I/O error: dev 08:10, sector 0
Jul 13 14:26:35 abc123 kernel: sdb : READ CAPACITY failed.
Jul 13 14:26:35 abc123 kernel: sdb : status = 1, message = 00, host = 0, driver = 08
Jul 13 14:26:35 abc123 kernel: Current sd00:00: sense key Not Ready
Jul 13 14:26:35 abc123 kernel: Additional sense indicates Medium not present
Jul 13 14:26:35 abc123 kernel: sdb : block size assumed to be 512 bytes, disk size 1GB.
Jul 13 14:26:35 abc123 kernel: sdb: Write Protect is on


At first I thought the harddisk went into the toilet already, but, is there any harddisk that can enable Write Protect? When I checked the hardware dbase, the /dev/sdb is *NOT* a harddisk. So I started to dig deeper and found out that /dev/sdb is a device from TEAC with model no. FD-05PUB. What the f..? *FD*? FD == Floppy Disk? The right place to ask is here. So I Google up and found this one interesting PDF document.

In that document, under section 4.5 "Issues related to Red Hat AS 2.1/3.0 and Blades" stated that:
[quote]
When a blade boots and the media tray is assigned to during boot up, the floppy drive in the media tray is assigned as (in this example) /dev/sdc. However, since there is no diskette in the drive, we see the following errors (beginning at READ CAPACITY FAILED):
Nov 5 09:44:30 spencerooni kernel: hub.c: new USB device 00:0f.2-1.1, assigned address 3
Nov 5 09:44:30 spencerooni kernel: usb.c: USB device 3 (vend/prod 0x644/0x0) is not claimed by any active driver.
Nov 5 09:44:30 spencerooni kernel: hub.c: new USB device 00:0f.2-1.3, assigned address 4
Nov 5 09:44:30 spencerooni kernel: usb.c: USB device 4 (vend/prod 0x5ab/0x30) is not claimed by any active driver.
Nov 5 09:44:30 spencerooni kernel: Initializing USB Mass Storage driver...
Nov 5 09:44:30 spencerooni kernel: usb.c: registered new driver usb-storage
Nov 5 09:44:30 spencerooni kernel: scsi1 : SCSI emulation for USB Mass Storage devices
Nov 5 09:44:30 spencerooni kernel: Vendor: TEAC Model: FD-05PUB Rev: 2000
Nov 5 09:44:30 spencerooni kernel: Type: Direct-Access ANSI SCSI revision: 02
Nov 5 09:44:30 spencerooni kernel: Attached scsi removable disk sdc at scsi1, channel 0, id 0, lun 0
Nov 5 09:44:30 spencerooni kernel: sdc : READ CAPACITY failed.
Nov 5 09:44:30 spencerooni kernel: sdc : status = 1, message = 00, host = 0, driver = 08
Nov 5 09:44:30 spencerooni kernel: Current sd00:00: sense key Not Ready
Nov 5 09:44:30 spencerooni kernel: Additional sense indicates Medium not present
Nov 5 09:44:30 spencerooni kernel: sdc : block size assumed to be 512 bytes, disk size 1GB.
Nov 5 09:44:30 spencerooni kernel: sdc: I/O error: dev 08:20, sector 0
Nov 5 09:44:30 spencerooni kernel: I/O error: dev 08:20, sector 0
Nov 5 09:44:30 spencerooni kernel: unable to read partition table


Now, that wouldn't be the end of the world, necessarily, However, ANYTIME the KVM is switched on the Bladecenter to/from that blade, the USB "stuff" (KVM and media tray) get re-hotplugged, for lack of a better term, and the sdc errors are generated. They occur even after switch the media tray owner to a different blade. The result is too many spurious errors which is making the internal support team cranky.

Modification 1: Do not boot blade with media tray assigned in order to avoid errors when someone takes the media tray away AND when the floppy drive is empty. NOTE: "alias floppy off" should be added to modules.conf to avoid floppy drive errors when media tray not assigned on boot.

One final note on this item: currently the media tray is required for installing Linux on the blades, but should only be needed in rare circumstances after that. Additionally, since there appear to be several combinations of events that will cause the EFI shell boot option to be restored in the boot sequence on a blade (see item 4), it is recommended that boot sequence be confirmed when an HS40 is rebooted until the firmware fix is released. This can be done by simply confirming that the
HS40 boots from the HBA during a restart. So these items should be included as required steps for building a Linux blade.

Modification 2: Leave read-only floppy diskette in the drive. This prevents the "Medium not present" errors from occurring. This is not a complete solution but is a good start. See #3 for more.

Modification 3: When media tray is assigned to a blade, it will automatically hotplug itself and the cdrom and floppy drive will be available for use. When use
media tray should be unassigned from the blade and/or the usb-storage module should be removed. The command for this is "rmmod usb-storage".

This step should also be performed when the IBM Remote Disk is used. The IBM Remote Disk is the feature where an image, or even the local floppy or CDROM drive can be mounted on the blade from the local machine.

I have not seen anything to indicate that a Red Hat kernel fix can be completed to prevent these error messages from being generated.
[/quote]


Pfft.

-- Updates on 27 July 2007
The modification 1 didn't work. Didn't have nerve to do "rmmod usb-storage", but it will be my last resort. Currently opting to disable the drive in the BIOS. Heh.

-- Updates on 30 July 2007
Gave advice to ignore the error message since it didn't effect the system performance. Kapisch?

2 comments:

yoe said...

Kat situ, leh main pilih2 problem keh? mm .. nampak cam best jek

kapla.hodot said...

aku tak start lagi real operation.. so skang main pilih2 jer la hehe..