Solaris 08/07 – fmd broken on T2000

I recently installed Solaris 08/07 on two T2000 machines and was extremely surprised to find a serious bug with the fmd (Fault Management Daemon) service.

The service would, seemingly at random, fail to start on boot. It wouldn’t actually fail though- it just never finished starting. This caused numerous side effects including that prtdiag, fmdump, and other fault/diagnostic utilities would not work properly. It also seemed to cause problems moving between init levels.

You may have been bitten by this bug if you see some of the following:

bash-3.00# fmadm  faulty
fmadm: failed to connect to fmd: RPC: Program not registered
bash-3.00# prtdiag -v
picl_initialize failed: Daemon not responding
bash-3.00# svcs -xv
svc:/system/fmd:default (Solaris Fault Manager)
State: offline since Mon Oct 08 15:35:25 2007
Reason: Start method is running.
See: http://sun.com/msg/SMF-8000-C4
See: man -M /usr/share/man -s 1M fmd
See: /var/svc/log/system-fmd:default.log
Impact: This service is not running.

This last output from svcs -xv might be normal if it doesn’t stay the same indefinitely. The Start method is running. should finish and the service should go online but if it stays in this state forever- you get the idea.

The next message may or may not be connected. I noticed it several times on boot in conjunction with the fmd failure to start. On the other hand, since the fmd failure caused problems with init levels, I had to sync the system from the ok prompt in order to power off the machine and this message might have been connected to the kernel panic from the previous shutdown.

ds: [ID 406019 kern.notice] NOTICE: ds@1: invalid message length, 
received 4128 bytes, expected 37536

In the end this issue escalated it’s way back to Sun (after re-installing, re-installing from different media, switching disks, removing additional network cards, and disabling HW RAID, re-installing again, running explorer, realizing explorer didn’t say anything because prtdiag, etc didn’t work.

Solution:They fixed it with an upgraded OBP firmware which was released in October.

Be Sociable, Share!

2 comments for “Solaris 08/07 – fmd broken on T2000

  1. Anonymous
    November 13, 2007 at 11:21 am

    FYI I see this exact same error on T1000s running 08/07 with this firmware:
    sc> showhost
    Sun-Fire-T1000 System Firmware 6.4.6 2007/06/24 18:43

    Host flash versions:
    Hypervisor 1.4.1 2007/04/02 16:37
    OBP 4.26.1 2007/04/02 16:25
    POST 4.26.0 2007/03/26 16:46

  2. me
    November 16, 2007 at 12:15 am

    You should definitely update your firmware. Look for patch number 127576-01.

    It is not a standard patch so read the instructions before trying to apply it.

    This will bring you up to:
    System Firmware 6.5.3 Sun Fire[TM] T2000 2007/10/03 05:56
    —————————————————
    ALOM-CMT v1.5.2 Sep 25 2007 08:45:51
    VBSC 1.5.4 Sep 25 2007 08:42:34
    Hypervisor 1.5.2 2007/09/25 08:39
    OBP 4.27.4 2007/10/02 18:35
    POST 4.27.4 2007/10/02 19:03

Leave a Reply

Your email address will not be published.