Sun’s Predicament

I’ve been working with Unix for a fairly long time now- about 13 years.

I’ll admit that I started with Linux and thought it was light years ahead of SunOS 4.x running on those old SPARC machines- I mean who had heard of SPARC processors? I remember my boss trying to explain to me that even an older SPARC processor was more powerful than a newer Intel Pentium processor. I didn’t really believe him. In time, I convinced them to get rid of most of their SPARC/Solaris in favor of the hip, free, and cheap Intel/Linux combination.

Now I see that I couldn’t have been more wrong. I realize that SunOS 4.x probably still has features which I don’t know how to use properly. When I look at Solaris 10, ZFS, Zones, LDOMS, DTrace, etc. I not really sure you could pay me to work with Linux (that would be soo depressing). That isn’t even mentioning the SPARC hardware it runs on- Can any Intel server compare to a T5140???

That’s why the current situation with Sun absolutely SUCKS (pardon my french)! I’m sure there are a lot of admins out there who feel the same way. If this Oracle deal doesn’t go through and Sun disappears because of it, it will be our loss. We’ll be stuck with mediocre operating systems and commodity hardware and I really hope it doesn’t happen.

That said, I’d like to say thanks to all the people at Sun who are still turning out crazy cool technologies despite the problems.

SUNOS-8000-1L Errors caused by nxge driver for X4447A-z

I recently installed Solaris 08/07 on a T2000 with a Sun Quad GbE x8 PCIe Low Profile Adapter (X4447A-z) inside. The machine gave lots of problems.

One of the issues was the following message which the machine logged hundreds if not thousands of times:

Oct 23 22:18:27 hostname fmd: [ID 441519 daemon.error] SUNW-MSG-ID: SUNOS-8000-1L,
TYPE: Defect, VER: 1, SEVERITY: Minor
Oct 23 22:18:27 hostname EVENT-TIME: Tue Oct 23 22:18:27 BST 2007
Oct 23 22:18:27 hostname PLATFORM: SUNW,Sun-Fire-T200, CSN: -, HOSTNAME: hostname
Oct 23 22:18:27 hostname SOURCE: eft, REV: 1.16
Oct 23 22:18:27 hostname EVENT-ID: 86cc16cc-a356-6a94-a11b-bbc8cd5e456f
Oct 23 22:18:27 hostname DESC: The EFT Diagnosis Engine encountered telemetry
for which it is unable to produce a diagnosis. Refer to
http://sun.com/msg/SUNOS-8000-1L for more information.
Oct 23 22:18:27 hostname AUTO-RESPONSE: Error reports from the component will be
logged for examination by Sun.
Oct 23 22:18:27 hostname IMPACT: Automated diagnosis and response for these
events will not occur.
Oct 23 22:18:27 hostname REC-ACTION: Run pkgchk -n SUNWfmd to ensure that
fault management software is installed properly. Contact Sun for support.

I originally assumed that these very descriptive messages were part of the same problem with the fmd service which I mentioned in a previous post but Sun found another source for the problem. Apparently it is the nxge driver.
As I write this entry, Sun is working on a new driver. They tried a test version on my server and it did not solve the problem but it does seem to lessen the number of errors and add some information to the logs specifically, the entries above are sometimes preceded by a line similar to this:

nxge: [ID 752849 kern.warning] WARNING: nxge2 : nxge_ipp_err_evnts: pkt_dis_max

In the meantime, it seems that I will be ditching the quad cards until Sun can get their act together. I’m getting them replaced by two dual gigabit cards which use the e1000g driver.