Unix

Linux and Solaris are Converging but Not the Way You Imagined

tux

In case you haven’t been paying attention, Linux is in a mad dash to copy everything that made Solaris 10 amazing when it launched in 2005. Everyone has recognized the power of Zones, ZFS and DTrace but licensing issues and the sheer effort required to implement the technologies has made it a long process.

ZFS

ZFS is, most probably, the most advanced file system in the world. The creators of ZFS realized, before anyone else, that file systems weren’t built to handle the amounts of data that the future would bring.

Work to port ZFS to Linux began in 2008 and a stable port of ZFS from Illumos was announced in 2013. That said, even 2 years later, the latest release still hasn’t reached feature parity with ZFS on Illumos. With developers preferring to develop OpenZFS on Illumos and the licensing issues preventing OpenZFS from being distributed as part of the Linux Kernel, it seems like ZFS on Linux (ZOL) may be doomed to playing second fiddle.

DTrace

DTrace is the most advanced tool in the world for debugging and monitoring live systems. Originally designed to help troubleshoot performance and other bugs in a live Solaris kernel, it quickly became extremely useful in debugging userland programs and run times.

Oracle has been porting DTrace since at least 2011 and while they both own the original and have prioritized the most widely used features, they still haven’t caught up to the original.

Zones

Solaris Zones are Operating System level virtual machines. They are completely isolated from each other but all running on the same kernel so there is only one operating system in memory. Zones have great integration with ZFS, DTrace, and all the standard system monitoring tools which makes it very easy to support and manage servers with hundreds of Zones running on them. Zones also natively support a mechanism called branding which allows the kernel to provide different interfaces to the guest zone. In Oracle Solaris, this is used to support running zones from older versions of Solaris on a machine running a newer OS.

Linux containers of some type or another have been around for a while, but haven’t gotten nearly as mature as Zones. Recently, the continued failure of traditional hypervisors to provide bare metal performance in the cloud, coupled with the uptake of Docker, has finally gotten the world to realize the tremendous benefits of container based virtualization like Zones.

The current state of containers in Linux is extremely fractured with at least 5 competing projects that I know of. LXC, initially released in 2008, seems to be the favorite but historically had serious privilege separation issues but has gotten a little better if you can meet all the system requirements.

Joyent has been waiting at the finish line.

While Linux users wait and wait for mature container solutions, full OS and application visibility, and a reliable and high performance file system, Joyent has been waiting to make things a whole lot easier.

About a year ago, David Mackay showed some interest in Linux Branded Zones, work which had been abandoned in Illumos. In the spring of 2014, Joyent started work on resurrecting lx-zones and in September, they presented their work. They already have working support for 32 bit and some 64 bit Linux binaries in Linux branded SmartOS Zones. As part of the process, they are porting some of the main Linux libraries and facilities to native SmartOS which will make porting Linux code to SmartOS much easier.

The upshot of it is that you can already get ZFS, Dtrace, and Linux apps inside a fully isolated, high performance, SmartOS zone. With only 9 months or so of work behind it, there are still some missing pieces to the Linux support, but, considering how long Linux has been waiting, I’m pretty sure SmartOS will reach feature parity with Linux a lot faster than Linux will reach feature parity with SmartOS.

EMC Fully Automated Storage Tiering

Storage Tiering is nothing new. We use fast 15K RPM disks for high performance applications, slower 10K RPM disks for less demanding applications, and 7.2K RPM SATA disks for archive storage. Recently, solid state disks (SSDs) have also become more common for really high performance needs. The trick is managing it all.

Two or three years ago, if you wanted to implement automatic storage tiering, I would have pointed you in the direction of Sun’s Storage and Archive Manager- SAM and QFS, Sun’s tightly integrated shared file system. SAM-QFS automatically moves files from one storage tier to another based on the SAM policy and transparently retrieves the files when requested. With tape still the least expensive storage available, this is still a great solution for archiving petabytes of documents/files.

Unfortunately, SAM works at the file level so it will not help our databases run faster. What will help us is ZFS. ZFS is still making some fairly big waves in the storage community with it’s Hybrid Storage Pool feature. In a standard configuration, ZFS uses RAM for a Layer 1 read cache (ARC).  In advanced configurations, the zpool can be configured to use a Layer 2 cache (L2ARC) on faster disks ie. SSDs compared to SAS compared to SATA , etc. The zpool can also be configured to use separate, possibly faster disks for the ZFS Intent Log (ZIL) which is basically a write cache (without getting into why it is more than a write cache). Even without faster disks, the ability to store the read/write cache on a separate device can increase performance just by dedicating more IOPS to the cause.

Oracle/Sun’s 7000 series storage builds on the success of the ZFS Hybrid Storage Pool, using Logzilla devices for the ZIL and Readzilla devices for the L2ARC. With the powerful flash acceleration in the storage pool, even 7.2K RPM disks can give performance equal to that of higher speed 15K RPM disks.

Although ZFS does great things for performance by utilizing multiple tiers of storage devices, all the data is still physically stored on the same tier of storage in addition to having the hot data stored again in the caches. This is arguably a waste of capacity but can also lead to performance issues in some cases. For example, a cold L2ARC cache after reboot could give slower performance until fully warmed up. Oracle will probably fix this at some point by allowing the L2ARC to persist if stored on a non-volatile device (bug_id=6662467).

In the meantime, EMC recently announced an interesting new feature called FAST, short for Fully Automated Storage Tiering. FAST is available from FLARE version 04.30.000.5.004. FAST allows you to define a pool in the array composed of multiple RAID Groups, and then define a LUN on the pool as opposed to defining a LUN on the RAID Groups themselves. Once the LUN begins filling with data, the EMC will transparently begin transparently migrating data between the tiers of the pool in 1GB chunks, storing hot data on the fastest tiers and coldest data on the slowest tier.

FAST sounds like a dream come true. No more complicated storage configurations for the database. No more packages and processes to move historical data to slower disk groups. On the other hand, I am skeptical as to whether or not this technology is really mature. Do all EMC products treat the FAST LUNS the same as traditional LUNS (SnapView, Replication Manager, etc.) Also, are the ramifications of disk failures for a FAST LUN the same or does failure of a Tier 1 disk in a FAST pool mean alot more high performance eggs in one basket? Time will tell.

Portsnap, Apache Configurations, and CGI – Questions Answered

  1. Explain the importance of installing and running portsnap after installing a current version of FreeBSD.

    Portsnap is a system for securely distributing the FreeBSD ports tree. Approximately once an hour, a “snapshot” of the ports tree is generated, repackaged, and cryptographically signed. The resulting files are then distributed via HTTP.

    The first time portsnap is run, it will need to download a compressed snapshot of the entire ports tree (portsnap fetch) and then a “live” copy of the ports tree can be extracted into /usr/ports/ (portsnap extract). This is necessary even if a ports tree has already been created in that directory (e.g., by using CVSup), since it establishes a baseline from which portsnap can determine which parts of the ports tree need to be updated later.

    Initializing portsnap as soon as possible will ensure the most secure and up to date software installations on your machine and will prevent a long download of the initial compressed tree when you need it later.

    After the initialization of portsnap, it is recommended to put ‘portsnap cron’ in a cronjob to fetch updates regularly. Then you should use ‘portsnap update’ before using the ports system. Putting ‘portsnap update’ in cron is not recommended since it can cause problems if run while using the ports tree.

  2. Explain the role configuration files in Unix applications. In Apache version 2.2, the configuration files have been modularized. What are the advantages and disadvantages of using a modular approach to configuration files?

    Configuration files allow you to control the settings and parameters of a service or program by editing in most cases a simple text file. Well known configuration files include /etc/hosts (local hostname to ip resolution), /etc/nsswitch.conf (name service configuration), /etc/resolv.conf (DNS server configuration). Samba uses a configuration file which looks more like a windows .ini file. Apache uses it’s own XML-ish format

    Apache’s use of modular configuration files is not new in version 2.2 as can be seen here: http://httpd.apache.org/docs/1.3/mod/core.html#include

    More likely you are used to a binary distribution of Apache which has split the configuration file into several files/directories and included them for te base configuration. The reason for doing this is convenience. It is easy to manage multiple Apache servers with similar settings by creating a basic shared configuration for all servers and only modifying a subset of the configuration sitting in an included file.

    It is also popular to use a set of prepared configuration files (module configurations/vhost configurations) in a directory marked “_available” and symlink them into a directory called “_enabled” which is included in the Apache configuration. This provides a quick on/off mechanism for certain configurations.

  3. Explain the use of “directives” in configuration files. Provide an example of two directives found in an Apache configuration file and detail what each accomplishes.

    Directives is a word which is pretty specific to Apache. Each directive controls some part of the configuration. Apache has ~410 of them. Each one is characterized by the syntax of the arguments it accepts, the default value if there is one, the context in which it can be used (server, virtual host, directory, etc.), what overrides must be in place for the directive to be used in a .htaccess file, status, module, and compatibility.

    Examples:

    ServerName is a directive which sets the request scheme, hostname and port that the server uses to identify itself. http://httpd.apache.org/docs/2.2/mod/core.html#servername

    The ServerAdmin directive sets the contact address that the server includes in any error messages it returns to the client. http://httpd.apache.org/docs/2.2/mod/core.html#serveradmin

  4. What is meant by “overrides”? Provide an example of an override found in an Apache configuration file and detail what is accomplished by the override.

    An override allows a directive to be overridden by directives in a .htaccess file located in one of the web content directories.

    An example of an override is AuthConfig. This override will allow the .htaccess file in a directory to change the apache configuration of that directory in terms of authentication (either allow or deny access, specify users, etc.) http://httpd.apache.org/docs/2.2/mod/core.html#allowoverride

  5. Define what is meant up a Common Gateway Interface, how it is used in websites, and the methods for providing one? Discuss the advantages and disadvantages of providing this functionality on a web site.

    CGI is an older standard for allowing a web server like Apache to send request parameters to an external program and use the program’s output as a response. Before scripting languages like PHP or PERL where built straight into web server modules, this was the only way to use dynamically generated content.

    CGI is generally not a great solution today although it is still used. It’s performance is poor due to the need to start a completely new process on each request. CGIs generally take more memory and require more open processes on the server. PERL has a CGI module which makes writing a CGI script fairly easy.

    More common today is the use of FastCGI which requires a different interface from the external program. FastCGI keeps a number of external programs running to improve performance. Many recommend running PHP as a FastCGI program in order to take advantage of Apache’s newer multi-threaded MPM.

32 or 64 bit MySQL

Recently, I wanted to confirm that I was running the 64 bit version of the MySQL server as opposed to the 32 bit version.
The Sun Webstack installation comes with both versions and if you use the built in SMF service, the difference between using the 32 bit version or 64 bit version is controlled by a flag in the service properties.
I was not using the built in service, but rather using Sun Cluster to start the server. In order to convince Sun Cluster to start the 64 bit version (I’m sure there is a better way to do this), one of my admins had made a symlink from the mysql/bin directory to the 64 bit binary directory. On the command line, you could no longer tell if the mysqld command was run from the 64 bit directory and there doesn’t seem to be a built in MySQL command which shows what version is currently running (show status, \s, show variables, etc)
In the end I ran pldd on the process id of the MySQL server. I am reasonably sure that I am running the 64 bit version of the server because all the shared libraries being used came from the /lib/sparcv9/ and /usr/lib/sparcv9/ directories which are the 64 bit libraries. Not sure if this method works on other OS’s but I thought it might be helpful to someone.
Good Luck!

Oracle’s New Line of Sun x86 Servers

Oracle just announced a new line of Nehalem based x86 servers and they are beasts:

Interesting to note: the x4450, my previous favorite due to being the only 2RU 4 socket Xeon I’m aware of has a new big brother- the Sun x4470. It seems they couldn’t get 4 7500 series Nehalems in 2RUs or else they realized that no one could compete with them in 3 RUs so why bother.
http://www.oracle.com/us/products/servers-storage/servers/x86/sun-fire-x4470-server-077286.html

Also interesting is the 5 rack unit Sun x4800 which supports 8 Nehalem 7500 series processors, up to 64 cores, hot-swappable IO cards, 1TB of RAM and has built in NEMs (borrowing from blade tech it seems). A fully loaded server runs about 3KW at 100% utilization according to the power calculator which is more than a similarly configured (much more reliable) M5000 but in half the rack space:
http://www.oracle.com/us/products/servers-storage/servers/x86/sun-fire-x4800-server-077287.html

Also interesting is widespread support for all types of SSD and Flash memory modules in this line, and the additional RAS capabilities coming to the x86 line with the new Nehalem processors.

The long and short of it for me seems to be that the machines are beasts but power hungry beasts. None of the lower power Nehalems really seem to be on the table. Until we see prices, it will be hard to tell if these are worth the pdf’s their printed on.