Tag: File system

EMC Fully Automated Storage Tiering

Storage Tiering is nothing new. We use fast 15K RPM disks for high performance applications, slower 10K RPM disks for less demanding applications, and 7.2K RPM SATA disks for archive storage. Recently, solid state disks (SSDs) have also become more common for really high performance needs. The trick is managing it all.

Two or three years ago, if you wanted to implement automatic storage tiering, I would have pointed you in the direction of Sun’s Storage and Archive Manager- SAM and QFS, Sun’s tightly integrated shared file system. SAM-QFS automatically moves files from one storage tier to another based on the SAM policy and transparently retrieves the files when requested. With tape still the least expensive storage available, this is still a great solution for archiving petabytes of documents/files.

Unfortunately, SAM works at the file level so it will not help our databases run faster. What will help us is ZFS. ZFS is still making some fairly big waves in the storage community with it’s Hybrid Storage Pool feature. In a standard configuration, ZFS uses RAM for a Layer 1 read cache (ARC).  In advanced configurations, the zpool can be configured to use a Layer 2 cache (L2ARC) on faster disks ie. SSDs compared to SAS compared to SATA , etc. The zpool can also be configured to use separate, possibly faster disks for the ZFS Intent Log (ZIL) which is basically a write cache (without getting into why it is more than a write cache). Even without faster disks, the ability to store the read/write cache on a separate device can increase performance just by dedicating more IOPS to the cause.

Oracle/Sun’s 7000 series storage builds on the success of the ZFS Hybrid Storage Pool, using Logzilla devices for the ZIL and Readzilla devices for the L2ARC. With the powerful flash acceleration in the storage pool, even 7.2K RPM disks can give performance equal to that of higher speed 15K RPM disks.

Although ZFS does great things for performance by utilizing multiple tiers of storage devices, all the data is still physically stored on the same tier of storage in addition to having the hot data stored again in the caches. This is arguably a waste of capacity but can also lead to performance issues in some cases. For example, a cold L2ARC cache after reboot could give slower performance until fully warmed up. Oracle will probably fix this at some point by allowing the L2ARC to persist if stored on a non-volatile device (bug_id=6662467).

In the meantime, EMC recently announced an interesting new feature called FAST, short for Fully Automated Storage Tiering. FAST is available from FLARE version 04.30.000.5.004. FAST allows you to define a pool in the array composed of multiple RAID Groups, and then define a LUN on the pool as opposed to defining a LUN on the RAID Groups themselves. Once the LUN begins filling with data, the EMC will transparently begin transparently migrating data between the tiers of the pool in 1GB chunks, storing hot data on the fastest tiers and coldest data on the slowest tier.

FAST sounds like a dream come true. No more complicated storage configurations for the database. No more packages and processes to move historical data to slower disk groups. On the other hand, I am skeptical as to whether or not this technology is really mature. Do all EMC products treat the FAST LUNS the same as traditional LUNS (SnapView, Replication Manager, etc.) Also, are the ramifications of disk failures for a FAST LUN the same or does failure of a Tier 1 disk in a FAST pool mean alot more high performance eggs in one basket? Time will tell.

No ZFS Support for EMC Replication Manager

As I originally blogged, I was hoping to use EMC snapshots to perform server-less/network-less backups. EMC provides two main tools for managing snapshots in this type of situation:

  • EMC Replication Manager
  • EMC PowerSnap Networker Module

The PowerSnap Module supposedly automates taking snapshots for the purpose of backups, while Replication Manager supposedly provides a much more robust package.

With Replication Manager you might create a policy to take a snapshot every five minutes, keep the last 10, and use those for backups whenever necessary.

To make a long story short, Replication Manager is useless for LUNs with ZFS. According to EMC, this won’t change in the near future. PowerSnap also has no support for taking snapshots of LUNs with ZFS on them so basically EMC has no server-less backup offerings for Solaris with ZFS.

As an IT guy in general, ZFS is the best thing that has happened to file systems in the last 10 years and it is only getting better. ZFS is already standard in FreeBSD and NetBSD. Linux supports ZFS over FUSE due to license issues but I’m confident those will be solved. The file system is platform independent, meaning you can move the data transparently between Intel and Sparc architectures. Deduplication has just been added to the feature set and disk encryption is on it’s way.

As a Solaris admin, I really can’t figure out why EMC would decide to cut off their own foot like this. It is clear that UFS will remain for legacy and backwards compatibility but ZFS is the future. Not planning to support ZFS is like not planning to support Solaris.

The only possibility that I can see is that EMC sees Sun, Solaris, and ZFS as enough of a threat, that they are strategically trying to limit options? For operations local to a server, ZFS has largely replaced the need for heavy hardware like EMC on the SAN. Some would argue that ZFS RAID + JBOD is better than ZFS + RAID on EMC. You can do the snapshots without the EMC. On a simple level, you can send snapshots asynchronously to another system, similar to MirrorView, without the EMC. You can do deduplication without the EMC. Now with Sun’s Flash Cache technology which integrates with ZFS, you can get the performance without the EMC. Along the same lines, you see Sun changing the rules of the storage/database game with solutions like Exadata V2. The integration of Zones with ZFS may be challenging Vmware on the virtualization front, especially with the serious advantage Sun’s Coolthreads servers have in terms of consolidation.

That said, I still prefer to offload this work to dedicated storage hardware for the time being and probably in the future. If EMC chooses not to support ZFS, they will only force us not to buy EMC arrays. We will stop buying disks, stop buying tools, etc.

Instead, they should be providing better support for ZFS, integrating with ZFS to get better performance, providing tools which make EMC the preferred disk array behind a ZFS filesystem.

EMC Replication Manager in Solaris

UPDATE: No ZFS Support for Replication Manager in the near future

Using storage level snapshots can be used to run backups without directly requiring resources from the original host.

EMC Replication Manager coordinates the creation of application consistent snapshots across all the hosts in your network. It handles scheduling creation/expiration of snapshots,  mounting and unmounting from backup servers, etc. from a single console.

Although it is not tightly integrated into EMC Networker like the similar Networker PowerSnap module, it can be used to start a backup process after taking a new snapshot and it has the capability to manage snapshots unrelated to backups from a GUI.

While the data sheet claims support for Solaris, there are several caveats which I have run into.

  1. There is no mention of ZFS support in the data sheet and apparently, there is no support in the software either. One would expect this to be a non-question since ZFS has been part of Solaris since 2006.
  2. The data sheet is missing the word “SPARC” next to the word Solaris. There is no support for x86.

Honestly, this has put a dent in my plans since my backup server is an x86 box. I’m hoping the lack of ZFS support will work out as long as we can script any FS specific magic we need. I don’t have an option of running something like Linux on it (just to get the software working) because I won’t be able to even mount the ZFS filesystems- let alone back them up.

In the meantime, I’ll have to move my backups to a SPARC server and considering the lack of low end SPARC machines, I’ll have to allocate something way too expensive to be a backup server.

Listing ZFS Clones using the origin property

Recently I created my first ZFS clones but quickly realized that there was no simple way to tell the clones from the regular filesystems. My first instinct was to run ‘zfs list -t clone’ similar to ‘zfs list -t snapshot’ but this didn’t work. Maybe it works in newer versions of ZFS.

After some poking around I found the ‘origin’ property which sets the clones apart so running something like-

zfs list -o origin,name,used,avail,refer,mountpoint | \
grep -v ^- |awk '{print $2"\t"$3"\t"$4"\t"$5}'

will get you what you are looking for.

If you haven’t played with ZFS clones yet, basically they are writable snapshots of a file system.

They are great if you want to copy a lot of data to the side, modify it, and possibly replace the original data, without taking a lot of time or disk space. The ZFS clones take seconds to create, since they don’t actually copy any data, and they will only store the blocks which have changed since their creation. If you want to replace the original data, you can then transparently promote the clone to be the master filesystem and turn the master into a clone.

The downside of clones is that they are always dependant on the snapshot from which they were created. You can not destroy a snapshot on which a clone is based without destroying the clone.

For the sake of simplicity and since I don’t usually have disk space issues, I usually prefer to make full copies using ZFS send/recieve but I have definate plans to make more use of ZFS clones in the future.

Persistent static routes in Solaris 10 11/06, 08/07

Static routes are a very common necessity once your networks become even a little complex. Whether you need to route specific traffic over a VPN or setup specific test addresses for IPMP failover, static routes are indispensable.

For many years the “correct” way of configuring static routes in Solaris has been to create an init.d script which ran the ‘route add’ commands.

As of Solaris 10 11/06, a more reasonable approach has been implemented. The ‘route’ command has a new option ‘-p’.

Make changes to the network route tables persistent across system restarts. The operation is applied to the network routing tables first and, if successful, is then applied to the list of saved routes used at system startup. In determining whether an operation was successful, a failure to add a route that already exists or to delete a route that is not in the routing table is ignored. Particular care should be taken when using host or network names in persistent routes, as network-based name resolution services are not available at the time routes are added at startup.

Now you may be asking “Where is my configuration file?” The route command currently stores your static routes in the file /etc/inet/static_routes but this has been declared volatile. Sun is not promising to keep these configurations in that file or in the same format from release to release.

I personally am not happy with Sun’s general move to administrative utilities for configuration as opposed to configuration files. I agree that utilities are useful. They ensure correct syntax, etc. but I want the ability to configure a system on the file system level as well. Otherwise I loose the ability to keep a system’s configuration files in version control. I loose the ability to deploy a system by transferring the appropriate files (ala scp, cfengine, puppet, home grown script, etc.) I prefer something along the lines of crontab where the syntax is checked but the configuration itself is a file in userspace.

Still, a standard method for configuring static routes is welcome in place of creating init scripts, especially with SMF services phasing out init scripts altogether.