Tag: Cross-platform software

Couchbase is Simply Awesome

couchbase

Here are five things that make Couchbase a go-to service in any architecture.

Couchbase is simple to setup.

Keep It Simple. It’s one of the axioms of system administration. Couchbase, though complicated under the hood, makes it very simple to setup even complicated clusters spanning multiple data centers.

Every node comes with a very user friendly web interface including the ability to monitor performance across all the nodes in the same machine’s cluster.

Adding nodes to a cluster is as simple as plugging in the address of the new node after which, all the data in the cluster is automatically rebalanced between the nodes. The same is true when removing nodes.

Couchbase is built to never require downtime which makes it a pleasure to work with.

If you are into automation a la chef, etc., Couchbase supports configuration via REST api. There are cookbooks available. I’m not sure about other configuration management tools but they probably have the relevant code bits as well.

Couchbase replaces Memcached

Even if you have no need for a more advanced NoSQL solution, there is a good chance you are using Memcached, Couchbase is the original Memcached on steroids.

Unlike traditional Memcached, Couchbase supports clustering, replication, and persistence of data. Using the Moxi Memcached proxy that comes with Couchbase, your apps can talk Memcached protocol to a cluster of Couchbase servers and get the benefits of automatic sharding and failover. If you want, Couchbase can also persist the Memcached data to disk turning your Memcached into a persistent, highly available key value store.

Couchbase is also a schema-less NoSQL DB

Aside from support for simple Memcached key/value storage, Couchbase is a highly available, easy to scale, JSON based DB with auto-sharding and built in map reduce.

Traditionally, Couchbase uses a system called views to perform complicated queries on the JSON data but they are also working on a new query language called N1QL which brings tremendous additional ad hoc query capabilities.

Couchbase also supports connectivity to Elastic Search, Hadoop, and Talend.

Couchbase is all about global scale out

Adding and removing nodes is simple and every node in a Couchbase cluster is read and write capable all the time. If you need more performance, you just add more nodes.

When one data center isn’t enough, Couchbase has a feature called cross data center replication (XDCR), letting you easily setup unidirectional or bidirectional replication between multiple Couchbase clusters over WAN. You can even setup full mesh replication though it isn’t clearly described in their documentation.

Unlike MongoDB, which can only have one master, Couchbase using XDCR allows apps in any data center to write to their local Couchbase cluster and that data will be replicated to all the other data centers.

I recently setup a system using five Couchbase clusters across the US and Europe, all connected in a full mesh with each other. In my experience, data written in any of the data centers updated across the globe in 1-2 seconds max.

Couchbase is only getting better

Having used Couchbase built from source (read community support only) since version 2.1 (Couchbase is now at 3.0.2), I can say that it is only getting better. They have made amazing progress with XDCR, added security functionality, and the N1QL language.

The Couchbase community is great. Checkout the IRC channel if you need help.

Portsnap, Apache Configurations, and CGI – Questions Answered

  1. Explain the importance of installing and running portsnap after installing a current version of FreeBSD.

    Portsnap is a system for securely distributing the FreeBSD ports tree. Approximately once an hour, a “snapshot” of the ports tree is generated, repackaged, and cryptographically signed. The resulting files are then distributed via HTTP.

    The first time portsnap is run, it will need to download a compressed snapshot of the entire ports tree (portsnap fetch) and then a “live” copy of the ports tree can be extracted into /usr/ports/ (portsnap extract). This is necessary even if a ports tree has already been created in that directory (e.g., by using CVSup), since it establishes a baseline from which portsnap can determine which parts of the ports tree need to be updated later.

    Initializing portsnap as soon as possible will ensure the most secure and up to date software installations on your machine and will prevent a long download of the initial compressed tree when you need it later.

    After the initialization of portsnap, it is recommended to put ‘portsnap cron’ in a cronjob to fetch updates regularly. Then you should use ‘portsnap update’ before using the ports system. Putting ‘portsnap update’ in cron is not recommended since it can cause problems if run while using the ports tree.

  2. Explain the role configuration files in Unix applications. In Apache version 2.2, the configuration files have been modularized. What are the advantages and disadvantages of using a modular approach to configuration files?

    Configuration files allow you to control the settings and parameters of a service or program by editing in most cases a simple text file. Well known configuration files include /etc/hosts (local hostname to ip resolution), /etc/nsswitch.conf (name service configuration), /etc/resolv.conf (DNS server configuration). Samba uses a configuration file which looks more like a windows .ini file. Apache uses it’s own XML-ish format

    Apache’s use of modular configuration files is not new in version 2.2 as can be seen here: http://httpd.apache.org/docs/1.3/mod/core.html#include

    More likely you are used to a binary distribution of Apache which has split the configuration file into several files/directories and included them for te base configuration. The reason for doing this is convenience. It is easy to manage multiple Apache servers with similar settings by creating a basic shared configuration for all servers and only modifying a subset of the configuration sitting in an included file.

    It is also popular to use a set of prepared configuration files (module configurations/vhost configurations) in a directory marked “_available” and symlink them into a directory called “_enabled” which is included in the Apache configuration. This provides a quick on/off mechanism for certain configurations.

  3. Explain the use of “directives” in configuration files. Provide an example of two directives found in an Apache configuration file and detail what each accomplishes.

    Directives is a word which is pretty specific to Apache. Each directive controls some part of the configuration. Apache has ~410 of them. Each one is characterized by the syntax of the arguments it accepts, the default value if there is one, the context in which it can be used (server, virtual host, directory, etc.), what overrides must be in place for the directive to be used in a .htaccess file, status, module, and compatibility.

    Examples:

    ServerName is a directive which sets the request scheme, hostname and port that the server uses to identify itself. http://httpd.apache.org/docs/2.2/mod/core.html#servername

    The ServerAdmin directive sets the contact address that the server includes in any error messages it returns to the client. http://httpd.apache.org/docs/2.2/mod/core.html#serveradmin

  4. What is meant by “overrides”? Provide an example of an override found in an Apache configuration file and detail what is accomplished by the override.

    An override allows a directive to be overridden by directives in a .htaccess file located in one of the web content directories.

    An example of an override is AuthConfig. This override will allow the .htaccess file in a directory to change the apache configuration of that directory in terms of authentication (either allow or deny access, specify users, etc.) http://httpd.apache.org/docs/2.2/mod/core.html#allowoverride

  5. Define what is meant up a Common Gateway Interface, how it is used in websites, and the methods for providing one? Discuss the advantages and disadvantages of providing this functionality on a web site.

    CGI is an older standard for allowing a web server like Apache to send request parameters to an external program and use the program’s output as a response. Before scripting languages like PHP or PERL where built straight into web server modules, this was the only way to use dynamically generated content.

    CGI is generally not a great solution today although it is still used. It’s performance is poor due to the need to start a completely new process on each request. CGIs generally take more memory and require more open processes on the server. PERL has a CGI module which makes writing a CGI script fairly easy.

    More common today is the use of FastCGI which requires a different interface from the external program. FastCGI keeps a number of external programs running to improve performance. Many recommend running PHP as a FastCGI program in order to take advantage of Apache’s newer multi-threaded MPM.

32 or 64 bit MySQL

Recently, I wanted to confirm that I was running the 64 bit version of the MySQL server as opposed to the 32 bit version.
The Sun Webstack installation comes with both versions and if you use the built in SMF service, the difference between using the 32 bit version or 64 bit version is controlled by a flag in the service properties.
I was not using the built in service, but rather using Sun Cluster to start the server. In order to convince Sun Cluster to start the 64 bit version (I’m sure there is a better way to do this), one of my admins had made a symlink from the mysql/bin directory to the 64 bit binary directory. On the command line, you could no longer tell if the mysqld command was run from the 64 bit directory and there doesn’t seem to be a built in MySQL command which shows what version is currently running (show status, \s, show variables, etc)
In the end I ran pldd on the process id of the MySQL server. I am reasonably sure that I am running the 64 bit version of the server because all the shared libraries being used came from the /lib/sparcv9/ and /usr/lib/sparcv9/ directories which are the 64 bit libraries. Not sure if this method works on other OS’s but I thought it might be helpful to someone.
Good Luck!

Real Time Reporting Databases

Reporting projects are the kind of projects which never seem to end. After a couple iterations I’ve come to the following conclusions:

  1. Absolutely no reports should run on a production database.
  2. Moving/aggregating data from a production database to a reporting database using ETL tools prone to synchronization issues and pretty unreliable.
  3. The best option is to set up real time replication of the data and build additional views on that.

Unfortunately, if you need to get data from heterogeneous databases, ie. Oracle, MySQL, SQL Server, etc. into a single reporting database, replication is not a simple solution. If you are running expensive database software in production, it may not be cost effective to run the same database for reporting.

Of course there are cross database replication solutions like Golden Gate or SharePlex but they are very expensive. I had already given up on getting data from Oracle into MySQL for reports when I stumbled across Tungsten Replicator.

According to the website, Tungsten Replicator provides open source database-neutral master/slave replication. Master/slave replication is a highly flexible technology that can solve a wide variety of problems including Cross DBMS Integration, ie. replication from Oracle to MySQL.

I’m looking forward to testing this product in the near future and I’d be happy to get anyone’s input if they’ve used it.