Automate Deployment of Hadoop Clusters with Dell Crowbar and Cloudera Manager

by Mike Pittaro

Deploying, managing, and operating Apache Hadoop clusters can be complex at all levels of the stack, from the hardware on up. To hide this complexity and reduce deployment time, since 2011, Dell has been using Dell Crowbar in conjunction with Cloudera Manager to deploy the Dell | Cloudera solution.

Cloudera Manager does a great job of deploying and managing the Hadoop layers of a cluster, but it depends on an operating system to be in place first. Meanwhile, to complement those capabilities, Dell Crowbar is a complete automated operations platform, designed to deploy layers of infrastructure from bare-metal servers all the way up the stack.

In the Dell | Cloudera Solution, we use Crowbar to provision the hardware, configure it, and install Red Hat Enterprise Linux and Cloudera Manager -- then, Cloudera Manager takes over to guide the user to a functioning cluster. Furthermore, we use the Cloudera Manager API [http://cloudera.github.io/cm_api/] to entirely automate the cluster setup process.

In this post, I’ll provide more details about how we have successfully integrated the Cloudera Manager API with Dell Crowbar.

Crowbar overview

Dell Crowbar, inspired by DevOps principles, is based on the concept of defining hardware and software configuration in code, just like applications. Crowbar uses a modular approach, where each component of the stack is deployed as an independent unit. The definition of each component is a Crowbar module called a “Barclamp”.

Figure 1 shows the Barclamps included in a typical Hadoop installation -- including hardware configurations and functions like DNS and NTP -- all the way up to the Cloudera components.

Figure 1 Crowbar Barclamps for Hadoop

During the deployment process, the Crowbar interface is used to create a “proposal” based on a Barclamp. Within the proposal editor, the hardware nodes are assigned their intended roles, and then the proposal is saved and applied. A single node can have several roles assigned depending on its function, and Barclamps are aware of dependencies between roles. For example, in order to deploy CDH, the single Cloudera Manager proposal is applied and subsequently Crowbar takes care of all the other requirements.

Cloudera Manager Barclamp

Figure 2 shows the Cloudera Manager proposal within Crowbar. The Barclamp defines seven roles available for nodes within the cluster:

Clouderamanager-cb-adminnode – the node running the Crowbar
Clouderamanager-server – the node running the Cloudera manager server
Clouderamanager-namenode – nodes running the name server, whether active/passive or quorum HA is being used.
Clouderamanager-datanode - the cluster data nodes
Clouderamanager-edgenode – an edge or gateway node for client tools
Clouderamanager-ha-journaling node – a quorum-based journaling node for quorum HA
Clouderamanager-ha-filernode – an NFS filer node, for active/passive HA using a shared NFS mount

F2 Cloudera Manager Barclamp proposal showing nodes and roles

In the Crowbar interface, available hardware nodes are dragged to the appropriate roles in the proposal, and then the proposal is applied. At that point, Crowbar analyzes dependencies and makes any required changes to the nodes. If the hardware nodes are new, these changes might involve hardware configuration and a complete OS install. If the nodes already have an OS installed, the changes simply might involve installing some additional packages.

In the proposal editor, the roles defined within Crowbar closely correspond to the typical roles of nodes in a Hadoop cluster and can be mapped almost directly to their corresponding Hadoop services within Cloudera Manager. So, we decided to use this information to integrate with the Cloudera Manager API.

The integration is enabled by setting the deployment mode in the proposal to “auto mode”, as shown in Figure 3. In auto mode, applying the proposal will configure the hardware and OS as necessary, install Cloudera Manager on the edge node, install the agents on the remaining nodes, and then use the Cloudera Manager API to create the cluster based on the Crowbar roles. The install also sets up a local Yum repository for all the CDH packages, so the install can proceed without full Internet access.

F3 Auto deployment mode in the Cloudera Manager Barclamp

Behind the Scenes with the API

Crowbar is implemented primarily in Ruby, and the Barclamps are wrappers around Opscode Chef, which also uses Ruby for its recipes and cookbooks. This means that the automatic deployment implementation in Crowbar would also be in Ruby.

The Cloudera Manager API provides client libraries for Java and Python, but not Ruby, so the first step in implementing the integration was to create a Ruby client library for the API. The API is standards based, using REST and JSON as a data format, so this was a relatively straightforward process. The Ruby library parallels the Python implementation. (The library is currently Apache licensed as part of Crowbar, but could be split out in the future if there’s community interest.)

The actual cluster deployment logic is in the file cm-server.rb. It turns out to be relatively straightforward, and follows the flow described in “How-to: Automate Your Hadoop Cluster from Java”. The information about the nodes, their roles, and their configuration are already available in Crowbar, so the code primarily iterates through the Crowbar data structures, and makes the calls to create the cluster, the HDFS service, and the MapReduce Service. Since Cloudera Manager also uses a role-based approach, this mapping turns out to be very clean.

There’s even some logic there to handle licensing. The Cloudera Manager API corresponds directly to Cloudera’s packaging, which includes Standard (free) and Enterprise (paid support, with a 60-day trial option) versions. If a Cloudera license is entered in the Crowbar proposal it is used; otherwise, the Enterprise trial license is activated. Our current integration uses the free APIs for HDFS and MapReduce configuration.

So far, the automatic deployment capability has significantly reduced the time to deploy a cluster, especially if there are a large number of nodes. More important, it eliminates the error-prone manual process of re-entering the node and role information into the Cloudera Manager wizard. (This is a new feature, and we’re still collecting feedback on enhancements.)

We currently set up the core HDFS and MapReduce services for the cluster. In the future, we will likely configure more services as part of the automatic deployment. Cloudera Manager role groups provide another interesting opportunity for further integration, since we have information about the actual hardware configuration in Crowbar, and can start grouping nodes together based on hardware-specific features.

In the meantime, Dell and Cloudera continue to collaborate on other new features to make sure the various systems and technologies integrate effectively to provide the easiest option to setup and manage Hadoop clusters.

Mike Pittaro (@pmikeyp) is Principal Architect on Dell's Cloud Software Solutions team. He has a background in high performance computing, data warehousing, and distributed systems, specializing in designing and developing big data solutions.

Automate Deployment of Hadoop Clusters with Dell Crowbar and Cloudera Manager

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112