Quantcast
Channel: Dell TechCenter
Viewing all articles
Browse latest Browse all 1001

Designing and Deploying VMWare’s Big Data Extensions to enable Multi-Tenant Hadoop on VMWare

$
0
0

Dell Solution Centers in conjunction with Intel have established a Cloud and Big Data program to deliver briefings, workshops and Proofs of Concept focused on hyper-scale programs such as OpenStack and Hadoop.  The program’s Modular Data Center contains over 400 servers to support hyper-scale capability to allow customers to test drive their solutions. 

In this blog series, Cloud Solution Architect Kris Applegate discusses some of the technologies he is exploring as part of this program – and shares some really useful tips! You and your customer can learn more about these solutions at one of our global Solution Centers; all have access to the Modular Data Center capability and we can engage remotely with customers who cannot travel to us.

*******************************************

 Big Data is something that is on everyone’s mind. Being able to extract business value from the large volumes of data our modern environments are generating takes a special kind of tool. Hadoop is a Big Data Analytics framework that has been in use for years at many top web properties, research institutes, and Fortune 500 companies. It provides a scalable and affordable method to store and analyze this data. Whether parsing log files at the petabyte scale or offloading work from existing data warehouse applications, Hadoop has a strong foothold in today’s IT world.

 Historically Hadoop has been deployed one of two ways:

  1. In Public Clouds: Amazon Web Service’s Elastic Map Reduce offering is a cloud-based pay-as-you-go Hadoop solution. It runs on the same VM infrastructure as Amazon Web Services Elastic Compute Cloud (EC2). These two offerings, when combined form a powerful tool-chain to automate large scale analytics and utilize their output in your other cloud workloads.
  2. On Bare-Metal: Dell and a number of other companies have bare-metal hardware-based solutions (both reference architectures and appliances) that can allow customers to setup their own dedicated Hadoop clusters on-premise. These solutions emphasize raw speed and persistent environments. They run Linux on bare-metal PowerEdge servers that are dedicated to the Hadoop role.

However, there is an emerging set of use-cases that would benefit from a virtualized framework over a private Hadoop installation:

  1. The need for multiple tenants with distinct boundaries between processing and storage resources. These could be separate projects, separate departments, or even separate business processes (Ad Click-stream analysis versus fraud detection).
  2. The need to live inside an existing VMWare-based private cloud. This allows you to share the pool of resources across many different symbiotic workloads. You can use VMWare Resource pools to create and enforce limits for each workload.  Potential symbiotic workloads could include Virtual Desktop Infrastructure, Content Distribution, and High-Performance Clustered Computing.

Using VMware’s new vSphere 5.5 Big Data Extensions you can enable these functions inside of your VMware vSphere environment. It’s as simple as downloading the OVA and importing into your existing environment. By default the basic Apache Foundation distribution of Hadoop is included, but it’s very straight-forward to add in a number of other commercial distributions depending on what you’re level of comfort is.

 Once installed, you can begin the creation of your first virtual Hadoop Cluster. You can specify your distribution, your topology (basic, compute/storage separation, HBase-only, and Custom), and the number and size of VMs for each of the Hadoop roles (Name Node, Client, Data Nodes, etc). Keep in mind that the options presented in the web interface are only a fraction of what can be done through the advanced Command-Line tools and API. Whether creating a small persistent Hadoop cluster or a simple one-time cluster that is focused on a singular task, the real value here is the automation and self-service nature that you can empower your users with.

Once you hit OK on the above wizard, the VMware Big Data Extensions will clone the appropriate VMs and begin the completely automated tasks associated with building out the clusters. Once you’re satisfied with the cluster you can even scale up (increase the size of the VM’s memory and CPU resources) or scale out (increase the number of VMs). These options can even be set to be performed automatically as load dictates.

 

 VMware’s Big Data Extensions have successfully lowered the barriers to entry when it comes to harnessing the power of Big Data Analytics. Whether giving Hadoop a test-drive or further enhancing your private cloud’s capabilities, Dell and VMWare are here with a solution.

 


Viewing all articles
Browse latest Browse all 1001

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>