Feel free to skip some of the steps if you already have certain packages installed
brew install caskroom/cask/brew-cask
Get Vagrant & Vagrant plugins
brew cask install virtualbox
brew cask install vagrant
brew cask install vagrant-manager
vagrant plugin install vagrant–hostmanager
git clone email@example.com:richardhe-awin/vagrant-hadoop-cluster.git
Configure Cloudera Manager (mostly referenced from http://blog.cloudera.com/blog/2014/06/how-to-install-a-virtual-apache-hadoop-cluster-with-vagrant-and-cloudera-manager/)
- Go to http://hadoop-master:7180/ (you might have to wait for a few minutes for the service to boot up before this is available) and login with admin/admin
- Choose to use the Express version and continue
- When you are asked to enter the host names, enter hadoop-node1 and hadoop-node2 and click search. You should see the two hosts coming up and confirm.
- Keep using the default option until you got to the page asking “Login to all hosts as”. Change this to “Another user” and enter “vagrant” as the username and enter “vagrant” again for the password fields. Click next and it should start installing (this will take a while).
- On the “Cluster Setup” page, choose “Custom Services” and select the following: HDFS, Hive, Hue, Impala, Oozie, Solr, Spark, Sqoop2, YARN and ZooKeeper. Click Continue.
- On the next page, you can select what services end up on what nodes. Usually Cloudera Manager chooses the best configuration here, but you can change it if you want. For now, click Continue.
- On the “Database Setup” page, leave it on “Use Embedded Database.” Click Test Connection (it says it will skip this step) and click Continue.
- Click Continue on the “Review Changes” step. Cloudera Manager will now try to configure and start all services.