Distributed Computing (Data, Hadoop, Elasticsearch and more…)

Elastic

My Talk about building scalable reporting solutions using Elasticsearch: https://youtu.be/Z8CFWqqP3Jo

Interesting article about running Elasticsearch on AWS: https://www.elastic.co/blog/running-elasticsearch-on-aws

Good read about Scaling Elasticsearch Writes: http://news.appbase.io/scaling-elasticsearch-writes/

Apache Spark

Spark Architecture (very important to understand): https://0x0fff.com/spark-architecture/

Spark SQL, we use this a lot to analyse data on HDFS: https://spark.apache.org/docs/latest/sql-programming-guide.html

Spark Streaming, try it out for real-time stream data processing that requires medium latency: http://spark.apache.org/streaming/

How to submit Spark jobs from a remote host: http://theckang.com/2015/remote-spark-jobs-on-yarn/

HBase

HBase Schema Design by Lars George: https://youtu.be/_HLoH_PgrLk