Sunday, May 26, 2019

AWS Policy editor

Thursday, May 2, 2019

Download files from Databricks Community Edition

Syntax is:<yournumber>

Wednesday, April 17, 2019

Run pyspark on Juypter notebook

$which python
$whereis python

# install EPEL repository first
$ sudo yum install epel-release
# install python-pip
$ sudo yum -y install python-pip

sudo pip install --upgrade setuptools


sudo sh

1) Install PySpark
pip install pyspark

2) Install Java

3) Install Jupyter notebook
pip install jupyter

4) Install find
pip install findspark

%env SPARK_HOME=c:\spark

# To find out where the pyspark
import findspark

# Creating Spark Context
from pyspark import SparkContext
sc = SparkContext("local", "first app")

# Calculating words count
text_file = sc.textFile("OneSentence.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)

# Printing each word with its respective count
output = counts.collect()
for (word, count) in output:
    print("{}: {}".format(word, count))

# Stopping Spark Context

Install Python 3.5 on Centos 6

sudo yum install centos-release-scl
sudo yum info rh-python35
sudo yum install rh-python35
sudo scl enable rh-python35 bash

Tuesday, April 16, 2019

Useful icons

Monday, April 8, 2019

Set up Cloudera CDH 5.13 todo

sudo /etc/resolv.conf

Edit Virtual Box forwarding, add port 22

Troubleshooting Hive

Check if Hive metastore is running
sudo service hive-metastore status
sudo service hive-metastore stop
sudo service hive-metastore start
sudo service hive-metastore restart

sudo service hive-server2 stop

Check if Hive Server2 is running
$ sudo service hive-server2 status

Start HiveServer2:
$ sudo service hive-server2 start

To stop HiveServer2:
$ sudo service hive-server2 stop

Check Hive warehouse
hadoop fs -ls /user/hive/warehouse

Connect using Beeline
beeline -u jdbc:hive2://
!connect -u jdbc:hive2://

Wednesday, April 3, 2019

Login Hortonworks Sandbox with WinSCP

ssh -p 2222

First-time login, use hadoop as password
You will be asked to change password upon first login


