Sunday, May 26, 2019

AWS Policy editor

https://awspolicygen.s3.amazonaws.com/policygen.html

Pin It

Thursday, May 2, 2019

Download files from Databricks Community Edition

Syntax is:

https://community.cloud.databricks.com/files/myStuffs/yourdata.csv/part-00000?o=<yournumber>

Pin It

Wednesday, April 17, 2019

Run pyspark on Juypter notebook

$which python
$whereis python

# install EPEL repository first
$ sudo yum install epel-release
# install python-pip
$ sudo yum -y install python-pip

sudo pip install --upgrade setuptools

wget https://repo.anaconda.com/archive/Anaconda2-5.0.1-Linux-x86_64.sh

sudo sh Anaconda2-5.0.1-Linux-x86_64.sh

1) Install PySpark
pip install pyspark

2) Install Java

3) Install Jupyter notebook
pip install jupyter

4) Install find
pip install findspark

%env SPARK_HOME=c:\spark

# To find out where the pyspark
import findspark
findspark.init()

# Creating Spark Context
from pyspark import SparkContext
sc = SparkContext("local", "first app")

# Calculating words count
text_file = sc.textFile("OneSentence.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)

# Printing each word with its respective count
output = counts.collect()
for (word, count) in output:
    print("{}: {}".format(word, count))

# Stopping Spark Context
sc.stop()



Install Python 3.5 on Centos 6

sudo yum install centos-release-scl
sudo yum info rh-python35
sudo yum install rh-python35
sudo scl enable rh-python35 bash

https://www.2daygeek.com/3-methods-to-install-latest-python3-package-on-centos-6-system/

Pin It

Tuesday, April 16, 2019

Useful icons

https://pngtree.com/free-icon/node-other-database-cluster_741016
https://www.clipartmax.com/

Pin It

Monday, April 8, 2019

Set up Cloudera CDH 5.13 todo

sudo /etc/resolv.conf

Edit Virtual Box forwarding, add port 22

Pin It

Troubleshooting Hive

https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_hiveserver2_start_stop.html

Check if Hive metastore is running
sudo service hive-metastore status
sudo service hive-metastore stop
sudo service hive-metastore start
sudo service hive-metastore restart

sudo service hive-server2 stop

Check if Hive Server2 is running
$ sudo service hive-server2 status

Start HiveServer2:
$ sudo service hive-server2 start

To stop HiveServer2:
$ sudo service hive-server2 stop

Check Hive warehouse
hadoop fs -ls /user/hive/warehouse

Connect using Beeline
beeline -u jdbc:hive2://
!connect -u jdbc:hive2://

Pin It

Wednesday, April 3, 2019

Login Hortonworks Sandbox with WinSCP

ssh root@sandbox-hdp.hortonworks.com -p 2222

First-time login, use hadoop as password
You will be asked to change password upon first login

C:\Users\<username>\.ssh

Pin It