Posts

Showing posts from April, 2019

Run pyspark on Juypter notebook

$which python $whereis python # install EPEL repository first $ sudo yum install epel-release # install python-pip $ sudo yum -y install python-pip sudo pip install --upgrade setuptools wget https://repo.anaconda.com/archive/Anaconda2-5.0.1-Linux-x86_64.sh sudo sh Anaconda2-5.0.1-Linux-x86_64.sh 1) Install PySpark pip install pyspark 2) Install Java 3) Install Jupyter notebook pip install jupyter 4) Install find pip install findspark %env SPARK_HOME=c:\spark # To find out where the pyspark import findspark findspark.init() # Creating Spark Context from pyspark import SparkContext sc = SparkContext("local", "first app") # Calculating words count text_file = sc.textFile("OneSentence.txt") counts = text_file.flatMap(lambda line: line.split(" ")) \              .map(lambda word: (word, 1)) \              .reduceByKey(lambda a, b: a + b) # Printing each word with its respective count output = counts.collect()

Useful icons

https://pngtree.com/free-icon/node-other-database-cluster_741016 https://www.clipartmax.com/

Set up Cloudera CDH 5.13 todo

sudo /etc/resolv.conf Edit Virtual Box forwarding, add port 22

Troubleshooting Hive

https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_hiveserver2_start_stop.html Check if Hive metastore is running sudo service hive-metastore status sudo service hive-metastore stop sudo service hive-metastore start sudo service hive-metastore restart sudo service hive-server2 stop Check if Hive Server2 is running $ sudo service hive-server2 status Start HiveServer2: $ sudo service hive-server2 start To stop HiveServer2: $ sudo service hive-server2 stop Check Hive warehouse hadoop fs -ls /user/hive/warehouse Connect using Beeline beeline -u jdbc:hive2:// !connect -u jdbc:hive2://

Login Hortonworks Sandbox with WinSCP

ssh root@sandbox-hdp.hortonworks.com -p 2222 First-time login, use hadoop as password You will be asked to change password upon first login C:\Users\<username>\.ssh

kill yum process

yum list ps -ef | grep 13023 kill -9 13023

Troubleshoot DNS on CDH

sudo nano /etc/resolv.conf nameserver 8.8.8.8 # /etc/init.d/network stop # /etc/init.d/network start sudo systemctl restart network https://www.cyberciti.biz/faq/linux-restart-network-interface/