Dora the Techplorer

Posts

Showing posts from April, 2019

Run pyspark on Juypter notebook

By Ms Dora Chua - April 17, 2019

$which python $whereis python # install EPEL repository first $ sudo yum install epel-release # install python-pip $ sudo yum -y install python-pip sudo pip install --upgrade setuptools wget https://repo.anaconda.com/archive/Anaconda2-5.0.1-Linux-x86_64.sh sudo sh Anaconda2-5.0.1-Linux-x86_64.sh 1) Install PySpark pip install pyspark 2) Install Java 3) Install Jupyter notebook pip install jupyter 4) Install find pip install findspark %env SPARK_HOME=c:\spark # To find out where the pyspark import findspark findspark.init() # Creating Spark Context from pyspark import SparkContext sc = SparkContext("local", "first app") # Calculating words count text_file = sc.textFile("OneSentence.txt") counts = text_file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b) # Printing each word with its respective coun...

Useful icons

By Ms Dora Chua - April 16, 2019

https://pngtree.com/free-icon/node-other-database-cluster_741016 https://www.clipartmax.com/

Set up Cloudera CDH 5.13 todo

By Ms Dora Chua - April 08, 2019

sudo /etc/resolv.conf Edit Virtual Box forwarding, add port 22

Troubleshooting Hive

By Ms Dora Chua - April 08, 2019

https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_hiveserver2_start_stop.html Check if Hive metastore is running sudo service hive-metastore status sudo service hive-metastore stop sudo service hive-metastore start sudo service hive-metastore restart sudo service hive-server2 stop Check if Hive Server2 is running $ sudo service hive-server2 status Start HiveServer2: $ sudo service hive-server2 start To stop HiveServer2: $ sudo service hive-server2 stop Check Hive warehouse hadoop fs -ls /user/hive/warehouse Connect using Beeline beeline -u jdbc:hive2:// !connect -u jdbc:hive2://

Search This Blog

Dora the Techplorer

Posts

Run pyspark on Juypter notebook

Useful icons

Set up Cloudera CDH 5.13 todo

Troubleshooting Hive

Login Hortonworks Sandbox with WinSCP

kill yum process

Troubleshoot DNS on CDH