Finally Spark 2 runs on CDH 5.15! :)



These are the things that I needed to setup or troubleshoot before it got running:

Upgrade Cloudera Manager 

to the latest version by following the instructions on this website
https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_ag_upgrading_cm.html

Note: to get Cloudera Manager Express to run on VM with less memory, use --force
sudo /home/cloudera/cloudera-manager --pause --express --force

The important steps would be as follows:-
1) Download cloudera-manager.repo with wget https://archive.cloudera.com/cm5/redhat/6/x86_64/cm/cloudera-manager.repo
2)Put cloudera-manager.repo inside /etc/yum.repos.d/
3) Run these commands
sudo yum clean all
sudo yum upgrade cloudera-manager-server cloudera-manager-daemons cloudera-manager-agent

Upgrade JDK 1.7 to 1.8

Download JDK 1.8 or latest version of JDK and unzip to chosen folder.
I put mine in /usr/java/jdk1.8
Set the JAVA_HOME path in .bashrc
Set the JAVA_HOME in Cloudera Managers->Hosts->Configuration->Java Home Directory

Upgrade CDH to latest version

Parcels->Configuration->Remote Parcel Repository URLs
https://archive.cloudera.com/cdh5/parcels/latest/

Download Spark 2 Parcel

Parcels->Configuration->Remote Parcel Repository URLs
https://archive.cloudera.com/spark2/parcels/latest/

Add the Spark2 CSD

Download http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.1.0.cloudera2.jar
Put it in /opt/cloudera/csd

Distribute and activate the CDH and Spark 2

Parcels
Then click Distribute
Then click Activate

Adjusted Java heap size (Not sure if this was really necessary)

sudo nano /etc/default/cloudera-scm-server
Change CMF_JAVA_OPTS and  set heap size -Xmx parameter to 4 gb instead of default 2gb and maximum
export CMF_JAVA_OPTS="-Xmx4G -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"

Create directory for spark and give it write permission

sudo -u hdfs dfs -chmod 777 /user/spark

sudo -u hdfs hadoop fs -chmod 777 /user/spark
sudo -u hdfs hadoop fs -mkdir /user/spark/spark2ApplicationHistory
sudo -u hdfs hadoop fs -chmod 777 /user/spark/spark2ApplicationHistory

sudo -u spark hadoop fs -chmod 777 /user/spark/applicationHistory
sudo -u spark hadoop fs -chmod 777 /user/spark/spark2ApplicationHistory

https://community.cloudera.com/t5/Hadoop-101-Training-Quickstart/CDH-5-5-VirtualBox-unable-to-connect-to-Spark-Master-Worker/td-p/34491

Run pyspark2

pyspark2

textFile = spark.read.text("/loudacre/salesStaff.csv")
textFile.count()



Comments

Popular posts from this blog

How to create an organizational chart in your webpage using Google Organization Chart Tools

Embed JSX code with if condition in React app

How to remove “Git” from Windows 7 context menu