Finally Spark 2 runs on CDH 5.15! :)
These are the things that I needed to setup or troubleshoot before it got running:
Upgrade Cloudera Manager
to the latest version by following the instructions on this websitehttps://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_ag_upgrading_cm.html
Note: to get Cloudera Manager Express to run on VM with less memory, use --force
sudo /home/cloudera/cloudera-manager --pause --express --force
The important steps would be as follows:-
1) Download cloudera-manager.repo with wget https://archive.cloudera.com/cm5/redhat/6/x86_64/cm/cloudera-manager.repo
2)Put cloudera-manager.repo inside /etc/yum.repos.d/
3) Run these commands
sudo yum clean all
sudo yum upgrade cloudera-manager-server cloudera-manager-daemons cloudera-manager-agent
Upgrade JDK 1.7 to 1.8
Download JDK 1.8 or latest version of JDK and unzip to chosen folder.
I put mine in /usr/java/jdk1.8
Set the JAVA_HOME path in .bashrc
Set the JAVA_HOME in Cloudera Managers->Hosts->Configuration->Java Home Directory
Upgrade CDH to latest version
Parcels->Configuration->Remote Parcel Repository URLshttps://archive.cloudera.com/cdh5/parcels/latest/
Download Spark 2 Parcel
Parcels->Configuration->Remote Parcel Repository URLshttps://archive.cloudera.com/spark2/parcels/latest/
Add the Spark2 CSD
Download http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.1.0.cloudera2.jarPut it in /opt/cloudera/csd
Distribute and activate the CDH and Spark 2
ParcelsThen click Distribute
Then click Activate
Adjusted Java heap size (Not sure if this was really necessary)
sudo nano /etc/default/cloudera-scm-serverChange CMF_JAVA_OPTS and set heap size -Xmx parameter to 4 gb instead of default 2gb and maximum
export CMF_JAVA_OPTS="-Xmx4G -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"
Create directory for spark and give it write permission
sudo -u hdfs dfs -chmod 777 /user/sparksudo -u hdfs hadoop fs -chmod 777 /user/spark
sudo -u hdfs hadoop fs -mkdir /user/spark/spark2ApplicationHistory
sudo -u hdfs hadoop fs -chmod 777 /user/spark/spark2ApplicationHistory
sudo -u spark hadoop fs -chmod 777 /user/spark/applicationHistory
sudo -u spark hadoop fs -chmod 777 /user/spark/spark2ApplicationHistory
https://community.cloudera.com/t5/Hadoop-101-Training-Quickstart/CDH-5-5-VirtualBox-unable-to-connect-to-Spark-Master-Worker/td-p/34491
Run pyspark2
pyspark2textFile = spark.read.text("/loudacre/salesStaff.csv")
textFile.count()
Comments
Post a Comment