Run pyspark on Juypter notebook

By Ms Dora Chua - April 17, 2019

$which python
$whereis python

# install EPEL repository first
$ sudo yum install epel-release
# install python-pip
$ sudo yum -y install python-pip

sudo pip install --upgrade setuptools

wget https://repo.anaconda.com/archive/Anaconda2-5.0.1-Linux-x86_64.sh

sudo sh Anaconda2-5.0.1-Linux-x86_64.sh

1) Install PySpark
pip install pyspark

2) Install Java

3) Install Jupyter notebook
pip install jupyter

4) Install find
pip install findspark

%env SPARK_HOME=c:\spark

# To find out where the pyspark
import findspark
findspark.init()

# Creating Spark Context
from pyspark import SparkContext
sc = SparkContext("local", "first app")

# Calculating words count
text_file = sc.textFile("OneSentence.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)

# Printing each word with its respective count
output = counts.collect()
for (word, count) in output:
print("{}: {}".format(word, count))

# Stopping Spark Context
sc.stop()

Install Python 3.5 on Centos 6

sudo yum install centos-release-scl
sudo yum info rh-python35
sudo yum install rh-python35
sudo scl enable rh-python35 bash

https://www.2daygeek.com/3-methods-to-install-latest-python3-package-on-centos-6-system/

Search This Blog

Dora the Techplorer

Run pyspark on Juypter notebook

Install Python 3.5 on Centos 6

Comments

Post a Comment

Popular posts from this blog

How to create an organizational chart in your webpage using Google Organization Chart Tools

How to remove “Git” from Windows 7 context menu

How to add Diggit, Technorati, Del.icio.us, Stumbleupon, Reddit and pInterest buttons to your Blogger posts