Posts

Showing posts from January, 2019

utorial: Using pandas with Large Data Sets

https://www.dataquest.io/blog/pandas-big-data/

Adjust width of pandas dataframe

https://stackoverflow.com/questions/52369572/python-how-to-get-data-types-for-all-columns-in-csv-file pd . set_option ( 'max_info_columns' , 200 )

Create a Python 3.6 environment on top of Python 3.7

cd d:\ python -m venv D:\mypython36_venv rem cd d:\mypython36_venv D:\mypython36_venv\Scripts\activate D:\mypython36_venv\Scripts\deactivate This one is for Python 2.7 on Raspberry Pi python -m virtualenv python_venv cd python_venv source mypython/bin/activate

Find out where is mysql data directory

'SHOW VARIABLES WHERE Variable_Name LIKE "%dir" 'SHOW VARIABLES WHERE Variable_Name = "datadir"'

datasets

https://www.chicago.gov/city/en/depts/dhr/dataset/current_employeenamessalariesandpositiontitles.html Sourced from: https://janakiev.com/blog/pandas-groupby/

check if ambari is ready

Image
https://ambari.apache.org/1.2.0/installing-hadoop-using-ambari/content/ambari-chap2-3.html\ 3. Start the Ambari Server To start the Ambari Server: ambari-server start To check the Ambari Server processes: ps -ef | grep Ambari To stop the Ambari Server: ambari-server stop https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Shell

Install mysql on hortonworks sandbox

https://community.hortonworks.com/questions/78566/where-is-mysql-in-hdp-25.html https://community.hortonworks.com/questions/203206/mysql-default-password-first-time-sandbox-login.html

Transfer files from Windows laptop to the Hortonworks VM

On Windows laptop, open a new terminal window, navigate to the folder where you have stored your files. Execute the following commands to transfer the files from your laptop to your Hortonworks VM scp -P 2222 * filetotransfer.txt root@sandbox-hdp.hortonworks.com:/tmp/data

Hortonworks Settings to change

Minimum replicated blocks % Change to zero This will prevent HDFS from going into safe mode, which will in turn cause YARN not to run properly

YARN service starting problem

Check if HDFS is in safe mode hdfs dfsadmin -safemode get Force HDFS to leave safe mode hdfs dfsadmin -safemode leave HDFS safemode command hdfs dfsadmin -safemode [enter | leave | get]