Wednesday, January 28, 2015

# Python library imports: numpy, random, sklearn, pandas, etc

yum install python-pip
yum install gcc-c++
pip install pandas
yum install numpy scipy python-matplotlib python-nose
pip install -U scikit-learn
pip install "ipython[notebook]"
for installing PYDOOP, first of all check JDK is installed, then follow the steps below:
$ readlink -f $(which java)
/usr/lib/jvm/java-6-oracle/jre/bin/java
$ export JAVA_HOME=/usr/lib/jvm/java-6-oracle
export HADOOP_HOME=/opt/hadoop
git clone https://github.com/crs4/pydoop.git
export LD_LIBRARY_PATH="${JAVA_HOME}/jre/lib/amd64/server:${LD_LIBRARY_PATH}"
python setup.py build
sudo python setup.py install --skip-build

Monday, January 19, 2015

Pain in the culo setting up IPython Centos 6.5

First pain in the ars is that you are tight to python 2.6 as if you upgrade to 2.7 (because the newest versions of ipython, i.e 2.x, are suported only on python 2.7) you break Yum and this is not a  good idea. Centos 6.5 comes with python 2.6 and Yum is based on it, can;t be changed for clarification.

Then IPython recomends to install 1.0 if you need to stay on python 2.6. So you have to download the source and install it manually (don't forget to install all the dependencies first):
untar the tarball file and run: python setup.py install

Next step is to start the ipython server listening on any IP as by default it starts listening on localhost:8888

ipython notebook --ip='*'
the output errors on logging: "TypeError: super() argument 1 must be type, not classobj"
are not relevants, you still may use ipython without issue.
If you are using Vagrant recall to forward the port to get access from the browser
http://127.0.0.1:8888
en voila !!


Sunday, January 18, 2015

Quick trick update python to version 2.7 on Centos 6.5

yum install centos-release-SCL
yum install python27
Then if you want to use if in your shell you would run something like:
scl enable python27 bash

Disclaimer: this is a trick for using python 2.7 in a separate environment, without removing 2.6 because it would break Yum on this centos version.

Thursday, January 15, 2015

Heads up: starting hadoop

if you see this when you starting or stoping hadoop:


[hadoop@dev ~]$ start-dfs.sh 
13/10/25 22:21:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... 
using builtin-java classes where applicable Starting namenodes on [Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now....... 
sed: -e expression #1, char 6: unknown option to `s' HotSpot(TM): ssh: 
Could not resolve hostname HotSpot(TM): Name or service not known 64-Bit: ssh: 
Could not resolve hostname 64-Bit: 
Name or service not known 

the workaround is to set the following environment variables: 
export HADOOP_HOME=/usr/local/hadoop 
export PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Cheat sheet for Hadoop , up, up !!!!

[root@hulk sbin]# hadoop version
Hadoop 2.4.1
Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1604318
Compiled by jenkins on 2014-06-21T05:43Z
Compiled with protoc 2.5.0
From source with checksum bb7ac0a3c73dc131f4844b873c74b630
This command was run using /usr/local/hadoop-2.4.1/share/hadoop/common/hadoop-common-2.4.1.jar

Check hadoop is running
[root@hulk sbin]# hadoop dfsadmin -report
List the content of home directory
$ hdfs dfs -ls /user/claudio
Upload file from local file to HDFS
$ hdfs dfs -put songs.txt /user/claudio
Cat the content of the file from HDFS
$ hdfs dfs -cat /user/claudio/songs.txt
Change permissions to file
$ hdfs dfs -chmod 700 /user/claudio/songs.txt
Set the replication factor of the file to 4
$ hdfs dfs -setrep -w 4 /user/claudio/songs.txt
Check the file size
$ hdfs dfs -du -h /user/adam/songs.txt
Create subdirectory in your home directory
$ hdfs dfs -mkdir songs
Move files across
$ hdfs dfs -mv songs.txt songs/
Remove directory from HDFS
$ hdfs dfs -rm -r songs