Tuesday, March 24, 2015

Spark Streaming - Using twitter4j package

Install conscript as git8 needs it:
curl https://raw.githubusercontent.com/n8han/conscript/master/setup.sh | sh

Install git8
cs n8han/giter8

Create basic project structure for sbt
g8 chrislewis/basic-project

... and you'll see a trre directory as below:

[root@nodo1 ejemplo-test-1]# ls -lrt
total 4
drwxr-xr-x. 4 root root  28 Mar 19 10:12 src
-rw-r--r--. 1 root root 354 Mar 19 10:12 build.sbt
drwxr-xr-x. 3 root root  42 Mar 19 10:19 project
drwxr-xr-x. 5 root root  75 Mar 19 10:21 target

Downlaod twitter4j.jar 3.0.3 and place under ./target

Install SBT if not done yet
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install sbt

cd ejemplo-test-1
sbt
> compile
> package (produce el *.jar)
> run
[info] Running com.example.ejemplotest1.App
Hello com.example.ejemplo test 1![success] Total time: 0 s, completed Mar 19, 2015 10:22:09 AM

under ./project create assembly.sbt

[root@nodo1 project]# more assembly.sbt
resolvers += Resolver.url("sbt-plugin-releases-scalasbt", url("http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/"))
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")

under ./ create build.sbt

[root@nodo1 ejemplo-test-1]# more build.sbt
name := "ejemplo test 1"

organization := "com.example"

version := "0.1.0-SNAPSHOT"

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
   }
}

scalaVersion := "2.11.2"

crossScalaVersions := Seq("2.10.4", "2.11.2")

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.2.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.2.0"

libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"

initialCommands := "import com.example.ejemplotest1._"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

$ sbt assembly (maybe it needs to be compiled first sbt compile)

$ spark-submit --class com.example1.CollectTweets /opt/spark/ejemplo1/ejemplo-test-1/target/scala-2.11/ejemplo_test_1-assembly-0.1.0-SNAPSHOT.jar  XX YY ZZ WW

Friday, March 6, 2015

Spark Streaming - scala Twitter Popular Tags

[root@nodo1 spark]# pwd
/usr/local/spark

[root@nodo1 spark]# bin/spark-submit \
--class org.apache.spark.examples.streaming.TwitterPopularTags \
--master local[2]  \
./lib/spark-examples-1.2.1-hadoop2.4.0.jar \
pTNQtr6CSXdu YIDalLhnh5ys BbpliqLm3j5Ilzn GzjjzTmid