Friday, February 6, 2015

Pyspark example: parse, filter and agregation

PYSPARK

file = sc.textFile("hdfs://nodo1/claudio/partesin.csv")
header = file.take(1)[0]
rows = file.filter(lambda line: line != header)
enTuples = rows.map(lambda x: x.split(","))
enKeyValuePairs = enTuples.map(lambda x: (x[16], int(x[18]))).reduceByKey(lambda a,b: a+b).take(10)

1 comment:

  1. Thank You. Short and Sweet. When i was searching about pyspark filter example i found this link has a hands on practical guide http://www.geoinsyssoft.com/filter-command-pyspark-2/. Thank You Very much.

    ReplyDelete