PYSPARK
file = sc.textFile("hdfs://nodo1/claudio/partesin.csv")
header = file.take(1)[0]
rows = file.filter(lambda line: line != header)
enTuples = rows.map(lambda x: x.split(","))
enKeyValuePairs = enTuples.map(lambda x: (x[16], int(x[18]))).reduceByKey(lambda a,b: a+b).take(10)
Thank You. Short and Sweet. When i was searching about pyspark filter example i found this link has a hands on practical guide http://www.geoinsyssoft.com/filter-command-pyspark-2/. Thank You Very much.
ReplyDelete