Map 예제
코드 예시(notebook 코드 참조)
import org.apache.spark._
val lines = sc.textFile("./data/redfox.txt")
lines: org.apache.spark.rdd.RDD[String] = ./data/redfox.txt MapPartitionsRDD[9] at textFile at <console>:34
println(lines.collect().toList)
List(The quick red fox jumped over the lazy brown dogs, I am donghwa)
val rageCaps = lines.map(x => x.toUpperCase)
rageCaps: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[10] at map at <console>:36
println(rageCaps.collect().toList(0))
THE QUICK RED FOX JUMPED OVER THE LAZY BROWN DOGS
flatMap
1) map을 사용할 때
val _words = lines.map(x => x.split(" "))
_words: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[11] at map at <console>:36
_words.collect()
res18: Array[Array[String]] = Array(Array(The, quick, red, fox, jumped, over, the, lazy, brown, dogs), Array(I, am, donghwa))
2) flatMap을 사용할 때
val words = lines.flatMap(x => x.split(" "))
words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[12] at flatMap at <console>:36
words.collect()
res19: Array[String] = Array(The, quick, red, fox, jumped, over, the, lazy, brown, dogs, I, am, donghwa)
val input = sc.textFile("./data/book.txt")
input: org.apache.spark.rdd.RDD[String] = ./data/book.txt MapPartitionsRDD[16] at textFile at <console>:34
input.take(8).foreach(println)
Self-Employment: Building an Internet Business of One
Achieving Financial and Personal Freedom through a Lifestyle Technology Business
By Frank Kane
Copyright � 2015 Frank Kane.
All rights reserved worldwide.
val words = input.flatMap( x => x.split(" "))
words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[17] at flatMap at <console>:36
words.take(15).foreach(println)
Self-Employment:
Building
an
Internet
Business
of
One
Achieving
Financial
and
Personal
Freedom
through
a
Lifestyle
val wordCounts= words.countByValue()
wordCounts: scala.collection.Map[String,Long] = Map(foolproof -> 1, precious -> 1, inflammatory -> 1, referrer, -> 1, hourly -> 3, embedded -> 1, way). -> 1, touch, -> 1, of. -> 3, salesperson -> 5, Leeches -> 1, expansion. -> 1, rate -> 7, appropriate. -> 2, CPA�s -> 1, 2014 -> 2, WELL-MEANING -> 1, Talk -> 5, Much -> 1, Builder" -> 1, plugin -> 3, headache -> 1, purchasing -> 9, China" -> 1, looks -> 2, site, -> 7, ranking -> 2, scare -> 1, hard-earned -> 1, freedom� -> 1, Seattle, -> 3, PULLING -> 1, action. -> 1, accident -> 3, scale. -> 2, looking. -> 1, physically -> 1, 27, -> 1, call. -> 1, contracts -> 3, twofold. -> 1, scenario -> 1, Advertising -> 3, way? -> 2, nudge -> 1, gamble -> 1, ideas -> 19, sketches -> 1, static -> 1, freelancer. -> 1, �PR:� -> 1, joining -> 1, particu...
wordCounts.take(10).foreach(println)
(foolproof,1)
(precious,1)
(inflammatory,1)
(referrer,,1)
(hourly,3)
(embedded,1)
(way).,1)
(touch,,1)
(of.,3)
(salesperson,5)