Spark latest nightly master release and docs

Binaries are available at

http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/

You can see the latest docs at

http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/

Prepare Spark api docs for offline browsing

Get the source code file from Spark website

https://spark.apache.org/downloads.html

Use command as

mvn scala:doc

This will build scaladocs for all the projects.

Go to required project target folder and seen what is needed.

If you want nightly release code for latest version

Go to URL below to get the code and see docs

http://people.apache.org/~pwendell/spark-nightly/

https://repository.apache.org/content/repositories/snapshots/org/apache/spark/

Spark Pivot example

Spark 1.6 has Pivot functionality.

Let's try that out.

Create a simple file with following data

cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000

Start the Spark shell with Spark csv

bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"

Load the sample file

scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]

Run simple pivot

scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
|    java|20000|30000|
|     net|15000|48000|
+--------+-----+-----+

Let's try that out.

Create a simple file with following data

cat /tmp/sample.csv
language,year,earning
net,2012,10000
java,2012,20000
net,2012,5000
net,2013,48000
java,2013,30000

Start the Spark shell with Spark csv

bin/spark-shell --packages "com.databricks:spark-csv_2.10:1.2.0"

Load the sample file

scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file:///tmp/sample.csv")
df: org.apache.spark.sql.DataFrame = [language: string, year: int, earning: int]

Run simple pivot

scala> df.groupBy("language").pivot("year","2012","2013").agg(sum("earning")).show
+--------+-----+-----+
|language| 2012| 2013|
+--------+-----+-----+
|    java|20000|30000|
|     net|15000|48000|
+--------+-----+-----+

References

https://github.com/apache/spark/pull/7841

http://sqlhints.com/2014/03/10/pivot-and-unpivot-in-sql-server/