How does Apache Spark read a parquet file
In this post I will try to explain what happens when Apache Spark tries to read a parquet file. Apache Parquet is a popular columnar storage format which stores its data as a bunch of files. Typically these files are stored on HDFS. In a seprate post I will explain more details about the internals of Parquet, but for here we focus on what happens when you call
val parquetFileDF = spark.read.parquet("intWithPayload.parquet")
as documented in the Spark SQL programming guide.