Hello Friends, hope you all are doing well.
In this blog, we are considering a situation where I wanted to read a CSV through spark, but the CSV contains some timestamp columns in it. Is this going to be a problem while inferring schema at the time of reading the csv using spark?
Well, the answer may be No, if the csv have the timestamp field in the specific yyyy-MM-dd hh:mm:ss format. In this particular case, the spark csv reader can infer it to timestamp considering it as the default format.
id,name,age,joining_date,wedding_date 1,Joseph,25,1999-09-04 45:50:46,2014-11-22 00:00:00
When you read the schema of the dataframe after reading the csv, you will see that every field have been inferred correctly by the csv,
root |-- id: integer (nullable = true) |-- name: string (nullable = true) |-- age: integer (nullable = true) |-- joining_date: timestamp (nullable = true) |-- wedding_date: timestamp (nullable = true)
But, what if the timestamp fields in the csv…
View original post 457 more words