Apache Spark: Reading csv using custom timestamp format

Knoldus

Hello Friends, hope you all are doing well.

In this blog, we are considering a situation where I wanted to read a CSV through spark, but the CSV contains some timestamp columns in it. Is this going to be a problem while inferring schema at the time of reading the csv using spark?

Well, the answer may be No, if the csv have the timestamp field in the specific yyyy-MM-dd hh:mm:ss format. In this particular case, the spark csv reader can infer it to timestamp considering it as the default format.

id,name,age,joining_date,wedding_date
1,Joseph,25,1999-09-04 45:50:46,2014-11-22 00:00:00

When you read the schema of the dataframe after reading the csv, you will see that every field have been inferred correctly by the csv,

root
 |-- id: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- age: integer (nullable = true)
 |-- joining_date: timestamp (nullable = true)
 |-- wedding_date: timestamp (nullable = true)

But, what if the timestamp fields in the csv…

View original post 457 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s