22

I have a dataframe that among other things, contains a column of the number of milliseconds passed since 1970-1-1. I need to convert this column of ints to timestamp data, so I can then ultimately convert it to a column of datetime data by adding the timestamp column series to a series that consists entirely of datetime values for 1970-1-1.

I know how to convert a series of strings to datetime data (pandas.to_datetime), but I can't find or come up with any solution to convert the entire column of ints to datetime data OR to timestamp data.

Austin Capobianco
  • 560
  • 1
  • 4
  • 21

2 Answers2

29

You can specify the unit of a pandas.to_datetime call.

Stolen from here:

# assuming `df` is your data frame and `date` is your column of timestamps

df['date'] = pandas.to_datetime(df['date'], unit='s')

Should work with integer datatypes, which makes sense if the unit is seconds since the epoch.

tdy
  • 229
  • 2
  • 9
R Hill
  • 1,115
  • 11
  • 20
0

I have this Int Columns below:

import pandas as pd
import numpy as np
dplyr_1.dtypes
year             int64
 dplyr           int64
  data.table     int64
   pandas        int64
 apache-spark    int64
dtype: object

Convert the Int column to string:

dplyr_1.year = dplyr_1.year.astype(str)
dplyr_1.dtypes
year             object
 dplyr            int64
  data.table      int64
   pandas         int64
 apache-spark     int64
dtype: object

Make sure to convert the column to str or the output column will be Timestamp('1970-01-01 00:00:00.000002010')

dplyr_1.year = pd.to_datetime(dplyr_1.year)
dplyr_1.year[0]

Timestamp('2010-01-01 00:00:00')

So this is the Timestamp datatype. If you want to check all dtypes and the output column:

dplyr_1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   year           12 non-null     datetime64[ns]
 1    dplyr         12 non-null     int64         
 2     data.table   12 non-null     int64         
 3      pandas      12 non-null     int64         
 4    apache-spark  12 non-null     int64         
dtypes: datetime64[ns](1), int64(4)
memory usage: 608.0 bytes

dplyr_1.year 0 2010-01-01 1 2011-01-01 2 2012-01-01 3 2013-01-01 4 2014-01-01 5 2015-01-01 6 2016-01-01 7 2017-01-01 8 2018-01-01 9 2019-01-01 10 2020-01-01 11 2021-01-01 Name: year, dtype: datetime64[ns]

rubengavidia0x
  • 289
  • 2
  • 15