How to concatenate pandas dataframe date and different time formats to single timestamp? - python

I have two columns in a pandas data frame as outlined below. Notice how some of the EVENT_TIME is in hh.mm.ss, some is in hh:mm:ss AM/PM format.
When running...
import pandas
df['EVENT_DATE'] = pd.to_datetime(df['EVENT_DATE'], format='%Y%m%d')
print(df['EVENT_DATE'])
...I can get EVENT_DATE in a consumable (for my purposes) format (e.g. 1999-07-28).
But when running...
df['EVENT_TIME'] = pd.to_datetime(df['EVENT_TIME'], format='%H.%M.%S', errors='coerce')
df['EVENT_TIME'] = pd.to_datetime(df['EVENT_TIME'], format='%I:%M:%S %p', errors='coerce')
print(df['EVENT_TIME'])
...the time data looks goofy.
1900-01-01 16:40:00
1900-01-01 15:55:00
1900-01-01 14:30:00
1900-01-01 13:26:00
NaT
NaT
NaT
NaT
How do I concatenate the date and times (which include multiple time formats) in a single timestamp?
I appreciate your help as I'm new to python (obviously!).

Using fillna
s1=pd.to_datetime(df['EVENT_TIME'], format='%H.%M.%S', errors='coerce')
s2=pd.to_datetime(df['EVENT_TIME'], format='%I:%M:%S %p', errors='coerce')
df['EVENT_TIME']=s1.fillna(s2)

Related

Convert 'hhmm' int to proper format

i'm relatively new to Python
I have a column of data which represents time of the day - but in an integer format hhmm - i.e. 1230, 1559.
I understand that this should be converted to a correct time format so that it can be used correctly.
I've spent a while googling for an answer but I haven't found a definitive solution.
Thank you
If need datetimes, also are necessary dates by function to_datetime, for times add dt.time.
Another solution is convert values to timedeltas - but is necessary format HH:MM:SS:
df = pd.DataFrame({'col':[1230,1559]})
df['date'] = pd.to_datetime(df['col'], format='%H%M')
df['time'] = pd.to_datetime(df['col'], format='%H%M').dt.time
s = df['col'].astype(str)
df['td'] = pd.to_timedelta(s.str[:2] + ':' + s.str[2:] + ':00')
print (df)
col date time td
0 1230 1900-01-01 12:30:00 12:30:00 12:30:00
1 1559 1900-01-01 15:59:00 15:59:00 15:59:00
print (df.dtypes)
col int64
date datetime64[ns]
time object
td timedelta64[ns]
dtype: object

Hourly average data of CSV data

My data is in CSV format which is minute resolution. It looks like
Timestamp value
6/10/2018 0:00 23.9
6/10/2018 0:01 19.8
6/10/2018 0:02 20.3
-------------------------
-------------------------
6/18/2018 23:59 25.9
Now I need the hourly average of this data. The code I have done so far is
import pandas as pd
df = pd.read_csv("filename.csv")
df['DateTime'] = pd.to_datetime(df['Timestamp'])
df.index = df['DateTime']
df1 = df.resample('H').mean()
print(df1)
But the output is not correct which is as
DateTime Value
2018-06-13 00:00:00 16.19
2018-06-13 01:00:00 20.80
----------------------------
----------------------------
2018-12-06 23:00:00 19.09
The date is far from the actual data table. So please help me to debug it.
Try this
df["DateTime"] = pd.to_datetime(df['Timestamp'], format="%d/%m/%Y %H:%M")
instead this
df['DateTime'] = pd.to_datetime(df['Timestamp'])
pandas has trouble parsing your Datetime column, probably because the string representation begins with the month. I think pandas assumes it is day-first until it is no longer possible, then it goes month-first.
You should specify a format string :
df['DateTime'] = pd.to_datetime(df['Timestamp'], format='%m/%d/%Y %H:%M')
Conventions for string format are in this page :
https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

Formatting datetimes without two digits in month and day?

I have a dataframe that has a particular column with datetimes in a format outputted in the following format:
df['A']
1/23/2008 15:41
3/10/2010 14:42
10/14/2010 15:23
1/2/2008 11:39
4/3/2008 13:35
5/2/2008 9:29
I need to convert df['A'] into df['Date'], df['Time'], and df['Timestamp'].
I tried to first convert df['A'] to a datetime by using
df['Datetime'] = pd.to_datetime(df['A'],format='%m/%d/%y %H:%M')
from which I would've created my three columns above, but my formatting codes for %m/%d do not pick up the single digit month and days.
Does anyone know a quick fix to this?
There's a bug with your format. As #MaxU commented, if you don't pass a format argument, then pandas will automagically convert your column to datetime.
df['Timestamp'] = pd.to_datetime(df['A'])
Or, to fix your code -
df['Timestamp'] = pd.to_datetime(df['A'], format='%m/%d/%Y %H:%M')
For your first query, use dt.normalize or, dt.floor (thanks, MaxU, for the suggestion!) -
df['Date'] = df['Timestamp'].dt.normalize()
Or,
df['Date'] = df['Timestamp'].dt.floor('D')
For your second query, use dt.time.
df['Time'] = df['Timestamp'].dt.time
df.drop('A', 1)
Date Time Timestamp
0 2008-01-23 15:41:00 2008-01-23 15:41:00
1 2010-03-10 14:42:00 2010-03-10 14:42:00
2 2010-10-14 15:23:00 2010-10-14 15:23:00
3 2008-01-02 11:39:00 2008-01-02 11:39:00
4 2008-04-03 13:35:00 2008-04-03 13:35:00
5 2008-05-02 09:29:00 2008-05-02 09:29:00
I believe you can use %-m instead of %m, if this works in the same way as strftime() function.

Plotting 1D Time Series from 2D Hourly DataFrame

I'm trying to plot a year's worth of utility data downloaded from my utility provider. The data is provided in a matrix where each row is a different day (most recent at the top), and each column is an hour of the day (11:00 AM, 12:00 PM, 1:00 PM, etc). I'd like to transform this 2D DataFrame into a 1D Timeseries, then plot the series.
Using .stack() gets me close, but I can't seem to create a datetime from the date and time column after they are stacked. Also, when plotted it plots the hours correctly from left to right, but the dates descend from left to right. For example it plots the 25th (1am, 2am 3am, etc), 24th (1am, 2am, 3am, etc), 23rd (1am, 2am, 3am, etc). I'm sure this will fix itself after a true datetime is created.
The code below generates a small sample df, but in the real data set all 24 hours are columns and all dates of the year are rows.
df=pd.DataFrame({'Date':['06/25/17','06/24/17','06/23/17'], '12:00 AM':
[1,2,3],'1:00 AM':[4,5,6],'2:00 AM':[7,8,9],})
df.set_index(['Date'], inplace = True)
df
The goal would be to have a series where the index is the time series and the utility usage is the data.
Thank you!
I think you need to unstack your data frame, concatenate the columns Date and level_0 to make a time stamp. Then set the index to the timestamp and drop the extra columns.
df=pd.DataFrame({'Date':['06/25/17','06/24/17','06/23/17'], '12:00 AM':
[1,2,3],'1:00 AM':[4,5,6],'2:00 AM':[7,8,9],})
df.set_index(['Date'], inplace = True)
#Unstack and reset index
df = df.unstack().reset_index()
#concatenate timestamp and convert to datetime
df['Timestamp'] = df['Date'] + ' '+ df['level_0']
df['Timestamp'] = pd.to_datetime(df['Timestamp'],format="%m/%d/%y %I:%M %p")
df =df.sort_values(by='Timestamp')
df = df.set_index('Timestamp')
#drop extra columns
df = df.drop(['Date','level_0'],axis=1)
returns df looking like:
0
Timestamp
2017-06-23 00:00:00 3
2017-06-23 01:00:00 6
2017-06-23 02:00:00 9
2017-06-24 00:00:00 2
2017-06-24 01:00:00 5
2017-06-24 02:00:00 8
2017-06-25 00:00:00 1
2017-06-25 01:00:00 4
2017-06-25 02:00:00 7
You could then plot your time series with
df.plot()
Yielding:

Concatenate two dataframe columns as one timestamp

I'm working on a pandas dataframe, one of my column is a date (YYYYMMDD), another one is an hour (HH:MM), I would like to concatenate the two column as one timestamp or datetime64 column, to later use that column as an index (for a time series). Here is the situation :
Do you have any ideas? The classic pandas.to_datetime() seems to work only if the columns contain hours only, day only and year only, ... etc...
Setup
df
Out[1735]:
id date hour other
0 1820 20140423 19:00:00 8
1 4814 20140424 08:20:00 22
Solution
import datetime as dt
#convert date and hour to str, concatenate them and then convert them to datetime format.
df['new_date'] = df[['date','hour']].astype(str).apply(lambda x: dt.datetime.strptime(x.date + x.hour, '%Y%m%d%H:%M:%S'), axis=1)
df
Out[1756]:
id date hour other new_date
0 1820 20140423 19:00:00 8 2014-04-23 19:00:00
1 4814 20140424 08:20:00 22 2014-04-24 08:20:00

Resources