upsampling with multiindex

I want to resampling(more specifically, upsampling) data with multiindex.

Thanks for stackoverflow, https://stackoverflow.com/questions/39737890/, I find a solution.  the code has some minor error, I have corret it.

# -*- coding: utf-8 -*-
“””
upsampling with multiindex
“””

import pandas as pd
import datetime
import numpy as np
np.random.seed(1234)

arrays = [np.sort([datetime.date(2016, 8, 31),
datetime.date(2016, 7, 31),
datetime.date(2016, 6, 30)]*5),
[‘A’, ‘B’, ‘C’, ‘D’, ‘E’]*3]
df = pd.DataFrame(np.random.randn(15, 4), index=arrays)
df.index.rename([‘date’, ‘id’], inplace=True)
print(df)

df1=df.unstack().resample(‘W-FRI’).last().ffill().stack()
print(‘\n df1:\n’,df1)

the result is as following:

                         0                    1                2                 3
date            id
2016-06-30 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-31 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-31 A 0.076200 -0.566446 0.036142 -2.074978
B 0.247792 -0.897157 -0.136795 0.018289
C 0.755414 0.215269 0.841009 -1.445810
D -1.401973 -0.100918 -0.548242 -0.144620
E 0.354020 -0.035513 0.565738 1.545659

df1:
0              1                2                 3
date            id
2016-07-01 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-08 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-15 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-22 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-29 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-08-05 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-12 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-19 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-26 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-09-02 A 0.076200 -0.566446 0.036142 -2.074978
B 0.247792 -0.897157 -0.136795 0.018289
C 0.755414 0.215269 0.841009 -1.445810
D -1.401973 -0.100918 -0.548242 -0.144620
E 0.354020 -0.035513 0.565738 1.545659

read h5 and remove prefix ‘b’

h5文件是存储大数据的一种较好格式,在读取该文件的时候,有时候会有prefix ‘b’.因此,要设法去除该prefix。

# -*- coding: utf-8 -*-
import tables as tb
import pandas as pd
import numpy as np
import time

time0=time.time()
pth=’d:/download/’

# 读取交易的数据
data_trading=pth+’Trading_v01.h5′
filem=tb.open_file(data_trading,mode=’a’,driver=”H5FD_CORE”)
tb_trading=filem.get_node(where=’/’, name=’wind_data’)
df=pd.DataFrame.from_records(tb_trading[:])
time1=time.time()
print(‘\ntime on reading data: %6.3fs’ %(time1-time0))
# in python3, remove prefix ‘b’
for key in [‘Date’, ‘Code’]:
df[key] = df[key].str.decode(“utf-8”)

# the following two methods are
# method 1
#str_df = df.loc[:,[‘Date’,’Code’]]
#str_df = str_df.stack().str.decode(‘utf-8′).unstack()
#for col in str_df:
# df[col] = str_df[col]
#method 2
#df.loc[:,’Date’]=[[dt.decode(‘utf-8′)] for dt in df.loc[:,’Date’]]
#df.loc[:,’Code’]=[[cd.decode(‘utf-8′)] for cd in df.loc[:,’Code’]]

time2=time.time()
print(“\ntime on removing prefix ‘b’: %6.3fs” %(time2-time1))
print(‘\ntotal time: %6.3fs’ %(time2-time0))

运行结果如下

time on reading data: 1.508s

time on removing prefix ‘b’: 9.315s

total time: 10.823s

不知道有没有更快捷的方式。