read h5 and remove prefix ‘b’ – Renke's 所见所思

h5文件是存储大数据的一种较好格式，在读取该文件的时候，有时候会有prefix ‘b’.因此，要设法去除该prefix。

# -*- coding: utf-8 -*-
import tables as tb
import pandas as pd
import numpy as np
import time

time0=time.time()
pth=’d:/download/’

# 读取交易的数据
data_trading=pth+’Trading_v01.h5′
filem=tb.open_file(data_trading,mode=’a’,driver=”H5FD_CORE”)
tb_trading=filem.get_node(where=’/’, name=’wind_data’)
df=pd.DataFrame.from_records(tb_trading[:])
time1=time.time()
print(‘\ntime on reading data: %6.3fs’ %(time1-time0))
# in python3, remove prefix ‘b’
for key in [‘Date’, ‘Code’]:
df[key] = df[key].str.decode(“utf-8”)

# the following two methods are
# method 1
#str_df = df.loc[:,[‘Date’,’Code’]]
#str_df = str_df.stack().str.decode(‘utf-8′).unstack()
#for col in str_df:
# df[col] = str_df[col]
#method 2
#df.loc[:,’Date’]=[[dt.decode(‘utf-8′)] for dt in df.loc[:,’Date’]]
#df.loc[:,’Code’]=[[cd.decode(‘utf-8′)] for cd in df.loc[:,’Code’]]

time2=time.time()
print(“\ntime on removing prefix ‘b’: %6.3fs” %(time2-time1))
print(‘\ntotal time: %6.3fs’ %(time2-time0))

运行结果如下

time on reading data: 1.508s

time on removing prefix ‘b’: 9.315s

total time: 10.823s

不知道有没有更快捷的方式。

分享到：微信新浪微博更多

日	一	二	三	四	五	六
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

发表评论 取消回复

发表评论取消回复