read h5 and remove prefix ‘b’

h5文件是存储大数据的一种较好格式,在读取该文件的时候,有时候会有prefix ‘b’.因此,要设法去除该prefix。

# -*- coding: utf-8 -*-
import tables as tb
import pandas as pd
import numpy as np
import time

time0=time.time()
pth=’d:/download/’

# 读取交易的数据
data_trading=pth+’Trading_v01.h5′
filem=tb.open_file(data_trading,mode=’a’,driver=”H5FD_CORE”)
tb_trading=filem.get_node(where=’/’, name=’wind_data’)
df=pd.DataFrame.from_records(tb_trading[:])
time1=time.time()
print(‘\ntime on reading data: %6.3fs’ %(time1-time0))
# in python3, remove prefix ‘b’
for key in [‘Date’, ‘Code’]:
df[key] = df[key].str.decode(“utf-8”)

# the following two methods are
# method 1
#str_df = df.loc[:,[‘Date’,’Code’]]
#str_df = str_df.stack().str.decode(‘utf-8′).unstack()
#for col in str_df:
# df[col] = str_df[col]
#method 2
#df.loc[:,’Date’]=[[dt.decode(‘utf-8′)] for dt in df.loc[:,’Date’]]
#df.loc[:,’Code’]=[[cd.decode(‘utf-8′)] for cd in df.loc[:,’Code’]]

time2=time.time()
print(“\ntime on removing prefix ‘b’: %6.3fs” %(time2-time1))
print(‘\ntotal time: %6.3fs’ %(time2-time0))

运行结果如下

time on reading data: 1.508s

time on removing prefix ‘b’: 9.315s

total time: 10.823s

不知道有没有更快捷的方式。

分享到: 微信 新浪微博 更多
Print Friendly, PDF & Email

发表评论

电子邮件地址不会被公开。 必填项已用*标注