Python – Renke's 所见所思

how to get the maximum result with quadratic programming on CVXOPT ？

the question is posted in follow page as well. https://stackoverflow.com/questions/47119580/how-to-get-the-maximum-result-with-quadratic-programming-on-cvxopt

I want to use CVXOPT to solve a quadratic programming.

'''
                        Value       =2      =3     =5        
                    Value*return    >=9    >=9      no requirement       
                   Value*Duration   <=5    <=9      no requirement
      Value  return  Duration           
        4       5       2           x11      x12   x13    
        6       4       4           x21      x22   x23

the matrix is like about, I want to get x11, x12,…x23.

the subject is as following:

G*x<=h
-5*x11         -4*x21      <=-9
      -5*x12         -4*x22<=-9
                           <=0
 2*x11         +4*x21      <=5
       2*x12        +4*x22 <=9
                           <=0


A*x=b

x11+x12+x13           =4
           x21+x22+x23=6
x11       +x21        =2
    x12       +x22    =3
        x13       +x23=5

I want get the target Max (x11^2+x12^2+x13^2+x21^2+x22^2+x23^2)

I wrote the following code:

import cvxopt as op
p_tg=op.matrix(0.0,(6,6))
p_tg[::7]=1
P = 2*op.matrix(p_tg)

q = op.matrix( [0.0,0,0,0,0,0])

G=op.matrix([[-5.0,0,0,2,0,0],
             [0,-5,0,0,2,0],
             [0,0,0,0,0,0],
             [-4,0,0,4,0,0],
             [0,-4,0,0,4,0],
             [0,0,0,0,0,0]])

h=op.matrix([-9.0,-9,0,5,9,0])  

A=op.matrix([[1.0,0,1,0,0],
             [1,0,0,1,0],
             [1,0,0,0,1],
             [0,1,1,0,0],
             [0,1,0,1,0],
             [0,1,0,0,1]])              
b=op.matrix([4,6,2.0,3,5])

sol=op.solvers.qp(P, q, G, h, A, b)

print(sol['x'])

of course, the above code get the minimum, not the maximum.

how to get the max result? I try to p_tg[::7]=-1, while it report error.

raise ValueError(“Rank(A) < p or Rank([P; A; G]) < n”)

ValueError: Rank(A) < p or Rank([P; A; G]) < n

cvxopt for linear and quadratic program

# -*- coding: utf-8 -*-
“””
Created on Sun Nov 5 08:03:26 2017
http://cvxopt.org/examples/tutorial/qp.html
“””

#1. Solving a linear program
”’ Linear programs can be specified via the solvers.lp() function.
As an example, we can solve the problem

minimize 2*x1+x2
subject to
-x1+x2<=1
x1+x2>=2
x2>=0
x1-2*x2<=4

the st is equal to
-x1+x2<=1
-x1-x2<=-2
-x2<=0
x1-2*x2<=4

”’

import cvxopt as op
A = op.matrix([ [-1.0, -1.0, 0.0, 1.0],
[1.0, -1.0, -1.0, -2.0] ])
b = op.matrix( [ 1.0, -2.0, 0.0, 4.0 ])
c = op.matrix( [ 2.0, 1.0 ])
sol=op.solvers.lp(c,A,b)
print(sol[‘x’])

#%%2. Solving a quadratic program

”’
minimize 2*x1**2+x2**2+x1*x2+x1+x2
st
x1>=0
x2>=0
x1+x2=1

the object and st are equal to

[x1, x2] *[[2,.5],[.5,1]] *[[x1],[x2]]

st
-x1 <=0
-x2<=0
x1+x2 =1

op.solvers.qp(P, q, G=None, h=None, A=None, b=None, solver=None,
kktsolver=None, initvals=None, **kwargs)

Solves a quadratic program
minimize (1/2)*x’*P*x + q’*x
subject to G*x <= h
A*x = b.
Input arguments.
P is a n x n dense or sparse ‘d’ matrix with the lower triangular part of P stored in the lower triangle. Must be positive semidefinite.
q is an n x 1 dense ‘d’ matrix.
G is an m x n dense or sparse ‘d’ matrix.
h is an m x 1 dense ‘d’ matrix.
A is a p x n dense or sparse ‘d’ matrix.
b is a p x 1 dense ‘d’ matrix or None.
solver is None or ‘mosek’.
The default values for G, h, A and b are empty matrices with zero rows.
”’

P = 2*op.matrix([ [2, .5],
[.5, 1] ])
q = op.matrix( [1.0, 1.0])

G = op.matrix([[-1.0,0.0],
[0.0,-1.0]])
h = op.matrix( [0.0,0.0])

A = op.matrix( [1.0, 1.0], (1,2)) # with 1 row, 2 columns
b = op.matrix(1.0)
sol=op.solvers.qp(P, q, G, h, A, b)

print(sol[‘x’])

solve LP problem with Python

I need to solve a problem, at first I tried to use VBA, while I found it’s a Linear Programming or nonlinear programming problem. Excel does not works for the problem because of it beyound the limit of Excel, then I think I may need to use Python to solve this problem.

I read some articles in stackoverflow and want to try cvxopt and cvxpy.

I visited official website of cvxopt(www.cvxopt.org) and cvxpy(www.cvxpy.org), and found they does not support Python 3.6, thanks to https://www.lfd.uci.edu/~gohlke/pythonlibs , which provide support version for python 3.6.

well, I may also need to try pyscipopt in Python.

Multiprocessing

最近在写一个代码，感觉要run很久，在群里问，有人建议采用multiprocessing。开始看官方文档。然而有一个程序却测试跑不出来。

# -*- coding: utf-8 -*-
"""
Created on Sun Jul 30 19:42:12 2017
multiprocessing_test
"""

import multiprocessing as mp

def f(name):
      print('hello', name)

if __name__ == '__main__':
       p = mp.Process(target=f, args=('bob',))
       p.start()
       p.join()
# 在anaconda的spyder中，一定要有这一句才ok。
# 但是在fluent python中，不需要这一句
      p.run() #output为 hello bob

#下面的程序运行正常
def f2(x):
     return x**x

if __name__ == '__main__':
     with mp.Pool(5) as p:
            print(p.map(f2, [1, 2, 3])) #output为[1, 4, 9]

groupby_fillna_for special columns

Well, there is no problem to merge data and fillna.

sometimes we want to fillna with groupby, more specifically, we need to fillna with different methods, some are ffill, while others are fillna(0).

the following code can deal with this situation.

# -*- coding: utf-8 -*-
“””
groupby_fillna_special columns
https://stackoverflow.com/questions/21284585
“””

import pandas as pd
from io import StringIO

# fillna with groupby
csvdata=”””
STK_ID RPT_Date sales opr_pft net_pft
002138 20130331 2.0703 0.3373 0.2829
002138 20130630 NaN NaN NaN
002138 20130930 7.4993 1.2248 1.1630
002138 20140122 NaN NaN NaN

600004 20130331 11.8429 3.0816 2.1637
600004 20130630 24.6232 6.2152 4.5135
600004 20130930 37.9673 9.2088 6.6463
600004 20140122 NaN NaN NaN

600809 20130331 27.9517 9.9426 7.5182
600809 20130630 40.6460 13.9414 9.8572
600809 20130930 53.0501 16.8081 11.8605
600809 20140122 NaN NaN NaN
“””
sio=StringIO(csvdata)

df=pd.read_csv(sio, sep=’\s+’,header=0,dtype={“STK_ID”:object,”RPT_Date”:object,
“sales”:float,’apr_pft’:float,
‘net_pft’:float})
df.set_index([‘STK_ID’,’RPT_Date’],inplace=True)

#ffill all columns with groupby
#DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)[source]
df.groupby(level=0).apply(lambda grp: grp.fillna(method=’ffill’))

#fillnow the lastrow with groupby
def f(g):
last = len(g.values)-1
g.iloc[last,:] = g.iloc[last-1,:]
return g
df.groupby(level=0).apply(f)

# fill certain columns
df[[‘sales’,’opr_pft’]]=df.groupby(level=0)[[‘sales’,’opr_pft’]].fillna(method=’ffill’)
#or you can write as following
df[[‘sales’,’opr_pft’]]=df.groupby(level=0)[[‘sales’,’opr_pft’]].apply(lambda grp: grp.fillna(method=’ffill’))

upsampling with multiindex

I want to resampling(more specifically, upsampling) data with multiindex.

Thanks for stackoverflow, https://stackoverflow.com/questions/39737890/, I find a solution. the code has some minor error, I have corret it.

# -*- coding: utf-8 -*-
“””
upsampling with multiindex
“””

import pandas as pd
import datetime
import numpy as np
np.random.seed(1234)

arrays = [np.sort([datetime.date(2016, 8, 31),
datetime.date(2016, 7, 31),
datetime.date(2016, 6, 30)]*5),
[‘A’, ‘B’, ‘C’, ‘D’, ‘E’]*3]
df = pd.DataFrame(np.random.randn(15, 4), index=arrays)
df.index.rename([‘date’, ‘id’], inplace=True)
print(df)

df1=df.unstack().resample(‘W-FRI’).last().ffill().stack()
print(‘\n df1:\n’,df1)

the result is as following:

0 1 2 3
date id
2016-06-30 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-31 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-31 A 0.076200 -0.566446 0.036142 -2.074978
B 0.247792 -0.897157 -0.136795 0.018289
C 0.755414 0.215269 0.841009 -1.445810
D -1.401973 -0.100918 -0.548242 -0.144620
E 0.354020 -0.035513 0.565738 1.545659

df1:
0 1 2 3
date id
2016-07-01 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-08 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-15 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-22 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-07-29 A 0.471435 -1.190976 1.432707 -0.312652
B -0.720589 0.887163 0.859588 -0.636524
C 0.015696 -2.242685 1.150036 0.991946
D 0.953324 -2.021255 -0.334077 0.002118
E 0.405453 0.289092 1.321158 -1.546906
2016-08-05 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-12 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-19 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-08-26 A -0.202646 -0.655969 0.193421 0.553439
B 1.318152 -0.469305 0.675554 -1.817027
C -0.183109 1.058969 -0.397840 0.337438
D 1.047579 1.045938 0.863717 -0.122092
E 0.124713 -0.322795 0.841675 2.390961
2016-09-02 A 0.076200 -0.566446 0.036142 -2.074978
B 0.247792 -0.897157 -0.136795 0.018289
C 0.755414 0.215269 0.841009 -1.445810
D -1.401973 -0.100918 -0.548242 -0.144620
E 0.354020 -0.035513 0.565738 1.545659

read h5 and remove prefix ‘b’

h5文件是存储大数据的一种较好格式，在读取该文件的时候，有时候会有prefix ‘b’.因此，要设法去除该prefix。

# -*- coding: utf-8 -*-
import tables as tb
import pandas as pd
import numpy as np
import time

time0=time.time()
pth=’d:/download/’

# 读取交易的数据
data_trading=pth+’Trading_v01.h5′
filem=tb.open_file(data_trading,mode=’a’,driver=”H5FD_CORE”)
tb_trading=filem.get_node(where=’/’, name=’wind_data’)
df=pd.DataFrame.from_records(tb_trading[:])
time1=time.time()
print(‘\ntime on reading data: %6.3fs’ %(time1-time0))
# in python3, remove prefix ‘b’
for key in [‘Date’, ‘Code’]:
df[key] = df[key].str.decode(“utf-8”)

# the following two methods are
# method 1
#str_df = df.loc[:,[‘Date’,’Code’]]
#str_df = str_df.stack().str.decode(‘utf-8′).unstack()
#for col in str_df:
# df[col] = str_df[col]
#method 2
#df.loc[:,’Date’]=[[dt.decode(‘utf-8′)] for dt in df.loc[:,’Date’]]
#df.loc[:,’Code’]=[[cd.decode(‘utf-8′)] for cd in df.loc[:,’Code’]]

time2=time.time()
print(“\ntime on removing prefix ‘b’: %6.3fs” %(time2-time1))
print(‘\ntotal time: %6.3fs’ %(time2-time0))

运行结果如下

time on reading data: 1.508s

time on removing prefix ‘b’: 9.315s

total time: 10.823s

不知道有没有更快捷的方式。

Add new column to Pytables

h5是很好的保存文件的格式，该文件格式有一个优势是支持快速读取，但是在写入列上有些麻烦。比较好的写入新列的方式，是将新生成的数据写入到一个新的表中，然后删除原来的表。

以下代码就是实现上述功能（此后文章未加说明，均为在Windows+Python 3 上实现）

# -*- coding: utf-8 -*-

# read data from h5 file and add one colume, then write new data to a new file

import tables as tb

import pandas as pd

import numpy as np

import time

import sys

time0=time.time()

pth=‘d:/download/’

data_file=pth+‘data0.h5’

#read data from the file

filem=tb.open_file(data_file,mode=‘a’,driver=“H5FD_CORE”)

df0=filem.get_node(where=‘/’,name=‘FinancialData’)

df = pd.DataFrame.from_records(df0[:])

df[‘UniqueorNot’]=np.where(df.duplicated(subset=[‘Date’,‘Code’]),‘Duplicated’,‘Unique’)

# convert object to string, because table can not accept object

def foo(atype):

  if atype==np.object_:

  return ‘S10’

  return atype

def df_dtype(df):

cols = df.columns

  if  sys.version_info[0] == 2: # python 2 needs .encode() but 3 does not

  types = [(cols[i].encode(), foo(df[k].dtype.type)) for (i, k) in enumerate(cols)]

  else:

  types = [(cols[i], foo(df[k].dtype.type)) for (i, k) in enumerate(cols)]

dtype = np.dtype(types)

  return dtype

dty = df_dtype(df)

# write to a new PyTables table

data_file=pth+‘data1.h5’

data_h5 = tb.open_file(data_file, ‘a’)

# make a definiton of creating Pytables

def c_table(f,tname,tdty,data):

  try:

f.create_table(where=‘/’, name=tname, description=tdty)

# if the table has already been created, then do nothing

  except tb.NodeError:

  print(‘\n‘, tname, ‘is existing’)

  pass

# get data from table

c=f.get_node(where=‘/’,name=tname)

# data can be list, and internal type is tuple

ts = [tuple(s) for s in data]

c.append(rows=ts)

c.flush()

# if you do not want to save the file to csv, you can drop the following two rows.

# df = pd.DataFrame.from_records(c[:])

# df.to_csv(pth+’data1.csv’)

# write new data to file

namelist=[‘FinancialData’]

for c in namelist:

c_table(data_h5,c,dty,df.as_matrix())

#flush and close the file

data_h5.flush()

data_h5.close()

# check the time elapsed

time2=time.time()

print(‘\n%8.4fs’ %(time2-time0))

2024年 4月
日	一	二	三	四	五	六
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30