I have a linear equation system with full (not sparse) matrix of size 5000x5000. Solution time using pure SciPy is approximately 0.86 seconds. Using Rhino + compas.rpc + SciPy the solution time is 185 seconds.
Are such solution times expected and I have to get along with it, or is there a possibility to speed up the process ?
As a first guess, I would assume that the matrix is slow to serialize and deserialize. You need to take into account that the deserialized matrix on the Rhino side will not by a numpy array, but a generic python list, with the corresponding impact on performance that this implies.
Would it be possible for you to post a snippet that reproduces the issue?
for my testing purposes I stored the matrix A and vector b as a json files on the disk.
Reading json and converting to python list (array) is not included in the time. Only the (time3-time2) takes +/- 180 seconds.
from System import Array
import rhinoscriptsyntax as rs
from time import time
import json
import compas
from compas.rpc import Proxy
linalg = Proxy('scipy.linalg')
time0 = time()
with open('A.json', 'r') as f:
datastore = json.load(f)
AA = datastore['A']
with open('b.json', 'r') as f:
datastore = json.load(f)
bb = datastore['b']
time1 = time()
time2 = time()
x = linalg.solve(AA, bb)
time3 = time()
print 'JSON conversion took', (time1-time0), 'seconds'
print 'Linear system solution took', (time3-time2), 'seconds'
Json with A matrix is about 812 MB large, since it contains 5000x5000 elements. If you find it important I can send it over, but the matrix (and vector) was just randomly generated outside Rhino using following code:
import time
import json
import numpy
size = 5000
time0 = time.time()
A = numpy.random.rand(size,size)
b = numpy.random.rand(size,1)
x = numpy.linalg.solve(A,b)
print('Elapsed time is',time.time()-time0,'seconds')
could you provide a bit more context about what you are trying to accomplish?
if the data is generated outside of Rhino in the first place, why load it in Rhino, then send it back out via RPC to calculate, and load it in Rhino again?
i just ask because although it is possible to use RPC in the way you describe (wrapping scipy.linalg directly) it is not necessarily what it was designed for…
maybe also just to clarify.
when you do
x = linalg.solve(A, b) # via RPC
you are serializing A and b to JSON to be able to send it to the CPython server where it will be deserialized and used in scipy.linalg.solve. the result is then again serialized on the server side, and sent back to Rhino, where it is deserialized into x.
therefore, the fact that excluding the loading of the original pre-generated JSON files from the timing has very little impact on the result makes sense, because the bottleneck is the amount of data you send back and forth during the RPC call…
I am trying to use Rhino as a geometrical preprocessor for my calculations. Since the coefficient matrices of my problems can exceed the size of 5000, I am testing different solvers to search the one with optimal performance. Although the example above was generated outside Rhino, the real problems should be build, solved and visualized inside Rhino. I am trying to learn, if Compas + RPC can be used for solution step.
So if I understand you correctly, such solution times should be expected for similarly size problems ?
yes, when amounts of data in the order of 1Gb need to be sent back and forth via RPC, i assume you will not get the performance that you are looking for.
however, perhaps you can approach the problem differently.
for example, perhaps things might work better better if you let the algorithm on the server side compile the matrices from the geometrical input, compute the result, and send processed geometrical data back, rather than send large amounts of numerical data back and forth…
so something like
algorithm = Proxy('mypackage.algorithm')
geometry = ...
result = algorithm(geometry)