


It seems like Option 1 may be the best because it is configurable by the user and can therfore adapt to the local computing environment as needed. Option 3 might be a possibility, but it would mean getting agreement from packaging managers at the various distributions like Debian to always compile with USE_THREAD=0. If Octave were compiled against a specific library then they would also need to maintain multiple Octave packages for each combination.
Who uses gnu octave install#
Also, many Linux distributions make it possible to quickly install a different BLAS version such as OpenBLAS, ATLAS, reference BLAS. This suggests that option number two isn't the right approach because we would need to have multiple configure options for OpenBLAS, Intel MKVS BLAS, etc. Octave requires that a BLAS library be installed, but is not compiled specifically for any library. This is probably why there is a problem now. Now it runs two threads: 1) for the Qt GUI, and 2) for the main interpreter. If the application is parallelized by OpenMP, please build OpenBLAS with USE_OPENMP=1īefore version 3.8.0, Octave ran a single thread. Orģ Build OpenBLAS single thread version, e.g. OrĢ Call openblas_set_num_threads(1) in the application on runtime. Thus, you must set OpenBLAS to use single thread as following.ġ export OPENBLAS_NUM_THREADS=1 in the environment variables. If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. How can I use OpenBLAS in multi-threaded applications? Also, I am yet to encounter a single case where arrayfun/pararrayfun beats simple vectorization (or simple vectorization combined with parfor).Adding Mike Miller to the CC list since he has worked with OMP and OMP_NUM_THREADS and this is superficially related. While I cannot answer your question (as I cannot reproduce the error), I would suggest you not to use pararrayfun for this case. % Simple vectorization to let Octave figure out the threading itself. % Uses half the amount of time and 1/64th of the cores. % pararrayfun using 1 thread with ChunksPerProc = 1. % (in theory) decreases time on busy machines, but in my case it didn't help. I am on a machine with 64 threads and here are the timing results % pararrayfun using all available threads with ChunksPerProc = 1. However, apparently the Octave interpreter is not smart enough. In my experience, MATLAB's JIT compiler is smart enough to vectorize simple for loops like we have here. I have a philosophical issue with arrayfun / pararrayfun ( What is the use of arrayfun if a for loop is faster?), I believe that they are completely redundant, and they fool MATLAB/Octave users into false sense of efficiency. I was fooling around and testing some other options to do the same calculation and comparing the time spent for each one. Vector_y = pararrayfun(1, 10.*log10(x.^2), vector_x, "Vectorized", true, "ChunksPerProc", 1) Vector_y = pararrayfun(nproc, 10.*log10(x.^2), vector_x, "Vectorized", true, "ChunksPerProc", 1) Here is the snippet I am using: pkg load parallel
