251 lines
No EOL
7.5 KiB
Text
251 lines
No EOL
7.5 KiB
Text
{
|
|
"metadata": {
|
|
"name": "parallel_vectorize"
|
|
},
|
|
"nbformat": 2,
|
|
"worksheets": [
|
|
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"Parallel Vectorize",
|
|
"------------------",
|
|
"",
|
|
"The `parallel_vectorize.py` module contains a set of llvmpy code generators",
|
|
"for creating mulithreaded _ufunc_. ",
|
|
"It depends on the new `numpy.fromfunc` for turning arbitrary function pointers into _ufunc_.",
|
|
"",
|
|
"From LLVM Function",
|
|
"------------------",
|
|
"",
|
|
"The `parallel_vectorize_from_func` method generates multithreaded _ufunc_ from LLVM functions.",
|
|
"",
|
|
"First, we will implement a workload function:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"from llvm_cbuilder import *",
|
|
"from llvm_cbuilder import shortnames as C",
|
|
"from llvm.core import *",
|
|
"",
|
|
"# Implement a workload",
|
|
"class Square(CDefinition):",
|
|
" _name_ = 'square'",
|
|
" _retty_ = C.double # 1 output: double",
|
|
" _argtys_ = [('x', C.double)] # 1 input: double",
|
|
" ",
|
|
" def body(self, x):",
|
|
" self.ret(x * x)",
|
|
"",
|
|
"m = Module.new('my_module')",
|
|
"llvm_square = Square()(m) # Generate a llvm function",
|
|
"print(llvm_square) "
|
|
],
|
|
"language": "python",
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"",
|
|
"define double @square(double %x) {",
|
|
"decl:",
|
|
" %x1 = alloca double",
|
|
" br label %body",
|
|
"",
|
|
"body: ; preds = %decl",
|
|
" store double %x, double* %x1",
|
|
" %0 = load double* %x1",
|
|
" %1 = load double* %x1",
|
|
" %2 = fmul double %0, %1",
|
|
" ret double %2",
|
|
"}",
|
|
""
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 1
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"Then, we will generate a _ufunc_ from `llvm_square`:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": true,
|
|
"input": [
|
|
"from llvm.ee import *",
|
|
"engine = EngineBuilder.new(m).create() # Generate JIT engine",
|
|
"",
|
|
"from parallel_vectorize import parallel_vectorize_from_func",
|
|
"ufunc_square = parallel_vectorize_from_func(llvm_square, engine) # Generate UFunc"
|
|
],
|
|
"language": "python",
|
|
"outputs": [],
|
|
"prompt_number": 2
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"We are ready to use `ufunc_square` as a regular _ufunc_."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"import numpy as np",
|
|
"A = np.arange(10., dtype=np.double)",
|
|
"ufunc_square(A)"
|
|
],
|
|
"language": "python",
|
|
"outputs": [
|
|
{
|
|
"output_type": "pyout",
|
|
"prompt_number": 3,
|
|
"text": [
|
|
"array([ 0., 1., 4., 9., 16., 25., 36., 49., 64., 81.])"
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 3
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"Here's another example that uses three inputs:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"class SumOfThree(CDefinition):",
|
|
" _name_ = 'sum.of.three'",
|
|
" _retty_ = C.int",
|
|
" _argtys_ = [('x', C.int),",
|
|
" ('y', C.int),",
|
|
" ('z', C.int)]",
|
|
" def body(self, x, y, z):",
|
|
" self.ret( x + y + z )",
|
|
"",
|
|
"llvm_sum3 = SumOfThree()(m)",
|
|
"ufunc_sum3 = parallel_vectorize_from_func(llvm_sum3, engine)",
|
|
"A = np.arange(10, dtype=np.int32)",
|
|
"B = A * 10",
|
|
"C = A * 100",
|
|
"ufunc_sum3(A, B, C)"
|
|
],
|
|
"language": "python",
|
|
"outputs": [
|
|
{
|
|
"output_type": "pyout",
|
|
"prompt_number": 4,
|
|
"text": [
|
|
"array([ 0, 111, 222, 333, 444, 555, 666, 777, 888, 999], dtype=int32)"
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 4
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"* * *",
|
|
"",
|
|
"Internals",
|
|
"---------",
|
|
"",
|
|
"There are four functions behind each multithreaded _ufunc_.",
|
|
"",
|
|
"1. the workload function (user defined);",
|
|
"2. the thread worker function (`UFuncCoreGeneric`);",
|
|
"3. the thread manager function (`ParallelUFuncPlatform`);",
|
|
"4. the ufunc entry point function (`SpecializedParallelUFunc`).",
|
|
"",
|
|
"**UFuncCoreGeneric** specializes to a llvm function type.",
|
|
"**It currently understands simple builtin scalar types (integers, float, double) only as arguments and return-type for the workload function.**",
|
|
"It sends work items to the workload function and performs work-stealing when it has finished its own workqueue.",
|
|
"Work-stealing uses atomic compare-exchange (or CAS) instruction to acquire ownership of a workqueue.",
|
|
"Work-stealing is implemented in the `UFuncCore._do_work_stealing`.",
|
|
"It can be disabled on platform that does not support atomic operations.",
|
|
"",
|
|
"**ParallelUFuncPlatform** specializes to the maximum number of threads. ",
|
|
"It divides all works equally among all threads.",
|
|
"Each thread executes the function generated by `UFuncCoreGeneric` once.",
|
|
"",
|
|
"**SpecializedParallelUFunc** is the specialized _ufunc_ entry point for a specific combination of ",
|
|
"workload, UFuncCoreGeneric and ParallelUFuncPlatform.",
|
|
"",
|
|
"Here's an example that uses `SpecializedParallelUFunc` directly for the `SumOfThree` workload."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"import parallel_vectorize as pv",
|
|
"# specialize",
|
|
"def_spuf = pv.SpecializedParallelUFunc(pv.ParallelUFuncPlatform(num_thread=2),",
|
|
" pv.UFuncCoreGeneric(llvm_sum3.type.pointee),",
|
|
" CFuncRef(llvm_sum3))",
|
|
"# define",
|
|
"spuf = def_spuf(m)",
|
|
"print(spuf.name)"
|
|
],
|
|
"language": "python",
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"specialized_parallel_ufunc_2_ufunc_worker.i32.i32.i32.i32_sum.of.three"
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 5
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"`CFuncRef` also accepts arbitrary function pointer as long as the function type is provided."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"# specialize",
|
|
"fnty = llvm_sum3.type.pointee",
|
|
"sum3ptr = engine.get_pointer_to_function(llvm_sum3)",
|
|
"print(\"as function pointer: %x\" % sum3ptr)",
|
|
"def_spuf = pv.SpecializedParallelUFunc(pv.ParallelUFuncPlatform(num_thread=2),",
|
|
" pv.UFuncCoreGeneric(fnty),",
|
|
" CFuncRef('sum3.as.ptr', fnty, sum3ptr)) # name, type, ptr",
|
|
"# define",
|
|
"spuf = def_spuf(m)",
|
|
"print(spuf.name)"
|
|
],
|
|
"language": "python",
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"as function pointer: 7f0bfc090740",
|
|
"specialized_parallel_ufunc_2_ufunc_worker.i32.i32.i32.i32_sum3.as.ptr"
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 6
|
|
}
|
|
]
|
|
}
|
|
]
|
|
} |