On each processor, a Thomas-like see forward is first performed in order to zero out the first first subdiagonal in each diagonal block (see Figure 2.2b).<br>These forward sweeps are done in parallel on the p processors; on p1 of the processors, an extra extra fills up inging so forward sweeps.<br>A back sweep, analogous to<br>The forward sweep, is also also introduced to zero out elements in the first superdiagonal of each block, from the<br>element above the (m'1)'th-e-e-in-each of the element si'ers block (see Figure 2.2c). These back s are also also done in parallel on the p processors;<br>on all of the processors, An extra column also fills up these tha iad s.<br>Finally, on the last p-1 of the processors, each Back sweep is extended one extra into step the corresponding sings (see Figure 2.2d).<br>The m'th<br>equation on each processor (see arrows in Figure 2.2d) is now lumped together as inone a single, tridiagonal, fairly<br>Small (that is, px p) and solved (on the "master" processor) with the ordinary Thomas algorithm.<br>Once the m'th variable in each block is is in this fashion, the remainings in each block is Beeded via edin.com.<br>The leading-order cost of this algorithm is is<br>(17n) (twice that of the standard Thomas algorithm). An esof of the parallel thomas algorithmed over is given in Algorithm 2.12;<br>as<br>The sylth to most codes presented in this text s?who are sly sFon on for loops, if statements, function call, and floating-point operations on vectors and matrices, 2.12 make use of a few of the advanced Parallel programming features of Matlab.<br>Unfortunate, the performance of this<br>use the 2011 a release of Matlab is poor, and the parallelized code is actuallyly ly<br>slower than the standard (serial) Thomas algorithm, yn when using four processors and large values of n. This is is sane ly due to the fact that, as of 2011, parallel is a fairly new addition to Matlab;<br>it is<br>the thiand and flexibility of the parallel Matlab tools will will improve ly in the near future.<br>In contrast, the parallel capabilities of lower-level languages like Fortran and C (as discussed re-in<br>There is much more mature and can provide a big speedup using the algorithmed ed over. Another strategy for parallel solution of tridiagonal systems is cyclic reduction (see Exercise 4.5). ...
正在翻译中..