- Hands-On GPU Programming with Python and CUDA
- Dr. Brian Tuomanen
- 820字
- 2021-06-10 19:25:34
The Mandelbrot set
We are now prepared to see a very standard example for parallel computing that we will revisit later in this text—an algorithm to generate an image of the Mandelbrot set. Let's first define exactly what we mean.
For a given complex number, c, we define a recursive sequence for , with
and
for
. If |zn| remains bounded by 2 as n increases to infinity, then we will say that c is a member of the Mandelbrot set.
Recall that we can visualize the complex numbers as residing on a two-dimensional Cartesian plane, with the x-axis representing the real components and the y-axis representing the imaginary components. We can therefore easily visualize the Mandelbrot set with a very appealing (and well-known) graph. Here, we will represent members of the Mandelbrot set with a lighter shade, and nonmembers with a darker shade on the complex Cartesian plane as follows:

Now, let's think about how we would go about generating this set in Python. We have to consider a few things first—since we obviously can't check whether every single complex number is in the Mandelbrot set, we have to choose a certain range to check over; we have to determine how many points in each range we will consider (width, height); and the maximum value of n that we will check |zn| for (max_iters). We can now prepare to implement a function to generate a graph of the Mandelbrot set—here, we do this by iterating over every single point in the graph in serial.
We will start by importing the NumPy library, which is a numerical library that we will be making ample use of throughout this text. Our implementation here is in the simple_mandelbrot function. We start by using NumPy's linspace function to generate a lattice that will act as a discrete complex plane (the rest of the code that follows should be fairly straightforward):
import numpy as np
def simple_mandelbrot(width, height, real_low, real_high, imag_low, imag_high, max_iters):
real_vals = np.linspace(real_low, real_high, width)
imag_vals = np.linspace(imag_low, imag_high, height)
# we will represent members as 1, non-members as 0.
mandelbrot_graph = np.ones((height,width), dtype=np.float32)
for x in range(width):
for y in range(height):
c = np.complex64( real_vals[x] + imag_vals[y] * 1j )
z = np.complex64(0)
for i in range(max_iters):
z = z**2 + c
if(np.abs(z) > 2):
mandelbrot_graph[y,x] = 0
break
return mandelbrot_graph
Now, we want to add some code to dump the image of the Mandelbrot set to a PNG format file, so let's add the appropriate headers at the beginning:
from time import time
import matplotlib
# the following will prevent the figure from popping up
matplotlib.use('Agg')
from matplotlib import pyplot as plt
Now, let's add some code to generate the Mandelbrot set and dump it to a file, and use the time function to time both operations:
if __name__ == '__main__':
t1 = time()
mandel = simple_mandelbrot(512,512,-2,2,-2,2,256, 2)
t2 = time()
mandel_time = t2 - t1
t1 = time()
fig = plt.figure(1)
plt.imshow(mandel, extent=(-2, 2, -2, 2))
plt.savefig('mandelbrot.png', dpi=fig.dpi)
t2 = time()
dump_time = t2 - t1
print 'It took {} seconds to calculate the Mandelbrot graph.'.format(mandel_time)
print 'It took {} seconds to dump the image.'.format(dump_time)
Now let's run this program (this is also available as the mandelbrot0.py file, in folder 1, within the GitHub repository):

It took about 14.62 seconds to generate the Mandelbrot set, and about 0.11 seconds to dump the image. As we have seen, we generate the Mandelbrot set point by point; there is no interdependence between the values of different points, and it is, therefore, an intrinsically parallelizable function. In contrast, the code to dump the image cannot be parallelized.
Now, let's analyze this in terms of Amdahl's Law. What sort of speedups can we get if we parallelize our code here? In total, both pieces of the program took about 14.73 seconds to run; since we can parallelize the Mandelbrot set generation, we can say that the portion of execution time for parallelizable code is p = 14.62 / 14.73 = .99. This program is 99% parallelizable!
What sort of speedup can we potentially get? Well, I'm currently working on a laptop with an entry-level GTX 1050 GPU with 640 cores; our N will thus be 640 when we use the formula. We calculate the speedup as follows:

That is definitely very good and would indicate to us that it is worth our effort to program our algorithm to use the GPU. Keep in mind that Amdahl's Law only gives a very rough estimate! There will be additional considerations that will come into play when we offload computations onto the GPU, such as the additional time it takes for the CPU to send and receive data to and from the GPU; or the fact that algorithms that are offloaded to the GPU are only partially parallelizable.
- Linux系統文件安全實戰全攻略
- 零起點學Linux系統管理
- Persistence in PHP with the Doctrine ORM
- 高性能Linux服務器構建實戰:運維監控、性能調優與集群應用
- Moodle 3.x Teaching Techniques(Third Edition)
- Windows Server 2019 Administration Fundamentals
- Linux內核觀測技術BPF
- Learning Magento 2 Administration
- 計算機系統:基于x86+Linux平臺
- Hands-On GPU Programming with Python and CUDA
- iOS 10快速開發:18天零基礎開發一個商業應用
- Linux內核分析及應用
- Windows網絡編程(第2版)
- 再也不踩坑的kubernetes實戰指南
- Windows 8玩全不求人