<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Concurrency | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/concurrency/</link><atom:link href="https://haobin-tan.netlify.app/tags/concurrency/index.xml" rel="self" type="application/rss+xml"/><description>Concurrency</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 13 Feb 2024 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>Concurrency</title><link>https://haobin-tan.netlify.app/tags/concurrency/</link></image><item><title>Concurrency</title><link>https://haobin-tan.netlify.app/docs/coding/python/concurrency/</link><pubDate>Tue, 13 Feb 2024 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/coding/python/concurrency/</guid><description/></item><item><title>Concurrency 101</title><link>https://haobin-tan.netlify.app/docs/coding/python/concurrency/concurrency_in_python/</link><pubDate>Tue, 13 Feb 2024 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/coding/python/concurrency/concurrency_in_python/</guid><description>&lt;h2 id="what-is-concurrency">What is Concurrency?&lt;/h2>
&lt;p>The dictionary definition of concurrency is &lt;strong>simultaneous occurrence.&lt;/strong> In Python, the things that are occurring &lt;em>simultaneously&lt;/em> are called by different names (thread, task, process) but at a high level, they all refer to &lt;strong>a sequence of instructions that run in order&lt;/strong>.&lt;/p>
&lt;blockquote>
&lt;p>Think of them as different &lt;strong>trains of thought.&lt;/strong> Each one can be stopped at certain points, and the CPU or brain that is processing them can switch to a different one. The state of each one is saved so it can be restarted right where it was interrupted.&lt;/p>
&lt;/blockquote>
&lt;p>But threads, tasks, and processes are different in detail:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>only &lt;code>multiprocessing&lt;/code> actually runs these trains of thought at literally the same time.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://realpython.com/intro-to-python-threading/">&lt;code>Threading&lt;/code>&lt;/a> and &lt;code>asyncio&lt;/code> both run on a &lt;em>single&lt;/em> processor and therefore only run one at a time. They just cleverly find ways to take turns to speed up the overall process.&lt;/p>
&lt;p>But there is a big difference between &lt;code>threading&lt;/code> and &lt;code>asyncio&lt;/code> in the way threads or tasks take turns&lt;/p>
&lt;ul>
&lt;li>In &lt;code>threading&lt;/code>, the operating system actually knows about each thread and can interrupt it at any time to start running a different thread. This is called &lt;strong>&lt;a href="https://en.wikipedia.org/wiki/Preemption_(computing)#Preemptive_multitasking">pre-emptive multitasking&lt;/a>&lt;/strong> since the operating system can pre-empt your thread to make the switch.
&lt;ul>
&lt;li>Pre-emptive multitasking is handy in that the code in the thread does &lt;em>not&lt;/em> need to do anything to make the switch.&lt;/li>
&lt;li>It can also be difficult because of that “at any time” phrase. This switch can happen in the middle of a single Python statement, even a trivial one like &lt;code>x = x + 1&lt;/code>!&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>Asyncio&lt;/code> uses &lt;a href="https://en.wikipedia.org/wiki/Cooperative_multitasking">cooperative multitasking&lt;/a>. The tasks must cooperate by announcing when they are ready to be switched out. That means that the code in the task has to change slightly to make this happen.
&lt;ul>
&lt;li>The benefit of doing this extra work up front is that you always know where your task will be swapped out. It will not be swapped out in the middle of a Python statement unless that statement is marked.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-is-parallelism">What is Parallelism?&lt;/h2>
&lt;p>&lt;code>multiprocessing&lt;/code> allows us to use all CPU cores we have. With &lt;code>multiprocessing&lt;/code>, Python creates new processes.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>A &lt;strong>process&lt;/strong> here can be thought of as almost a completely different program, though technically they’re usually defined as a collection of resources where the resources include memory, file handles and things like that. &lt;em>One way to think about it is that each process runs in its own Python interpreter.&lt;/em>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Because they are different processes, each of your trains of thought in a multiprocessing program can run on a different core. Running on a different core means that they &lt;em>actually can run at the same time&lt;/em>, which is fabulous 👏. There are some complications that arise from doing this, but Python does a pretty good job of smoothing them over most of the time.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Comparison between concurrency and parallelism:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Concurrency Type&lt;/th>
&lt;th>Switching Decision&lt;/th>
&lt;th>Number of Processors&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Pre-emptive multitasking (&lt;code>threading&lt;/code>)&lt;/td>
&lt;td>The operating system decides when to switch tasks external to Python.&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Cooperative multitasking (&lt;code>asyncio&lt;/code>)&lt;/td>
&lt;td>The tasks decide when to give up control.&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Multiprocessing (&lt;code>multiprocessing&lt;/code>)&lt;/td>
&lt;td>The processes all run at the same time on different processors.&lt;/td>
&lt;td>Many&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="when-is-concurrency-useful">When is Concurrency Useful?&lt;/h2>
&lt;p>Concurrency can make a big difference for two types of problems: &lt;strong>I/O-bound&lt;/strong> and &lt;strong>CPU-bound&lt;/strong>.&lt;/p>
&lt;p>I/O-bound problems cause your program to slow down because &lt;em>it frequently must wait for &lt;a href="https://realpython.com/python-input-output/">input/output&lt;/a> (I/O) from some external resource&lt;/em>. They arise frequently when your program is working with things that are much slower than your CPU.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/IOBound.4810a888b457.png" alt="Timing Diagram of an I/O Bound Program">&lt;/p>
&lt;p>&lt;strong>CPU-bound&lt;/strong> programs: classes of programs that do significant computation without talking to the network or accessing a file. In this case, he resource limiting the speed of your program is the CPU, not the network or the file system.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/CPUBound.d2d32cb2626c.png" alt="CPUBound.d2d32cb2626c">&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>I/O-Bound Process&lt;/th>
&lt;th>CPU-Bound Process&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Your program spends most of its time talking to a slow device, like a network connection, a hard drive, or a printer.&lt;/td>
&lt;td>You program spends most of its time doing CPU operations.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Speeding it up involves overlapping the times spent waiting for these devices.&lt;/td>
&lt;td>Speeding it up involves finding ways to do more computations in the same amount of time.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="how-to-speed-up-an-io-bound-program">How to Speed Up an I/O-Bound Program?&lt;/h2>
&lt;h2 id="how-to-speed-up-a-cpu-bound-program">How to Speed Up a CPU-Bound Program?&lt;/h2>
&lt;p>A CPU-bound problem does few I/O operations, and its overall execution time is a factor of how fast it can process the required data.&lt;/p>
&lt;p>We’ll use a somewhat silly function to create something that takes a long time to run on the CPU&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">cpu_bound&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">number&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># computes the sum of the squares of each number from 0 to the passed-in value&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">i&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">number&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="synchronous-version">Synchronous Version&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">cpu_bound&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">number&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">i&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">number&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">find_sums&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numbers&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">number&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">numbers&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">cpu_bound&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">number&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="vm">__name__&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s2">&amp;#34;__main__&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">numbers&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">5_000_000&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">)]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">start_time&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">time&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">time&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">find_sums&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numbers&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">duration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">time&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">time&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">start_time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Duration &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">duration&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> seconds&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This code calls &lt;code>cpu_bound()&lt;/code> 20 times with a different large number each time. It does all of this on a single thread in a single process on a single CPU. The execution timing diagram looks like this:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/CPUBound.d2d32cb2626c-20240213160924535.png" alt="CPUBound.d2d32cb2626c">&lt;/p>
&lt;p>This program takes about 7.1 seconds on my machine:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Duration 7.118567943572998 seconds
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="cpu-bound-multiprocessing-version">CPU-Bound &lt;code>multiprocessing&lt;/code> Version&lt;/h3>
&lt;p>&lt;code>multiprocessing&lt;/code> is explicitly designed to share heavy CPU workloads across multiple CPUs. Here’s what its execution timing diagram looks like:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/CPUMP.69c1a7fad9c4.png" alt="CPUMP.69c1a7fad9c4">&lt;/p>
&lt;p>Let&amp;rsquo;s apply &lt;code>multiprocessing&lt;/code> to accelerate our code above:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">multiprocessing&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">cpu_bound&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">number&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="nb">sum&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">i&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">number&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">find_sums&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numbers&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">with&lt;/span> &lt;span class="n">multiprocessing&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">Pool&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">pool&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">pool&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">cpu_bound&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">numbers&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="vm">__name__&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="s2">&amp;#34;__main__&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">numbers&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="mi">5_000_000&lt;/span> &lt;span class="o">+&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">x&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">)]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">start_time&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">time&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">time&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">find_sums&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">numbers&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">duration&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">time&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">time&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="o">-&lt;/span> &lt;span class="n">start_time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Duration &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">duration&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2"> seconds&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>In &lt;code>find_sums()&lt;/code>, we change from looping through the numbers to creating a &lt;code>multiprocessing.Pool&lt;/code> object and using its &lt;code>.map()&lt;/code> method to send individual numbers to worker-processes as they become free&lt;/li>
&lt;li>The &lt;code>multiprocessing.Pool()&lt;/code> constructor has the &lt;code>processes&lt;/code> optional parameter.
&lt;ul>
&lt;li>You can specify how many &lt;code>Process&lt;/code> objects you want created and managed in the &lt;code>Pool&lt;/code>. By default, it will determine how many CPUs are in your machine and create a process for each one.&lt;/li>
&lt;li>In a production environment, you might want to have a little more control.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>In my machine, the running time is reduced to about 2.3 seconds.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">Duration 2.3258490562438965 seconds
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="when-to-use-concurrency">When to Use Concurrency?&lt;/h2>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://realpython.com/python-concurrency/#what-is-concurrency">Speed Up Your Python Program With Concurrency&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Thread and Thread Pool</title><link>https://haobin-tan.netlify.app/docs/coding/python/concurrency/python_thread/</link><pubDate>Tue, 13 Feb 2024 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/coding/python/concurrency/python_thread/</guid><description>&lt;h2 id="python-threads">Python Threads&lt;/h2>
&lt;p>A &lt;a href="https://en.wikipedia.org/wiki/Thread_(computing)">thread&lt;/a> refers to a thread of execution by a computer program.&lt;/p>
&lt;p>Every Python program is a process with one thread called the &lt;em>&lt;strong>main thread&lt;/strong>&lt;/em> used to execute your program instructions.&lt;/p>
&lt;ul>
&lt;li>Each process is in fact one instance of the Python interpreter that executes Python instructions (Python bytecode)&lt;/li>
&lt;/ul>
&lt;p>Each thread that is created requires the application of resources (e.g. memory for the thread’s stack space). The computational costs for setting up threads can become expensive if we are creating and destroying many threads over and over for ad hoc tasks.&lt;/p>
&lt;p>Instead, we would prefer to keep worker threads around for reuse if we expect to run many ad hoc tasks throughout our program. -&amp;gt; This can be achieved using a &lt;strong>thread pool&lt;/strong>.&lt;/p>
&lt;h2 id="thread-pools">Thread Pools&lt;/h2>
&lt;p>A &lt;a href="https://en.wikipedia.org/wiki/Thread_pool">thread pool&lt;/a> is a &lt;strong>programming pattern for automatically managing a pool of worker threads&lt;/strong>. The pool is responsible for a fixed number of threads.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>It controls when the threads are created, such as just-in-time when they are needed.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It also controls what threads should do when they are not being used, such as making them wait without consuming computational resources.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Each thread in the pool is called a &lt;strong>worker&lt;/strong> or a &lt;strong>worker thread&lt;/strong>.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Each worker is &lt;em>agnostic&lt;/em> to the type of tasks that are executed, along with the user of the thread pool to execute a suite of similar (homogeneous) or dissimilar tasks (heterogeneous) in terms of the function called, function arguments, task duration, and more.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Worker threads are designed to be re-used once the task is completed and provide protection against the unexpected failure of the task, such as raising an exception, without impacting the worker thread itself.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>The pool may provide some facility to configure the worker threads, such as running an initialization function and naming each worker thread using a specific naming convention.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Thread pools can provide a generic interface for executing ad hoc tasks with a variable number of arguments, but do not require that we choose a thread to run the task, start the thread, or wait for the task to complete.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It can be significantly more efficient to use a thread pool instead of manually starting, managing, and closing threads, especially with a large number of tasks.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Python provides a thread pool via the &lt;strong>&lt;code>ThreadPoolExecutor&lt;/code>&lt;/strong> class.&lt;/p>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://superfastpython.com/threadpoolexecutor-in-python/#Python_Threads_and_the_Need_for_Thread_Pools">Python Threads and the Need for Thread Pools&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>ThreadPoolExecutor</title><link>https://haobin-tan.netlify.app/docs/coding/python/concurrency/thread_pool_executor/</link><pubDate>Tue, 13 Feb 2024 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/coding/python/concurrency/thread_pool_executor/</guid><description>&lt;h2 id="executors-and-features">Executors and Features&lt;/h2>
&lt;p>The &lt;strong>ThreadPoolExecutor&lt;/strong> Python class is used to create and manage thread pools and is provided in the &lt;a href="https://docs.python.org/3/library/concurrent.futures.html">concurrent.futures module&lt;/a>. The &lt;strong>ThreadPoolExecutor&lt;/strong> extends the &lt;strong>&lt;code>Executor&lt;/code>&lt;/strong> class and will return &lt;strong>&lt;code>Future&lt;/code>&lt;/strong> objects when it is called.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Executor&lt;/strong>: Parent class for the ThreadPoolExecutor that defines basic lifecycle operations for the pool.&lt;/li>
&lt;li>&lt;strong>Future&lt;/strong>: Object returned when submitting tasks to the thread pool that may complete later.&lt;/li>
&lt;/ul>
&lt;h3 id="executors">Executors&lt;/h3>
&lt;p>The &lt;strong>ThreadPoolExecutor&lt;/strong> class extends the abstract &lt;strong>Executor&lt;/strong> class.&lt;/p>
&lt;p>The &lt;strong>Executor&lt;/strong> class defines three methods used to control our thread pool&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>submit()&lt;/code>&lt;/strong>: Dispatch a function to be executed and return a future object.&lt;/li>
&lt;li>&lt;strong>&lt;code>map()&lt;/code>&lt;/strong>: Apply a function to an iterable of elements.&lt;/li>
&lt;li>&lt;strong>&lt;code>shutdown()&lt;/code>&lt;/strong>: Shut down the executor.&lt;/li>
&lt;/ul>
&lt;p>The &lt;strong>Executor&lt;/strong> is started when the class is created and must be shut down explicitly by calling &lt;strong>shutdown()&lt;/strong>, which will release any resources held by the &lt;strong>Executor&lt;/strong>.&lt;/p>
&lt;p>&lt;strong>submit()&lt;/strong> and &lt;strong>map()&lt;/strong> functions are used to submit tasks to the Executor for asynchronous execution.&lt;/p>
&lt;ul>
&lt;li>The &lt;strong>map()&lt;/strong> function operates just like the &lt;strong>built-in map()&lt;/strong> function and is used to apply a function to each element in an iterable object (e.g., list). Unlike the built-in &lt;strong>map()&lt;/strong> function, each application of the function to an element will happen &lt;em>asynchronously&lt;/em> instead of sequentially.&lt;/li>
&lt;li>The &lt;strong>submit()&lt;/strong> function takes a function, as well as any arguments, and will execute it asynchronously, although the call returns immediately and provides a &lt;strong>Future&lt;/strong> object.&lt;/li>
&lt;/ul>
&lt;h3 id="features">Features&lt;/h3>
&lt;p>A future is &lt;strong>an object that represents a &lt;a href="https://en.wikipedia.org/wiki/Futures_and_promises">delayed result for an asynchronous task&lt;/a>.&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>It is also sometimes called a &lt;strong>&lt;mark>promise&lt;/mark>&lt;/strong> or a &lt;strong>&lt;mark>delay&lt;/mark>&lt;/strong>.&lt;/li>
&lt;li>It provides a context for the result of a task that may or may not be executing and a way of getting a result once it is available.&lt;/li>
&lt;li>In Python, the &lt;code>Future&lt;/code> object is returned from an &lt;code>Executor&lt;/code>, such as a &lt;code>ThreadPoolExecutor&lt;/code> when calling the &lt;strong>submit()&lt;/strong> function to dispatch a task to be executed asynchronously.&lt;/li>
&lt;li>In general, we do &lt;em>not&lt;/em> create Future objects; we only receive them and we may need to call functions on them. There is always one &lt;code>Future&lt;/code> object for each task sent into the &lt;code>ThreadPoolExecutor&lt;/code> via a call to &lt;code>submit()&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>The &lt;code>Future&lt;/code> object provides a number of helpful functions for inspecting the status of the task&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>cancelled()&lt;/code>&lt;/strong>: Returns &lt;strong>&lt;code>True&lt;/code>&lt;/strong> if the task was cancelled before being executed.&lt;/li>
&lt;li>&lt;strong>&lt;code>running()&lt;/code>&lt;/strong>: Returns &lt;strong>&lt;code>True&lt;/code>&lt;/strong> if the task is currently running.&lt;/li>
&lt;li>&lt;strong>&lt;code>done()&lt;/code>&lt;/strong>: Returns &lt;strong>&lt;code>True&lt;/code>&lt;/strong> if the task has completed or was cancelled.&lt;/li>
&lt;/ul>
&lt;p>A running task cannot be cancelled and a done task could have been cancelled.&lt;/p>
&lt;p>A &lt;code>Future&lt;/code> object also provides access to the result of the task via the &lt;strong>&lt;code>result()&lt;/code>&lt;/strong> function. If an exception was raised while executing the task, it will be re-raised when calling the &lt;code>result()&lt;/code> function or can be accessed via the &lt;strong>&lt;code>exception()&lt;/code>&lt;/strong> function.&lt;/p>
&lt;ul>
&lt;li>&lt;code>result()&lt;/code>: Access the result from running the task.&lt;/li>
&lt;li>`exception(): Access any exception raised while running the task.&lt;/li>
&lt;/ul>
&lt;p>Both the &lt;code>result()&lt;/code> and &lt;code>exception()&lt;/code> functions allow a timeout to be specified as an argument, which is the number of seconds to wait for a return value if the task is not yet complete. If the timeout expires, then a &lt;strong>&lt;code>TimeoutError&lt;/code>&lt;/strong> will be raised.&lt;/p>
&lt;p>If we want to have the thread pool automatically call a function once the task is completed, we can attach a callback to the &lt;code>Future&lt;/code> object for the task via the &lt;strong>&lt;code>add_done_callback()&lt;/code>&lt;/strong> function.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>add_done_callback()&lt;/code>&lt;/strong>: Add a callback function to the task to be executed by the thread pool once the task is completed.
&lt;ul>
&lt;li>We can add more than one callback to each task and they will be executed in the order they were added. If the task has already completed before we add the callback, then the callback is executed immediately.&lt;/li>
&lt;li>Any exceptions raised in the callback function will not impact the task or thread pool.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="threadpoolexecutor-lifecycle">ThreadPoolExecutor Lifecycle&lt;/h2>
&lt;p>There are four main steps in the &lt;a href="https://superfastpython.com/threadpoolexecutor-quick-start-guide/">lifecycle of using the ThreadPoolExecutor class&lt;/a>;&lt;/p>
&lt;ul>
&lt;li>Create: Create the thread pool by calling the constructor &lt;strong>T&lt;code>hreadPoolExecutor()&lt;/code>&lt;/strong>.&lt;/li>
&lt;li>Submit: Submit tasks and get futures by calling &lt;strong>&lt;code>submit()&lt;/code>&lt;/strong> or &lt;strong>&lt;code>map()&lt;/code>&lt;/strong>.&lt;/li>
&lt;li>Wait: Wait and get results as tasks complete (optional).&lt;/li>
&lt;li>Shut down: Shut down the thread pool by calling &lt;strong>&lt;code>shutdown()&lt;/code>&lt;/strong>.&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2024-02-18%2018.44.08.png" alt="截屏2024-02-18 18.44.08" style="zoom:50%;" />
&lt;h3 id="1-create-the-thread-pool">1. Create the Thread Pool&lt;/h3>
&lt;p>When an instance of a &lt;strong>&lt;code>ThreadPoolExecutor&lt;/code>&lt;/strong> is created, it must be configured with&lt;/p>
&lt;ul>
&lt;li>
&lt;p>the fixed number of threads in the pool&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Default Total Threads = (Total CPUs) + 4&lt;/p>
&lt;blockquote>
&lt;p>if you have 4 CPUs, each with hyperthreading (most modern CPUs have this), then Python will see 8 CPUs and will allocate (8 + 4) or 12 threads to the pool by default.&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;li>
&lt;p>It is typically not a good idea to have thousands of threads as it may start to impact the amount of available RAM and results in a large amount of switching between threads, which may result in worse performance.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>a prefix used when naming each thread in the pool, and&lt;/p>
&lt;/li>
&lt;li>
&lt;p>the name of a function to call when initializing each thread along with any arguments for the function&lt;/p>
&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># create a thread pool with the default number of worker threads&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">executor&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ThreadPoolExecutor&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># create a thread pool with 10 worker threads&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">executor&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ThreadPoolExecutor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">max_workers&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="2-submit-tasks-to-the-thread-pool">2. Submit tasks to the thread pool&lt;/h3>
&lt;p>Once the thread pool has been created, you can submit tasks for asynchronous execution. There are two main approaches for submitting tasks defined on the Executor parent class: &lt;code>map()&lt;/code> and &lt;code>submit()&lt;/code>.&lt;/p>
&lt;h4 id="submit-tasks-with-map">Submit tasks with &lt;code>map&lt;/code>&lt;/h4>
&lt;p>The &lt;strong>&lt;code>map()&lt;/code>&lt;/strong> function is an &lt;em>asynchronous&lt;/em> version of the &lt;a href="https://docs.python.org/3/library/functions.html#map">built-in map() function&lt;/a> for applying a function to each element in an iterable, like a list. You can call the &lt;a href="https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map">map() function&lt;/a> on the pool and pass it the name of your function and the iterable. One common use case to use &lt;code>map()&lt;/code> is to convert a &lt;code>for&lt;/code>-loop to run using one thread per loop iteration:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># perform all tasks in parallel&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">results&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">pool&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">my_task&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">my_items&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># does not block&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>&lt;code>my_task&lt;/code> : the name of the function you want to execute&lt;/li>
&lt;li>&lt;code>my_items&lt;/code>: iterable of objects, each to be executed by the &lt;code>my_task&lt;/code> function&lt;/li>
&lt;/ul>
&lt;p>The tasks will be queued up in the thread pool and executed by worker threads in the pool as they become available. The &lt;code>map()&lt;/code> function will return an iterable immediately. This iterable can be used to access the results from the target task function as they are available &lt;strong>in the order that the tasks were submitted (e.g. order of the iterable you provided)&lt;/strong>.&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">Even though the tasks are executed concurrently, the &lt;code>executor.map()&lt;/code> method ensures that the results are returned in the original order of the input iterable.&lt;/span>
&lt;/div>3
&lt;p>Example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">time&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">sleep&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">random&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">random&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">concurrent.futures&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">ThreadPoolExecutor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">task&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">num&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sleep&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">random&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">num&lt;/span> &lt;span class="o">*&lt;/span> &lt;span class="mi">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">ThreadPoolExecutor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">executor&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># execute tasks concurrently and process results in order&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">result&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">executor&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">task&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">)):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># retrieve the result&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Output:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">4
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">6
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">8
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>You can also set a timeout when calling &lt;strong>map()&lt;/strong> via the “&lt;strong>timeout&lt;/strong>” argument in seconds if you wish to impose a limit on how long you’re willing to wait for each task to complete as you’re iterating, after which a &lt;strong>TimeOut&lt;/strong> error will be raised.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># perform all tasks in parallel&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># iterate over results as they become available&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">result&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">executor&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">map&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">my_task&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">my_items&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">timeout&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># wait for task to complete or timeout expires&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="submit-tasks-with-submit">Submit tasks with &lt;code>submit()&lt;/code>&lt;/h4>
&lt;p>The &lt;strong>&lt;code>submit()&lt;/code>&lt;/strong> function submits one task to the thread pool for execution.&lt;/p>
&lt;p>The function takes the name of the function to call and all arguments to the function, then returns a &lt;strong>&lt;code>Future&lt;/code>&lt;/strong> object immediately.&lt;/p>
&lt;ul>
&lt;li>The &lt;code>Future&lt;/code> object is a promise to return the results from the task (if any) and provides a way to determine if a specific task has been completed or not.&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">ThreadPoolExecutor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">executor&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># submit a task with arguments and get a future object&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">future&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">executor&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">submit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">my_task&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">arg1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">arg2&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># does not block&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>&lt;code>my_task&lt;/code> : the name of the function you want to execute&lt;/li>
&lt;li>&lt;code>arg1&lt;/code>, &lt;code>arg2&lt;/code>: the first and second arguments to pass to the &lt;code>my_task&lt;/code> function&lt;/li>
&lt;/ul>
&lt;p>You can access the result of the task via the &lt;strong>&lt;code>result()&lt;/code>&lt;/strong> function on the returned &lt;code>Future&lt;/code> object. This call will &lt;em>block until the task is completed.&lt;/em>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># get the result from a future&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">future&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># blocks&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>You can also set a timeout when calling &lt;code>result()&lt;/code> via the**&lt;code>timeout&lt;/code>** argument in seconds if you wish to impose a limit on how long you’re willing to wait for each task to complete, after which a &lt;code>TimeOut&lt;/code> error will be raised.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># wait for task to complete or timeout expires&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">result&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">future&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">timeout&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">5&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># blocks&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="3-wait-for-tasks-to-complete-optional">3. Wait for Tasks to Complete (Optional)&lt;/h3>
&lt;p>The &lt;code>concurrent.futures&lt;/code> module provides two module utility functions for waiting for tasks via their &lt;code>Future&lt;/code> objects, which are only created when we call &lt;code>submit()&lt;/code> to push tasks into the thread pool.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>wait()&lt;/code>&lt;/strong>: Wait on one or more &lt;code>Future&lt;/code> objects until they are completed.&lt;/li>
&lt;li>&lt;strong>&lt;code>as_completed()&lt;/code>&lt;/strong>: Returns &lt;code>Future&lt;/code> objects from a collection as they complete their execution.&lt;/li>
&lt;/ul>
&lt;p>(These wait functions are optional to use, as you can wait for results directly after calling &lt;strong>map()&lt;/strong> or &lt;strong>submit()&lt;/strong> or wait for all tasks in the thread pool to finish.)&lt;/p>
&lt;p>Both functions are useful to use with an idiom of dispatching multiple tasks into the thread pool via submit in a list compression:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># dispatch tasks into the thread pool and create a list of futures&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">futures&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">executor&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">submit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">my_task&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">my_data&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">my_data&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">my_datalist&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">time&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">sleep&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">random&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">random&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">concurrent.futures&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">ThreadPoolExecutor&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">concurrent.futures&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">as_completed&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># custom task that will sleep for a variable amount of time&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">def&lt;/span> &lt;span class="nf">task&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">name&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># sleep for less than a second&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">sleep&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">random&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">return&lt;/span> &lt;span class="n">name&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># start the thread pool&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">with&lt;/span> &lt;span class="n">ThreadPoolExecutor&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="n">executor&lt;/span>&lt;span class="p">:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># submit tasks and collect futures&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">futures&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="n">executor&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">submit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">task&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">i&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="k">for&lt;/span> &lt;span class="n">i&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">range&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">10&lt;/span>&lt;span class="p">)]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># process task results as they are available&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">for&lt;/span> &lt;span class="n">future&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">as_completed&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">futures&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="c1"># retrieve the result&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">future&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">result&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Output:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">6
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">7
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">9
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">8
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">4
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">3
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">5
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">1
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Note: the output may vary from time to time, as the &lt;code>task()&lt;/code> functions are executed cocurrently in different threads and the order of completion can not be guaranteed. Using &lt;code>as_completed()&lt;/code> will print the results as soon as each task completes, regardless of the order in which they were submitted.&lt;/p>
&lt;h2 id="threadpoolexecutor-example">ThreadPoolExecutor Example&lt;/h2>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://superfastpython.com/threadpoolexecutor-in-python/#ThreadPoolExecutor_for_Thread_Pools_in_Python">ThreadPoolExecutor for Thread Pools in Python&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://superfastpython.com/threadpoolexecutor-in-python/#LifeCycle_of_the_ThreadPoolExecutor">LifeCycle of the ThreadPoolExecutor&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item></channel></rss>