在 Linux 上创建线程与进程的开销

2024-05-26

我试图回答在 python 中创建线程与进程有多少开销的问题。我修改了类似问题的代码，该问题基本上运行一个带有两个线程的函数，然后运行带有两个进程的相同函数并报告时间。

import time, sys
NUM_RANGE = 100000000

from multiprocessing  import Process
import threading

def timefunc(f):
    t = time.time()
    f()
    return time.time() - t

def multiprocess():
    class MultiProcess(Process):
        def __init__(self):
            Process.__init__(self)

        def run(self):
            # Alter string + test processing speed
            for i in xrange(NUM_RANGE):
                a = 20 * 20


    for _ in xrange(300):
      MultiProcess().start()

def multithreading():
    class MultiThread(threading.Thread):
        def __init__(self):
            threading.Thread.__init__(self)

        def run(self):
            # Alter string + test processing speed
            for i in xrange(NUM_RANGE):
                a = 20 * 20

    for _ in xrange(300):
      MultiThread().start()

print "process run time" + str(timefunc(multiprocess))
print "thread run time" + str(timefunc(multithreading))

然后我得到了多处理 7.9 秒和多线程 7.9 秒

我试图回答的主要问题是，在 Linux 上使用多线程或多处理是否适合处理数千个网络请求。看起来根据这段代码，它们在启动时间方面是相同的，但也许进程的内存使用量要大得多？

您的代码不适合对进程和线程之间的启动时间进行基准测试。多线程 Python 代码（在 CPython 中）意味着单核。当该线程持有全局解释器锁时，一个线程中的任何 Python 代码执行都将排除该进程中所有其他线程的执行（GIL https://wiki.python.org/moin/GlobalInterpreterLock）。这意味着只要涉及 Python 字节码，您就只能实现线程并发，而不是真正的并行。

您的示例主要是对特定的 CPU 限制工作负载性能（在紧密循环内运行计算）进行基准测试，无论如何您都不会使用线程。如果您想衡量创建开销，则必须从基准测试中删除除创建本身之外的任何内容（尽可能）。

TL; DR

启动一个线程（以 Ubuntu 18.04 为基准）比启动一个进程便宜很多倍。

与线程启动相比，使用指定 start_methods 的进程启动需要：

fork：约长 33 倍
分叉服务器：约 6693 倍长
spawn：约 7558 倍长

完整结果在底部。

基准

我最近升级到 Ubuntu 18.04 并测试了一个脚本的启动，希望它更接近事实。请注意，此代码是 Python 3。

一些用于格式化和比较测试结果的实用程序：

# thread_vs_proc_start_up.py
import sys
import time
import pandas as pd
from threading import Thread
import multiprocessing as mp
from multiprocessing import Process, Pipe


def format_secs(sec, decimals=2) -> str:
    """Format subseconds.

    Example:
    >>>format_secs(0.000_000_001)
    # Out: '1.0 ns'
    """
    if sec < 1e-6:
        return f"{sec * 1e9:.{decimals}f} ns"
    elif sec < 1e-3:
        return f"{sec * 1e6:.{decimals}f} µs"
    elif sec < 1:
        return f"{sec * 1e3:.{decimals}f} ms"
    elif sec >= 1:
        return f"{sec:.{decimals}f} s"

def compare(value, base):
    """Return x-times relation of value and base."""
    return f"{(value / base):.2f}x"


def display_results(executor, result_series):
    """Display results for Executor."""
    exe_str = str(executor).split(".")[-1].strip('\'>')
    print(f"\nresults for {exe_str}:\n")

    print(result_series.describe().to_string(), "\n")
    print(f"Minimum with {format_secs(result_series.min())}")
    print("-" * 60)

基准测试函数如下。对于每一个测试n_runs，创建一个新管道。一个新的进程或线程（执行器）启动并且目标函数calc_start_up_time立即返回时间差。就这样。

def calc_start_up_time(pipe_in, start):
    pipe_in.send(time.perf_counter() - start)
    pipe_in.close()


def run(executor, n_runs):

    results = []
    for _ in range(int(n_runs)):
        pipe_out, pipe_in = Pipe(duplex=False)
        exe = executor(target=calc_start_up_time, args=(pipe_in,
                                                    time.perf_counter(),))
        exe.start()
        # Note: Measuring only the time for exe.start() returning like:
        # start = time.perf_counter()
        # exe.start()
        # end = time.perf_counter()
        # would not include the full time a new process needs to become
        # production ready.
        results.append(pipe_out.recv())
        pipe_out.close()
        exe.join()

    result_series = pd.Series(results)
    display_results(executor, result_series)
    return result_series.min()

它的构建是通过 start_method 和作为命令行参数传递的运行次数从终端启动的。基准测试将始终运行n_runs使用指定的 start_method 启动进程（在 Ubuntu 18.04 上可用：fork、spawn、forkserver），然后与n_runs线程启动。结果侧重于最小值，因为它们显示了可能的速度。

if __name__ == '__main__':

    # Usage:
    # ------
    # Start from terminal with start_method and number of runs as arguments:
    #   $python thread_vs_proc_start_up.py fork 100
    #
    # Get all available start methods on your system with:
    # >>>import multiprocessing as mp
    # >>>mp.get_all_start_methods()

    start_method, n_runs = sys.argv[1:]
    mp.set_start_method(start_method)

    mins = []
    for executor in [Process, Thread]:
        mins.append(run(executor, n_runs))
    print(f"Minimum start-up time for processes takes "
          f"{compare(*mins)} "
          f"longer than for threads.")

Results

with n_runs=1000在我生锈的机器上：

# Ubuntu 18.04 start_method: fork
# ================================
results for Process:

count    1000.000000
mean        0.002081
std         0.000288
min         0.001466
25%         0.001866
50%         0.001973
75%         0.002268
max         0.003365 

Minimum with 1.47 ms
------------------------------------------------------------

results for Thread:

count    1000.000000
mean        0.000054
std         0.000013
min         0.000044
25%         0.000047
50%         0.000051
75%         0.000058
max         0.000319 

Minimum with 43.89 µs
------------------------------------------------------------
Minimum start-up time for processes takes 33.41x longer than for threads.

# Ubuntu 18.04 start_method: spawn
# ================================

results for Process:

count    1000.000000
mean        0.333502
std         0.008068
min         0.321796
25%         0.328776
50%         0.331763
75%         0.336045
max         0.415568 

Minimum with 321.80 ms
------------------------------------------------------------

results for Thread:

count    1000.000000
mean        0.000056
std         0.000016
min         0.000043
25%         0.000046
50%         0.000048
75%         0.000065
max         0.000231 

Minimum with 42.58 µs
------------------------------------------------------------
Minimum start-up time for processes takes 7557.80x longer than for threads.

# Ubuntu 18.04 start_method: forkserver
# =====================================


results for Process:

count    1000.000000
mean        0.295011
std         0.007157
min         0.287871
25%         0.291440
50%         0.293263
75%         0.296185
max         0.361581 

Minimum with 287.87 ms
------------------------------------------------------------

results for Thread:

count    1000.000000
mean        0.000055
std         0.000014
min         0.000043
25%         0.000045
50%         0.000047
75%         0.000064
max         0.000251 

Minimum with 43.01 µs
------------------------------------------------------------
Minimum start-up time for processes takes 6693.44x longer than for threads.

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

在 Linux 上创建线程与进程的开销的相关文章

使用带有关键字参数的 map() 函数

这是我尝试使用的循环map功能于 volume ids 1 2 3 4 5 ip 172 12 13 122 for volume id in volume ids my function volume id ip ip 我有办法做到这一点
如何替换 pandas 数据框列中的重音符号

我有一个数据框dataSwiss其中包含瑞士城市的信息我想用普通字母替换带有重音符号的字母这就是我正在做的 dataSwiss Municipality dataSwiss Municipality str encode utf 8 d
python 相当于 R 中的 get() （= 使用字符串检索符号的值）

在 R 中 get s 函数检索名称存储在字符变量向量中的符号的值s e g X lt 10 r lt XVI s lt substr r 1 1 X get s 10 取罗马数字的第一个符号r并将其转换为其等效整数尽管花了一些时间翻
如何从网页中嵌入的 Tableau 图表中抓取工具提示值

我试图弄清楚是否有一种方法以及如何使用 python 从网页中的 Tableau 嵌入图形中抓取工具提示值以下是当用户将鼠标悬停在条形上时带有工具提示的图表示例我从要从中抓取的原始网页中获取了此网址 https covid19 colo
是否可以忽略一行的pyright检查？

我需要忽略一行的pyright 检查有什么特别的评论吗 def create slog group SLogGroup data Optional dict None SLog insert one SLog group group da
测试 python Counter 是否包含在另一个 Counter 中

如何测试是否是pythonCounter https docs python org 2 library collections html collections Counter is 包含在另一个中使用以下定义柜台a包含在计数器中b当且
Python 函数可以从作用域之外赋予新属性吗？

我不知道你可以这样做 def tom print tom s locals locals def dick z print z name z name z guest Harry print z guest z guest print di
docker 非 root 绑定安装权限，WITH --userns-remap

all 尝试让绑定安装权限正常工作我的目标是在容器中绑定安装卷以便 a 容器不以 root 用户身份运行入口点二 docker daemon 配置了 userns remap 这样容器主机上没有 root c 我可以绑定挂载和读写
如何在Python中获取葡萄牙语字符？

我正在研究葡萄牙语角色看起来很奇怪我怎样才能解决这个问题代码 import feedparser import random Vou definir os feeds feeds conf feedurl http pplware s
添加不同形状的 numpy 数组

我想添加两个不同形状的 numpy 数组但不进行广播而是将缺失值视为零可能最简单的例子是 1 2 3 2 gt 3 2 3 or 1 2 3 2 1 gt 3 2 3 1 0 0 我事先不知道形状我正在弄乱每个 np shape
如何使用Python创建历史时间线

So I ve seen a few answers on here that helped a bit but my dataset is larger than the ones that have been answered prev
如何在Python中对类别进行加权随机抽样

给定一个元组列表其中每个元组都包含一个概率和一个项目我想根据其概率对项目进行采样例如给出列表 3 a 4 b 3 c 我想在 40 的时间内对 b 进行采样在 python 中执行此操作的规范方法是什么我查看了 random 模
Fabric env.roledefs 未按预期运行

On the 面料网站 http docs fabfile org en 1 10 usage execution html 给出这个例子 from fabric api import env env roledefs web hosts
类型错误：预期单个张量时的张量列表 - 将 const 与 tf.random_normal 一起使用时

我有以下 TensorFlow 代码 tf constant tf random normal time step batch size 1 1 我正进入状态TypeError List of Tensors when single Te
使用 Python 绘制 2D 核密度估计

I would like to plot a 2D kernel density estimation I find the seaborn package very useful here However after searching
发送用户注册密码，django-allauth

我在 django 应用程序上使用 django alluth 进行身份验证注册我需要创建一个自定义注册表单其中只有一个字段电子邮件密码将在服务器上生成这是我创建的表格 from django import forms from
使用 Python 的 matplotlib 选择在屏幕上显示哪些图形以及将哪些图形保存到文件中

我想用Python创建不同的图形matplotlib pyplot 然后我想将其中一些保存到文件中而另一些则应使用show 命令然而 show 显示all创建的数字我可以通过调用来避免这种情况close 创建我不想在屏幕上显示的绘图
从列表指向字典变量

假设你有一个清单 a 3 4 1 我想用这些信息来指向字典 b 3 4 1 现在我需要的是一个常规看到该值后在 b 的位置内读写一个值我不喜欢复制变量我想直接改变变量b的内容假设b是一个嵌套字典你可以这样做 reduce di
Python 类继承 - 诡异的动作

我观察到类继承有一个奇怪的效果对于我正在处理的项目我正在创建一个类来充当另一个模块的类的包装器我正在使用第 3 方 aeidon 模块用于操作字幕文件但问题可能不太具体以下是您通常如何使用该模块 project aeidon P
Python Selenium：如何在文本文件中打印网站上的值？

我正在尝试编写一个脚本该脚本将从 tulsaspca org 网站获取以下 6 个值并将其打印在 txt 文件中最终输出应该是 905 4896 7105 23194 1004 42000 放置的动物的 HTML span class

随机推荐

REST API 响应中的校验和

发送带有响应内容的校验和是个好主意吗如果是这样计算校验和的最常见方法是什么 Example HTTP 1 1 200 OK Date Thu 30 Jun 2011 21 32 20 GMT Server Apache Connecti
苹果的属性列表（plist）在C++中的实现

我的任务是在 C 应用程序中读取 Apple 的属性列表文件主要关注 OS X 中指定的 xml 类型 plist 文件它模仿 xml 类型实现 Apple 对其属性列表的实现描述如下 http developer apple com
计算热图颜色

我正在制作一个由 HTML 表格组成的热图该表包含n细胞并有一个lowest值和一个highest值最高值始终高于最低值每个细胞都有一个cell价值所有这些值都是整数具有最低值的单元格应为浅蓝色缩放到具有最高值的单元格为深红色
Rails (PostgreSQL) 中文本列的默认大小

如果我在迁移中有这个 t text body 我可以容纳多少文字 body 如果相关的话我正在使用 PostgreSQL 直接来自PostgreSQL 文档 http www postgresql org docs 8 4 static d
自定义错误处理程序抛出错误：无法读取未定义的属性“get”（注入器）

我正在 Angular 4 中构建自定义错误处理程序以使用错误拦截器处理不同类型的应用程序错误创建一个基类应用程序错误 ts 和其他类例如处理 403 错误创建类拒绝访问 ts 扩展了这个基类在基类中注入了一个服务toastrSe
用逗号分割字符串到新行

我有一个像这样的字符串 This is great day tomorrow is a better day the day after is a better day the day after the day after that is
将误差线添加到多条线上以在 R 中的绘图上显示标准差

我有一个包含许多不同线条的图我想为每条线上的每个点添加误差线 df lt matrix runif 25 5 5 plot 1 5 seq 0 1 1 4 type n mapply lines as data frame df col
.onLoad 在渲染完成之前调用吗？

我想在页面加载后调用一些 JS 这可能会涉及延迟因此我希望首先加载页面以便显示内容但似乎调用了 onLoad 处理程序中的代码before渲染完成是否有更好的事件可以使用该事件在页面完成时触发澄清一下我想在页面呈现在屏幕上
使用 stl sort 对表进行排序

我有一个巨大的表约 50Gb 格式为 i j k 来自稀疏矩阵存储为 uint32 t idx1 idx2 float vals uint32 t tablesize 我想使用给定的比较函数即 idx1 和 idx2 的函数对其进行
卷积神经网络 (CNN) 输入形状

我是 CNN 的新手我有一个关于 CNN 的问题我对 CNN 特别是 Keras 的输入形状有点困惑我的数据是不同时隙的二维数据比方说10X10 因此我有 3D 数据我将把这些数据输入到我的模型中来预测即将到来的时间段所以我
如何配置Lettuce Redis集群异步连接池

我正在配置我的生菜重新分配池当我按照官方文档配置时连接池无法正常初始化无法获取连接官方文档指出 RedisClusterClient clusterClient RedisClusterClient create RedisURI
为什么我无法将动态事件处理程序附加到该元素？

My code http jsfiddle net arEWv 7 HTML div style width 500px height 500px div div div
从泛型类继承时需要 T 的列表或枚举器的建议

我知道答案并不简单而且我已经使用了一些我认为丑陋的木棍我只是在寻找一些优雅的答案抽象类 public interface IOtherObjects public abstract class MyObjects
在 Spring 中使用事务时创建提交后

由于某些原因我使用 Spring PlatformTransactionManager 手动执行事务提交和回滚我需要做的是设置一个钩子以便在提交事务后发生提交后操作通过查看 void commit TransactionStatus
如何从带有短语主题的 Cortana 命令中提取参数，并通过文本激活？

高水平我想在中使用我的自定义 Cortana 命令记事本 TEXT模式例如通过按 WIN S 并输入 appname Notepad Examples moment 这将打开记事本并输入例句记事本命令已经可以在VOICE模式当
在自定义 Dask 图中包含关键字参数 (kwargs)

我正在使用 Dask 为一项操作构建自定义图表熟悉如何将参数传递给 Dask 图中的函数并阅读了docs http dask pydata org en latest custom graphs html 然而似乎还是缺少了一些东西 D
为什么是 ”＆ ;”无效的语法？

我正在尝试在终端上运行 for 循环我希望将每次迭代发送到后台进程以便所有迭代同时运行以下是命令一一运行 for i in sra do fastq dump split files i done only 我突出显示了分号要同时运
删除单元格时表格视图单元格背景变为白色 - iOS

我有一个 iOS 应用程序UITableView 我注意到当用户选择时单元格背景颜色会闪烁白色Delete button In the editActionsForRowAtIndexPath方法我创建了两个单元格按钮 Edit and
当应用程序继续运行时，如何清理 .NET 中的 COM 引用？

我正在开发一个 NET 程序该程序启动 Excel 的新实例执行一些工作然后结束但必须让 Excel 保持运行稍后当程序再次运行时它将尝试挂钩到前一个实例在这种情况下处理 COM 对象释放的最佳方法是什么如果我第一次没有对
在 Linux 上创建线程与进程的开销

我试图回答在 python 中创建线程与进程有多少开销的问题我修改了类似问题的代码该问题基本上运行一个带有两个线程的函数然后运行带有两个进程的相同函数并报告时间 import time sys NUM RANGE 100000000

在 Linux 上创建线程与进程的开销

在 Linux 上创建线程与进程的开销 的相关文章

随机推荐

热门标签

在 Linux 上创建线程与进程的开销的相关文章