tags: C++ Concurrency
写在前面
在C++ 中实现多线程还是很容易的, 不像C的pthreads接口, 下面来总结一下C++多线程的一些基本操作, 包括线程的创建, 合并, 分离, 获取ID等操作, 主要参考了**C++并发编程实战(第二版)**的第一二章, 这本书应该是C++并发必看的经典了.
另外参考:
std::thread;
一些有用的程序
用于辅助
#include <iostream>
#include <cassert>
#include <chrono>
#include <thread>
using namespace std;
using namespace std::chrono;
using namespace std::literals;
睡眠
测试多线程, 不加睡眠系统实在是太容易假死了.
this_thread::sleep_for(1s);
计时
auto start = system_clock::now();
auto end = system_clock::now();
auto duration = duration_cast<microseconds>(end - start);
cout << "Time spent: "
<< double(duration.count()) * microseconds::period::num /
microseconds::period::den
<< "s" << endl;
线程基础
头文件thread
查看硬件支持
我的是8核CPU.
#include <iostream>
#include <thread>
int main(int argc, char const *argv[]) {
std::cout << std::thread::hardware_concurrency();
return 0;
}
如果是1核, 那就只能实现并发而不能实现并行了.
创建与合并(join)
构造函数: 直接传入函数名(函数指针), 以及对应的参数(如果有), 需要注意线程的join(), 否则主线程不会等待子线程结束.
void fun() { cout << "Hello t1!\n"; }
void t1() {
thread t1(&fun);
if (t1.joinable()) cout << "t1 is joinable\n", t1.join();
}
其他创建方法:
- 传入函数对象: 临时对象, 即右值
- 传入函数对象: 具名对象, 即左值
- 传入lambda表达式
void t2() {
thread t2([] { cout << "Hello t2!\n"; });
t2.join();
}
struct Foo {
void operator()() const { cout << "Hello t3!\n"; }
};
void t3() {
thread t3{Foo()};
t3.join();
}
struct Foo1 {
void operator()() const { cout << "Hello t4!\n"; }
};
void t4() {
Foo1 f;
thread t4(f);
t4.join();
}
void t5() {
auto t5 = thread([] { cout << "Hello t5!\n"; });
t5.join();
}
事实上使用join()方法等待线程是一刀切
式的, 即要么不等待, 要么一直等待, 之后会采用期值(future)或者条件变量(condition_variable)来做.
并且线程只能被join一次.
int main() {
thread t1([] { cout << "AA\n"; });
t1.join();
cout << t1.joinable() << endl;
t1.join();
}
线程分离: detach
分离的线程不受主线程(即main函数)的管理, 而是由C++runtime库管理(成为daemon守护/后台进程).
但是分离线程之后就无法等待线程结束了
void t1() {
thread t([] {
cout << "detached thread\n";
this_thread::sleep_for(1s);
});
t.detach();
assert(!t.joinable());
this_thread::sleep_for(1s);
cout << "Main thread\n";
}
int main(int argc, char const* argv[]) {
auto start = system_clock::now();
t1();
auto end = system_clock::now();
auto duration = duration_cast<microseconds>(end - start);
cout << "Time spent: "
<< double(duration.count()) * microseconds::period::num /
microseconds::period::den
<< "s" << endl;
return 0;
}
可见主线程和分离的线程(几乎)同时结束. 耗时1s.
上面代码中, 如果用join而不是detach, 那么用时就是2s, 大家可以测试一下.
获取id
两种获取方法:
- 直接对thread对象调用成员函数
.get_id()
; - 通过在对应线程中(即传入线程的函数中)调用
this_thread::get_id()
.
int main() {
cout << "null thread id: " << thread().get_id() << endl;
cout << "null thread id: " << thread::id() << endl;
thread t1([] {
cout << "Hello t1!\n";
cout << "t1 thread id(use this_thread::get_id): "
<< this_thread::get_id() << endl;
});
cout << "main thread id: " << this_thread::get_id() << endl;
cout << "t1 id(use t1.get_id): " << t1.get_id() << endl;
t1.join();
}
线程实战
参数传递的小问题
case 1: 常量引用
线程具有内部存储空间, 参数会按照默认方式先复制到该处, 新创建的线程才能直接访问它们.
然后, 这些副本被当成临时变量, 以右值形式传给新线程上的函数或者可调用对象.
即便函数的相关参数是引用, 上述过程依然会发生.
void oops() {
auto f = [](int i, string const& s) { cout << i << s << endl; };
char buf[1024];
snprintf(buf, 10, "%i", 100);
thread t(f, 3, string(buf));
t.detach();
}
自动变量: 代码块内声明或者定义的局部变量, 位于程序的栈区.
case 2: 非常量引用
class Widget {};
void oops_again() {
auto f = [](int id, Widget& w) {};
Widget w1;
thread t(f, 10, std::ref(w1));
t.join();
}
针对非常量引用, 由于这种形参不能接受右值变量, 所以一定要加上std::ref
修饰(配接器)
case 3: 成员函数
class X {
public:
void do_something() { cout << "do_something\n"; }
};
void t2() {
X my_x;
thread t(&X::do_something, &my_x);
}
针对成员函数的参数传递, 需要考虑形参的顺序(将成员的地址作为成员函数的第一个参数, 然后才传入成员函数的参数)
case 4: 智能指针的控制权转移
void process(unique_ptr<X>){}
void t3(){
unique_ptr<X> p(new X);
p->do_something();
thread t(process, std::move(p));
}
通过 std::move() 移交控制权
移动语义支持
通过移动语义, thread可以实现控制权移交.
void f() { cout << "f()\n"; }
void g() { cout << "g()\n"; }
void test1() {
thread t1(f);
t1.join();
thread t2 = move(t1);
t1 = thread(g);
t1.join();
thread t3;
t3 = move(t2);
t1 = move(t3);
}
void f3(thread t) {}
void g3() {
f3(thread(f));
thread t(f);
f3(std::move(t));
}
std::move() 仅仅将左值强制类型转换为右值, 但是不进行其他操作, 真正移交控制权的时刻是 t2 的move构造调用时(初始化)
并行版的accumulate
template <typename Iterator, typename T>
struct accmuluate_block {
void operator()(Iterator first, Iterator last, T& result) {
result = accumulate(first, last, result);
}
};
template <typename Iterator, typename T>
T parallel_accumulate(Iterator first, Iterator last, T init) {
unsigned long const length = distance(first, last);
if (!length) return init;
unsigned long const min_per_thread = 25;
unsigned long const max_threads =
(length + min_per_thread - 1) / min_per_thread;
unsigned long const hardeare_threads = thread::hardware_concurrency();
unsigned long const num_threads =
min(hardeare_threads != 0 ? hardeare_threads : 2, max_threads);
unsigned long const block_size = length / num_threads;
vector<T> results(num_threads);
vector<thread> threads(num_threads - 1);
Iterator block_start = first;
for (unsigned long i{}; i < num_threads - 1; ++i) {
Iterator block_end = block_start;
advance(block_end, block_size);
threads[i] = thread(accmuluate_block<Iterator, T>(), block_start,
block_end, ref(results[i]));
block_start = block_end;
}
accmuluate_block<Iterator, T>()(block_start, last,
results[num_threads - 1]);
for (auto& entry : threads) entry.join();
return accumulate(results.begin(), results.end(), init);
}
vector<int> get_vec() {
vector<int> v;
for (int i{}; i < 10000000; ++i) v.emplace_back(i);
return v;
}
void t1() {
auto v = get_vec();
auto start = system_clock::now();
int ans = parallel_accumulate(v.begin(), v.end(), 0);
auto end = system_clock::now();
auto duration = duration_cast<microseconds>(end - start);
cout << "Time spent: "
<< double(duration.count()) * microseconds::period::num /
microseconds::period::den
<< "s" << endl;
cout << ans;
}
确实是快了将近8倍…
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)