为什么同一核心（超线程）中的两个线程的 L1 写访问比两个核心的 L1 写访问最差？

2024-02-01

我制作了一个 c/c++ 程序（混合了printf and std::）以了解不同的缓存性能。我想并行化一个正在计算大块内存的进程。我必须在相同的内存位置上进行多次计算，因此我将结果写入到位，覆盖源数据。当第一个微积分完成后，我会用之前的结果进行另一个微积分。

我猜想如果我有两个线程，一个进行第一个微积分，另一个进行第二个微积分，我会提高性能，因为每个线程完成一半的工作，从而使进程速度提高两倍。我已经了解了缓存的工作原理，所以我知道如果做得不好，情况可能会更糟，所以我编写了一个小程序来测量所有内容。

（有关机器拓扑、CPU 类型和标志以及源代码，请参阅下文。）

我看到了一些奇怪的结果。显然，从以下位置获取数据没有区别L1, L2, L3 or RAM以便进行计算。我是否在同一个缓冲区或两个不同的缓冲区（它们之间的内存距离）中工作并不重要，除非它们位于同一核心中。我的意思是：最糟糕的结果是当两个线程位于同一核心（超线程）时。我用CPU亲和力设置它们

我的程序有一些选项，但它们是不言自明的。

这些是命令和结果：

./main --loops 200 --same-buffer --flush

200000 loops.
Flushing caches.
Cache size: 32768
Using same buffer.
Running in cores 0 and 1.
Waiting 2 seconds just for threads to be ready.
Post threads to begin work 200000 iterations.
Thread two created, pausing.
Go ahead and calculate in 2...
Buffer address: 0x7f087c156010.
Waiting for thread semaphores.
Thread one created, pausing.
Go ahead and calculate in 1...
Buffer address: 0x7f087c156010.
Time 1 18.436685
Time 2 18.620263
We don't wait anymore.
Joining threads.
Dumping data.
Exiting from main thread.

我们可以看到它运行在核心 0 和 1 中，根据我的拓扑，不同的核心。缓冲区地址相同：0x7f087c156010.

时间：18秒。

现在在同一个核心：

./main --loops 200 --same-buffer --same-core --flush

200000 loops.
Flushing caches.
Cache size: 32768
Using same buffer.
Using same core. (HyperThreading)
Thread one created, pausing.
Thread two created, pausing.
Running in cores 0 and 6.
Waiting 2 seconds just for threads to be ready.
Post threads to begin work 200000 iterations.
Waiting for thread semaphores.
Go ahead and calculate in 1...
Buffer address: 0x7f0a6bbe1010.
Go ahead and calculate in 2...
Buffer address: 0x7f0a6bbe1010.
Time 1 26.572419
Time 2 26.951195
We don't wait anymore.
Joining threads.
Dumping data.
Exiting from main thread.

我们可以看到它运行在核心 0 和 6 中，根据我的拓扑，相同的核心，两个超线程。相同的缓冲区。

时间：26秒。

所以慢了10秒。

这怎么可能？我知道如果缓存行不脏，就不会从内存（L1、2、3 或 RAM）中获取它。我已经让程序写入替代的 64 字节数组，因此与一个缓存行相同。如果一个线程写入缓存行 0，则另一个线程写入缓存行 1，因此不存在缓存行冲突。

这是否意味着两个超线程即使共享一级缓存也不能同时写入？

显然，在两个不同的核心中工作比单独一个核心工作效果更好。

- 编辑 -

根据评论者的建议和马克斯·兰霍夫 https://stackoverflow.com/users/9528746/max-langhof，我已经包含了对齐缓冲区的代码。我还添加了一个选项来错位缓冲区以查看差异。

我不确定对齐和错误代码，但我已经复制了here https://stackoverflow.com/questions/227897/how-to-allocate-aligned-memory-only-using-the-standard-library

就像他们告诉我的那样，测量未优化的代码是浪费时间。

对于优化的代码，结果非常有趣。我发现令人惊讶的是，即使数据未对齐并且具有两个核心，它也需要相同的时间，但我认为这是因为内部循环中的工作量很小。（我想这表明了当今处理器的设计有多好。）

数字（使用 perf stat -d -d -d 获取）：

*** Same core

No optimization
---------------
No aligment
    39.866.074.445      L1-dcache-loads           # 1485,716 M/sec                    (21,75%)
        10.746.914      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (20,84%)
Aligment
    39.685.928.674      L1-dcache-loads           # 1470,627 M/sec                    (22,77%)
        11.003.261      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (27,37%)
Misaligment
    39.702.205.508      L1-dcache-loads           # 1474,958 M/sec                    (24,08%)
        10.740.380      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (29,05%)


Optimization
------------
No aligment
    39.702.205.508      L1-dcache-loads           # 1474,958 M/sec                    (24,08%)
        10.740.380      L1-dcache-load-misses     #    0,03% of all L1-dcache hits    (29,05%)
       2,390298203 seconds time elapsed
Aligment
        19.450.626      L1-dcache-loads           #   25,108 M/sec                    (23,21%)
         1.758.012      L1-dcache-load-misses     #    9,04% of all L1-dcache hits    (22,95%)
       2,400644369 seconds time elapsed
Misaligment
         2.687.025      L1-dcache-loads           #    2,876 M/sec                    (24,64%)
           968.413      L1-dcache-load-misses     #   36,04% of all L1-dcache hits    (12,98%)
       2,483825841 seconds time elapsed

*** Two cores

No optimization
---------------
No aligment
    39.714.584.586      L1-dcache-loads           # 2156,408 M/sec                    (31,17%)
       206.030.164      L1-dcache-load-misses     #    0,52% of all L1-dcache hits    (12,55%)
Aligment
    39.698.566.036      L1-dcache-loads           # 2129,672 M/sec                    (31,10%)
       209.659.618      L1-dcache-load-misses     #    0,53% of all L1-dcache hits    (12,54%)
Misaligment
         2.687.025      L1-dcache-loads           #    2,876 M/sec                    (24,64%)
           968.413      L1-dcache-load-misses     #   36,04% of all L1-dcache hits    (12,98%)


Optimization
------------
No aligment
        16.711.148      L1-dcache-loads           #    9,431 M/sec                    (31,08%)
       202.059.646      L1-dcache-load-misses     # 1209,13% of all L1-dcache hits    (12,87%)
       2,898511757 seconds time elapsed
Aligment
        18.476.510      L1-dcache-loads           #   10,484 M/sec                    (30,99%)
       202.180.021      L1-dcache-load-misses     # 1094,25% of all L1-dcache hits    (12,83%)
       2,894591875 seconds time elapsed
Misaligment
        18.663.711      L1-dcache-loads           #   11,041 M/sec                    (31,28%)
       190.887.434      L1-dcache-load-misses     # 1022,77% of all L1-dcache hits    (13,22%)
       2,861316941 seconds time elapsed

-- 编辑结束 --

该程序使用缓冲区转储创建日志文件，因此我已经验证它按预期工作（您可以在下面看到）。

我还有 ASM，我们可以在其中看到循环正在执行某些操作。

 269:main.cc       ****             for (int x = 0; x < 64; ++x)
 1152                   .loc 1 269 0 is_stmt 1
 1153 0c0c C745F000         movl    $0, -16(%rbp)   #, x
 1153      000000
 1154               .L56:
 1155                   .loc 1 269 0 is_stmt 0 discriminator 3
 1156 0c13 837DF03F         cmpl    $63, -16(%rbp)  #, x
 1157 0c17 7F26             jg  .L55    #,
 270:main.cc       ****                 th->cache->cache[i].data[x] = '2';
 1158                   .loc 1 270 0 is_stmt 1 discriminator 2
 1159 0c19 488B45E8         movq    -24(%rbp), %rax # th, tmp104
 1160 0c1d 488B4830         movq    48(%rax), %rcx  # th_9->cache, _25
 1161 0c21 8B45F0           movl    -16(%rbp), %eax # x, tmp106
 1162 0c24 4863D0           movslq  %eax, %rdx  # tmp106, tmp105
 1163 0c27 8B45F4           movl    -12(%rbp), %eax # i, tmp108
 1164 0c2a 4898             cltq
 1165 0c2c 48C1E006         salq    $6, %rax    #, tmp109
 1166 0c30 4801C8           addq    %rcx, %rax  # _25, tmp109
 1167 0c33 4801D0           addq    %rdx, %rax  # tmp105, tmp110
 1168 0c36 C60032           movb    $50, (%rax) #, *_25
 269:main.cc       ****             for (int x = 0; x < 64; ++x)

这是转储的一部分：

== buffer ==============================================================================================================
00000001 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000002 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000003 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000004 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 0x31 
00000005 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 
00000006 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 
00000007 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 
00000008 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32 0x32

我的机器拓扑：

这是 CPU 类型和标志。

processor   : 11
vendor_id   : GenuineIntel
cpu family  : 6
model       : 45
model name  : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
stepping    : 7
microcode   : 0x70b
cpu MHz     : 1504.364
cache size  : 15360 KB
physical id : 0
siblings    : 12
core id     : 5
cpu cores   : 6
apicid      : 11
initial apicid  : 11
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb kaiser tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
bugs        : cpu_meltdown spectre_v1 spectre_v2
bogomips    : 4987.77
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

这是完整的源代码：

//
//
//
//
#include <emmintrin.h>
#include <x86intrin.h>
#include <stdio.h>
#include <time.h>
#include <ctime>
#include <semaphore.h>
#include <pthread.h>
#include <string.h>
#include <string>


struct cache_line {
    char data[64];
};

//
// 32768 = 32 Kb = 512 64B cache lines
struct cache_l1 {
    struct cache_line cache[512];
};

size_t TOTAL = 100000;

void * thread_one (void * data);
void * thread_two (void * data);

void dump (FILE * file, char * buffer, size_t size);

class thread {
public:
    sem_t sem;
    sem_t * glob;
    pthread_t thr;
    struct cache_l1 * cache;
};

bool flush = false;

int main (int argc, char ** argv)
{
    bool same_core = false;
    bool same_buffer = false;
    bool align = false;
    bool misalign = false;
    size_t reserve_mem = 32768; // 15MB 15.728.640
    std::string file_name ("pseudobench_");
    std::string core_option ("diffcore");
    std::string buffer_option ("diffbuff");
    std::string cache_option ("l1");

    for (int i = 1; i < argc; ++i) {
        if (::strcmp("--same-core", argv[i]) == 0) {

            same_core = true;
            core_option = "samecore";

        } else if (::strcmp("--same-buffer", argv[i]) == 0) {

            same_buffer = true;
            buffer_option = "samebuffer";

        } else if (::strcmp("--l1", argv[i]) == 0) {

            // nothing already L1 cache size

        } else if (::strcmp("--l2", argv[i]) == 0) {

            reserve_mem *= 8; // 256KB, L2 cache size
            cache_option = "l2";

        } else if (::strcmp("--l3", argv[i]) == 0) {

            reserve_mem *= 480; // 15MB, L3 cache size
            cache_option = "l3";

        } else if (::strcmp("--ram", argv[i]) == 0) {

            reserve_mem *= 480; // 15MB, plus two times L1 cache size
            reserve_mem += sizeof(struct cache_l1) * 2;
            cache_option = "ram";

        } else if (::strcmp("--loops", argv[i]) == 0) {

            TOTAL = ::strtol(argv[++i], nullptr, 10) * 1000;
            printf ("%ld loops.\n", TOTAL);

        } else if (::strcmp("--align", argv[i]) == 0) {

            align = true;
            printf ("Align memory to 16 bytes.\n");

        } else if (::strcmp("--misalign", argv[i]) == 0) {

            misalign = true;
            printf ("Misalign memory.\n");

        } else if (::strcmp("--flush", argv[i]) == 0) {

            flush = true;
            printf ("Flushing caches.\n");

        } else if (::strcmp("-h", argv[i]) == 0) {

            printf ("There is no help here. Please put loops in units, "
                    "they will be multiplicated by thousands. (Default 100.000 EU separator)\n");
        } else {
            printf ("Unknown option: '%s', ignoring it.\n", argv[i]);
        }
    }

    char * ch = new char[(reserve_mem * 2) + (sizeof(struct cache_l1) * 2) + 16];
    struct cache_l1 * cache4 = nullptr;
    struct cache_l1 * cache5 = nullptr;

    if (align) {
        // Align memory (void *)(((uintptr_t)ch+15) & ~ (uintptr_t)0x0F);
        cache4 = (struct cache_l1 *) (((uintptr_t)ch + 15) & ~(uintptr_t)0x0F);
        cache5 = (struct cache_l1 *) &cache4[reserve_mem - sizeof(struct cache_l1)];
        cache5 = (struct cache_l1 *)(((uintptr_t)cache5) & ~(uintptr_t)0x0F);
    } else {
        cache4 = (struct cache_l1 *) ch;
        cache5 = (struct cache_l1 *) &ch[reserve_mem - sizeof(struct cache_l1)];
    }

    if (misalign) {
        cache4 = (struct cache_l1 *) ((char *)cache4 + 5);
        cache5 = (struct cache_l1 *) ((char *)cache5 + 5);
    }

    (void)cache4;
    (void)cache5;

    printf ("Cache size: %ld\n", sizeof(struct cache_l1));

    if (cache_option == "l1") {
        // L1 doesn't allow two buffers, so same buffer
        buffer_option = "samebuffer";
    }

    sem_t globsem;

    thread th1;
    thread th2;

    if (same_buffer) {
        printf ("Using same buffer.\n");
        th1.cache = cache5;
    } else {
        th1.cache = cache4;
    }
    th2.cache = cache5;

    sem_init (&globsem, 0, 0);

    if (sem_init(&th1.sem, 0, 0) < 0) {
        printf ("There is an error with the 1 semaphore.\n");
    }
    if (sem_init(&th2.sem, 0, 0) < 0) {
        printf ("There is an error with the 2 semaphore.\n");
    }

    th1.glob = &globsem;
    th2.glob = &globsem;

    cpu_set_t cpuset;
    int rc = 0;

    pthread_create (&th1.thr, nullptr, thread_one, &th1);
    CPU_ZERO (&cpuset);
    CPU_SET (0, &cpuset);
    rc = pthread_setaffinity_np(th1.thr,
                                sizeof(cpu_set_t),
                                &cpuset);
    if (rc != 0) {
        printf ("Can't change affinity of thread one!\n");
    }

    pthread_create (&th2.thr, nullptr, thread_two, &th2);
    CPU_ZERO (&cpuset);
    int cpu = 1;

    if (same_core) {
        printf ("Using same core. (HyperThreading)\n");
        cpu = 6; // Depends on CPU topoglogy (see that with lstopo)
    }

    CPU_SET (cpu, &cpuset);
    rc = pthread_setaffinity_np(th2.thr,
                                sizeof(cpu_set_t),
                                &cpuset);
    if (rc != 0) {
        printf ("Can't change affinity of thread two!\n");
    }

    printf ("Running in cores 0 and %d.\n", cpu);

    fprintf (stderr, "Waiting 2 seconds just for threads to be ready.\n");
    struct timespec time;
    time.tv_sec = 2;
    nanosleep (&time, nullptr);

    fprintf (stderr, "Post threads to begin work %ld iterations.\n", TOTAL);

    sem_post (&globsem);
    sem_post (&globsem);

    printf ("Waiting for thread semaphores.\n");

    sem_wait (&th1.sem);
    sem_wait (&th2.sem);

    printf ("We don't wait anymore.\n");

    printf ("Joining threads.\n");
    pthread_join (th1.thr, nullptr);
    pthread_join (th2.thr, nullptr);

    printf ("Dumping data.\n");
    file_name += core_option;
    file_name += "_";
    file_name += buffer_option;
    file_name += "_";
    file_name += cache_option;
    file_name += ".log";
    FILE * file = ::fopen(file_name.c_str(), "w");
    if (same_buffer)
        dump (file, (char *)cache5, sizeof(struct cache_l1));
    else {
        dump (file, (char *)cache4, sizeof(struct cache_l1));
        dump (file, (char *)cache5, sizeof(struct cache_l1));
    }
    printf ("Exiting from main thread.\n");
    return 0;
}

void * thread_one (void * data)
{
    thread * th = (thread *) data;
    printf ("Thread one created, pausing.\n");
    if (flush)
        _mm_clflush (th->cache);
    sem_wait (th->glob);

    printf ("Go ahead and calculate in 1...\n");
    printf ("Buffer address: %p.\n", th->cache);
    clock_t begin, end;
    double time_spent;
    register uint64_t counter = 0;
    begin = clock();
    for (size_t z = 0; z < TOTAL; ++z ) {
        ++counter;
        for (int i = 0; i < 512; i += 2) {
            ++counter;
            for (int x = 0; x < 64; ++x) {
                ++counter;
                th->cache->cache[i].data[x] = '1';
            }
        }
    }
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;    
    printf ("Time 1 %f %ld\n", time_spent, counter);

    sem_post (&th->sem);

    return nullptr;
}

void * thread_two (void * data)
{
    thread * th = (thread *) data;
    printf ("Thread two created, pausing.\n");
    if (flush)
        _mm_clflush (th->cache);
    sem_wait (th->glob);

    printf ("Go ahead and calculate in 2...\n");
    printf ("Buffer address: %p.\n", th->cache);
    clock_t begin, end;
    double time_spent;
    register uint64_t counter = 0;
    begin = clock();
    for (size_t z = 0; z < TOTAL; ++z ) {
        ++counter;
        for (int i = 1; i < 512; i += 2) {
            ++counter;;
            for (int x = 0; x < 64; ++x) {
                ++counter;
                th->cache->cache[i].data[x] = '2';
            }
        }
    }
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;    
    printf ("Time 2 %f  %ld\n", time_spent, counter);

    sem_post (&th->sem);

    return nullptr;
}

void dump (FILE * file, char * buffer, size_t size)
{
    size_t lines = 0;
    fprintf (file, "\n");
    fprintf (file, "== buffer =================================================="
             "============================================================\n");

    for (size_t i = 0; i < size; i += 16) {
        fprintf (file, "%08ld %p ", ++lines, &buffer[i]);
        for (size_t x = i; x < (i+16); ++x) {
            if (buffer[x] >= 32 && buffer[x] < 127)
                fprintf (file, "%c ", buffer[x]);
            else
                fprintf (file, ". ");
        }
        for (size_t x = i; x < (i+16); ++x) {
            fprintf (file, "0x%02x ", buffer[x]);
        }
        fprintf (file, "\n");
    }
    fprintf (file, "== buffer =================================================="
             "============================================================\n");
}

显然，从 L1、L2、L3 或 RAM 获取数据来进行计算没有什么区别。

在请求下一个之前，您将完全遍历每个级别（以及每个页面）的每个缓存行。内存访问速度很慢，但不会慢到您可以在下一页到达之前遍历整个页面。如果您每次访问时都访问不同的 L3 缓存线或不同的 RAM 页，您肯定会注意到差异。但这样做的方式是让 CPU 在每个 L2、L3 或 RAM 请求之间处理大量指令，从而完全隐藏任何类型的缓存未命中延迟。

因此，你的记忆力丝毫不受限制。您基本上拥有最良性的使用模式：您的所有数据几乎一直都已缓存。有时您会遇到缓存未命中的情况，但与处理缓存数据所花费的时间相比，其获取时间就显得苍白无力了。此外，您的 CPU 可能会预测您的（极其可预测的）使用模式，并且在您访问内存之前就已经预取了内存。

所以慢了10秒。
这怎么可能？我知道如果缓存行不脏，就不会从内存（L1、2、3 或 RAM）中获取它。

如上所示，您不受内存限制。你受到 CPU 处理指令的速度的限制（编辑：这是由于禁用优化而加剧的，这会导致指令数量膨胀），并且两个超线程线程不会那么擅长也就不足为奇了作为不同物理核心上的两个线程。

对于这一观察结果特别重要的是，并非每对超线程核心的所有资源都是重复的。例如，执行端口（例如加法器、除法器、浮点单元等）不是共享的。下面是 Skylake 调度程序的图表来演示这一概念：

在超线程时，两个线程都必须争夺这些资源（甚至单线程程序也会因乱序执行而受到这种设计的影响）。此设计中有四个简单整数 ALU，但只有一个Store Data港口。因此，同一核心（在此 Haswell CPU 中）上的两个线程无法同时存储数据，但它们可以同时计算多个整数运算（注意：不能保证实际上端口 4 是争用的来源 - 某些 Intel 工具可能能够为您解决这个问题）。当在两个不同的物理核心之间分配负载时，不存在此限制。

在不同物理核心之间同步 L2 缓存行可能会产生一些开销（因为 L2 缓存显然不是在 CPU 的所有核心之间共享），但这很难从这里衡量。

我在此页面中找到了上面的图片，它对上述内容（以及更多内容）进行了更深入的解释：https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(客户端) https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

为什么同一核心（超线程）中的两个线程的 L1 写访问比两个核心的 L1 写访问最差？的相关文章

为什么 C# Array.BinarySearch 这么快？

我已经实施了一个很简单用于在整数数组中查找整数的 C 中的 binarySearch 实现二分查找 static int binarySearch int arr int i int low 0 high arr Length 1 mid
GLKit的GLKMatrix“列专业”如何？

前提A 当谈论线性存储器中的列主矩阵时列被一个接一个地指定使得存储器中的前 4 个条目对应于矩阵中的第一列另一方面行主矩阵被理解为依次指定行以便内存中的前 4 个条目指定矩阵的第一行 A GLKMatrix4看起来像这样 u
为什么两个不同的 Base64 字符串的转换会返回相等的字节数组？

我想知道为什么从 base64 字符串转换会为不同的字符串返回相同的字节数组 const string s1 dg const string s2 dq byte a1 Convert FromBase64String s1 byte a2
动态加载程序集的应用程序配置

我正在尝试将模块动态加载到我的应用程序中但我想为每个模块指定单独的 app config 文件假设我的主应用程序有以下 app config 设置
秒表有最长运行时间吗？

多久可以Stopwatch在 NET 中运行如果达到该限制它会回绕到负数还是从 0 重新开始 Stopwatch Elapsed返回一个TimeSpan From MSDN https learn microsoft com en us
查找c中结构元素的偏移量

struct a struct b int i float j x struct c int k float l y z 谁能解释一下如何找到偏移量int k这样我们就可以找到地址int i Use offsetof 找到从开始处的偏移量z
Asp.NET WebApi 中类似文件名称的路由

是否可以在 ASP NET Web API 路由配置中添加一条路由以允许处理看起来有点像文件名的 URL 我尝试添加以下条目WebApiConfig Register 但这不起作用使用 URIapi foo 0de7ebfa 3a55
类模板参数推导 - clang 和 gcc 不同

下面的代码使用 gcc 编译但不使用 clang 编译 https godbolt org z ttqGuL template
从Web API同步调用外部api

我需要从我的 Web API 2 控制器调用外部 api 类似于此处的要求使用 HttpClient 从 Web API 操作调用外部 HTTP 服务 https stackoverflow com questions 13222998
如何使用 ICU 解析汉字数字字符？

我正在编写一个使用 ICU 来解析由汉字数字字符组成的 Unicode 字符串的函数并希望返回该字符串的整数值五 gt 5 三十一 gt 31 五千九百七十二 gt 5972 我将区域设置设置为 Locale getJapan 并使用
用于登录 .NET 的堆栈跟踪

我编写了一个 logger exceptionfactory 模块它使用 System Diagnostics StackTrace 从调用方法及其声明类型中获取属性但我注意到如果我在 Visual Studio 之外以发布模式运行代
堆栈溢出：堆栈空间中重复的临时分配？

struct MemBlock char mem 1024 MemBlock operator const MemBlock b const return MemBlock global void foo int step 0 if ste
C# 中通过 Process.Kill() 终止的进程的退出代码

如果在我的 C 应用程序中我正在创建一个可以正常终止或开始行为异常的子进程在这种情况下我通过调用 Process Kill 来终止它但是我想知道该进程是否已退出通常情况下我知道我可以获得终止进程的错误代码但是正常的退出代码是什
如何设计以 char* 指针作为类成员变量的类？

首先我想介绍一下我的情况我写了一些类将 char 指针作为私有类成员而且这个项目有 GUI 所以当单击按钮时某些函数可能会执行多次这些类是设计的单班在项目中但是其中的某些函数可以执行多次然后我发现我的项目存在内存泄漏所以我想
如何在整个 ASP .NET MVC 应用程序中需要授权

我创建的应用程序中除了启用登录的操作之外的每个操作都应该超出未登录用户的限制我应该添加 Authorize 每个班级标题前的注释像这儿 namespace WebApplication2 Controllers Authorize p
如何序列化/反序列化自定义数据集

我有一个 winforms 应用程序它使用强类型的自定义数据集来保存数据进行处理它由数据库中的数据填充我有一个用户控件它接受任何自定义数据集并在数据网格中显示内容这用于测试和调试为了使控件可重用我将自定义数据集视为普通的 Sy
如何查看网络连接状态是否发生变化？

我正在编写一个应用程序用于检查计算机是否连接到某个特定网络并为我们的用户带来一些魔力该应用程序将在后台运行并执行检查是否用户请求托盘中的菜单我还希望应用程序能够自动检查用户是否从有线更改为无线或者断开连接并连接到新网络并执行魔
如何将带有 IP 地址的连接字符串放入 web.config 文件中？

我们当前在 web config 文件中使用以下连接字符串 add name DBConnectionString connectionString Data Source ourServer Initial Catalog ourDB P
如何在Xamarin中删除ViewTreeObserver？

假设我需要获取并设置视图的高度在 Android 中众所周知只有在绘制视图之后才能获取视图高度如果您使用 Java 有很多答案最著名的方法之一如下取自这个答案 https stackoverflow com a 24035591
基于 OpenCV 边缘的物体检测 C++

我有一个应用程序我必须检测场景中某些项目的存在这些项目可以旋转并稍微缩放更大或更小我尝试过使用关键点检测器但它们不够快且不够准确因此我决定首先使用 Canny 或更快的边缘检测算法检测模板和搜索区域中的边缘然后匹配边缘以查

随机推荐

Udacity Sunshine 应用程序 - 第 1 课 - 尝试显示模拟 ListView 时出现 NullPointerException

我正在 udacity com 上接受 Android 开发培训并完成 Sunshine 应用程序的实施我使用的是Android Studio 最新版本默认安装我现在应该有一个带有模拟数据的 ListView 但我得到了一个 Null
让 celery 等待任务完成

我希望 celery 等待特定任务完成因此我在 celery 本身旁边安装了 celery results backend 但我不明白如何编写任务调用才能等待因为我当前收到以下错误 example task missing 1 requ
自定义 RepositoryRestController 映射 url 在 spring-data-rest 中抛出 404

我写了一个自定义RepositoryRestController使用其相应的实体存储库对此 url 执行请求时查询正在我的控制台中运行但 url 返回 404 我还可以在日志中看到此 url 的 requestHandlerMappi
谷歌地图自动打开信息窗口

我在获取谷歌地图上的信息窗口以自动打开时遇到问题我已按照此处的教程进行操作 https www taniarascia com google maps apis for multiple locations https www tania
org.openqa.selenium.StaleElementReferenceException：迭代列表时元素未附加到页面文档

我不知道为什么会出现这个错误需要帮助来修复它我正在开发的网站 http freevideolectures com Course 3680 Pentaho BI http freevideolectures com Course 368
Google Analytics：同时运行经典和通用跟踪

我最近需要在一个已经升级到通用跟踪代码的网站上启用人口统计和兴趣报告由于通用跟踪代码直到第 3 阶段才设置为支持人口统计和兴趣报告因此我决定为同一网站创建一个新的 Google Analytics 属性网站 SEOcial http
ASP.net 中的会话变量有限制吗？

我将填充DataTable和别的controls来自一个复杂的object 我应该在哪里存储这样的object 会话变量多大时开始影响页面的性能数据在Session对象存储在服务器的内存中因此存储限制是服务器可用的内存除非您明确这样
使用 link_to Rails 设置会话变量

是否可以使用 link to 设置会话变量我不想设置参数因为我有几个重定向并且它被擦除了即我想通过链接将会话变量 modelid 设置为 you 我想在 FB 登录 oauth 运行时设置会话变量您可以在其中一个控制器中创建一个操
确保只有一名工作人员在运行多个工作人员的金字塔 Web 应用程序中启动 apscheduler 事件

我们有一个用金字塔制作的网络应用程序并通过gunicorn nginx提供服务它与 8 个工作线程进程一起工作我们需要工作我们选择了 apscheduler 这是我们启动它的方式 from apscheduler events i
在 PHP 中递归收集图像文件路径

我正在开发一个相当大的 PHP 类它通过命令行执行图像优化的很多操作您基本上可以通过该程序Image path or a Folder path里面有多个图像然后它通过最多 5 个其他优化图像的命令行程序运行这些文件下面是收集图像
如何检索当前接收按键事件的 OSX 应用程序

我正在关注可可文档 https developer apple com library mac documentation cocoa reference applicationkit classes nsworkspace class r
使用声明（派生类）

struct B1 int d void fb struct B2 B1 using B1 d using B1 fb int d why this gives error void fb and this does not int mai
使用history.pushState 和popstate 进行Ajax - 当popstate 状态属性为null 时我该怎么办？

我正在尝试使用 ajax 加载内容的 HTML5 历史记录 API 我有一堆通过相关链接连接的测试页面我有这个 JS 它处理这些链接的点击单击链接时处理程序会获取其 href 属性并将其传递给 ajaxLoadPage 后者将请求页面
Angular2 - 从输入文件读取二进制文件并将其绑定到对象

我已获取绑定到结构化对象数组的多个文件上传输入的文件二进制内容场景是这样的我有一堂这样的课 export class PriceList public idPriceList number public code string publ
来自字节数组 Cocoa 应用程序 Xamarin C# 的 NSImage 源

这是我的第一个问题我创建了 WPF 应用程序其中image control源设置为随机图像的位图数组它工作完美该数组包含以下信息R G B每个像素的顺序现在我必须在 Mac 的 Cocoa 应用程序中做同样的事情但我遇到了问题
将 XmlNodeList 转换为 XmlNode[]

我有一个外部库需要 XmlNode 而不是 XmlNodeList 有没有一种直接的方法可以做到这一点而无需迭代和传输每个节点我不想这样做 XmlNode exportNodes XmlNode myNodeList Count in
使用 scipy.ODR 的线性回归失败（解决方案未满级）

尝试使用 scipy odr 进行线性回归也是如此然而它却惨遭失败 scipy odr 之前曾为我工作过我在代码中没有看到任何错误我能想到的唯一原因是斜率可能太小但我不明白这会如何困扰 scipy 感谢您的帮助代码 usr bi
有没有办法禁用 .NET 浏览器检测？

我没能找到禁用asp net 2 0添加的浏览器检测功能的方法 I want all对我的页面的请求将被视为 IE 正在请求它们我的修复方法是将 App Browsers 文件夹添加到我的项目中其中包含以下 browser 文件
使用 JAX-WS 构建大型 MTOM/XOP 消息

我有一个关于将 MTOM XOP 与 JAX WS 一起使用的问题我正在编写一个发送大量二进制数据的网络服务客户端请求多个文件服务器在响应中返回文件我能够让它正确构建响应以便它正确实现 XOP 但我遇到了与内存相关的问题因为它存
为什么同一核心（超线程）中的两个线程的 L1 写访问比两个核心的 L1 写访问最差？

我制作了一个 c c 程序混合了printf and std 以了解不同的缓存性能我想并行化一个正在计算大块内存的进程我必须在相同的内存位置上进行多次计算因此我将结果写入到位覆盖源数据当第一个微积分完成后我会用之前的结果进行另

为什么同一核心（超线程）中的两个线程的 L1 写访问比两个核心的 L1 写访问最差？

为什么同一核心（超线程）中的两个线程的 L1 写访问比两个核心的 L1 写访问最差？ 的相关文章

随机推荐

热门标签

为什么同一核心（超线程）中的两个线程的 L1 写访问比两个核心的 L1 写访问最差？的相关文章