Dont copy space nor new line srcpy in c9/19/2023 ![]() Other architectures are broadly similar, except for how useful SIMD is for strcpy. (Or close to 2 on Ice Lake, probably bottlenecked on the 5-wide front-end for the load / macro-fused test/jz / store.) So it's a disaster for medium to large copies with runtime-variable source where the compiler can't remove the loop. ![]() Writing it yourself with one char at a time will end up using byte load/store instructions, so can go at most 1 byte per clock cycle. (Or better, stpcpy to get a pointer to the end, so you know the string length afterwards, allowing you to use explicit-length stuff instead of the next function also having to scan the string for length.) It seems modern GCC/clang do not recognize this pattern as strcpy the way they recognize memcpy-equivalent loops, so if you want efficient copying for unknown-size C strings, you need to use actual strcpy. For example for x86-64, glibc's AVX2 version ( ) should be able to copy 32 bytes per clock cycle for medium-sized copies with source and destination hot in cache, on mainstream CPUs like Zen2 and Skylake. The libc function uses hand-written asm to do it efficiently as it goes, especially on ISAs like x86 where SIMD can help. ![]() If you actually just call strcpy(dst, src), the compiler will either expand it inline in some efficient way, or emit an actual call to the library function. they fail at search loops like strlen and strcpy. That's slower because it has to search for the terminating 0 byte, if it wasn't known at compile time after inlining and unrolling the loop.Įspecially if you do it by hand like this for non-constant strings modern gcc/clang are unable to auto-vectorize loops there the program can't calculate the trip-count ahead of the first iteration. Your 2nd way is equivalent to strcpy for an implicit-length string. See Idiomatic way of performance evaluation? This example is too small and too simplistic to actually be usable as a benchmark, though. Remember that CPUs run asm, not C directly. See this and this Q&A for examples of how code actually compiles. It doesn't matter how you do the array indexing optimizing compilers can see through size_t indices or pointers and make good asm for the target platform. Your copy loop will actually get recognized by modern compilers as a fixed-size copy, and (if large) turned into an actual call to memcpy, otherwise usually optimized similarly. Your first block of code has a compile-time-constant size (you were able to use sizeof instead of strlen). So if you're avoiding memcpy because of the overhead of a library function call for a short copy, don't worry, there won't be one if the length is a compile-time constant.īut even in the unknown / runtime-variable length cases, the library functions will usually be an optimized version hand-written in asm that's much faster (especially for medium to large strings) than anything you can do in pure C, especially for strcpy without undefined behaviour from reading past the end of a buffer. Modern compilers treat these as "builtin" functions, so for constant sizes can expand them to a few asm instructions instead of actually setting up a call to the library implementation, which would have to branch on the size and so on. Don't write your own copy loops when you can use a standard function like memcpy (when the length is known) or strcpy (when it isn't).
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |