Skip to content

Use find_char/memchr to improve performance of str.split #145797

@KowalskiThomas

Description

@KowalskiThomas

Description

I propose to use memchr (link) (and memrchr) to speed up str.split (and str.resplit).

Currently, str.split (and str.rsplit) use a basic loop to iterate through the string to find delimiters.
Using memchr and memrchr can provide a significant performance boost as they are typically heavily optimised / implemented using SIMD instructions.

Note that no complicated change is needed to make this happen, as both functions exist within STRINGLIB (find_char and rfind_char).

Interestingly, there is a comment today in the code stating that using memchr does not provide any meaningful improvements, but benchmarks seem to disagree. I suspect that this is because this code was written ~16y ago (f2c5484) and memchr implementations (and, potentially, hardware) have evolved and started to difference since then.

Proposed change

I have a branch in my fork that implements this change and that I can open if we want to go forward with that.

The long story short code-wise is that the existing for loop iterating on characters would be replaced with a call to STRINGLIB(find_char).

Benchmarks

I made this pyperf benchmarking script which should show whether the proposed change helps. (I don't think there's a pyperformance benchmark for that.)

import pyperf

SIZES = [100, 1_000, 10_000, 100_000, 1_000_000]
SEGMENT_LENGTHS = [2, 10, 50, 250, 500, 1000, 10_000, 25_000, 100_000, 250_000, 500_000, 999_999]

CHAR_SETS = {
    "ascii": ("a", " "),
    "latin": ("é", " "),
    # we have two CJK cases because find_char has a different implementation for
    # (sep & 0xff == 0) and (sep & 0xff != 0)
    "cjk": ("\u4e16", "\u3000"),  # CJK ideograph + ideographic space (sep & 0xff == 0)
    "cjk_nz": ("\u4e16", "\u3001"),  # CJK ideograph + ideographic comma (sep & 0xff != 0)
    "emoji": ("\U0001f600", " "),
}


def make_string(char: str, sep: str, size: int, seg_len: int) -> str:
    segment = char * seg_len + sep
    repeats = max(1, size // len(segment))
    return (segment * repeats)[:size]


def bench_split_sep(s: str, sep: str):
    s.split(sep)


def add_benchmarks(runner: pyperf.Runner):
    for charset_name, (char, sep) in CHAR_SETS.items():
        for size in SIZES:
            for seg_len in SEGMENT_LENGTHS:
                if seg_len > size:
                    continue

                s = make_string(char, sep, size, seg_len)
                name = f"split_{charset_name}_size{size}_seg{seg_len}"
                runner.bench_func(name, bench_split_sep, s, sep)


runner = pyperf.Runner()
add_benchmarks(runner)

I ran the benchmarks on two computers (running macOS and Linux), with --enable-optimizations --enable-lto. I don't have a Windows machine around, let alone with a C compiler available, but if needed I may be able to find one.
Optimisation flags actually are important in that case. I initially ran benchmarks on a build without LTO and it led to slightly worse results in certain cases (probably due to the find_char function not being inlined, although I don't have proof).

Overall, what the benchmarks show is that:

  • There is a performance gain (and a significant one)
  • The performance gains grow with the size of segments (which makes sense, given that what we optimise here is how fast we find the next delimiter)
    • At lower segment sizes, the performance gain is near zero
    • At higher segment sizes, the performance gain can be up to 90%
  • Performance gains are much lower (or even inexistent) when the delimiter (searched character) is "not well supported"
    • Splits on ASCII show significant improvements
    • Same goes for Latin
    • CJK shows or doesn't show an improvement depending on the delimiter (due to this), in the "happy case" the improvement is comparable to ASCII/Latin
    • Emoji delimiters show no improvement
  • One caveat to all that is that the absolute numbers are already not exactly huge today. However, when splitting many strings in a loop, they could quickly add up.

The raw results are attached here.

Details

Note The Change column is how much time the optimised version takes compared to the baseline, so less is better.

Linux

Benchmark                           Optimised           Baseline            Change
----------------------------------------------------------------------------------
split_ascii_size100_seg2            1.22 us +- 0.02 us  1.35 us +- 0.03 us  -9.6%
split_ascii_size100_seg10           421 ns +- 6 ns      497 ns +- 4 ns      -15.3%
split_ascii_size100_seg50           212 ns +- 1 ns      242 ns +- 12 ns     -12.4%

split_ascii_size1000_seg2           11.1 us +- 0.2 us   12.3 us +- 0.2 us   -9.8%
split_ascii_size1000_seg10          3.30 us +- 0.05 us  4.08 us +- 0.05 us  -19.1%
split_ascii_size1000_seg50          768 ns +- 9 ns      1.22 us +- 0.03 us  -37.0%
split_ascii_size1000_seg250         289 ns +- 5 ns      611 ns +- 8 ns      -52.7%
split_ascii_size1000_seg500         245 ns +- 9 ns      429 ns +- 3 ns      -42.9%
split_ascii_size1000_seg1000        191 ns +- 5 ns      529 ns +- 17 ns     -63.9%

split_ascii_size10000_seg2          105 us +- 4 us      116 us +- 3 us      -9.5%
split_ascii_size10000_seg10         28.4 us +- 0.4 us   36.8 us +- 0.3 us   -22.8%
split_ascii_size10000_seg50         7.05 us +- 0.09 us  11.7 us +- 0.1 us   -39.7%
split_ascii_size10000_seg250        1.74 us +- 0.05 us  5.95 us +- 0.20 us  -70.8%
split_ascii_size10000_seg500        1.84 us +- 0.06 us  5.61 us +- 0.08 us  -67.2%
split_ascii_size10000_seg1000       1.27 us +- 0.01 us  4.43 us +- 0.04 us  -71.3%
split_ascii_size10000_seg10000      255 ns +- 4 ns      3.52 us +- 0.02 us  -92.8%

split_ascii_size100000_seg2         1.54 ms +- 0.03 ms  1.65 ms +- 0.03 ms  -6.7%
split_ascii_size100000_seg10        290 us +- 21 us     361 us +- 4 us      -19.7%
split_ascii_size100000_seg50        65.8 us +- 1.1 us   110 us +- 1 us      -40.2%
split_ascii_size100000_seg250       18.0 us +- 0.2 us   57.7 us +- 1.8 us   -68.8%
split_ascii_size100000_seg500       19.3 us +- 0.3 us   56.4 us +- 0.7 us   -65.8%
split_ascii_size100000_seg1000      12.7 us +- 0.4 us   47.3 us +- 0.3 us   -73.2%
split_ascii_size100000_seg10000     4.26 us +- 0.24 us  33.6 us +- 1.0 us   -87.3%
split_ascii_size100000_seg25000     3.51 us +- 0.18 us  28.2 us +- 1.2 us   -87.6%
split_ascii_size100000_seg100000    1.35 us +- 0.02 us  33.2 us +- 0.9 us   -95.9%

split_ascii_size1000000_seg2        18.8 ms +- 0.5 ms   20.7 ms +- 0.7 ms   -9.2%
split_ascii_size1000000_seg10       5.40 ms +- 0.08 ms  6.12 ms +- 0.06 ms  -11.8%
split_ascii_size1000000_seg50       1.19 ms +- 0.04 ms  1.59 ms +- 0.01 ms  -25.2%
split_ascii_size1000000_seg250      677 us +- 10 us     1.07 ms +- 0.01 ms  -36.7%
split_ascii_size1000000_seg500      200 us +- 2 us      554 us +- 2 us      -63.9%
split_ascii_size1000000_seg1000     143 us +- 2 us      459 us +- 9 us      -68.8%
split_ascii_size1000000_seg10000    71.9 us +- 2.6 us   375 us +- 10 us     -80.8%
split_ascii_size1000000_seg25000    77.2 us +- 2.2 us   369 us +- 4 us      -79.1%
split_ascii_size1000000_seg100000   61.6 us +- 1.6 us   346 us +- 13 us     -82.2%
split_ascii_size1000000_seg250000   45.0 us +- 1.0 us   284 us +- 5 us      -84.2%
split_ascii_size1000000_seg500000   27.5 us +- 0.8 us   189 us +- 5 us      -85.4%
split_ascii_size1000000_seg999999   73.7 us +- 3.4 us   387 us +- 2 us      -81.0%

split_latin_size100_seg2            1.53 us +- 0.07 us  1.62 us +- 0.01 us  -5.6%
split_latin_size100_seg10           499 ns +- 13 ns     570 ns +- 12 ns     -12.5%
split_latin_size100_seg50           217 ns +- 3 ns      255 ns +- 4 ns      -14.9%

split_latin_size1000_seg2           14.4 us +- 0.2 us   15.2 us +- 0.2 us   -5.3%
split_latin_size1000_seg10          4.13 us +- 0.11 us  4.81 us +- 0.10 us  -14.1%
split_latin_size1000_seg50          940 ns +- 12 ns     1.62 us +- 0.10 us  -42.0%
split_latin_size1000_seg250         307 ns +- 2 ns      835 ns +- 12 ns     -63.2%
split_latin_size1000_seg500         247 ns +- 1 ns      584 ns +- 12 ns     -57.7%
split_latin_size1000_seg1000        190 ns +- 4 ns      844 ns +- 7 ns      -77.5%

split_latin_size10000_seg2          138 us +- 3 us      145 us +- 4 us      -4.8%
split_latin_size10000_seg10         37.6 us +- 0.7 us   44.0 us +- 0.5 us   -14.5%
split_latin_size10000_seg50         9.51 us +- 0.59 us  15.4 us +- 0.5 us   -38.2%
split_latin_size10000_seg250        2.07 us +- 0.07 us  8.76 us +- 0.11 us  -76.4%
split_latin_size10000_seg500        1.92 us +- 0.02 us  8.46 us +- 0.13 us  -77.3%
split_latin_size10000_seg1000       1.38 us +- 0.03 us  7.30 us +- 0.10 us  -81.1%
split_latin_size10000_seg10000      255 ns +- 2 ns      6.90 us +- 0.32 us  -96.3%

split_latin_size100000_seg2         2.26 ms +- 0.21 ms  2.26 ms +- 0.27 ms  ~same
split_latin_size100000_seg10        371 us +- 5 us      435 us +- 11 us     -14.7%
split_latin_size100000_seg50        86.8 us +- 1.2 us   146 us +- 5 us      -40.5%
split_latin_size100000_seg250       20.9 us +- 0.4 us   87.0 us +- 0.7 us   -76.0%
split_latin_size100000_seg500       21.4 us +- 0.6 us   87.5 us +- 1.0 us   -75.5%
split_latin_size100000_seg1000      13.7 us +- 0.5 us   78.8 us +- 0.4 us   -82.6%
split_latin_size100000_seg10000     4.27 us +- 0.19 us  63.0 us +- 1.1 us   -93.2%
split_latin_size100000_seg25000     3.60 us +- 0.23 us  52.6 us +- 0.9 us   -93.2%
split_latin_size100000_seg100000    1.34 us +- 0.01 us  66.3 us +- 2.6 us   -98.0%

split_latin_size1000000_seg2        26.0 ms +- 0.8 ms   26.5 ms +- 0.7 ms   -1.9%
split_latin_size1000000_seg10       7.18 ms +- 0.23 ms  8.05 ms +- 0.62 ms  -10.8%
split_latin_size1000000_seg50       1.88 ms +- 0.01 ms  2.47 ms +- 0.09 ms  -23.9%
split_latin_size1000000_seg250      709 us +- 8 us      1.37 ms +- 0.01 ms  -48.2%
split_latin_size1000000_seg500      217 us +- 5 us      869 us +- 11 us     -75.0%
split_latin_size1000000_seg1000     149 us +- 2 us      781 us +- 5 us      -80.9%
split_latin_size1000000_seg10000    74.9 us +- 3.7 us   700 us +- 7 us      -89.3%
split_latin_size1000000_seg25000    76.7 us +- 2.4 us   690 us +- 3 us      -88.9%
split_latin_size1000000_seg100000   62.1 us +- 2.2 us   639 us +- 14 us     -90.3%
split_latin_size1000000_seg250000   45.3 us +- 1.3 us   529 us +- 3 us      -91.4%
split_latin_size1000000_seg500000   27.5 us +- 0.9 us   357 us +- 11 us     -92.3%
split_latin_size1000000_seg999999   74.8 us +- 3.2 us   757 us +- 56 us     -90.1%

split_cjk_size100_seg2              1.46 us +- 0.02 us  1.54 us +- 0.04 us  -5.2%
split_cjk_size100_seg10             519 ns +- 5 ns      605 ns +- 4 ns      -14.2%
split_cjk_size100_seg50             248 ns +- 4 ns      278 ns +- 1 ns      -10.8%

split_cjk_size1000_seg2             13.8 us +- 0.5 us   14.2 us +- 0.1 us   -2.8%
split_cjk_size1000_seg10            4.53 us +- 0.26 us  5.03 us +- 0.12 us  -9.9%
split_cjk_size1000_seg50            1.52 us +- 0.04 us  1.99 us +- 0.05 us  -23.6%
split_cjk_size1000_seg250           880 ns +- 19 ns     1.17 us +- 0.03 us  -24.8%
split_cjk_size1000_seg500           630 ns +- 5 ns      807 ns +- 9 ns      -21.9%
split_cjk_size1000_seg1000          842 ns +- 18 ns     1.18 us +- 0.01 us  -28.6%

split_cjk_size10000_seg2            130 us +- 1 us      136 us +- 8 us      -4.4%
split_cjk_size10000_seg10           41.5 us +- 0.3 us   46.1 us +- 1.5 us   -10.0%
split_cjk_size10000_seg50           14.6 us +- 0.5 us   19.4 us +- 0.6 us   -24.7%
split_cjk_size10000_seg250          10.6 us +- 0.2 us   14.1 us +- 0.1 us   -24.8%
split_cjk_size10000_seg500          9.07 us +- 0.06 us  12.3 us +- 0.1 us   -26.3%
split_cjk_size10000_seg1000         7.44 us +- 0.15 us  10.5 us +- 0.2 us   -29.1%
split_cjk_size10000_seg10000        6.80 us +- 0.12 us  10.1 us +- 0.0 us   -32.7%

split_cjk_size100000_seg2           2.18 ms +- 0.18 ms  2.21 ms +- 0.20 ms  -1.4%
split_cjk_size100000_seg10          410 us +- 17 us     443 us +- 4 us      -7.4%
split_cjk_size100000_seg50          138 us +- 4 us      188 us +- 12 us     -26.6%
split_cjk_size100000_seg250         105 us +- 1 us      145 us +- 6 us      -27.6%
split_cjk_size100000_seg500         92.8 us +- 0.7 us   127 us +- 1 us      -26.9%
split_cjk_size100000_seg1000        80.5 us +- 1.7 us   113 us +- 1 us      -28.8%
split_cjk_size100000_seg10000       66.5 us +- 0.3 us   96.1 us +- 0.6 us   -30.8%
split_cjk_size100000_seg25000       55.8 us +- 1.1 us   80.6 us +- 1.6 us   -30.8%
split_cjk_size100000_seg100000      65.7 us +- 0.4 us   98.4 us +- 0.5 us   -33.2%

split_cjk_size1000000_seg2          25.7 ms +- 1.8 ms   25.3 ms +- 0.9 ms   +1.6%
split_cjk_size1000000_seg10         7.40 ms +- 0.30 ms  7.86 ms +- 0.31 ms  -5.9%
split_cjk_size1000000_seg50         2.76 ms +- 0.25 ms  3.17 ms +- 0.22 ms  -12.9%
split_cjk_size1000000_seg250        1.05 ms +- 0.01 ms  1.42 ms +- 0.06 ms  -26.1%
split_cjk_size1000000_seg500        949 us +- 7 us      1.32 ms +- 0.07 ms  -28.1%
split_cjk_size1000000_seg1000       804 us +- 14 us     1.13 ms +- 0.01 ms  -28.8%
split_cjk_size1000000_seg10000      748 us +- 15 us     1.07 ms +- 0.01 ms  -30.1%
split_cjk_size1000000_seg25000      741 us +- 4 us      1.06 ms +- 0.00 ms  -30.1%
split_cjk_size1000000_seg100000     682 us +- 7 us      976 us +- 3 us      -30.1%
split_cjk_size1000000_seg250000     572 us +- 22 us     812 us +- 4 us      -29.6%
split_cjk_size1000000_seg500000     390 us +- 21 us     550 us +- 3 us      -29.1%
split_cjk_size1000000_seg999999     815 us +- 4 us      1.14 ms +- 0.01 ms  -28.5%

split_cjk_nz_size100_seg2           1.49 us +- 0.02 us  1.54 us +- 0.04 us  -3.2%
split_cjk_nz_size100_seg10          499 ns +- 8 ns      609 ns +- 11 ns     -18.1%
split_cjk_nz_size100_seg50          223 ns +- 5 ns      282 ns +- 10 ns     -20.9%

split_cjk_nz_size1000_seg2          14.0 us +- 0.1 us   14.3 us +- 0.3 us   -2.1%
split_cjk_nz_size1000_seg10         3.95 us +- 0.05 us  5.01 us +- 0.04 us  -21.2%
split_cjk_nz_size1000_seg50         951 ns +- 17 ns     1.98 us +- 0.08 us  -52.0%
split_cjk_nz_size1000_seg250        376 ns +- 10 ns     1.16 us +- 0.01 us  -67.6%
split_cjk_nz_size1000_seg500        312 ns +- 11 ns     814 ns +- 33 ns     -61.7%
split_cjk_nz_size1000_seg1000       203 ns +- 12 ns     1.18 us +- 0.03 us  -82.8%

split_cjk_nz_size10000_seg2         132 us +- 3 us      134 us +- 1 us      -1.5%
split_cjk_nz_size10000_seg10        36.0 us +- 1.2 us   45.4 us +- 0.5 us   -20.7%
split_cjk_nz_size10000_seg50        9.05 us +- 0.07 us  19.2 us +- 0.4 us   -52.9%
split_cjk_nz_size10000_seg250       3.93 us +- 0.03 us  14.6 us +- 1.1 us   -73.1%
split_cjk_nz_size10000_seg500       2.70 us +- 0.02 us  12.3 us +- 0.1 us   -78.0%
split_cjk_nz_size10000_seg1000      1.65 us +- 0.02 us  10.4 us +- 0.1 us   -84.1%
split_cjk_nz_size10000_seg10000     344 ns +- 5 ns      10.1 us +- 0.1 us   -96.6%

split_cjk_nz_size100000_seg2        2.18 ms +- 0.20 ms  2.18 ms +- 0.21 ms  ~same
split_cjk_nz_size100000_seg10       347 us +- 5 us      444 us +- 14 us     -21.8%
split_cjk_nz_size100000_seg50       87.6 us +- 5.6 us   186 us +- 8 us      -52.9%
split_cjk_nz_size100000_seg250      41.0 us +- 1.0 us   141 us +- 1 us      -70.9%
split_cjk_nz_size100000_seg500      29.0 us +- 0.8 us   127 us +- 3 us      -77.2%
split_cjk_nz_size100000_seg1000     17.7 us +- 0.3 us   113 us +- 1 us      -84.3%
split_cjk_nz_size100000_seg10000    8.53 us +- 0.52 us  99.7 us +- 5.7 us   -91.4%
split_cjk_nz_size100000_seg25000    7.04 us +- 0.46 us  80.5 us +- 1.3 us   -91.3%
split_cjk_nz_size100000_seg100000   2.50 us +- 0.05 us  98.4 us +- 0.3 us   -97.5%

split_cjk_nz_size1000000_seg2       25.1 ms +- 0.9 ms   25.5 ms +- 0.6 ms   -1.6%
split_cjk_nz_size1000000_seg10      7.01 ms +- 0.17 ms  7.82 ms +- 0.21 ms  -10.4%
split_cjk_nz_size1000000_seg50      2.26 ms +- 0.16 ms  3.24 ms +- 0.16 ms  -30.2%
split_cjk_nz_size1000000_seg250     440 us +- 27 us     1.41 ms +- 0.02 ms  -68.8%
split_cjk_nz_size1000000_seg500     353 us +- 10 us     1.29 ms +- 0.01 ms  -72.6%
split_cjk_nz_size1000000_seg1000    235 us +- 5 us      1.13 ms +- 0.01 ms  -79.2%
split_cjk_nz_size1000000_seg10000   193 us +- 3 us      1.07 ms +- 0.00 ms  -82.0%
split_cjk_nz_size1000000_seg25000   197 us +- 3 us      1.07 ms +- 0.01 ms  -81.6%
split_cjk_nz_size1000000_seg100000  162 us +- 3 us      979 us +- 8 us      -83.5%
split_cjk_nz_size1000000_seg250000  122 us +- 3 us      813 us +- 4 us      -85.0%
split_cjk_nz_size1000000_seg500000  72.9 us +- 2.8 us   552 us +- 6 us      -86.8%
split_cjk_nz_size1000000_seg999999  234 us +- 9 us      1.14 ms +- 0.00 ms  -79.5%

split_emoji_size100_seg2            1.69 us +- 0.06 us  1.83 us +- 0.02 us  -7.7%
split_emoji_size100_seg10           615 ns +- 38 ns     674 ns +- 8 ns      -8.8%
split_emoji_size100_seg50           249 ns +- 9 ns      300 ns +- 5 ns      -17.0%

split_emoji_size1000_seg2           15.6 us +- 0.7 us   17.1 us +- 0.5 us   -8.8%
split_emoji_size1000_seg10          4.92 us +- 0.12 us  5.67 us +- 0.10 us  -13.2%
split_emoji_size1000_seg50          1.22 us +- 0.01 us  2.10 us +- 0.09 us  -41.9%
split_emoji_size1000_seg250         620 ns +- 11 ns     1.39 us +- 0.04 us  -55.4%
split_emoji_size1000_seg500         331 ns +- 7 ns      816 ns +- 5 ns      -59.4%
split_emoji_size1000_seg1000        231 ns +- 2 ns      1.19 us +- 0.00 us  -80.6%

split_emoji_size10000_seg2          148 us +- 3 us      164 us +- 1 us      -9.8%
split_emoji_size10000_seg10         45.9 us +- 0.5 us   52.1 us +- 0.3 us   -11.9%
split_emoji_size10000_seg50         11.9 us +- 0.1 us   20.2 us +- 0.2 us   -41.1%
split_emoji_size10000_seg250        5.82 us +- 0.15 us  15.7 us +- 0.1 us   -62.9%
split_emoji_size10000_seg500        3.45 us +- 0.05 us  12.7 us +- 0.1 us   -72.8%
split_emoji_size10000_seg1000       2.21 us +- 0.03 us  10.7 us +- 0.1 us   -79.3%
split_emoji_size10000_seg10000      669 ns +- 12 ns     10.6 us +- 0.8 us   -93.7%

split_emoji_size100000_seg2         2.53 ms +- 0.02 ms  2.62 ms +- 0.12 ms  -3.4%
split_emoji_size100000_seg10        457 us +- 4 us      514 us +- 6 us      -11.1%
split_emoji_size100000_seg50        115 us +- 1 us      194 us +- 1 us      -40.7%
split_emoji_size100000_seg250       64.5 us +- 1.8 us   159 us +- 3 us      -59.4%
split_emoji_size100000_seg500       35.4 us +- 1.2 us   128 us +- 0 us      -72.3%
split_emoji_size100000_seg1000      26.7 us +- 1.2 us   118 us +- 1 us      -77.4%
split_emoji_size100000_seg10000     19.1 us +- 1.0 us   105 us +- 2 us      -81.8%
split_emoji_size100000_seg25000     15.0 us +- 1.1 us   87.3 us +- 1.4 us   -82.8%
split_emoji_size100000_seg100000    4.63 us +- 0.04 us  98.8 us +- 2.1 us   -95.3%

split_emoji_size1000000_seg2        31.4 ms +- 0.7 ms   32.7 ms +- 0.8 ms   -4.0%
split_emoji_size1000000_seg10       9.30 ms +- 0.18 ms  10.1 ms +- 0.4 ms   -7.9%
split_emoji_size1000000_seg50       3.68 ms +- 0.09 ms  4.47 ms +- 0.04 ms  -17.7%
split_emoji_size1000000_seg250      748 us +- 12 us     1.62 ms +- 0.03 ms  -53.8%
split_emoji_size1000000_seg500      546 us +- 11 us     1.36 ms +- 0.01 ms  -59.9%
split_emoji_size1000000_seg1000     412 us +- 6 us      1.22 ms +- 0.01 ms  -66.2%
split_emoji_size1000000_seg10000    429 us +- 4 us      1.19 ms +- 0.01 ms  -63.9%
split_emoji_size1000000_seg25000    382 us +- 19 us     1.17 ms +- 0.01 ms  -67.4%
split_emoji_size1000000_seg100000   338 us +- 4 us      1.07 ms +- 0.01 ms  -68.4%
split_emoji_size1000000_seg250000   314 us +- 4 us      924 us +- 6 us      -66.0%
split_emoji_size1000000_seg500000   237 us +- 2 us      652 us +- 6 us      -63.7%
split_emoji_size1000000_seg999999   869 us +- 6 us      1.77 ms +- 0.14 ms  -50.9%

macOS

Benchmark                           Optimised           Baseline            Change
----------------------------------------------------------------------------------
split_ascii_size100_seg2            435 ns +- 28 ns     403 ns +- 10 ns     +7.9%
split_ascii_size100_seg10           145 ns +- 3 ns      150 ns +- 8 ns      -3.3%
split_ascii_size100_seg50           59.8 ns +- 3.5 ns   70.2 ns +- 0.5 ns   -14.8%

split_ascii_size1000_seg2           4.50 us +- 0.15 us  3.86 us +- 0.05 us  +16.6%
split_ascii_size1000_seg10          1.15 us +- 0.01 us  1.35 us +- 0.02 us  -14.8%
split_ascii_size1000_seg50          291 ns +- 6 ns      573 ns +- 34 ns     -49.2%
split_ascii_size1000_seg250         92.6 ns +- 0.6 ns   321 ns +- 10 ns     -71.2%
split_ascii_size1000_seg500         94.3 ns +- 1.5 ns   233 ns +- 5 ns      -59.5%
split_ascii_size1000_seg1000        64.4 ns +- 0.6 ns   316 ns +- 3 ns      -79.6%

split_ascii_size10000_seg2          45.6 us +- 0.3 us   37.0 us +- 0.8 us   +23.2%
split_ascii_size10000_seg10         11.4 us +- 0.1 us   12.1 us +- 0.2 us   -5.8%
split_ascii_size10000_seg50         2.86 us +- 0.03 us  5.79 us +- 0.12 us  -50.6%
split_ascii_size10000_seg250        684 ns +- 9 ns      3.61 us +- 0.05 us  -81.1%
split_ascii_size10000_seg500        846 ns +- 16 ns     3.44 us +- 0.30 us  -75.4%
split_ascii_size10000_seg1000       539 ns +- 11 ns     2.80 us +- 0.02 us  -80.8%
split_ascii_size10000_seg10000      231 ns +- 2 ns      2.63 us +- 0.02 us  -91.2%

split_ascii_size100000_seg2         495 us +- 5 us      413 us +- 6 us      +19.9%
split_ascii_size100000_seg10        115 us +- 3 us      129 us +- 1 us      -10.9%
split_ascii_size100000_seg50        27.8 us +- 0.3 us   54.6 us +- 1.5 us   -49.1%
split_ascii_size100000_seg250       8.16 us +- 0.07 us  37.4 us +- 0.8 us   -78.2%
split_ascii_size100000_seg500       10.9 us +- 0.2 us   37.3 us +- 0.9 us   -70.8%
split_ascii_size100000_seg1000      7.30 us +- 0.18 us  32.2 us +- 0.1 us   -77.3%
split_ascii_size100000_seg10000     2.99 us +- 0.04 us  24.8 us +- 0.4 us   -87.9%
split_ascii_size100000_seg25000     2.14 us +- 0.02 us  20.3 us +- 0.1 us   -89.5%
split_ascii_size100000_seg100000    1.69 us +- 0.01 us  25.8 us +- 0.2 us   -93.4%

split_ascii_size1000000_seg2        5.26 ms +- 0.19 ms  4.47 ms +- 0.05 ms  +17.7%
split_ascii_size1000000_seg10       1.43 ms +- 0.01 ms  1.56 ms +- 0.06 ms  -8.3%
split_ascii_size1000000_seg50       329 us +- 3 us      628 us +- 14 us     -47.6%
split_ascii_size1000000_seg250      77.4 us +- 0.5 us   373 us +- 6 us      -79.2%
split_ascii_size1000000_seg500      108 us +- 1 us      380 us +- 7 us      -71.6%
split_ascii_size1000000_seg1000     79.8 us +- 0.7 us   337 us +- 5 us      -76.3%
split_ascii_size1000000_seg10000    34.3 us +- 0.3 us   274 us +- 2 us      -87.5%
split_ascii_size1000000_seg25000    27.6 us +- 0.3 us   264 us +- 1 us      -89.5%
split_ascii_size1000000_seg100000   28.6 us +- 0.3 us   248 us +- 2 us      -88.5%
split_ascii_size1000000_seg250000   24.0 us +- 0.3 us   206 us +- 2 us      -88.3%
split_ascii_size1000000_seg500000   15.3 us +- 0.2 us   136 us +- 3 us      -88.8%
split_ascii_size1000000_seg999999   29.8 us +- 0.5 us   271 us +- 2 us      -89.0%

split_latin_size100_seg2            484 ns +- 6 ns      444 ns +- 8 ns      +9.0%
split_latin_size100_seg10           155 ns +- 6 ns      159 ns +- 2 ns      -2.5%
split_latin_size100_seg50           60.1 ns +- 0.6 ns   70.6 ns +- 1.4 ns   -14.9%

split_latin_size1000_seg2           5.09 us +- 0.06 us  4.58 us +- 0.13 us  +11.1%
split_latin_size1000_seg10          1.42 us +- 0.01 us  1.49 us +- 0.01 us  -4.7%
split_latin_size1000_seg50          336 ns +- 6 ns      573 ns +- 13 ns     -41.4%
split_latin_size1000_seg250         102 ns +- 1 ns      336 ns +- 7 ns      -69.6%
split_latin_size1000_seg500         95.3 ns +- 1.2 ns   232 ns +- 3 ns      -58.9%
split_latin_size1000_seg1000        66.0 ns +- 0.7 ns   327 ns +- 2 ns      -79.8%

split_latin_size10000_seg2          51.2 us +- 0.4 us   47.4 us +- 0.5 us   +8.0%
split_latin_size10000_seg10         13.8 us +- 0.6 us   14.3 us +- 0.5 us   -3.5%
split_latin_size10000_seg50         3.54 us +- 0.06 us  5.94 us +- 0.07 us  -40.4%
split_latin_size10000_seg250        844 ns +- 10 ns     3.80 us +- 0.03 us  -77.8%
split_latin_size10000_seg500        919 ns +- 10 ns     3.50 us +- 0.02 us  -73.7%
split_latin_size10000_seg1000       565 ns +- 9 ns      2.88 us +- 0.03 us  -80.4%
split_latin_size10000_seg10000      234 ns +- 8 ns      2.79 us +- 0.04 us  -91.6%

split_latin_size100000_seg2         574 us +- 5 us      541 us +- 5 us      +6.1%
split_latin_size100000_seg10        136 us +- 1 us      145 us +- 1 us      -6.2%
split_latin_size100000_seg50        35.3 us +- 0.3 us   59.9 us +- 1.9 us   -41.1%
split_latin_size100000_seg250       9.38 us +- 0.19 us  39.1 us +- 0.2 us   -76.0%
split_latin_size100000_seg500       11.7 us +- 0.1 us   38.5 us +- 0.2 us   -69.6%
split_latin_size100000_seg1000      7.48 us +- 0.04 us  32.9 us +- 0.2 us   -77.3%
split_latin_size100000_seg10000     3.01 us +- 0.03 us  25.3 us +- 0.1 us   -88.1%
split_latin_size100000_seg25000     2.15 us +- 0.02 us  20.8 us +- 0.4 us   -89.7%
split_latin_size100000_seg100000    1.69 us +- 0.01 us  27.5 us +- 0.4 us   -93.9%

split_latin_size1000000_seg2        6.49 ms +- 0.07 ms  5.99 ms +- 0.07 ms  +8.3%
split_latin_size1000000_seg10       1.75 ms +- 0.02 ms  1.85 ms +- 0.02 ms  -5.4%
split_latin_size1000000_seg50       413 us +- 3 us      666 us +- 20 us     -38.0%
split_latin_size1000000_seg250      90.0 us +- 2.0 us   390 us +- 9 us      -76.9%
split_latin_size1000000_seg500      118 us +- 1 us      390 us +- 2 us      -69.7%
split_latin_size1000000_seg1000     83.5 us +- 0.6 us   343 us +- 6 us      -75.7%
split_latin_size1000000_seg10000    34.7 us +- 0.3 us   281 us +- 2 us      -87.7%
split_latin_size1000000_seg25000    27.8 us +- 0.2 us   270 us +- 2 us      -89.7%
split_latin_size1000000_seg100000   28.8 us +- 0.3 us   251 us +- 2 us      -88.5%
split_latin_size1000000_seg250000   24.0 us +- 0.3 us   211 us +- 1 us      -88.6%
split_latin_size1000000_seg500000   15.3 us +- 0.3 us   141 us +- 6 us      -89.1%
split_latin_size1000000_seg999999   29.7 us +- 0.4 us   279 us +- 2 us      -89.4%

split_cjk_size100_seg2              577 ns +- 15 ns     549 ns +- 14 ns     +5.1%
split_cjk_size100_seg10             212 ns +- 3 ns      195 ns +- 3 ns      +8.7%
split_cjk_size100_seg50             74.5 ns +- 0.4 ns   73.5 ns +- 2.7 ns   +1.4%

split_cjk_size1000_seg2             5.62 us +- 0.17 us  5.29 us +- 0.07 us  +6.2%
split_cjk_size1000_seg10            1.87 us +- 0.06 us  1.81 us +- 0.05 us  +3.3%
split_cjk_size1000_seg50            590 ns +- 6 ns      592 ns +- 11 ns     ~same
split_cjk_size1000_seg250           429 ns +- 13 ns     413 ns +- 10 ns     +3.9%
split_cjk_size1000_seg500           249 ns +- 4 ns      260 ns +- 4 ns      -4.2%
split_cjk_size1000_seg1000          322 ns +- 9 ns      330 ns +- 2 ns      -2.4%

split_cjk_size10000_seg2            54.0 us +- 0.5 us   51.6 us +- 0.6 us   +4.7%
split_cjk_size10000_seg10           17.5 us +- 0.2 us   16.7 us +- 0.2 us   +4.8%
split_cjk_size10000_seg50           6.03 us +- 0.06 us  6.07 us +- 0.10 us  ~same
split_cjk_size10000_seg250          4.77 us +- 0.10 us  4.86 us +- 0.04 us  -1.9%
split_cjk_size10000_seg500          3.65 us +- 0.03 us  3.73 us +- 0.04 us  -2.1%
split_cjk_size10000_seg1000         3.00 us +- 0.02 us  3.06 us +- 0.03 us  -2.0%
split_cjk_size10000_seg10000        2.63 us +- 0.02 us  2.80 us +- 0.07 us  -6.1%

split_cjk_size100000_seg2           587 us +- 17 us     568 us +- 7 us      +3.3%
split_cjk_size100000_seg10          178 us +- 3 us      168 us +- 3 us      +6.0%
split_cjk_size100000_seg50          59.2 us +- 0.6 us   59.3 us +- 0.6 us   ~same
split_cjk_size100000_seg250         51.8 us +- 0.4 us   52.7 us +- 1.2 us   -1.7%
split_cjk_size100000_seg500         41.4 us +- 0.4 us   41.9 us +- 0.3 us   -1.2%
split_cjk_size100000_seg1000        36.5 us +- 0.9 us   37.0 us +- 0.2 us   -1.4%
split_cjk_size100000_seg10000       25.7 us +- 0.2 us   26.2 us +- 0.2 us   -1.9%
split_cjk_size100000_seg25000       22.6 us +- 0.2 us   23.1 us +- 0.4 us   -2.2%
split_cjk_size100000_seg100000      25.8 us +- 0.1 us   27.8 us +- 0.5 us   -7.2%

split_cjk_size1000000_seg2          6.57 ms +- 0.10 ms  6.38 ms +- 0.11 ms  +3.0%
split_cjk_size1000000_seg10         2.18 ms +- 0.04 ms  2.11 ms +- 0.05 ms  +3.3%
split_cjk_size1000000_seg50         708 us +- 6 us      711 us +- 10 us     ~same
split_cjk_size1000000_seg250        514 us +- 10 us     520 us +- 2 us      -1.2%
split_cjk_size1000000_seg500        414 us +- 2 us      420 us +- 3 us      -1.4%
split_cjk_size1000000_seg1000       374 us +- 5 us      381 us +- 4 us      -1.8%
split_cjk_size1000000_seg10000      285 us +- 2 us      290 us +- 2 us      -1.7%
split_cjk_size1000000_seg25000      294 us +- 2 us      302 us +- 8 us      -2.6%
split_cjk_size1000000_seg100000     262 us +- 2 us      268 us +- 2 us      -2.2%
split_cjk_size1000000_seg250000     216 us +- 8 us      220 us +- 1 us      -1.8%
split_cjk_size1000000_seg500000     143 us +- 1 us      145 us +- 1 us      -1.4%
split_cjk_size1000000_seg999999     284 us +- 2 us      289 us +- 2 us      -1.7%

split_cjk_nz_size100_seg2           597 ns +- 7 ns      552 ns +- 12 ns     +8.2%
split_cjk_nz_size100_seg10          197 ns +- 3 ns      196 ns +- 3 ns      ~same
split_cjk_nz_size100_seg50          64.8 ns +- 1.8 ns   88.6 ns +- 17.2 ns  -26.9%

split_cjk_nz_size1000_seg2          6.41 us +- 0.08 us  5.34 us +- 0.12 us  +20.0%
split_cjk_nz_size1000_seg10         1.86 us +- 0.02 us  1.80 us +- 0.03 us  +3.3%
split_cjk_nz_size1000_seg50         397 ns +- 4 ns      593 ns +- 7 ns      -33.1%
split_cjk_nz_size1000_seg250        212 ns +- 9 ns      413 ns +- 11 ns     -48.7%
split_cjk_nz_size1000_seg500        107 ns +- 1 ns      260 ns +- 2 ns      -58.8%
split_cjk_nz_size1000_seg1000       94.0 ns +- 0.9 ns   329 ns +- 2 ns      -71.4%

split_cjk_nz_size10000_seg2         63.9 us +- 0.7 us   51.8 us +- 0.7 us   +23.4%
split_cjk_nz_size10000_seg10        18.0 us +- 0.3 us   16.7 us +- 0.3 us   +7.8%
split_cjk_nz_size10000_seg50        4.15 us +- 0.08 us  6.04 us +- 0.05 us  -31.3%
split_cjk_nz_size10000_seg250       2.08 us +- 0.03 us  4.84 us +- 0.03 us  -57.0%
split_cjk_nz_size10000_seg500       1.32 us +- 0.03 us  3.72 us +- 0.02 us  -64.5%
split_cjk_nz_size10000_seg1000      922 ns +- 9 ns      3.07 us +- 0.02 us  -70.0%
split_cjk_nz_size10000_seg10000     393 ns +- 3 ns      2.80 us +- 0.02 us  -86.0%

split_cjk_nz_size100000_seg2        697 us +- 19 us     567 us +- 14 us     +22.9%
split_cjk_nz_size100000_seg10       178 us +- 2 us      168 us +- 2 us      +6.0%
split_cjk_nz_size100000_seg50       40.6 us +- 0.4 us   59.2 us +- 0.7 us   -31.4%
split_cjk_nz_size100000_seg250      25.2 us +- 0.2 us   52.6 us +- 0.3 us   -52.1%
split_cjk_nz_size100000_seg500      16.9 us +- 0.1 us   42.0 us +- 0.9 us   -59.8%
split_cjk_nz_size100000_seg1000     13.5 us +- 0.2 us   37.0 us +- 0.2 us   -63.5%
split_cjk_nz_size100000_seg10000    5.35 us +- 0.06 us  26.1 us +- 0.2 us   -79.5%
split_cjk_nz_size100000_seg25000    5.75 us +- 0.07 us  23.1 us +- 0.2 us   -75.1%
split_cjk_nz_size100000_seg100000   3.39 us +- 0.08 us  27.6 us +- 0.4 us   -87.7%

split_cjk_nz_size1000000_seg2       7.77 ms +- 0.18 ms  6.43 ms +- 0.14 ms  +20.8%
split_cjk_nz_size1000000_seg10      2.18 ms +- 0.03 ms  2.13 ms +- 0.06 ms  +2.3%
split_cjk_nz_size1000000_seg50      527 us +- 5 us      711 us +- 7 us      -25.9%
split_cjk_nz_size1000000_seg250     244 us +- 2 us      521 us +- 2 us      -53.2%
split_cjk_nz_size1000000_seg500     167 us +- 1 us      420 us +- 2 us      -60.2%
split_cjk_nz_size1000000_seg1000    141 us +- 2 us      381 us +- 3 us      -63.0%
split_cjk_nz_size1000000_seg10000   61.0 us +- 0.6 us   290 us +- 2 us      -79.0%
split_cjk_nz_size1000000_seg25000   74.2 us +- 0.6 us   301 us +- 2 us      -75.3%
split_cjk_nz_size1000000_seg100000  59.0 us +- 1.2 us   268 us +- 2 us      -78.0%
split_cjk_nz_size1000000_seg250000  46.0 us +- 0.5 us   218 us +- 1 us      -78.9%
split_cjk_nz_size1000000_seg500000  29.8 us +- 0.3 us   146 us +- 1 us      -79.6%
split_cjk_nz_size1000000_seg999999  58.8 us +- 0.4 us   289 us +- 2 us      -79.7%

split_emoji_size100_seg2            591 ns +- 13 ns     560 ns +- 6 ns      +5.5%
split_emoji_size100_seg10           224 ns +- 3 ns      211 ns +- 6 ns      +6.2%
split_emoji_size100_seg50           83.5 ns +- 1.1 ns   80.3 ns +- 1.3 ns   +4.0%

split_emoji_size1000_seg2           5.98 us +- 0.11 us  5.49 us +- 0.06 us  +8.9%
split_emoji_size1000_seg10          2.02 us +- 0.04 us  1.93 us +- 0.02 us  +4.7%
split_emoji_size1000_seg50          649 ns +- 5 ns      647 ns +- 7 ns      ~same
split_emoji_size1000_seg250         423 ns +- 3 ns      422 ns +- 3 ns      ~same
split_emoji_size1000_seg500         260 ns +- 2 ns      269 ns +- 2 ns      -3.3%
split_emoji_size1000_seg1000        328 ns +- 12 ns     333 ns +- 2 ns      -1.5%

split_emoji_size10000_seg2          58.6 us +- 0.8 us   54.8 us +- 0.5 us   +6.9%
split_emoji_size10000_seg10         19.4 us +- 0.2 us   18.7 us +- 0.4 us   +3.7%
split_emoji_size10000_seg50         6.61 us +- 0.10 us  6.60 us +- 0.08 us  ~same
split_emoji_size10000_seg250        5.00 us +- 0.05 us  5.05 us +- 0.03 us  ~same
split_emoji_size10000_seg500        3.94 us +- 0.03 us  4.02 us +- 0.02 us  -2.0%
split_emoji_size10000_seg1000       3.28 us +- 0.03 us  3.31 us +- 0.03 us  ~same
split_emoji_size10000_seg10000      2.64 us +- 0.01 us  2.79 us +- 0.03 us  -5.4%

split_emoji_size100000_seg2         689 us +- 6 us      642 us +- 5 us      +7.3%
split_emoji_size100000_seg10        194 us +- 6 us      183 us +- 2 us      +6.0%
split_emoji_size100000_seg50        64.4 us +- 0.8 us   63.8 us +- 0.5 us   ~same
split_emoji_size100000_seg250       57.2 us +- 0.3 us   58.0 us +- 0.3 us   -1.4%
split_emoji_size100000_seg500       49.3 us +- 0.2 us   49.9 us +- 0.4 us   -1.2%
split_emoji_size100000_seg1000      41.7 us +- 0.9 us   42.0 us +- 0.5 us   ~same
split_emoji_size100000_seg10000     31.9 us +- 0.2 us   32.5 us +- 0.3 us   -1.8%
split_emoji_size100000_seg25000     24.1 us +- 0.2 us   24.9 us +- 0.8 us   -3.2%
split_emoji_size100000_seg100000    25.7 us +- 0.1 us   27.7 us +- 0.4 us   -7.2%

split_emoji_size1000000_seg2        7.40 ms +- 0.42 ms  7.18 ms +- 0.27 ms  +3.1%
split_emoji_size1000000_seg10       2.51 ms +- 0.07 ms  2.40 ms +- 0.03 ms  +4.6%
split_emoji_size1000000_seg50       916 us +- 16 us     929 us +- 36 us     -1.4%
split_emoji_size1000000_seg250      566 us +- 4 us      576 us +- 4 us      -1.7%
split_emoji_size1000000_seg500      495 us +- 2 us      500 us +- 4 us      ~same
split_emoji_size1000000_seg1000     427 us +- 10 us     427 us +- 3 us      ~same
split_emoji_size1000000_seg10000    353 us +- 4 us      362 us +- 14 us     -2.5%
split_emoji_size1000000_seg25000    315 us +- 2 us      319 us +- 3 us      -1.3%
split_emoji_size1000000_seg100000   287 us +- 2 us      292 us +- 2 us      -1.7%
split_emoji_size1000000_seg250000   237 us +- 13 us     239 us +- 2 us      ~same
split_emoji_size1000000_seg500000   155 us +- 1 us      158 us +- 1 us      -1.9%
split_emoji_size1000000_seg999999   309 us +- 2 us      315 us +- 2 us      -1.9%

I have also ensured the tests pass with my changes.

$ ./python.exe -m unittest Lib/test/test_str.py -v -k split 
test_additional_rsplit (Lib.test.test_str.StrTest.test_additional_rsplit) ... ok
test_additional_split (Lib.test.test_str.StrTest.test_additional_split) ... ok
test_rsplit (Lib.test.test_str.StrTest.test_rsplit) ... ok
test_split (Lib.test.test_str.StrTest.test_split) ... ok
test_splitlines (Lib.test.test_str.StrTest.test_splitlines) ... ok
test_formatter_field_name_split (Lib.test.test_str.StringModuleTest.test_formatter_field_name_split) ... ok

----------------------------------------------------------------------
Ran 6 tests in 0.001s

OK

I haven't (yet) made benchmarks for the rsplit case, but if that'd be useful, I can also make some (I just wanted to probe for interest / validity first).

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions