Yup, that makes a lot of sense - I didn't realize Python cached so many previous regex's, which is nice!
I did use cprofile a while back and I think at the time it showed a lot of time in the regexp matching. (I don't think I posted about it though) Honestly, once I implemented the parallelism I got less motivated to make it faster. But now I am kinda curious what the speed difference is between Rust and Python if I spend time trying to optimize both.
Rust should be a lot faster than Python if your time is mostly spent parsing the contents of those lines. Think that each Python op-code will be running extra assembly instructions, just to handle the virtual machine overhead.
I see a number of micro-optimizations that may give you a few percent more in Python.
For example, you use GameSituation as a mutable way to maintain parse state. You modify it with things like "gameSituation.outs += 1".
Mutating instance attributes has a much higher overhead in CPython than in C/C++ (and presumably Rust). You can reduce some of that overhead by telling the class which slots to have.
Consider "spam.py" containing the following:
class Foo:
def __init__(self):
self.a = 0
class Bar(Foo):
__slots__ = ("a",)
% python -m timeit -s 'import spam; x=spam.Foo()' 'x.a = 3'
5000000 loops, best of 5: 41 nsec per loop
% python -m timeit -s 'import spam; x=spam.Bar()' 'x.a = 3'
10000000 loops, best of 5: 32.4 nsec per loop
If you replace your GameSituation with a dict then you can get a little faster, but not enough to worry about.
Another micro-optimization is to reduce the number of temporary strings. Consider:
if (batterEvent.startswith('W+') or batterEvent.startswith('IW+') or batterEvent.startswith('I+')):
tempEvent = batterEvent[2:]
If you track the current offset in the string, then you can do things like:
if batterEvent[i:i+2] in ("W+", "I+") or batterEvent[i:i+3] == "IW+":
i += 2
and use the start position parameter in the re.match() calls.
(BTW, there appears to be a bug in your original code, since "IW+" is 3 letters long.)
Another BTW, you might change "for line in f.readlines()" to "for line in f". Shouldn't affect performance but should reduce your overall memory use.
In closing, character-level string processing in CPython is slow so I doubt you'll get all that much faster.
You might try pypy, but with the number of temporary strings you create, my guess is pypy still won't be that much faster. Should be easy to test though.
I did use cprofile a while back and I think at the time it showed a lot of time in the regexp matching. (I don't think I posted about it though) Honestly, once I implemented the parallelism I got less motivated to make it faster. But now I am kinda curious what the speed difference is between Rust and Python if I spend time trying to optimize both.
Thanks for the suggestions!