Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

re2 is not only about exponential time: matching of regexes like a|b|c is O(N) in backtracking engines and O(1) in DFA-based engines like re2. It can make a big difference in practice for generated regexes - e.g. if you want to check if an URL has one of the thousand substrings in it (think adblock-like use cases). With backtracking regex or with a loop it'd be O(N) regarding the number of options, but with DFA it is O(1) regarding the number of options. I've seen 1000x speedups for similar use cases with re2 vs re from Python stdlib.


How does the DFA engine match anything without looking at the whole input?


I think s/he meant to say O(nm) vs O(m), n being the number of branches in the regex and m being the length of the input.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: