re2 is not only about exponential time: matching of regexes like a|b|c is O(N) i... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		kmike84 on Feb 8, 2017 \| parent \| context \| favorite \| on: Regex: badly needs fuzzing re2 is not only about exponential time: matching of regexes like a\|b\|c is O(N) in backtracking engines and O(1) in DFA-based engines like re2. It can make a big difference in practice for generated regexes - e.g. if you want to check if an URL has one of the thousand substrings in it (think adblock-like use cases). With backtracking regex or with a loop it'd be O(N) regarding the number of options, but with DFA it is O(1) regarding the number of options. I've seen 1000x speedups for similar use cases with re2 vs re from Python stdlib.

adrianN on Feb 9, 2017 [–]

How does the DFA engine match anything without looking at the whole input?

haldean on Feb 9, 2017 | [–]

I think s/he meant to say O(nm) vs O(m), n being the number of branches in the regex and m being the length of the input.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact