Check If A Number is Prime Using A Regular Expression

The implementation is using Python, but can easily be ported to other languages:

The idea is to “flatten” the number  (e.g. 3 becomes “111” and   5 becomes “11111”), and then try to group the 1’s into multiple equal groups. If we succeed to group, the number isn’t prime, otherwise it’s prime (with the exception of the number 1).

It works because the so-called “regular expression”, isn’t actually regular.


import re

def is_prime(n):
    return re.match(r'^1?$|^(11+?)\1+$', "1" * n) == None;

The first part of the regexp “^1?$” matches a string with an optional “1″, that is, it matches the empty string, and the string “1″, which are the “flat” representations of 0 and 1 respectively. This is because Zero and One are special cases, and are not prime.

The second part tries to match a string of “start of line”, then two 1’s or more ^(11+?), then it tries to find one or more times the previous match (\1+). then end of line $.
i.e. try to divide the 1’s into two or more groups of two, if failed, try two or more groups of three, etc… in general: try to divide the sticks into two or more groups of “two or more”. Which is exactly the definition of a non-prime number (a.k.a composite number).
So, if the regexp succeeds to match the pattern, the number is not prime. If it fails, the number is prime.

Note about ^(11+?):
This is almost the same as ^(11+) but, with one difference: using (+?) instead of (+) means “match minimally”, as opposed to “match greedily”.
For example, consider the string “aaaa” and the regexp ‘(a+)\1+’ the regexp will match the string, the value of \1 will be “aa”, since the a+ pattern matched the longest string which allows the regexp to match the string.
The regexp ‘(a+?)\1+’ will also match the string, but the value of \1 will be “a”, because this time (a+?) looked for the minimal match which will allow the regexp to match the string.
In our case, the use of (+?) instead of (+) does not affect correctness. The only difference is related to performance, because it tries small groups first, rather than big groups first.

Surviving sed When Replacing Slashes

Usually, searching and replacing using sed looks something like this:

sed -e ‘s/abc/yyy/g’ file.txt

Now, how would we for example modify a string of the form:

http://example.com/directory/index.html

to become

http::example.com::directory::index.html

We can use something like this:

sed -e ‘s/:\/\//::/g;s/\//::/g’ file.txt

But who said that we have to use a slash as a separator when our regular expression is full of slashes?
The following is equivalent but much cleaner:

sed -e ‘s#://#::#g;s#/#::#g’ file.txt

You can use any character as a separator.

Enjoy.