Peter Stuifzand

Simple Haskell regex program

I wrote a small piece of code to match a regex to a string in Haskell. The code is based on the example from Beautiful code.

match                       :: String -> String -> Bool
match ('^':reg) s           = matchhere reg s
match reg s@(x:xs)          = if matchhere reg s
                                then True
                                else matchhere reg xs

matchhere                   :: String -> String -> Bool
matchhere []         _      = True
matchhere ('$':[])   []     = True
matchhere (x:'*':xs) s      = matchstar x xs s
matchhere (r:rs)     (x:xs) | r == x    = matchhere rs xs
                            | r == '.'  = matchhere rs xs
                            | otherwise = False

matchstar                   :: Char -> String -> String -> Bool
matchstar c []  []          = True
matchstar c _   []          = False
matchstar c reg (x:xs)      = matchhere reg xs
                              || ((c == '.' || c == x) && matchstar c reg xs)

The matchstar function in this example is non-greedy, because it will try to match the character after * before it tries to match the star.

The other way this could be written is:

matchstar c reg str@(x:xs)  = matchstar c reg xs || matchhere reg str

This will first try to matchstar and only if that fails, try to match it with matchhere. A string like:

aba*ab

will never match. Because the last a will always be matched by the a*.

I also wrote this simple test suite with this code:

main = do
            plan 11
            ok $ match       "es"           "test"
            ok $ match       "test"         "test"
            ok $ match       "^test$"       "test"
            ok $ not $ match "adf"          "test"
            ok $ match       "^..$"         "aa"
            ok $ match       ".*"           "aa"
            ok $ match       "abc.*"        "abcaa"
            ok $ match       "abc.*abc"     "abcaaabc"
            ok $ match       "abc.*a"       "abcaaa"
            ok $ match       "^abc.*a"      "abcaaa"
            ok $ match       "^abc.*a$"     "abcaaa"
            return ()

ok  :: Bool -> IO ()

ok True  = do putStrLn "ok"
ok False = do putStrLn "not ok"

plan         :: Int -> IO ()
plan tests = do putStrLn $ "1.." ++ (show tests)
                return ()

Each test will print ok (if it passed) or not ok (if it didn’t). It look like TAP, because it is easy to do. I used Test::Harness 3.10, which is a perl module. Test::Harness contains a program called prove. This will run code and gather information about the TAP output. It works really nice together.

prove -e runhaskell Regex.hs

And the output is:

Regex.......ok
All tests successful.
Files=1, Tests=11,  1 wallclock secs ( 0.00 usr  0.00 sys +  0.29 cusr  0.02 csys =  0.31 CPU)
Result: PASS

All tests pass. My little test framework doesn’t implement everything in the TAP specification. The most important missing things are the test numbers and an error message when a tests fails. Maybe I will look at this sometime, but at the moment I’m with how this works.

© 2023 Peter Stuifzand