Skip to main content

Boyer Moore Search Algorithm

I needed to find a String in a text file, so I wrote(rather hacked) a scheme imlementation of the boyer moore string search algoritm.

This is just a hack. But it is commented. What do you think?
(I decided to use this blog also as my cut-paste-source from now on.)


;; searches for string using boyer moore algorithm
(define (>>boyer-moore needle haystack)
(define needle-len (string-length needle))
(define hs-len (string-length haystack))
(define r-needle-list (reverse (string->list needle)))
;; two tables are build
;; compute the bad character shift table
;; it contains the number of chars to skip, if a character is encountered that is not the last of the search string.
;; (this table is only used after the search cursor was replaced)
(define bad-char-shift-table
(let loop
((shift 0)
(nlist r-needle-list)
(table '()))
(if (eq? nlist '())
table
(if (assv (car nlist) table)
(loop (+ 1 shift) (cdr nlist) table)
(loop (+ 1 shift) (cdr nlist) `((,(car nlist) ,shift) ,@table))))))
;; the good char table contains the number of chars to skip forwar, if a substring starting from the
;; end of a needle was matched befor a mismatch occurs
;; it contains the next possible position from the current search position where a
;; string match might end...
(define good-suffix-shift-table
;; for every reverse substring define a shift value
(let char-pattern-loop
((pattern '())
(pattern-len 0)
(nlist r-needle-list)
(table '()))
(if (eq? nlist '())
table
(char-pattern-loop
`( ,@pattern ,(car nlist))
(+ 1 pattern-len)
(cdr nlist)
`(,@table
(,pattern ,(let loop
((shift 0)
(unmatched (car nlist)))
(if (equal? (ncar pattern-len (ncdr shift r-needle-list)) (ncar (- needle-len shift) pattern))
(if (eqv? shift needle-len)
shift
(if (>= (+ shift pattern-len) needle-len)
shift
(if (eqv? (car (ncdr shift nlist)) unmatched)
;; ok nicht gefunden weiter schieben/suchen
(loop (+ 1 shift) unmatched)
shift)))
(loop (+ 1 shift) unmatched)))))))))
;; searching at a position
(define (search-needle-at index)
(letrec ((sub-hs (reverse (string->list (substring haystack index (+ needle-len index))))) ;; den kaefer erstma aufn ruecken drehen...
(first-char (car sub-hs)))
(if (eqv? first-char (car r-needle-list)) ;; first time is special
;; if the fist char matches, proceed with subpattern search
(let ((common (common-sublist sub-hs r-needle-list)))
(if (= (car common) needle-len)
0 ;; found
(cadr (assoc (cdr common) good-suffix-shift-table))))
;;if the first char did not match, look up shift in bad-char-shift table
(let ((shift (assv first-char bad-char-shift-table)))
(if (eq? shift #f)
needle-len ;; return the needle length if nothing better could be found in the bad-char jump table
(cadr shift)))))) ;; ...otherwise return the value obtained from the table

;; search mainloop
(let main-loop ((current-index 0))
(if (> (+ needle-len current-index) hs-len)
#f
(let ((minimum-chars-to-skip (search-needle-at current-index)))
(if (= 0 minimum-chars-to-skip)
current-index ;; juhu found string!
(main-loop (+ current-index minimum-chars-to-skip)))))))


(>>boyer-moore "ANPANMAN" "NNNNNAXPANPANMANANMAN")


Some utility definitions are missing from the above code, these are:


;; returns the rest of the list after removing n elements
(define (ncdr n list)
(if (eqv? n 0)
list
(if (eq? list '())
list
(ncdr (- n 1) (cdr list)))))

;; returns the fist n items of the list
(define (ncar n list)
(let loop ((result '())
(rest list)
(c n))
(if (eqv? c 0)
result
(if (eq? rest '())
result
(loop `(,(car rest) ,@result) (cdr rest) (- c 1) )))))

;; return the common begining sublist of two lists
(define (common-sublist listA listB)
(let loop
((listC '())
(restA listA)
(restB listB)
(count 0))
(if (or (eq? restA '()) (eq? restB '()))
(cons count listC)
(if (eqv? (car restA) (car restB))
(loop `(,@listC ,(car restA)) (cdr restA) (cdr restB) (+ 1 count))
(cons count listC)))))



The above code might be complete bullsh*t, I dont know I just hacked it down while reading the wikipedia article of the algorithm. I didn't bother to lookup a reference implementation...
Also it was like 4:00 am when I hacked it...(apologies accepted?)

Comments

Popular posts from this blog

Lazy Evaluation(there be dragons and basement cats)

Lazy Evaluation and "undefined" I am on the road to being a haskell programmer, and it still is a long way to go. Yesterday I had some nice guys from #haskell explain to me lazy evaluation. Take a look at this code: Prelude> let x = undefined in "hello world" "hello world" Prelude> Because of Haskells lazyness, x will not be evaluated because it is not used, hence undefined will not be evaluated and no exception will occur. The evaluation of "undefined" will result in a runtime exception: Prelude> undefined *** Exception: Prelude.undefined Prelude> Strictness Strictness means that the result of a function is undefined, if one of the arguments, the function is applied to, is undefined. Classical programming languages are strict. The following example in Java will demonstrate this. When the programm is run, it will throw a RuntimeException, although the variable "evilX" is never actually used, strictness requires that all argu

Learning Haskell, functional music

As you might have realized, I started to learn Haskell. One of the most fun things to do in any programming language is creating some kind of audible side effects with a program. Already back in the days when I started programming, I always played around with audio when toying around with a new language. I have found a wonderful set of lecture slides about haskell and multimedia programming, called school of expression. Inspired by the slides about functional music I implemented a little song. Ahh ... and yes it is intended to sound slightly strange . I used the synthesis toolkit to transform the music to real noise, simply by piping skini message to std-out. I used this command line to achieve the results audible in the table: sven@hhi1214a:~/Mukke$ ghc -o test1 test1.hs && ./test1 | stk-demo Plucked -n 16 -or -ip Sound samples: Plucked play Clarinet play Whistle(attention very crazy!) play As always the source... stueck = anfang :+: mitte :+: ende anfang = groovy :+: (Trans

The purpose of the MOCK

In response to a much nicer blog entry, that can be found here . There are actually several distinct "tests" that make up usual unit tests, among them two that really do stand out: one kind of testing to test method flows, one to test some sort of computation. Mock objects are for the purpose of testing method flows. A method flow is a series of message transmissions to dependent objects. The control flow logic inside the method(the ifs and whiles) will alter the flow in repsonse to the parameters of the method call parameters passed by calling the method under test, depending on the state of the object that contains the method under test and the return values of the external method calls(aka responses to the messages sent). There should be one test method for every branch of an if statement, and usuale some sort of mock control objects in the mock framework will handle loop checking. BTW: I partly use message transmission instead of method invocation to include other kind