Skip to main content

Boyer Moore Search Algorithm

I needed to find a String in a text file, so I wrote(rather hacked) a scheme imlementation of the boyer moore string search algoritm.

This is just a hack. But it is commented. What do you think?
(I decided to use this blog also as my cut-paste-source from now on.)


;; searches for string using boyer moore algorithm
(define (>>boyer-moore needle haystack)
(define needle-len (string-length needle))
(define hs-len (string-length haystack))
(define r-needle-list (reverse (string->list needle)))
;; two tables are build
;; compute the bad character shift table
;; it contains the number of chars to skip, if a character is encountered that is not the last of the search string.
;; (this table is only used after the search cursor was replaced)
(define bad-char-shift-table
(let loop
((shift 0)
(nlist r-needle-list)
(table '()))
(if (eq? nlist '())
table
(if (assv (car nlist) table)
(loop (+ 1 shift) (cdr nlist) table)
(loop (+ 1 shift) (cdr nlist) `((,(car nlist) ,shift) ,@table))))))
;; the good char table contains the number of chars to skip forwar, if a substring starting from the
;; end of a needle was matched befor a mismatch occurs
;; it contains the next possible position from the current search position where a
;; string match might end...
(define good-suffix-shift-table
;; for every reverse substring define a shift value
(let char-pattern-loop
((pattern '())
(pattern-len 0)
(nlist r-needle-list)
(table '()))
(if (eq? nlist '())
table
(char-pattern-loop
`( ,@pattern ,(car nlist))
(+ 1 pattern-len)
(cdr nlist)
`(,@table
(,pattern ,(let loop
((shift 0)
(unmatched (car nlist)))
(if (equal? (ncar pattern-len (ncdr shift r-needle-list)) (ncar (- needle-len shift) pattern))
(if (eqv? shift needle-len)
shift
(if (>= (+ shift pattern-len) needle-len)
shift
(if (eqv? (car (ncdr shift nlist)) unmatched)
;; ok nicht gefunden weiter schieben/suchen
(loop (+ 1 shift) unmatched)
shift)))
(loop (+ 1 shift) unmatched)))))))))
;; searching at a position
(define (search-needle-at index)
(letrec ((sub-hs (reverse (string->list (substring haystack index (+ needle-len index))))) ;; den kaefer erstma aufn ruecken drehen...
(first-char (car sub-hs)))
(if (eqv? first-char (car r-needle-list)) ;; first time is special
;; if the fist char matches, proceed with subpattern search
(let ((common (common-sublist sub-hs r-needle-list)))
(if (= (car common) needle-len)
0 ;; found
(cadr (assoc (cdr common) good-suffix-shift-table))))
;;if the first char did not match, look up shift in bad-char-shift table
(let ((shift (assv first-char bad-char-shift-table)))
(if (eq? shift #f)
needle-len ;; return the needle length if nothing better could be found in the bad-char jump table
(cadr shift)))))) ;; ...otherwise return the value obtained from the table

;; search mainloop
(let main-loop ((current-index 0))
(if (> (+ needle-len current-index) hs-len)
#f
(let ((minimum-chars-to-skip (search-needle-at current-index)))
(if (= 0 minimum-chars-to-skip)
current-index ;; juhu found string!
(main-loop (+ current-index minimum-chars-to-skip)))))))


(>>boyer-moore "ANPANMAN" "NNNNNAXPANPANMANANMAN")


Some utility definitions are missing from the above code, these are:


;; returns the rest of the list after removing n elements
(define (ncdr n list)
(if (eqv? n 0)
list
(if (eq? list '())
list
(ncdr (- n 1) (cdr list)))))

;; returns the fist n items of the list
(define (ncar n list)
(let loop ((result '())
(rest list)
(c n))
(if (eqv? c 0)
result
(if (eq? rest '())
result
(loop `(,(car rest) ,@result) (cdr rest) (- c 1) )))))

;; return the common begining sublist of two lists
(define (common-sublist listA listB)
(let loop
((listC '())
(restA listA)
(restB listB)
(count 0))
(if (or (eq? restA '()) (eq? restB '()))
(cons count listC)
(if (eqv? (car restA) (car restB))
(loop `(,@listC ,(car restA)) (cdr restA) (cdr restB) (+ 1 count))
(cons count listC)))))



The above code might be complete bullsh*t, I dont know I just hacked it down while reading the wikipedia article of the algorithm. I didn't bother to lookup a reference implementation...
Also it was like 4:00 am when I hacked it...(apologies accepted?)

Comments

Popular posts from this blog

Learning Haskell, functional music

As you might have realized, I started to learn Haskell. One of the most fun things to do in any programming language is creating some kind of audible side effects with a program. Already back in the days when I started programming, I always played around with audio when toying around with a new language. I have found a wonderful set of lecture slides about haskell and multimedia programming, called school of expression. Inspired by the slides about functional music I implemented a little song. Ahh ... and yes it is intended to sound slightly strange . I used the synthesis toolkit to transform the music to real noise, simply by piping skini message to std-out. I used this command line to achieve the results audible in the table: sven@hhi1214a:~/Mukke$ ghc -o test1 test1.hs && ./test1 | stk-demo Plucked -n 16 -or -ip Sound samples: Plucked play Clarinet play Whistle(attention very crazy!) play As always the source... stueck = anfang :+: mitte :+: ende anfang = groovy :+: (Trans ...

The purpose of the MOCK

In response to a much nicer blog entry, that can be found here . There are actually several distinct "tests" that make up usual unit tests, among them two that really do stand out: one kind of testing to test method flows, one to test some sort of computation. Mock objects are for the purpose of testing method flows. A method flow is a series of message transmissions to dependent objects. The control flow logic inside the method(the ifs and whiles) will alter the flow in repsonse to the parameters of the method call parameters passed by calling the method under test, depending on the state of the object that contains the method under test and the return values of the external method calls(aka responses to the messages sent). There should be one test method for every branch of an if statement, and usuale some sort of mock control objects in the mock framework will handle loop checking. BTW: I partly use message transmission instead of method invocation to include other kind...

Keys, Values and Rules: Three Important Shake Concepts

The title was a click-bait! This article will actually try to explain five instead of three important notions in Shake. These are: Rules Keys Values The Build Database Actions This short blog post was inspired by the hurdles with my Shake based build, after the new Shake version was released, which had breaking API changes. Jump to the next section if you are not interested in the why and how of this blog post. Shake is rule based build system much like GNU make. Like make it is robust, unlike make, it is pretty fast and supports dynamic build dependencies. But you knew all that already, if you are the target audience of this post, since this post is about me explaining to myself by explaining to you, how that build tool, I used for years, actually works. Although I used it for years, I never read the paper or wrapped my head around it more than absolutely necessary to get the job done. When Shake was updated to version 0.16.x, the internal API for custom rules w...