Skip to main content

Boyer Moore Search Algorithm

I needed to find a String in a text file, so I wrote(rather hacked) a scheme imlementation of the boyer moore string search algoritm.

This is just a hack. But it is commented. What do you think?
(I decided to use this blog also as my cut-paste-source from now on.)

;; searches for string using boyer moore algorithm
(define (>>boyer-moore needle haystack)
(define needle-len (string-length needle))
(define hs-len (string-length haystack))
(define r-needle-list (reverse (string->list needle)))
;; two tables are build
;; compute the bad character shift table
;; it contains the number of chars to skip, if a character is encountered that is not the last of the search string.
;; (this table is only used after the search cursor was replaced)
(define bad-char-shift-table
(let loop
((shift 0)
(nlist r-needle-list)
(table '()))
(if (eq? nlist '())
(if (assv (car nlist) table)
(loop (+ 1 shift) (cdr nlist) table)
(loop (+ 1 shift) (cdr nlist) `((,(car nlist) ,shift) ,@table))))))
;; the good char table contains the number of chars to skip forwar, if a substring starting from the
;; end of a needle was matched befor a mismatch occurs
;; it contains the next possible position from the current search position where a
;; string match might end...
(define good-suffix-shift-table
;; for every reverse substring define a shift value
(let char-pattern-loop
((pattern '())
(pattern-len 0)
(nlist r-needle-list)
(table '()))
(if (eq? nlist '())
`( ,@pattern ,(car nlist))
(+ 1 pattern-len)
(cdr nlist)
(,pattern ,(let loop
((shift 0)
(unmatched (car nlist)))
(if (equal? (ncar pattern-len (ncdr shift r-needle-list)) (ncar (- needle-len shift) pattern))
(if (eqv? shift needle-len)
(if (>= (+ shift pattern-len) needle-len)
(if (eqv? (car (ncdr shift nlist)) unmatched)
;; ok nicht gefunden weiter schieben/suchen
(loop (+ 1 shift) unmatched)
(loop (+ 1 shift) unmatched)))))))))
;; searching at a position
(define (search-needle-at index)
(letrec ((sub-hs (reverse (string->list (substring haystack index (+ needle-len index))))) ;; den kaefer erstma aufn ruecken drehen...
(first-char (car sub-hs)))
(if (eqv? first-char (car r-needle-list)) ;; first time is special
;; if the fist char matches, proceed with subpattern search
(let ((common (common-sublist sub-hs r-needle-list)))
(if (= (car common) needle-len)
0 ;; found
(cadr (assoc (cdr common) good-suffix-shift-table))))
;;if the first char did not match, look up shift in bad-char-shift table
(let ((shift (assv first-char bad-char-shift-table)))
(if (eq? shift #f)
needle-len ;; return the needle length if nothing better could be found in the bad-char jump table
(cadr shift)))))) ;; ...otherwise return the value obtained from the table

;; search mainloop
(let main-loop ((current-index 0))
(if (> (+ needle-len current-index) hs-len)
(let ((minimum-chars-to-skip (search-needle-at current-index)))
(if (= 0 minimum-chars-to-skip)
current-index ;; juhu found string!
(main-loop (+ current-index minimum-chars-to-skip)))))))


Some utility definitions are missing from the above code, these are:

;; returns the rest of the list after removing n elements
(define (ncdr n list)
(if (eqv? n 0)
(if (eq? list '())
(ncdr (- n 1) (cdr list)))))

;; returns the fist n items of the list
(define (ncar n list)
(let loop ((result '())
(rest list)
(c n))
(if (eqv? c 0)
(if (eq? rest '())
(loop `(,(car rest) ,@result) (cdr rest) (- c 1) )))))

;; return the common begining sublist of two lists
(define (common-sublist listA listB)
(let loop
((listC '())
(restA listA)
(restB listB)
(count 0))
(if (or (eq? restA '()) (eq? restB '()))
(cons count listC)
(if (eqv? (car restA) (car restB))
(loop `(,@listC ,(car restA)) (cdr restA) (cdr restB) (+ 1 count))
(cons count listC)))))

The above code might be complete bullsh*t, I dont know I just hacked it down while reading the wikipedia article of the algorithm. I didn't bother to lookup a reference implementation...
Also it was like 4:00 am when I hacked it...(apologies accepted?)


Popular posts from this blog

Keys, Values and Rules: Three Important Shake Concepts

The title was a click-bait! This article will actually try to explain five instead of three important notions in Shake.

These are:
RulesKeysValuesThe Build DatabaseActions
This short blog post was inspired by the hurdles with my Shake based build, after the new Shake version was released, which had breaking API changes.

Jump to the next section if you are not interested in the why and how of this blog post.

Shake is rule based build system much like GNU make. Like make it is robust, unlike make, it is pretty fast and supports dynamic build dependencies.

But you knew all that already, if you are the target audience of this post, since this post is about me explaining to myself by explaining to you, how that build tool, I used for years, actually works.

Although I used it for years, I never read the paper or wrapped my head around it more than absolutely necessary to get the job done.

When Shake was updated to version 0.16.x, the internal API for custom rules was removed. Until then I w…

Lazy Evaluation(there be dragons and basement cats)

Lazy Evaluation and "undefined"
I am on the road to being a haskell programmer, and it still is a long way to go. Yesterday I had some nice guys from #haskell explain to me lazy evaluation.

Take a look at this code:

Prelude> let x = undefined in "hello world"
"hello world"

Because of Haskells lazyness, x will not be evaluated because it is not used, hence undefined will not be evaluated and no exception will occur.

The evaluation of "undefined" will result in a runtime exception:

Prelude> undefined
*** Exception: Prelude.undefined

Strictness means that the result of a function is undefined, if one of the arguments, the function is applied to, is undefined.
Classical programming languages are strict. The following example in Java will demonstrate this. When the programm is run, it will throw a RuntimeException, although the variable "evilX" is never actually used, strictness requires that all
arguments of a fu…

Erlang mock - erlymock

The project has evolved and can be found here: ErlyMock

Some features

Easy to use
Design based on easymock
Works together with otp: can be used even if the clut is called from another process, by invoking mock:verify_after_last_call(Mock,optional: timeout)
custom return functions
predefined return functions for returning values, receiving message, throwing exceptions, etc..
erlymock automatically purges all modules that were mocked, after verify()
Custom argument matchers:

%% Orderchecking types: in_order, out_of_order, stub;
%% Answering: {return, ...}|{error, ...}|{throw, ...}|{exit, ...}|{rec_msg, Pid}|{function, Fun(Args) -> RetVal}
expect(Mock, Type, Module, Function, Arguments, Answer = {AT, _}) when AT==return;AT==error;AT==throw;AT==exit;AT==rec_msg;AT==function ->
call(Mock, {expect, Type, Module, Function, length(Arguments), {Arguments, Answer}}).

%% this version of expect is suited for useing custom argument matchers
expect(Mock, Type, Module, Fun, …