Notes on the Gwydion Dylan Bindings to PCRE Written by Tom Emerson, tree@tiac.net Revision 0.1 Overview -------- PCRE (Perl-compatible Regular Expressions) is a C library written by Philip Hazel (ph10@cam.ac.uk) which provides Perl 5.004 and 5.005 compatible regexp functionality. The Python scripting language uses PCRE as its regular expression engine. This is the first go at a set of Gwydion Dylan bindings for PCRE --- the eventual goal being to replace the existing regexp library with PCRE (while maintaining the interface, of course). The mappings are spartan and need to be improved, but I wanted to get this out for people to start using while I muck with cleaning up the Dylan wrappers. Included in this distribution are the following: GDREADME this file gd-makefile.gmk makefile for building the Dylan library pcre-exports.dylan library and module definitions pcre-intr.intr Melange interface definitions pcre.lid LID file for the library pcretest.patch patch to the pcretest program (see below) gd-pcretest/ Dylan implementation of the pcretest program gd-pgrep/ Dylan implementation of the pgrep program Building -------- 1. Build PCRE according to its directions (pretty much just type 'make') 2. Build the GD libraries with, 'make -f gd-makefile.gmk'. I haven't bothered to create makefiles for gd-pcretest and gd-pgrep yet: both can be compiled by dropping into their directory and typing d2c -L.. gd-pcretest works very much like its C and Perl counterparts. It takes two arguments, the name of the input file containing test data, and the name of the output file to which the results are written. Unfortunately the "%x" specifier to 'format' doesn't have the same semantics as the C version: to make it easy to compare the Dylan test results with those for the C implementation, I made a slight modification to pcretest.c to output hex numbers in the same format as GD's 'format'. Just apply 'pcretest.patch' to pcretest.c and rebuild. Then regenerate the test output for 'testinput' and 'testinput3': % ./pcretest testinput testoutput.gd % ./pcretest testinput3 testoutput3.gd Given this, after running gd-pcretest the output for testinput and testinput3 should be identical. Note that testinput2 and testinput4 test features of PCRE that I am not currently supporting. The performance of gd-pcretest leaves a lot to be desired: I was not careful about minimizing the number of string copies that are performed so I'm sure there is much room for improvement. However, since the purpose is to simply test the operation of the interface I'm not concerned much about the performance. The gd-pgrep utility behaves almost exactly like its C counterpart: the only differences are: - *standard-input* is output instead of 'stdin'. - Return values don't match (this will be fixed later) Future Directions ----------------- - Improve error handling: the nonsense of expecting the user to deal with error strings and whatnot is silly. Conditions should be used instead. - Harlequin C-FFI version. Perhaps as a test of Pidgin to compare with a hand-generated interface. - Write extraction functions to extract matches from the search string. - Fix bugs and improve my Dylan style. - Investigate i18n issues in the engine. 1999-Mar-30 tree@tiac.net