2009-02-08 Ger Hobbelt * Merged in BlameBarack vanilla changes since 2008/10. * Notice that the unification of mailfilter + mailreaver + mailtrainer has an unexpected side-effect: mailfilter would add the 'X-CRM114-Status: Good ( )' mail header while the more modern mailreaver would add the 'X-CRM114-Status: GOOD ( )' mail header, which is SIMILAR but NOT IDENTICAL. This vanilla issue turned up while working on TREC-compatible tests using Gordon Cormack / Lyman published spamfilterjig toolkit, which expected this instead: 'X-CRM114-Status: GOOD ( pR: - )' ... notice that extra 'pR:' string AND the mandatory '-' minus in there! * Added the mkcss.crm script (derived from work on TREC-compatible tests) * Added tenfold_validate_ex.crm and tenfold_validate_mailreaver.crm scripts to ./tests/ directory. tenfold_validate_ex spits out TREC-compatible result files and includes additional features, such as 'forced training' (Train Always) vs. the regular TOE (Train On Error). Another option is THTTR (Thick Threshold Training) which will repeatedly (for a maximum number of iterations) train a message until its classification passes a preconfigured 'thick' threshold. * alt.winnow classifier has some VERY EXPERIMENTAL new pR calculus. * Added vanilla CRM114 bug exhibiting tests in ./tests/ directory: vanilla_*.crm * Fixed the trap-within-a-call handling bug in GerH builds. Test added to ./tests/ directory: trap_inside_call_issue1 * Hyperspace-style has been moved into the VT engine so that any classifier using VT now automagically supports the attribute as well. * Bugfix: script parser would not accept the Bayesian/Markovian 'alternative' classifiers (markovian.alt, osb.alt, osbf.alt and winnow.alt) as arguments of the 'learn', 'classify' nor the 'css*' script commands. * The experimental Bayesian/Markovian 'alternative' classifiers (markovian.alt, osb.alt, osbf.alt and winnow.alt) now include full VT support. * VT engine has been extended to include configurable leading and trailing padding in the tokenizer (vanilla uses hardcoded DEADBEEF trailing padding only). It is the intent to make the padding script-configurable to enable quick & easy testing of various edge behaviours of the VT engine; this is expected to be quite important when classifying small messages. * VT engine has been augmented to include feature hash weight factors and order number feedback to allow Bayesian/Markovian classifiers full, backwards-compatible VT use. Of course, feature weight lists can be script-configured, just like VT matrices already could (weight:, vector:) * VT engine now detects and signals when Arne's optimization is feasible as this depends very much on the VT matrix contents. * All Bayesian / Markovian classifiers now use the same left-trucatable prime as their default hashtable/CSS store size. 2008-10-20 Ger Hobbelt * Added Bourne Shell fixes for those systems which have the original Bourne shell as /bin/sh: '[...]' is replaced with 'test ...' -- Bourne doesn't recognize the [] alias 'shift' is surrounded by 'if test $# != 0; then ... fi' as 'shift' will print an error and abort in Bourne when there are no more arguments to shift (BASH will silently ignore this) '\n' in tests/Makefile CRM114 inline test scripts is replaced by CRM114's :*:_nl: as Bourne unescapes the \n even within '...' quotes and that ruins our test validations (bash and bourne produced output is not similar when you keep those \n around instead of using :*:_nl:) added check for the existence of '/usr/share/dict/words' as not all systems have that one and not having it around would previously result in an aborted (FAILed) 'make check'. * bugfixes for mawk vs. gawk in ./tests/*.awk: - gawk accepts /* .. */ comments, mawk does not. - gawk understands sprintf("%.0lf", ...), mawk does not. For mawk to work it must be: %.0f without the 'l'. 2008-10-04 Ger Hobbelt * Added a few extra test cases to 'make check': alius_w_comment2, ... * Updated the makefiles to make sure that all MSVC/Windows required files are included in the distro (some projects were lacking). (NOTE: mandatory libraries are not included; there available as separate downloads at http://hebbut.net/ ) 2008-10-02 / 2008-09-** Ger Hobbelt * Full review of mailreaver/mailtrainer/mailfilter/maillib.crm and user request resulted in a new version with several fixes to both scripts and C source code. Script fixes include the important upgrade for mailtrainer, which now uses the exact mail preprocessing code as mailreaver, so that mailtrainer does NOT use a message with its content duplicated for learning and re-classification. * Added '--dontstore' non-cached classification and training support to mailreaver + mailtrainer. * Integrated almost all still missing mailfilter features (e.g. forwarding to error-handling email address) to mailreaver. NOT INCLUDED: special 'secured' command mail processing; '--learn' handling, '--unlearn' handling, 'automatic' training. Extra mailfilter.CF options: # mailfilter log options: :log_to_nonspamtext: /yes/ :log_to_spamtext: /yes/ * mail***.crm '--help' command line option now always is detected and processed, no matter its position on the command line. * Fixed Makefiles, scripts and ./configure to properly support 'make install'; now bang lines will ensure the *installed* crm114 binary will be used to execute them. When you want to use a different crm114 binary, you should run the script as path-to/crm114 your_script.crm * Extra mailfilter.CF configuration parameters added: # # specify your system's 'mail' command: # # Should support the commandline # -s # and content via stdin. :mail: /mail/ and :do_refute_training: /SET/ * Made darn sure the message content passed to crm*expand() functions is NUL terminated, no matter what. This prevents spurious crashes and extra weird character output when running in -t and/or -T mode. * Fix for crm script 'input' statement when filename is constructed using a crm expression which is itself larger than the maximum allowed filepath on your OS. * FAILED ATTEMPT to get rid of the global variables; tdw and vht are now generally passed as extra arguments, but this not sufficient; there are a zillion spots in the code which require almost all globals (error handling/reporting routines), which will bloat the call interfaces tremendously when we'd continued with this particular effort. ABORTED EFFORT. * Fix for fringe case in 'isolate' statement regarding buffer overruns. * script 'match' fix for . * script 'match' addition: now /, etc. can all be mixed instead of only the last one being used with warning that extra (previous) attributes would be discarded. Provisional fix for as well, but this has not been tested yet. * Fix for failing 'syscall': serious application invoke errors are now non-trappable as the system state is undeterminable by then anyhow. * Fix for fringe case buffer overrun error in 'window' script statement where one would specify illegally large variable names. * Important FIX for FSCM: now with corrected VT invocation. The existing situation would produce extremely corrupted hash feature series and core dumps in particular circumstances. * Fix for several fringe case buffer overrun errors in Markovian classifier. * Fix memory mapped access request for the Markovian classify statement: read-only access - as it should be. * Fix for memory and resource leaks in Markovian classifier. * Update for the crm script :@: operator: the '=' equality compare operation will try to cope with the slight errors which will occur in classifiers as they use floating point arithmatic. The equal operator will now compare the values within a band of FLT_EPSILON. * Fix for Win64 versus Linux 64-bit printf() operations in the crm114 script math operator :@: * Fix for OSB learncount updates in the CSS databases; before, illegal NEGATIVE values could easily occur, thus destroying your CSS databases completely. * Fixes for Markovian repeated for OSB Bayes, Winnow and OSBF classifiers. * Important FIX for Hyperspace VT: same as for FSCM (see above). * Fixed Hyperspace 'learn': would always print some '-t' output, even when '-t' was NOT enabled. * Fixed bug where script statements such as 'input' would try to store too much data into a variable: now such failures are properly trappable, as originally intended. * VERSIONING HEADER *update* now also stores the 'REVISION' to assist future migration operations. This BREAKS the current versioned CSS format, but we can better break it now, when few people use it, then later when folks truely depend on it. * TESTS: added several script tests which verify additional normal and illegal behaviours. * configure: now also reports the install directory when done. This is done to help users who compile crm114 GerH from source and wish to know where their binaries are going to end up. * Fixed the distribute_files_from_src_dir.sh script which redistributes Bill's scripts in ./src/ * configure: added the [sub]category testset make targets to the main 'make' file for ease of use: megatest_ng: timing_tests: test_megatest: test_megatest_ng: test_basics: test_other: test_mailfilter: test_mailreaver: test_classifiers: script: initial_tests1: fringe_cases_compiler_tests1: initial_tests2: further_tests: all_classifier_tests: markovian_classifier_tests: OSBF_classifier_tests: OSB_classifier_tests: Hyperspace_classifier_tests: Bit_Entropy_classifier_tests: FSCM_classifier_tests: Neural_Net_classifier_tests: SVM_SKS_classifier_tests: Correlator_classifier_tests: CLUMP_classifier_tests: 2008-07-17 Ger Hobbelt * Fixed Makefiles, scripts and ./configure to fully support VPATH: e.g. you can now build and test CRM114 from a specific (different) build directory, using this sequence for example: mkdir _build cd _build ../configure --srcdir=.. make make check * Added 'statistics gathering' (I called it 'profiling' when I wrote it about a month ago, but that name has so many re-uses here, and besides, 'statistics gathering' covers much better what it actually does) to OSB; I intend to add it to the other classifiers too for deep analysis support. With that comes a new tool to process these 'data traces' into useful numbers and graphs. Currently, the tool only produces some stats and some crudy ASCII art, but that's just the start. 'Statistics gathering' this way has been chosen in order to permit 'tracing' multiple runs on CRM114 with the absolute minimum of extra load caused by 'collecting' the information. A separate tool has been written as I believe the data processing should NOT occur in CRM114 itself; I envision two major output channels, both for processing large quantities of data into human-parsable stats and graphs: number series and/or scripts, which should be fed to R (the Open Source 'S'-equivalent statistics package) and a second feed of a rather more novel type: rendered video. For that purpose, a future version of the processing tool will have a dependency on OpenEXR, as the HDRI nature of that particular format is especially suited to display data with a wide dynamic range. * Taking care of 'make check' and other 'make' test targets' verbosity: a few environment variables have been introduced: CRM114_MAKE_SCRIPTS_DEBUG=1 export CRM114_MAKE_SCRIPTS_DEBUG ... and, yes, don't forget that 'export' statement or it won't work within 'make': full verbosity for when you wih to debug the test/filter scripts themselves. (tests/testscript.sh.in + tests/crm114*filter.sh.in test output filter scripts) 'unset CRM114_MAKE_SCRIPTS_DEBUG' will turn test script debugging verbosity OFF again. CRM114_CHECK_QUIET=1 export CRM114_CHECK_QUIET Don't forget to assign it a non-zero number to turn it ON; setting it to '0' or 'unset'ting it will turn this option OFF: when ON, don't show diff reports unless we've got a test that FAILs. In other words: use CRM114_CHECK_QUIET=1;export CRM114_CHECK_QUIET to shut up 'make check' et al, so you only see al list of test titles and 'OK's flowing by - until you hit a FAIL, which will show you WHY it failed (diff output between actual output and reference (= expected) output. * Test script / 'make check' notes: A) hassle-free refreshing / updating test references the generic test script has been designed so that both testing AND refreshing reference data is a breathe: simply delete (or move) the reference data in ./tests/ref and the testscript will automatically copy the test outputs for the next runs to the reference directory (while printing a warning for each test reference which is updated in this way). Of course, you can also delete a single reference file: the next time this test is performed, its test result will be used as reference from then on. Procedure when validating new test results as 'correct' and old results as 'NOT correct anymore' is thus: rm tests/ref/.* make check (or other 'make ' which tests ) to update the reference. Any subsequent 'make check' or other 'make ' which tests will use that reference to test as usual. B) test filters to cope with permissable variations in test output The generic test script comes with a set of 'filter' scripts to 'postprocess' test output, BEFORE that output is fed to a comparator (diff). In fact, when you specify a filter script (4th commandline argument for testscript.sh) the testscript assumes the filter WILL ALSO PERFORM THE COMPARISON ITSELF. This is done to allow the filter scripts to compare output with reference in other ways that are not possible with UNIX diff. Hence, the exit value of the filter script is used as a OK/FAIL signal: 0 = OK, anything else = FAIL. Note that the filter scripts accept custom environment variables to drive their behaviour (see tests/Makefile.am for several examples of this, especially with the classifier tests). The generic testscript itself also accepts an additional environment variable: CRM114_CHECK_OVERRIDE=0; export CRM114_CHECK_OVERRIDE; which can also be used as part of a 'make' action (see tests/Makefile.am for several examples) to 'override' the test results. In effect, those tests are thus completely overriden; they do execute, but since we know their results are buggy / instable, we can prevent those tests from aborting test sets of which they are part. ('make check' and other 'make' test targets will stop at the first FAILed test.) Note that the value assigned to CRM114_CHECK_OVERRIDE is used as the 'permanent' test result; 0=OK, any other value will cause a 'permanent' failure. * All major/medium test targets have been 'exported' down to the root Makefile, so you don't need to 'cd tests' anymore to run a particular set of tests. 2008-05-25 Ger Hobbelt * Upgraded the custom M4 macros and otherwise pushed the autoconf bits into the Century of the Fruitbat, kicking and screaming. Seriously, the compiler flags, etc. are working as expected. Finally. And all this goodness requires autoconf 2.62 or beyond now, so stay abreast, my dears. * UNIX: CRM114 'syscall' acted up and barfed in the tests; amazingly this hadn't happened before, but it turns out it's a big difference if you compile with no optimizations (because the configure script did not properly check for them) or suddenly have access to -O3 / -O2 optimized binaries. Turns out there's an uninitialized variable in there. While at it, checked the waitpid() man page, decided NOT to check the return value depite my anal retentive preferences in that regard, but instead made sure the error report now shows the correct info as children can also be aborted in various ways. * GCC 3 produced wads of warnings when finally -O3 was turned on. A single warning was particularly annoying and happened only in those places where I had overriden the free() function to ensure that the pointer passed to it is also NULLed once done do prevent any remaining code from accessing unalloced memory anymore - at least for the usual cases of such. warning: dereferencing type-punned pointer will break strict-aliasing rules Turns out the warning is not useful, but it took a bit of work to get rid of it. This required the use of GCC specific code constructs, which have bee properly encapsulated in crm114_sysincludes.h * Added Neural Net tests to 'make check' again. Thanks to the new random generator, these don't take 15 whole minutes anymore: first test fails, second succeeds - within tolerances. (See tests/Makefile.am) 2008-05-21 Ger Hobbelt * Added the match_re_fringe1.crm test case after some new discoveries regarding crm114 matching and variable rewriting which caught me unaware. * mailtrainer.crm had a bad return statement, which nevertheless worked out in the old days as all CRM114 variables have global scope are the called function used the same variable as the caller. Harumph. * QUICKREF.txt now includes preliminary documentation for the new CRM statement: cssmigrate (:report:) [:src:] (:dst:) /params/ This is not yet implemented, but intended to load and convert vanilla CSS files for production classifiers and convert those files to something 'native' for the GerH builds on all platforms, both 32- and 64-bit. * Added extra syntax checks for liaf/fail/alius script commands in the script compiler to warn script programmers of misuse of these: these commands expect to live in a {} block. * Fixed the compiler to properly handle MSDOS formatted script code: the number of generated opcodes would sometimes overflow the pre-allocated space without warning, resulting in silently(!) truncated scripts when CRM114 was fed CRLF-line terminated code. (This is very probably a bug introduced by yours truly when starting work on the script compiler. :-( ) * '-T' mode now dumps the compiled opcodes of the script with all attributes for further analysis. * Tweaked the FAIL targets to NOT point past the last valid line of script anymore as the debugger would not like such a thing. As we have the extra empty line added to every loaded script anyhow, the outer-most FAIL target will be that empty line. This does not make a difference to the script execution; it is merely to keep the debugger a bit simpler (it is already complex enough) and still allowing the debugger to catch possible crm-internal compiler errors. * Fixed the debugger to allow it to have command line history recall on UNIX when 'readline()' is available. This now makes the debugger as capable on readline-enabled UNIX systems as it already was on Win32. * Added the mk_absolute_path() function to the code for cross-platform filepath expansion. Can be used to report the precise filepaths for files which report access failures, etc. - sometimes files can be found in several places on your discs and then it is handy to know _exactly_ which file crm114 has been looking for. * Few minor fixes in the error reporting functions, following a partial code review, and added (int) typecasts to strlen() and other functions which produce 'size_t' results, which are fed into 'int' targets. This has been done to reduce the number of warnings when compiling this code in pedantic mode on UNIX and Win32/64 platforms. * fixed code for eval to ensure we have at least a chance at retrieving a valid variable name (where the empty var '::' is also accepted) using the generally applied coding pattern: len = crm_get_pgm_arg(varname, MAX_VARNAME, apb->p1start, apb->p1len); len = crm_nexpandvar(varname, len, MAX_VARNAME); if (!crm_nextword(varname, len, 0, &varnamestart, &varnamelen) || varnamelen < 2) { // we do accept the special 'empty var' :: here as a valid var: it does exist after all :-) nonfatalerror("bla bla bla"); } As it is, this should rather be packaged into a 'get_variable_name()' wrapper call or something, because we still don't check for proper ':' delimiting, so code like this is okayed by the compiler but produces inaccessible results (or other errors): alter (x) /foo/ #note the lacking : : around the 'x' * Code has been reviewed for the variable fetch pattern descried above; several other spots have seen minor edits too. (e.g. 'isolate', 'window') * FSCM contained code to accept \-escaped whitespace in filenames, but only in 'classify'. This recurred in the apparent copy&paste Neural Net code which had the same 'feature'. REMOVED. (Until we re-introduce such a feature in a later release, but then for all filenames in script parameters everywhere.) * Cleaned up OSB a little bit (RIDICULOUS_CODE): there was some old stuff in there that would not do what it was expected to in non-standard situations anyway, so it could be easily discarded. * Removed the last few remaining vestiges of this abomination to fetch a word from string space: while (htext[i] < 0x021) i++; j = i; while (htext[j] >= 0x021) j++; * '-T' mode now dumps the (pre-)processed script code with line numbers for easier analysis of medium and large scripts. * Added Marsaglias CMWC RNG code to have a cross-platform reproducible good quality random generator: rand() is non-portable. This showed in the Neural Net tests; that classifier requires a little 'noise' to settle; the new CMWC was chosen because it is fast, has good random characteristics and is very portable so that test results can be compared properly across platforms (32/64 UNIX/WIN). As I am a lazy bum, I took the implementation from here: http://www.agner.org/random/ which is GPL'd code so should be okay license-wise inside CRM114. * Fixed script command flags decoder routine crm_flagparse() so it would not cause the code to jump multiple traps when several flags would report an error. Come to think of it, I should rather make sure the nonfatal() code (and other bits) NEVER get a chance to jump several trap targets as multiple errors are reported by a single script command being executed. HMMMMM.... Ah well, a single message for several flags being not allowed or otherwise faulty is an improvement anyhow. * DITCHED the special code for the tools from the Win32 and UNIX builds in a pre-emptive move towards including those features inside CRM114 itself. Too much trouble to keep these up-to-date when I know their lifetime is severely limited. crm_util_errorhandlers.c is now officialy obsoleted and discarded from the distribution. * Fixed CRM114 to ALWAYS check if the :_env_PWD: and :_env_USER: global vars can be set from the environment variables. If not, set these using system calls so as to guarantee that both are set and at least filled with _something_ that's probably valid when the actual script is executed. * Added part of the vanilla VT code into the VT source so I can better validate the test results against megatest_knowngood.log - the new VT code includes some (I think) bug fixes which make it produce a slightly different number of features, which results in different test results. And that annoys 'make check' too much. * Had been a bit too enthousiastic with applying the crmhash_t type throughout the VT code: now the coefficients have the regular 'int' type they should have had from the start. * Fixed dumb bug in VT code when custom vectors are supplied. 'make check' is happy again. * MAILREAVER/MAILTRAINER/MAILFILER: preliminary port to Win32 included. * MAILFILTER.CF: BASE64 decoding is now DISabled by default; this is just for testing (though I found OSBF has a provision for big token clustering into a single feature, so only emails with human readable content encoded as BASE64 are adversily affected by this move when you don't revert it in production - note that the new VT tokenizer also includes this 'big token clustering' feature). * There a better way to keep the vanilla and Win32 build header files in sync regarding build numbers, etc.: now these have been added to the configure script so I don't have to update them by hand every time (which I often forgot). * Fixed bug in crm114_classify.filter.awk test result postprocessing script, which made the Correlate classifier tests barf. 2008-05-05 Ger Hobbelt * Of course I had to screw up again. Apparently I did never hit any error reports larger than ~ 256 characters going, as when I was taking the next step towards a single-scan compiler, a bug in fwrite4stdio() caught me behind the ear: error messages and other displays would repeat after 256 chars instead of printing the _complete_ message. FIXED. * Compiler now comes with full argument parsing and argument count validation using the table in crm_compiler.c. Of course, one may expect trouble then and several troubles have already been addressed in the table and the validation code. Basically, JIT is dead and gone now. The things left TODO: - get rid of the preprocessor - make the mct[] array a single allocated array instead of a series of blocks - no use having that - refactor the code, such that calls like crm_statement_parse() do not needlessly rescan the command too, or rather have it's state machine parse the complete command for us as it does that already basically: this functionality is duplicated in several ways in both the preprocessor and the compiler core. * Fixed a few bugs in the latest boundary checking and error reporting code. This was not catched by 'make check' and I just found out when trying to run mailreaver.crm - oops. :-(( Definitely in need of a more sophisticated test set to prevent this sort of boogers. * Further preparation for the css* commands in the CRM114 compiler: the compiler should now be able to cope with 'non standard flags', i.e. ignore them when the statement definition table (crm_compiler.c) says we should expect them for this particular command. * Programming bugs in the vanilla test scripts: fataltraptest.crm --> an extra fataltrapprogramerr.crm test case has been created to test this situation, while fataltraptest.crm has been fixed to perform the intended tests. +Too many angled '<>' arguments were specified for this command: we see you specified 1 args while the maximum required is 0. +Sorry, but this program is very sick and probably should be killed off. +This happened at line 14 of file overalterisolatedtest.crm: + alter <> (:z:) /:*:2:/ +(runtime system location: crm_stmt_parser.c(630) in routine: crm_statement_parse) --> overalterisolatedtest.crm has been fixed to adhere. +crm114: *ERROR* +Too many slashes '//' arguments were specified for this command: we see you specified 2 args while the maximum required is 1. +Sorry, but this program is very sick and probably should be killed off. +This happened at line 6 of file exectest.crm: + syscall ( ) (:lsout:) /ls *.c/ /[Windows-MS]dir *.c / +(runtime system location: crm_stmt_parser.c(651) in routine: crm_statement_parse) --> exectest.crm has been fixed too: this was my own mistake as I hadn't removed the old Win32 syscall hack in there. :-( This code does not run (anymore when ever it did) on Win32 as it is; I should code an extra bit code in there to check that hosttype and make it Win32 compatible the /proper/ way. +crm114: *ERROR* +Too many slashes '//' arguments were specified for this command: we see you specified 2 args while the maximum required is 1. +Sorry, but this program is very sick and probably should be killed off. +This happened at line 10 of file randomiotest.crm: + syscall /rm -f randtst.txt/ /[Windows-MS]del randtst.txt/ +(runtime system location: crm_stmt_parser.c(651) in routine: crm_statement_parse) --> one more of my own legacy boogers. I *LIKE* the new compiler already. :-) Yes, I do! +crm114: *ERROR* +Too few angled '<>' arguments were specified for this command: we see you specified 0 args while the minimum required is 1. +Sorry, but this program is very sick and probably should be killed off. +This happened at line 2 of file (from command line): + learn (q_test.css) /[[:graph:]]+/ +(runtime system location: crm_stmt_parser.c(630) in routine: crm_statement_parse) --> heck, this was legal in the old days, right? The new CRM114 has a dummy 'autodetect' call in there, which does not do any autodetecting at all but just point to the old default. Which is another way of saying it's a work in progress, that. Nevertheless, this should stay, so the compiler definitions will be fixed for this. Fixed in the compiler table. * Fixed configure.ac as 7zip was suddenly unrecognized since a few releases. :-( 2008-05-01 Ger Hobbelt * BillY-equivalent src makefile is now renamed to Makefile.vanilla to better signify the raison-d-etre for this file: a 'vanilla UNIX setup' for those who won't or can't compile the ./configure-based set. With that comes an appropriately preconfigured config headerfile and #define to turn this on - set by default in src/Makefile.vanilla, of course: #define : ORIGINAL_VANILLA_UNIX_MAKEFILE #include headefile : "config_vanilla_UNIX_sys_defaults.h" * CHANGE: now all :@:: and :+:: accepting script elements also accept :#:: - it was very handy to be able to print string/variable content length in output /.../ statements, that's why. * CHANGE: Vector Tokenizer now accepts 3D vector matrices (i.e. multiple matrices for stride != 1 processes), comes with its own structures to define cutom tokenizer functions for use by the tokenizer and the code has been completely refactored to facilitate these new features in the best possible way. Currently the VT code accepts up to 4 2D vector matrices in parallel; currenty published classifiers only use 2. This is a compile-time constant: #define UNIFIED_VECTOR_STRIDE 4 * Fixed input line handling for MSDOS & older MAC which do not use the UNIX LF ('\n') line termination, but CRLF and CR respectively. (crm_expr_file_io.c) * CHANGE: speed improvement for Win32 stdio by removing the fflush() calls, which are useless anyway. WARNING: those statement proved not THAT useless: without them the syscall command would keep repeating previously generated output when stdout is redirected to file. See also the crm114 dev mailing list; suffice to say that after a long time looking for the cause of this isse, the _proper_ fflush() calls were added to the syscall() code itself - where they _should_ be. (Though I hate think of a fork()ing system which requires these stricks to keep things going smoothly; that's one of the reasons why I personally avoid fork() like the Plague; it's only useful in very particular circumstances and besides, Win32 doesn't have fork() at all, so fork()ing code is rather unportable to boot. Use threads or other modern multitasking/async I/O means instead.) * Win32 does NOT like writing (fwrite(), ...) large amounts of data to stdout/err at once (errors happen beyond 64K at least, sometimes earlier, depending on your setup and Win32 platform). This has been fixed by repelacing relevant calls by the new function: fwrite4stdio() which writes the data in 'reasonable' chunks. 'reasonable' is calculated as about half of a regular FILE buffer. This provides no noticable performance losses on either Win32 or Linux (SuSe64). * SKS/SVM: removed useless expandvar() calls for the filename regexes. * SCM: removed fro the source tree and make files: this code was already long time antiquated and does not show up in the BillY releases anyway. * FSCM: file I/O data structures have been made cross-platform portable using [u]int32_t and other fixed-width data types; other classifiers already had this treatment but this one had apparently slipped the dance. * The error handling routines now 'know' when we're inside the debugger, so failures in watched expressions (which can easily happen ;-) ) won't have side-effects such as execution of trap handlers in the script itself (ouch!). * compiler checks (table in crm_compiler.c) have been corrected for some of the more modern (experimental) classifiers. * Improved the debugger and updated QUICKREF.txt accordingly. * Added the debugger-related script system variable :_?: which contains the latest error message from the CRM114 run. * Debugger now also pops up when the complete script has run so the debuggeee/user can view the last results before the application terminates. Sometimes code jump commands may still work, though the complete context cannot be reset of course (feed from stdin, etc.). * Debugger 'v' command: line range handling split off to a separate routine; now handles a more flexible format too (every number can be 'defaulted'). * CRM114 error reports now also show the offending source code in print. * Did we now finally stamp out all the boundary bugs in expandvar()? -- This following a bug report by Jason Lewis, which showed a boundary bug in vanilla CRM114, but upon renewed inspection I found the GerH copy still could also suffer from boundary issues (though is fewer, rather obscure conditions, yet a boundary bug is one too many anyhow). This build includes fixes for all known/detected boundary issues in there (see for a list of similar material in the calling code further below), including fixes for coping with extremely large variable names which need :@: or other processing. KNOWN BUG: nested :@: statements are not parsed properly, e.g. eval /:@::@:1+1::/ does not produce what you'd expect ('2'); this is important to know when using the CRM114 debugger as the watched expressions use the same expandvar() routine to produce their results. See crm_debugger.c * Following discussion about the need to escape other characters in any crm114 command arg, here's the comment from crm_Expandvar.c (that one again): // // As 'datastring' points to a text buffer which has - of course - // already been \-de-escaped, we perform a little trick here to // prevent requiring the user to specify double \-escaped '/' // slashes in his/her restriction regex: since we know we're // the only regex in this box, we do this by walking back from the // end of the string, looking for the last '/' there is. // * Checked all malloc/calloc/realloc/strdup class and added untrappableerror) statements for those spots which did not check for NULL returns - which would otherwise lead to crashes/coredumps. * expandvar() code cleanup - at least a bit. KNOWN BUG: ":@: :@: 1 + 1 ::" is not handled properly by expandvar: nested match expressions in an 'eval' statement screw up BADLY. TODO: for performance reasons, the quad loop through the buffer for each expansion should be reduced to a single round only. This can be nicely combined with the fix for KNOWN BUG above. * debugger: 'c' restored to its proper glory: 'c' without any arg should just GO. We've got 'n' for single stepping, don't we? * UNIX syscall: KNOWN BUG: some weirdness is now occurring when crm114 scripts execute syscall and their own output is redirected to file (such as for the 'make check' testcases): it looks like the crm114 output buffer gets rewritten to stdout completely, every time a syscall is executed. :-S * makefiles/configure: upgraded to the latest autoconf/automake/etc. KNOWN BUGS: some of the AX_... macros use undocumented AC_RUN_LOG() which does not exist anymore in the latest release. This must be fixed (diff commandline flag checks and a few others suffer from this) * makefiles: 'make dist' will not work from a VPATH-style setup (see automake/autoconf manuals for what VPATH is). 'make distcheck' does not fail on this, but you can see it happen when running this check. * makefiles: 'make check' target now does something VERY useful: it runs all the megatest tests (except Neural Net) as separate tests. See tests/Makefile.am for the individual make targets: use these to test only a subset of the available tests -- very handy when debugging a certain bit of code or classifier. This is another step towards a complete 'standardized' make/conf setup: ./bootstrap <-- for maintainers only! ./configure make make check make install make clean and for maintainers the extra: make distcheck make dist make distclean * KNOWN BUG: 'make distcheck' does not really like my 'alternative' .tar.gz distro archive naming scheme, so it barfs after running most of the tests. This needs to be fixed. * Win32: KNOWN BUG: syscall() is b0rked: generally it'll work, but there are certain spurious conditions related to the OS and environment which can cause the current code to hang forever. This should be fixed by moving to the new fully async hnadling scheme -- not ready for release that yet. :-( * makefiles: if you want a verbose 'make check', i.e. when you want to debug the scripts, do so using the environment variable: CRM114_MAKE_SCRIPTS_DEBUG=1; export CRM114_MAKE_SCRIPTS_DEBUG; * makefiles/tests: a new testscript.sh[.in] test driver has been added. This driver is used to run the individual tests; several specialized postprocessing filter scripts (using SED and AWK) having been added too, so results of a test can be validated in a flexible way. For example, see this classifier test which takes certain floating point issues into account upon auto- comparison/validation: Support_Vector_Machine_Unigram_test1: $(TEST_PREREQUISITES) test_timing_seconds_announcement $(E) "*******************************************" $(E) "* Support Vector Machine (SVM) unigram classifier " $(E) "*******************************************" [...] $(SILENT)\ CRM114_CHECK_FILTER_ARGS="-v prob=0.1 -v pR=0.2"; export CRM114_CHECK_FILTER_ARGS; \ ./testscript.sh '-{ isolate (:s:); {classify < svm unigram unique > ( i_test.css | q_test.css | i_vs_q_test.css ) (:s:) /[[:graph:]]+/ /0 0 100 1e-3 1 0.5 1 1/ [:_dw:] ; output / type I \n:*:s:\n/} alius { output / type Q \n:*:s:\n/ } }' \ $(srcdir)/mt_ng_Support_Vector_Machine_SVM_unigram_1.input \ $(refdir)/Support_Vector_Machine_Unigram_test1.step4.refoutput \ $(builddir)/crm114_classify.filter.sh [...] Note the CRM114_CHECK_FILTER_ARGS environment variable in there which sets custom AWK variables to configure the custom AWK/SH postprocess filter script 'crm114_classify.filter.sh' to tolerate a certain spread in 'prob' probability and 'pR' values. See also the 'crm114_classify.filter.awk' AWK script which does the postprocessing. The concept ----------- testscript.sh is the main driver: it starts CRM114 with the specified script (either file or 'literal' '-{...}' script code; feed CRM114 any specified input file/literal text over stdin and store the stdout and stderr output in temporary files, which are merged with the CRM114 exit code into a template file, which is fed, together with its equivalent reference file from the tests/ref/ directory, to the postprocessing section: this is either a custom script wich compares both files, or standard UNIX diff. Any differences found are reported to the console and will cause cause the test to 'FAIL'. A match with the reference file will produce and 'OK' instead. The only custom postprocessing/diff script at this date which accepts extra parameters is crm114_classify.filter.sh, which gets its extra args through aforementioned environment variable. The other custom scripts (tests/ref/*.filter.sh) serve particular purposes and don't require parameterization. Want to re-generate the master 'known good' files? .................................................. Simple. cd tests && rm -rf ref/ and you got rid of them all. Next, run make check as usual and you'll get a warning for every missing reference file, which will be automatically generated. Though, of course, I'd rather see you just 'rm -f'-d only those ref/ files that need updating... * TODO: tests/Makefile.am cleanup: now some of those lines are _extremely_ large; these may be broken into readable bits. * Removed old crufftiness in classifiers: i = 0; while (htext[i] < 0x021) i++; CRM_ASSERT(i < hlen); j = i; while (htext[j] >= 0x021) j++; Now only the new crm_nextword() repalacement calls remain in the code. * Code Cleanup: checked all heap allocation calls and made sure return values are checked for undesirable NULL pointers. This should later be migrated to a CRM114 equivalent of GNUlib xmalloc() et al - but this time with the option to pass along a customized error message to provide the user with more information when such a thing happens. + crm_nextword() calls are now all properly checked for their success/fail return code, instead of the previous situation: mostly no check at all, sometimes on return code, sometimes a check on the side effect that the returned length will be zero. Some of the superfluous length checks still remain, e.g. if (crm_nextword(htext, hlen, fn_start_here, &fnstart, &fnlen) && fnlen > 0) { where this suffices completely and utterly: if (crm_nextword(htext, hlen, fn_start_here, &fnstart, &fnlen)) { as crm_nextword() will not return zero-length tokens as 'valid' items. * Code Cleanup / Security Fixes for potential out-of-bounds accesses: - crm_nextword()can always report there's not more tokens to have (fail return code); several spots in the code did not explicitly take care of this situation, some obviouslt assuming 'nothing can go wrong here'. Right. The complete code has been inspected for this flaw and when found, it has been fixed. - This three line pattern which appears throughout the code is a potential security hazard: crm_get_pgm_arg(htext, htext_maxlen, apb->p1start, apb->p1len); hlen = apb->p1len; hlen = crm_nexpandvar(htext, hlen, htext_maxlen); as the actual produce of crm_get_pgm_arg() does not HAVE to be apb->p1len (4th arg) characters. This can happen when working with very large strings which do not fit in MAX_PATTERN, MAX_FILE_NAME_LEN or other pattern buffer sizes used in the code. Unfortunately crm_get_pgm_arg() was a void function so the caller would not hear about the actual length of the buffer returned by this routine. Hence the call now has an 'int' return type and returns the actual number of characters written to the pattern buffer. This means the subsequent crm_nexpandvar() call does not run the risk anymore of processing to much input which it must believe is legal data. The new coding pattern looks like this: hlen = crm_get_pgm_arg(htext, htext_maxlen, apb->p1start, apb->p1len); hlen = crm_nexpandvar(htext, hlen, htext_maxlen); and has been applied throughout the code: all source code has been reviewed for this issue. The same goes for the hlen = crm_get_pgm_arg(htext, htext_maxlen, apb->p1start, apb->p1len); hlen = crm_restrictvar(... sequence, by the way... - A related issue is that callers of crm_nexpandvar() quite often write an extra NUL byte into the buffer at - matching the sample shown above: htext[hlen] = 0; This happens either directly (shortly afterwards in the code) or indirectly (through one or more crm_nextword() calls which point at a part of the buffer, after which the NUL byte is written at the length position returned by the crm_nextword() call. Both situations will cause an out-of-bounds write operation, which will corrupt your data/stack, when working with very large strings and variable names. I'm sure variable names of ~ 2046 characters (excluding the two ':' delimiting colons) can be made to cause this issue; of course it not 'average day use' but security risks are by definition situations which MAY cause trouble, and here we have one such situation. The code has been reviewed for this issue and a multitude of places were this applies were located. Due to the amount of effort required to fix this in the callers itself, it is now officially assumed that crm_nexpandvar() will be used in situations where the returned length will be used by the caller to write an _extra_ sentinel byte to the buffer. To ensure that the caller can do so legally, crm_nexpandvar() will AT ALL TIMES return a length of no more than the maximum buffer width passed to it MINUS ONE. This rule applies even under the harshest error conditions inside crm_expandvar() That means that this call hlen = crm_nexpandvar(htext, hlen, MAX_PATTERN); will never fail this subsequent assertion: CRM_ASSERT(hlen >= 0 && hlen <= MAX_PATTERN-1); so that a subsequent line like this: htext[hlen] = 0; is LEGAL from now on. The impact is that variable names and other text segments are now limited to one character less in size, but at 2-16K limits this is very acceptable. ALSO, several code sections calling crm_expandvar() assumed it would write such a NUL sentinel itself; to prevent other odd behaviour inside the expandvar() code, this is not more so anymore; besides, the old code did not do it either under erroneous conditions, so there was no guarantee at all the NUL sentinal would be there when the crm_expandvar() call was done. The complete source code has been inspected for these issues and all locations have been fixed/updated to show this behaviour consistently. THE ONLY THING we did NOT do EVERYWHERE is add that NUL-sentinel anyhow: only the code sections that seem to require this NUL sentinel have been updated, if the sentinel wasn't yet written by the calling code, due to the incorrect assumption mentioned in the previous paragraph. - Multiple locations where script variables and buffer were copied/loaded, no length check was performed, so invalid results would be generated for extremely large variable names, which are longer than the compile-time defined sizes. Now many places in the code have added boundary checking and additional error messages to report this type of situation back to the user. TODO: inspect the complete source code for this issue and fix accordingly. Now several boundaries seem to be silently applied: no crash, indeed but still behaviour based on partial (stripped) inputs and no failure notices for the user. * 'crm114 -v' now also prints the revision number for proper version checking. * SCRIPT: fixed :@: integer printing: now integer are always printed without any trailing decimal part a la floating point values. * Significantly reduced the number of annoying warnings on Win32/64 by casting size_t returning functions to (int) where applicable. * The script compiler now adds a reference to the command definition record for each JIT/compiled command, so other code (debugger!) may reference this extra info. 2008-04-13 Ger Hobbelt * COLOPHON.txt fixed (Debian bug #471554: typo in COLOPHON.txt) * Vector Tokenizer changes to allow full custom 3D vector matrices; also refactored to permit the use of custom tokenizer functions, etc. 2008-03-31 Ger Hobbelt * Fixes in -err/-out handling; also fixed the 'output' script command: output [stderr] will _still_ go to the '-err filename' redirected stderr, just like it would when you'd called crm114 using crm114 .... 2> filename instead. * Fixes in the 'updated' crm114 parser/compiler to properly run mailreaver.crm et al again. * Added stricter checks (now checks at all ;-) ) in the compiler: decoded flags in element are checked against the predefined allowed flag set (in crm_compiler.c). If any flag is NOT in the allowed set, an error will be reported. This resulted in a significant update of the stmt_table[] flag set definitions for each command to match the rest of the code. QUICKREF.txt has been updated too in several places to mimic these. * mailreaver et al contained a few \: escapes in their trap handlers which shouldn't be there. Removed. * fixed the fault/trap handlers: now the offending source line itself is included in the error report. Yes, this is a kinda kludgy fix for another nasty issue: the line numbers reported by CRM14 are 'nearby' at best, due to the fact that the [errorhandling] code is not aware of the modifications performed by the preprocessor: line breaking, 'insert' processing, line merging ('\' continuation char at end of line pulls next line up). In fact, the code does NEVER track 'original source line/position/filename' info for the script sources. This also means that 'insert' will cause crm14 to report some really fancy linenumbers in the _main_ source file when the offending line is in the insert-ed code or follows that 'insert' statement in the main source file. This will not be fixed until we introduce 'sourceinfo' tracking code which 'remembers' where the source code came from on a per- char basis: filename/line/position -- remember the preprocessor does line breaking, triggered by comments, semicolons and the recognition of another 'token' outside []()//<> braces anyway! * Fixed a nonfatal OSB Bayes file access failure report: WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING The complete code should be inspected to (a) determine which errors are _really_ fatal, as the code is littered with fatalerror()s which at least look like they're not all that fatal anyhow: nothing a trap handler in the script can't handle. (b) nonfatalerror()s should probably abort the current command. An example for (b) which was changed in the code: when one of the CSS databases cannot be opened for some reason, the classify run should be aborted immediately, because that file will be lacking from the evaluation, thus - at the very least - inadvertentedly skewing the pR results to numbers which might lead to wrong conclusions. In this particular case, the good.css was missing, but the code continued on, was _lucky_ to find spam.css was also lacking, or spam.css would've been used to find the 'good guys' this way. This is not a severe issue for classify calls which include only 2 files - though the validation for this is also lacking, so you still get weird results at times - but it is harder to tolerate when the classify should run across multiple CSS files: if one is missing from the action for some reason, the 'pR' output cannot be trusted. Hence, such nonfatalerror() issues should abort the command by returning to caller immediately. GROT GROT GROT: Currently there's also been [almost] no checking if all memory and handles are correctly freed/released when aborting a command due to a nonfatalerror(). 2008-03-26 Ger Hobbelt * updated to Bill's latest release (20080326) * fixed a dumb mistake in crmregex_tre.c which I'd introduced myself :-( 2008-03-25 Ger Hobbelt * 'int'ified ('INTifada') crm114 sources - finally got fed up and killed the 'long' fetish. INTifada: 'long' fetish has been removed from the code: now everything is an 'int' like every good boy should, unless it specifically NEEDS to be something else. Process to redo this in a blunt way (I did it the hard way: partial code inspection): convert 'long long' to int64_t and unsigned long long to uint64_t find timing code and make sure those clock_t derivatives are 'clock_t' instead of long s/\/int/g * Added fwrite_ASCII_Cfied() routine to report variables, etc. to stderr/stdout, so non-ascii variables, mucked up scripts, etc. don't get a chance to blow up the console anymore. * Merged latest Neural Net code * Bugfixes in the crm language parser (variable hash storage) * Language parser could not cope with the line alius # line comment Nor would it handle # urgh \# alius correctly by processing - you wish - the 'alius' opcode in there. The new code can cope with series of comments and inter-command comments like these lines: # so? \# output # what? \#/checking for a foo.../#if we mingle comments with code? \# # like this? See alius_w_comment.crm for a few test cases. * Bugfix in 'file1 | file2 | file3' regexes in SKS and SVM: now correctly copes with megatest.sh's ./crm114 '-{window; learn ( i_test.css | q_test.css| i_vs_q_test.css ) < svm unigram unique > /[[:graph:]]+/ /0 0 100 1e-3 1 0.5 1 1/ }' Note that 'q_test.css|' in there: file2 will be decoded by the old regex, but no match for file3 as the relevant part of the old regex said: [[:space:]]+\\|[[:space:]]+([[:graph:]]+) Note the '+'-es in there following those [[:space:]] classes. 2008-03-24 Ger Hobbelt * Latest gcc versions recognize and handle trigraphs like you'd expect ;-) so all "???" sequences have been converted to "\?\?\?" to keep GCC and other compilers from unintentionally recognizing a trigraph there. * added second // slash arg to the CRM 'syscall' cvommand to specify alternative commands for other OS-es. This is intended to allow CRM114 and CRM script writers to cope with the absolutely NON- portable syscall operation: this is only portable across UNIX boxes - and then only to a degree. So the second slash arg has format: /[hoststring]commandline/ where 'hoststring' is the compile-time HOSTTYPE string, which, for example, for the Win32 builds is 'Windows-MS'. BTW: hoststring- comparison is caseINSENSITIVE. The square brackets are used to delimit the hoststring, so when the 'hoststring' matches the compile-time constant in your CRM114 binary, the 'commandline' will be used to replace the commandline as specified in the first // slash arg. See randomiotest.crm for an example. * Fixed OSBF/OSB/Markovian classify code when too many CSS databases to use for classification have been specified - this prevents an array-overrun & out-of-bounds memory access. * Got fed up with the minor differences between Win32 and UNIX megatest: created a copy in megatest_ng.sh, which should produce indentical results to megatest.sh, but uses input files (*.input) instead of the UNIX shell <<-EOF...EOF code. This is mirrored in the new megatest.bat, which can thus use the same input files; I've found that the difference between a LF or CRLF line termination byte sequence does lead to some significant differences in the test results. * Major fixes to the Win32 syscall command; now exectest.crm does actually run on Win32. Please note that UNIX commands still won't work allow of a sudden on a Win32 box - unless you provide the proper binaries and environment - but at least vanilla syscall, syscall and even syscall now work async expected: no crash or indefinite lockup anymore! Note that this still applies: "Sorry, syscall to a label isn't implemented in this version " * Adapted error reporting to match Bill's code. See the bill_style_errormessage constant in crm_[util_]errorhandlers.c - maybe we'll make this run-time configurable one day, instead of compile-time like it is today... Bill's format puts the code info at the end, instead of at the start, and looks like this: (runtime system location: crm_expr_file_io.c(151) in routine: crm_expr_input) * Win32 megatest now tests OK for these classifiers (run ( megatest.bat 2 2>&1 ) > megatest_win32.log in a DOS box to verify): **** Default (SBPH Markovian) classifier **** OSB Markovian classifier **** OSB Markov Unique classifier **** OSB Markov Chisquared Unique classifier **** OSBF Local Confidence (Fidelis) classifier **** OSB Winnow classifier **** Unigram Bayesian classifier **** unigram Winnow classifier **** OSB Hyperspace classifier <-- matches better when you define #define USE_FIXED_UNIQUE_MODE 0 /* zero! turn OFF to match Bill's! */ WARNING: there's a VT bug in the 20080324 wget copy of CRM114, which reflects on the knowngood: it is NOT good in that regard. The GerH builds includes the fixes, but will differ from knowngood until the base has been fixed too. **** OSB three-letter Hyperspace classifier <-- as above **** Unigram Hyperspace classifier <-- as above **** String Hyperspace classifier <-- as above **** String Unigram Hyperspace classifier <-- as above **** Vector 3-word-bag Hyperspace classifier <-- as above **** Bit-Entropy classifier <-- entropy numbers differ 0.00001 percent at most. Which might be considered excellent. **** Fast Substring Compression Match Classifier **** Bytewise Correlation classifier <--- turned out A-OK, when I finally discovered that fixing typo's etc in documents does NOT help megatest.sh to produce identical results. I'm a dunce. dubious: **** Bit-Entropy Toroid classifier <-- 10% difference in pR and slight difference in jump counts. Cause yet unknown. **** Support Vector Machine (SVM) unigram classifier <-- passes the test with 0.2% difference in pR tops, but yaks about some files in certain tests, due to tightened error checking in GerH. A 'hm'. **** Support Vector Machine (SVM) classifier <-- as above. Another 'hm'. **** String Kernel SVM (SKS) classifier **** String Kernel SVM (SKS) Unique classifier (maybe a bug in megatest_ng.sh ??? ) dangerous: **** Neural Network Classifier <-- crashes; feature counts which differ wildly between 64-bit UNIX, knowngood and Win32. **** Clump / Pmulc Test <-- if you like wierd crashes and code which shouldn't have worked in the first place... Meanwhile echo === TEST 014 === .\crm114 matchtest.crm < matchtest_mt_ng_1.input still needs some medical attention IMHO, as CRM114 on Win32 is much more 'picky' about whitespace than its UNIX brethren. Why? Don't know yet. But it shown in line leading and trailing whitespace sometimes. Probably due to the clunky whitespace skip loops in the CRM114 code. * UNIX / GNU make now comes with a working 'test' target, which actually uses the new megatest_ng.sh New Generation ;-) shell script. As before, a diff with Bill's knowngood is provided when done. so now you can make test like everybody else out there! * Included VT and other fixes reported in the CRM114 ML these last few days: Hyperspace featurecounts Fixed TRE interface/regex cache fixed (Paolo) -- I discarded my older fix, because I have not been able to guarantee that it would've worked, where Paolo's does - has been tested on a victim. * Updated and reformatted (indent cleanup) QUICKREF.txt. * QUICKREF.txt now includes preliminary documentation for the new CRM statements: cssanalyze (:report:) (:c1:) /params/ cssbackup (:report:) (:dst: :c1:) /params/ csscreate (:report:) (:c1:) /params/ cssdiff (:report:) (:c1: :c2:) /params/ cssinfo (:report:) (:c1:) /params/ cssmerge (:report:) (destfile) (:c1:...:cN:) /params/ cssrestore (:report:) (:c1: :src:) /params/ These are intended to obsolete the external css* tools in due time and should be able to handle each classifier (see the bottom of classifier sources for the preliminary hooks/code). 2008-01-28 Ger Hobbelt * (BillY) Added VT (Vector Tokenization) to the Hyperspace classifier. The others will follow its lead in this. * Included qsort implementation by Michael Tokarev (http://www.corpit.ru/mjt/qsort.html) for a bit of improved performance as suggested by Paolo. * Fixed Makefile.am so standard ./bootstrap script works as expected, including 'make distcheck': that one doesn't fail anymore. * Included the original BillY makefile in src/: it is minimally patched to make it behave as expected with the new GNU/autoconf enabled code in here. For those who like to compile this thing the old fashioned way on compatible systems, please run cd src ./make_vanilla_UNIX.sh ./make_vanilla_UNIX.sh megatest as a replacement for the old BillY make action: cd src make make megatest i.e. replace 'make' in Bill's documents with make_vanilla_UNIX.sh if you want to go down that road (you'll miss out on all the ./configure goodness though ;-) ). Note that a special configuration headerfile (src/config_vanilla_UNIX_sys_defaults.h) is provided to help make_BillY compile on compatible UNIX systems. This headerfile includes a toned down UNIX build configuration. Tested on SuSE Linux, but should (hopefully) build on other UNIXes as well. If any compile time errors occur here, please edit the src/config_vanilla_UNIX_sys_defaults.h headerfile according and please post the diffs/changes to the crm developer mailing list. WARNING: src/make_vanilla_UNIX.sh et al are provided as a 'backwards compatible build fix' ONLY and are NOT meant to replace the available ./configure- driven build/install available in this or future GerXXXX crm distro/builds! * Made initial provisions to merge the crm tools with crm itself: added the statements cssmerge / cssdiff / cssinfo / cssanalyze / cssbackup / cssrestore / csscreate. These statements should suffice to cover the functionality found in the cssutil / osbf_util / cssmerge / cssdiff tools. * Improved the indent.sh shell script and src/Makefile a bit; the included uncrustify.cfg config file is compatible with the latest uncrustify version today (jan/2008): v0.42 * KNOWN BUG: 'make reindent': uncrustify will not format the last few lines of crm_mjt_qsort.h in a suitable layout. The code will compile, but correct this one by hand to undo the damage there (until I find what's wrong with uncrustify :-( ). * KNOWN BUG for clump/pmulc still applies! :-( Version 20071015-BlameGerHobbelt 2007-08-03 Ger Hobbelt * revived the autoconfiscated crm114 build setup * fixed several items * added stuff for the Win32 port * various minor tweaks and fixes * improved internal error checking 2004-08-21 12:32 Joost van Baal * man/include.zmm.in: make ./configure's prefix available to manpages 2004-08-21 12:32 Joost van Baal * man/: cssutil.azm, cssmerge.azm: reverted some of Shalen's commits back: use include.zmm 2004-08-21 12:26 Joost van Baal * man/cssdiff.azm: reverted some of Shalen's commits back: use include.zmm 2004-08-21 12:22 Joost van Baal * man/crm.azm: reverted some of Shalen's commits back: use include.zmm. Refer to QUICKREF.txt in the right location 2004-08-20 02:32 schhabra * man/: crm.azm, cssdiff.azm, cssmerge.azm, cssutil.azm: Have Updated Man Pages at Thu Aug 19 20:31:38 EDT 2004 .Now Man Pages have Version 2. However could not find anything in doc/docs directory to update. Thanks Shalendra Chhabra (schhabra@cs.ucr.edu) 2004-08-19 13:34 Joost van Baal * NEWS: document changes in 20040816.BlameClockworkOrange-auto.3 2004-08-19 13:23 Joost van Baal * src/001_crm_main.c_missing_exit_code.patch: contributed patch from Johan Petersson, Message-Id: <6.1.1.1.2.20040818184005.054339b8@mail.trilithium.net>, To: crm114-general@lists.sourceforge.net, Date: Wed, 18 Aug 2004 18:52:12 +0200 2004-08-19 13:18 Joost van Baal * examples/Makefile.am: ship whitelist.mfp.example too, tnx Paolo 2004-08-19 13:10 Joost van Baal * man/crm114.azm: warning about possible obsoleteness 2004-08-19 13:10 Joost van Baal * README.1st: credits, note on status of ChangeLog 2004-08-19 11:36 Joost van Baal * Makefile.am: autogenerate ChangeLog 2004-08-19 11:33 Joost van Baal * NEWS: documented manpage stuff 2004-08-19 11:30 Joost van Baal * man/Makefile.am: extra manpage crm(1) added 2004-08-19 11:28 Joost van Baal * man/crm.azm: cosmetics, refer to crm114(1) 2004-08-19 11:24 Joost van Baal * man/crm.azm: another generic crm manpage; contributed by Shalendra Chhabra 2004-08-19 11:23 Joost van Baal * man/cssutil.azm: cosmetics, include example 2004-08-19 11:23 Joost van Baal * man/include.zmm.in: credits 2004-08-19 11:14 Joost van Baal * man/cssutil.azm: manpage update, contributed by Shalendra Chhabra 2004-08-19 11:12 Joost van Baal * man/cssmerge.azm: use include.zmm, cosmetics 2004-08-19 11:08 Joost van Baal * man/cssmerge.azm: manpage update, contributed by Shalendra Chhabra 2004-08-19 11:06 Joost van Baal * man/cssdiff.azm: use include.zmm, document version, add verbose example 2004-08-19 10:56 Joost van Baal * man/cssdiff.azm: manpage update, contributed by Shalendra Chhabra 2004-08-18 20:17 Joost van Baal * ChangeLog: generated by running "cvs2cl --prune --stdout -U ../CVSROOT/users" 2004-08-18 16:53 Joost van Baal * Makefile.am, NEWS, TODO: some stuff for next release 2004-08-18 15:52 Joost van Baal * NEWS, TODO, configure.ac: more hassle with new tests 2004-08-18 15:26 Joost van Baal * tests/Makefile.am: ship 5 more tests from Bill 2004-08-18 15:17 Joost van Baal * docs/Makefile.am: fix missing doc, thanks Ondrej 2004-08-18 11:44 Joost van Baal * README: install README in topsrcdir: autotools compliant 2004-08-18 11:29 Joost van Baal * NEWS: make sure intro in NEWS file is not too long: current version number should appear in the first few lines of this file 2004-08-18 11:21 Joost van Baal * bootstrap: we generate .bz2 out of the box 2004-08-18 11:20 Joost van Baal * Makefile.am: create a .bz2 compressed tarball too. check NEWS file for sanity 2004-08-18 11:03 Joost van Baal * src/Makefile.am: minor change in code layout: crm_css_maintenance 2004-08-18 10:50 Joost van Baal * README: make autotools stop whining 2004-08-18 10:26 Joost van Baal * README.1st: Bill has incorporated some of the splitting, we no longer have to do that 2004-08-18 10:20 Joost van Baal * NEWS: document missed releases 2004-08-18 10:10 Joost van Baal * README: i dont feel like maintaining my own fork of Bill's README file 2004-07-18 13:47 Joost van Baal * TODO: one more thing TODO 2004-07-18 13:46 Joost van Baal * configure.ac: added comment about pkg-config 2004-06-29 21:25 Joost van Baal * .cvsignore, Makefile.am, bootstrap, configure.ac, setversion: make sure date stamp files for inclusion in zoem manpage source are generated 2004-06-29 19:33 Joost van Baal * man/crm114.azm: make it work with zoem-04-108 and newer 2004-06-29 03:13 Raul Miller * configure.ac, src/Makefile.am: updates for BlameSeifkes 2004-04-21 05:37 Peter E. Popovich * NEWS, configure.ac, examples/Makefile.am, mailfilter/Makefile.am: updated for 2004018-BlameEasterBunny 2004-04-12 19:52 Raul Miller * src/Makefile.am: redoing commit, just to be sure... 2004-04-12 19:45 Raul Miller * src/crm114.c: remove files from cvs which are supplied (or were supplied) by upstream tarball 2004-04-12 19:43 Raul Miller * tests/.cvsignore: added .mfp to tests cvsignore, keeping with the grand tradition that nothing of any practical use actually goes into cvs. 2004-04-12 19:42 Raul Miller * src/Makefile.am, tests/.cvsignore, tests/testscript.sh: BlameMarys, with testscript.sh -- this time, for sure 2004-04-12 19:25 Raul Miller * src/Makefile.am: fix stupid bug in tests/testscript.sh 2004-04-12 19:20 Raul Miller * NEWS: updated NEWS for BlameMarys 2004-04-12 19:17 Raul Miller * configure.ac, src/Makefile.am: Updates for BlameMarys 2004-04-09 10:36 Joost van Baal * man/Makefile.am: note on zoem version requirement added 2004-04-09 10:34 Joost van Baal * bootstrap: added note on bzip2 target 2004-04-09 10:33 Joost van Baal * configure.ac: added cvs tags 2004-04-09 10:28 Joost van Baal * man/crm114.azm: made zoem >= 04-027 compliant 2004-04-08 21:51 Raul Miller * NEWS, configure.ac, src/Makefile.am: Initial updates for StPatrick (sources are now split, upstream) 2004-03-13 22:13 Raul Miller * configure.ac: fix version date 2004-03-13 21:58 Raul Miller * configure.ac: new autoconfiscated version 2004-03-10 20:19 Bill Yerazunis * src/crm114.c: First import by wsy - thank you for not making fun of this. 2004-02-25 20:07 Joost van Baal * NEWS: do not duplicate docs 2004-02-25 19:57 Joost van Baal * NEWS, configure.ac: new release: manpage finetuning 2004-02-25 19:53 Joost van Baal * man/crm114.azm: finetuning of spaces. thanks Seth Hanford 2004-02-25 10:41 Joost van Baal * man/: cssdiff.azm, cssmerge.azm: recent zoem is more picky on syn*opt syntax 2004-02-19 11:17 Joost van Baal * man/: crm114.azm, cssmerge.azm: now is zoem v 2003-300 compliant 2004-02-18 21:40 Joost van Baal * man/crm114.azm: added DESCRIPTION section, from first part of INTRO.txt 2004-02-18 15:44 Joost van Baal * man/crm114.azm: finished crm114(1) manpage 2004-02-18 14:29 Joost van Baal * man/crm114.azm: converted some more, valid zoem: zoem 03-265-1 knows how to typeset this 2004-02-18 13:17 Joost van Baal * man/: crm114.azm, include.zmm.in: converted a bit more from QUICKREF to zoem syntax. beware! current file will likely fail to be parsed by zoem. working on it 2004-02-16 20:11 Raul Miller * configure.ac: cleanup version number, install tests under doc/crm114/ instead of just doc// 2004-02-16 20:00 Raul Miller * tests/Makefile.am: use PACKAGE instead of PACKAGE_TARNAME at install time 2004-02-16 18:36 Raul Miller * src/Makefile.am: combined crm_classify.c and crm_learn.c into crm_features.c renamed crm_helpers.c as crm_errohandlers.c 2004-02-16 07:06 Joost van Baal * TODO: Stefan Seyfried's hack should be in docs 2004-02-15 16:35 Raul Miller * ChangeLog, NEWS, configure.ac, src/Makefile.am: Updated for Jetlag 2004-02-11 22:31 Joost van Baal * man/Makefile.am: added crm114.azm, in order to be able to build it while testing 2004-02-11 22:31 Joost van Baal * man/crm114.azm: converted some more of QUICKREF.txt 2004-02-11 08:38 Joost van Baal * man/crm114.azm: skeleton of crm114 manpage 2004-02-08 12:49 Joost van Baal * NEWS, README.1st: another link to more info 2004-02-08 11:10 Joost van Baal * TODO: found url of .spec 2004-02-07 21:53 Joost van Baal * examples/Makefile.am: oops, should have fixed this earlier: megatest.log is renamed 2004-02-07 21:50 Joost van Baal * NEWS, TODO, configure.ac: cosmetics on NEWS layout 2004-02-07 19:30 Joost van Baal * .cvsignore, NEWS, README: update copyright notice, ship a pathed README from Bill, Bill notes in our NEWS 2004-02-07 18:00 Joost van Baal * NEWS, man/Makefile.am: oops, we are shipping obsolete manpages 2004-02-07 14:07 Joost van Baal * NEWS, configure.ac: new upstream 2004-01-22 21:12 Joost van Baal * NEWS, configure.ac, mailfilter/Makefile.am: new upstream, new quirks 2004-01-14 19:17 Joost van Baal * ChangeLog, NEWS, README.1st, TODO, configure.ac, examples/.cvsignore, src/Makefile.am: support both gnu and tre regex lib 2004-01-11 13:29 Joost van Baal * configure.ac: inspiration from mutt configure.in 2004-01-11 13:23 Joost van Baal * NEWS, README.1st, TODO, configure.ac: did some research on how to enable multiple regex libs. offering just gnu and tre seems enough to me 2004-01-07 10:07 Joost van Baal * TODO: and more 2004-01-07 09:31 Joost van Baal * TODO: more stuff to do 2004-01-06 14:36 Joost van Baal * README.1st: updated 2004-01-06 14:26 Joost van Baal * NEWS, configure.ac: building from new release by Bill 2004-01-06 14:05 Joost van Baal * TODO, bootstrap: release a .tar.bz2 too 2004-01-01 21:52 Joost van Baal * NEWS: new release 2004-01-01 21:42 Joost van Baal * Makefile.am, NEWS, TODO, configure.ac, src/Makefile.am: flexible sh-bang path, symlink /usr/bin/crm 2004-01-01 20:09 Joost van Baal * Makefile.am, NEWS, configure.ac, man/Makefile.am: manpages get build and installed 2004-01-01 19:56 Joost van Baal * man/: cssdiff.azm, cssmerge.azm, cssutil.azm, include.zmm.in: added first shot at manpages 2003-12-30 18:46 Joost van Baal * TODO: another idea 2003-12-30 18:23 Joost van Baal * NEWS, TODO, configure.ac, src/Makefile.am: new split scheme, build from new release by Bill 2003-12-26 17:56 Joost van Baal * configure.ac: runtime check, eyecandy 2003-12-26 12:11 Joost van Baal * TODO: runs on RH 2003-12-23 23:13 Joost van Baal * NEWS, TODO, configure.ac, examples/Makefile.am, mailfilter/Makefile.am, tests/Makefile.am: preliminary support for expanding shebang path in .crm scripts 2003-12-22 22:51 Joost van Baal * examples/Makefile.am: typo 2003-12-22 22:50 Joost van Baal * NEWS, TODO, configure.ac, examples/Makefile.am: configure.ac rebuild and cleanup, proper location of .mfp files 2003-12-22 07:44 Joost van Baal * TODO: some ideas of Paolo merged in 2003-12-21 22:12 Joost van Baal * TODO, configure.ac: robust libtre check enabled 2003-12-21 20:55 Joost van Baal * NEWS, TODO, configure.ac: preparing next release, more TODO stuff 2003-12-21 20:46 Joost van Baal * configure.ac: we do not need c++! 2003-12-21 18:21 Joost van Baal * HACKING, NEWS, configure.ac: new release, new hacking instructions 2003-12-21 16:25 Joost van Baal * src/Makefile.am, configure.ac: paolo split: crm114.c splitted in per-function source files 2003-12-21 14:47 Joost van Baal * TODO: TODO file for autoconfiscation 2003-12-20 22:48 Joost van Baal * Makefile.am, NEWS, README.1st, configure.ac, examples/.cvsignore, mailfilter/Makefile.am, tests/Makefile.am: note on what this is about added 2003-12-20 20:39 Joost van Baal * Makefile.am, NEWS, configure.ac, docs/.cvsignore, docs/Makefile.am, mailfilter/.cvsignore, mailfilter/Makefile.am, tests/.cvsignore, tests/Makefile.am: restructuring tarball layout, phase 1 2003-12-19 22:24 Joost van Baal * NEWS, configure.ac, examples/Makefile.am: fixed typos: "crm114-20031219-RC12.3.tar.gz is ready for distribution" 2003-12-19 22:09 Joost van Baal * .cvsignore, Makefile.am, configure.ac, examples/.cvsignore, examples/Makefile.am, src/.cvsignore, src/Makefile.am: new layout of tarball: split stuff among directories 2003-12-19 19:10 Joost van Baal * .cvsignore, Makefile.am, NEWS, configure.ac: now installs some docs 2003-12-19 18:16 Joost van Baal * crm114.lsm.in: description fixed 2003-12-19 17:38 Joost van Baal * configure.ac: new release: version updated 2003-12-19 17:29 Joost van Baal * .cvsignore, AUTHORS, ChangeLog, HACKING, NEWS, bootstrap, configure.ac: make distcheck works 2003-12-19 17:11 Joost van Baal * HACKING, Makefile.am, bootstrap: start documenting how to work with this stuff, added Id tags 2003-12-19 17:04 Joost van Baal * Makefile.am, configure.ac, crm114.lsm.in: autoconfiscation skeleton 2003-12-19 17:02 Joost van Baal * bootstrap: Initial import. 2003-12-19 17:02 Joost van Baal * bootstrap, .cvsignore, HACKING, Makefile.am, NEWS, README, TODO, configure.ac, docs/.cvsignore, docs/Makefile.am, examples/.cvsignore, examples/Makefile.am, mailfilter/.cvsignore, mailfilter/Makefile.am, man/.cvsignore, man/Makefile.am, man/crm114.azm, man/cssdiff.azm, man/cssmerge.azm, man/cssutil.azm, man/include.zmm.in, src/.cvsignore, src/Makefile.am, tests/.cvsignore, tests/Makefile.am: Initial revision