HOME © 2012

Michael Thompson

DSPAM php Control Center

[ Main Index ] [ DSPAM php Control Center ] [ Control Center FAQ ] [ DSPAM Case Study ] [ Apache2 Configuration ] [ Reference Links ]

Just get gmail for your entire domain. It's free.

DSPAM is an incredibly discerning open-source spam removal utility available from the folks over at nuclearelephant.com. If your mail system's users are complaining about spam, it is worth a look.

This site has a php implementation of the dspam user interface. I wrote it because I wanted to customize and augment some of the features found in dspam.cgi but it was written in perl and perl hurts my eyes.

dspamCC aspires to be a simple spam quarantine control center suitable for 'Executive Use' with the following features:

  • has "more" style paging to help with long lists of spam.
  • handles messages quarantined by methods other than DSPAM such as procmail rules, simple black lists, spamassassin, etc.
  • resends false positives.
  • optionally displays MIME content of messages
  • supports message searching
  • sends "From: " contents to whitelist handler
  • forwards messages to designated address

Here is the latest version of the interface:

    Development
    Environment
    Version Source Screen Shot
    dspam v 2.10.3-6
    php v 4.3.4
    Apache v 2.0.48
    UNIX, System V
    (should be functional
    with dspam V3Alpha|Beta)
    Version 1.09 2004-05-14
    - added search function
    - optimized layout of controls
    [ Download ] Interface Screen Shot
    Interface Screen Shot
    Interface Screen Shot
    Version 1.08 2004-05-13
    - better handling of missing mbox file
    Version 1.07 2004-05-10
    - more accurate & descriptive paging
    Version 1.06 2004-05-06
    - eliminated dependance on register_globals=on
    Version 1.05 2004-05-05
    - added page delete
    Version 1.04 2004-04-27
    - more correct statistics
    Version 1.03 2004-04-25
    - better documentation
    Version 1.02 2004-04-21
    - bugfix in javascript fpChecked()
    - correct handling of messages quarantined by sources other than DSPAM such as procmail, spamasssasin, et al
    Version 1.01 2004-04-20
    - MIME handling uses PEAR::Mail/mimeDecode.php if available
    - more correct statistics
    Version 1.0 2004-04-19
    - production release
    - optional HTML content viewing in <iframe>
    - added accuracy calculations
    2004-04-18
    - handle nonDSPAM messages in the quarantine
    - slightly better "view" mode
    Interface Screen Shot
    2004-04-13
    - implemented '--domain-scale' handling
    2004-04-10
    - whitelisting
    - added graphics
    dspam v 2.8.3
    php v 4.3.4
    Apache v 2.0.48
    UNIX, System V
    2004-04-09
    - better handling of envelope headers in forwarded mail
    Interface Screen
Shot
    2004-04-06
    - Bug fix for Subject line
    - php version issues
    2004-03-22
    - better handling of dspam configuration options
    dspam v 2.8
    php v 4.3.2
    Apache v 2.0.48
    UNIX, System V
    29-Feb-2004
    EXPERIMENTAL
    Interface Screen Shot


FAQ

  1. The installation includes dspam.php and dspam.cgi. Which one should I use?

    If your web server is running php as an Apache2 module, use dspam.php otherwise, use dspam.cgi. If you use dspam.cgi, you will probably need to change the first line of dspam.cgi to match your installation.

  2. All I get is 500 Internal Server Error. What should I do?

    Try using dspam.php instead of dspam.cgi

  3. I see an error in my server logs that says "/usr/local/apache2/htdocs/bin.dspam/php not found." What should I do?

    Change the first line of dspam.cgi to match your installation of the php binary.

  4. All I get is a short message "did you authenticate?" What should I do?

    Make sure to setup the .htaccess file or httpd.conf so that authentication is required to access the Control Center. It assumes that user authentication will be handled by the web server. It attempts to find the user name in any one of these three php environmental variables:

      SERVER["REMOTE_USER"]
      SERVER["REDIRECT_REMOTE_USER"]
      SERVER["PHP_AUTH_USER"] 
      

  5. All I get is a short message "/etc/mail/dspam/user/user.mbox could not be read" What should I do?

    Check the ownership and permission settings on the file and the directories in the path leading to the file. (see Configuring Apache2)

  6. I get an error message "could not lock /etc/mail/dspam/user.mbox" What should I do?

    File locking relies on the php function flock() which does not work on NFS and many other networked file systems. Check your operating system documentation for more details. On some operating systems flock() is implemented at the process level. When using a multithreaded server API like ISAPI you may not be able to rely on flock() to protect files against other PHP scripts running in parallel threads of the same server instance! flock() is not supported on antiquated filesystems like FAT and its derivates and will therefore always return FALSE under these environments (this is especially true for Windows 98 users). You can disable file locking in inc/config.inc.php like this:

      $bLock=false;
      

  7. I want to be able to see the HTML versions of the messages in the quarantine. How do I do that?

    This feature is not enabled by default. In inc/config.inc.php set these variables accordingly and a "Display HTML" link will appear at the top of the screen when a message is being viewed:

      $dirWebTMP = "tmp/"; # must exist and be writable (see Configuring Apache2)
      #$bDefangPolicy=true; # default is to defang messages
      $bDefangPolicy=false; 
      

  8. Can I add other messages to the quarantine besides those from DSPAM?

    Yes. You can use procmail rules, spamassassin, etc. As long as the program using the quarantine writes the messages in Unix mail format the Control Center will work.

  9. What configuration changes do I need to make to my mailer?

    Use dspam as the local delivery agent and dspam must be built without --enable-delivery-to-stdout and --enable-spam-delivery either of which would prevent a spam quarantine from being created.

    e-mail aliases are used to handle false-positives, spam reporting and whitelisting. Here are some example alias definitions suitable for sendmail given a username "tammy":

      spammy-tammy: "|'/usr/local/bin/dspam' --user 'tammy' --addspam"
      hammy-tammy: "|'/usr/local/bin/dspam' --user 'tammy' --falsepositive"
      family-tammy: "|'/usr/local/bin/handleWhitelist.sh' 'tammy'"
      
    In the file inc/config.inc.php the correct prefixes for the aliases as defined above would be:
      $sp_prefix="spammy-"; 
      $fp_prefix="hammy-";
      $wl_prefix="family-";
      

  10. We see messages in our webserver error log like this "PHP Notice: Undefined variable: ..."

    Those are informational messages. Try using the default settings for error reporting in /usr/local/lib/php.ini or change it to something like this:

      error_reporting = E_ALL & ~E_NOTICE
      

  11. My question is not answered here. Would someone help us get this thing working?

    Sure. e-mail dspam@michaelthompson.org


DSPAM Case Study

We are a privately held multinational manufacturer. On an average day we quarantine around 30,000 junk messages with a combination of spamassassin and DSPAM, with DSPAM being responsible for quarantining about 15% of the total each day. We trained the DSPAM filter for about one month prior to putting it into production use. For the first three days of production use we had to monitor the results closely for false positives but within five days false positives were no longer an issue.1

We have about 550 e-mail accounts handled by a Linux server running postfix and we were filtering exclusively with spamassassin. Just after the beginning of the year (2004) we received a complaint regarding a single mail message from a vendor to a high-level executive that had been blocked by spamassassin. We found that the vendor's message was not only sent from an advertising-supported free e-mail service but also was delivered by a black-listed server. In spite of the obvious considerations, the repercussions of this one incident became so dramatized that we were forced to remove the blacklisting from our spamassassin filtering scheme altogether.

At about the same time other high-level executives began complaining about increasing levels of spam. The worst cases were those whose addresses had been published on our company's website since it went on-line in the late 90's and those whose jobs required them to make frequent purchases on-line. The number of spam messages received by these individuals ranged from about a dozen per day in one case to over 200 per day in the worst case we found. In many cases we found that the spammers had designed their messages such that they would just pass under spamassassin's radar. We experienced a surge in messages that included random word lists and obfuscated spelling designed to fool spamassassin's style of filtering, ie V\@gra and the like.

We noticed a slashdot report about DSPAM and decided to test it. We installed it on a UNIX System V server running sendmail and configured it for one user. DSPAM performed so well that after just a few days of testing we were confident enough to announce to the corporation that we had begun training an intelligent e-mail filter and would be ready to put it into production use within one month.

Our initial DSPAM filter training procedure consisted of a number of elements:

  • First we exposed it to all the messages for a single user who used the Control Center to report false positives. One stunning result was that less than 50 false positives were encountered in over 3000 messages received during the initial month of training.
  • Another element of training involved creating half a dozen "honeypot" addresses and feeding any messages sent to those addresses directly to DSPAM on the test server as spam like this:
      /usr/local/bin/dspam --user 'michael' --corpus --addspam
  • On several occasions we forwarded selected groups of around two hundred messages quarantined by spamassassin from the production server to DSPAM (with the spamassassin tags removed) using the same technique shown above.
  • We also trained the DSPAM filter using messages that had been passed by spamassassin but had been manually identified as spam by selected recipients.
  • Selected users "Blind copied" their outgoing messages to DSPAM as "ham" like this:
      /usr/local/bin/dspam --user 'michael' --corpus

    At the end of one month of training we had accumulated a mySQL database containing over 800,000 tokens (~104Meg) which we copied from the test server to the production server. We created a "dspam" user account on the production server and updated the database with the corresponding user ID. To put the DSPAM filter into production we populated the user directories with a procmail recipe based on the template below. The template shown here has been obfuscated a little bit regarding directory names and aliases. Some customization and all the logging has been removed. In this example the aliases are mapped as follows:

      spam: the corporate spam quarantine account
      learn-ham: /usr/local/bin/dspam --user dspam --corpus
      learn-spam: /usr/local/bin/dspam --user dspam --corpus --addspam
      
      # USER_NAME_HERE's procmail rules
      #
      #
      SYSDIR=/a_system/directory/we_use/
      USER=USER_NAME_HERE
      HOME=/home/$USER
      LOCFILE=$HOME/spamc.lock
      
      :0
      * ! ^X-AFLAGWEUSE: 
      {
        formail -A"X-AFLAGWEUSE: procmail"
      
        # handle certain virus signatures.
        :0B
        * ! $ ? /usr/local/bin/php $SYSDIR/bin/fmail.php
        ! spam
      
        FROM=`$SYSDIR/bin/getFrom.sh`
      
        # not if "From:" is on the system whitelist
        :0
        * ! $ ? echo "$FROM" | egrep -i -f $SYSDIR/white.lis 2>/dev/null
        {
          # not if "From:" is on the user whitelist
          :0
          * ! $ ? echo "$FROM" | egrep -i -f $HOME/white.lis 2>/dev/null
          {
            # check with spamassassin
            DROPPRIVS=yes
            :0 fw: $LOCFILE
            * < 256000
            | spamc 
      
            # this section will not be entered unless spamc failed
            :0
            * ! ^X-Spam-Status
            * < 256000
            {
               # spamd process failure detected
              :0 fw: $LOCFILE
              | spamassassin -x
            }
      
            :0
            * ^X-Spam-Status: Yes
            {
              # well, if it is this spammy let dspam "learn" it
              :0
              * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
              {
                # remove spamassassin markup
                :0 fwh
                | spamassassin -x -d
      
                :0 c
                ! learn-spam
              }
      
              :0
              ! spam
            }
      
            
            # check for messages whitelisted by spamassasin
            # and pass them around the dspam filter
            SAS=`formail -cx"X-Spam-Status: "`
            :0
            * ! $ ? echo "$SAS" | egrep "WHITELIST"
            {
              # ok, let dspam have a look at it
              :0 fw: $LOCFILE
              |dspam --user dspam
       
              :0
              * ^X-DSPAM-Result: Spam
              {
                :0
                ! spam
              }
            }
          }
        }
      
        :0
        * ! ^X-Spam-Status:
        * ! ^X-DSPAM-Result:
        * ! $ ? echo "$FROM" | egrep -i -f /home/$USER/white.lis 2>/dev/null
        {
          # it must have been system white-listed so presumably 
          # it is the kind of message we almost always want to pass
          # so let dspam learn it as "ham"
          :0 c
          ! learn-ham
        }
      
        # deliver it now 
        :0
        ! "$@"
      }
      

    Consideration of the procmail recipe and the discussion above demonstrates several ongoing methods for training the DSPAM filter automatically.

  • We instruct it to learn as "ham" any messages from senders found on the system white list. The system white list contains only the domains or addresses of customers, partners and vendors. Presumably any messages we receive in the future that are like the ones we typically receive from these known sources will be the kinds of messages we want to deliver.
  • We continue to feed it as "ham" outgoing messages from selected senders.
  • We continue to feed it as "spam" any messages received by the honeypot addresses. We think this will help our filter more quickly adapt to the latest spammer techniques.
  • We instruct it to learn as "spam" any messages that were scored very high by a (nearly) default installation of spamassassin.

    Here are some things we avoid doing to protect our corporate filter's "intelligence" (Had we decided to implement DSPAM on a per user level they would not be necessary):

  • We do not expose the DSPAM filter to messages from senders on the user white lists. One person's treasure can be another person's trash. The user white lists handle those cases where someone actually wants to receive the latest airline specials or certain advertisements from some companies. A future version of the Control Center will include completely user administrated whitelisting.
  • We do not expose the DSPAM filter to messages from senders on the spamassassin white lists. The senders placed on the spamassassin white lists are typically subscribed lists, trade journals, local news feeds, etc. that certain groups of people want to receive. We have to deliver these messages to those groups that want them but we don't want DSPAM to learn them as "ham."

    As our experimental Control Center is nearly ready for "executive use", the next phase of our implementation will be to provide user administered spam quarantines and whitelists.

    The greatest vulnerability of our system is the reliance on whitelisting as the "From: " field of a message is easily faked. By auditing our results on an ongoing basis we hope to be able to fend off those kinds of attempts with countermeasures tailored to specific instances.


    [1]DSPAM seems to learns from false positives very quickly but we think the use of whitelisting greatly reduced the number of false positives we had to deal with at the beginning of production use. We created an e-mail alias such that users could forward a message to it and the system would extract the "From: " plus the original sender of the forwarded message which would then be added to the user's personal whitelist.


  • Configuring Apache2

    This is an example of a working Apache configuration such that dspam.cgi runs as the user dspam on a virtual host like dspam.yourdomain.com.

    The assumptions are that dspam is a valid user on the webserver's system, that user mailbox files will be will belong to the "dspam" group and that the group access is "rw" i.e.

    -rw-rw----    1 michael  dspam          4386 Mar 27 11:06 /etc/mail/dspam/michael/michael.mbox
    

      <VirtualHost *> ################################################################ # # dspam.yourdomain.com # the Dspam virtual host - it runs as user dspam:dspam # # ################################################################ # to make this work Apache has to be configured so that # --with-suexec-docroot=[dir] is at or above the value # of DocumentRoot [dir] in the directory heirarchy. # In this example we are using /usr/local/apache2/htdocs and # /usr/local/apache2/htdocs/dspam respectively. # # suexec's verification step for being "in the document root" is # a simple strncmp using the length of the value given to # -with-suexec-docroot= so symbolic links will NEVER work. # # This is the pertinent portion of the Apache build: # # --with-suexec-bin=/usr/local/apache2/bin/suexec # --with-suexec-caller=nobody # --with-suexec-userdir=public_html # --with-suexec-docroot=/usr/local/apache2/htdocs # --with-suexec-uidmin=100 # --with-suexec-gidmin=100 # --with-suexec-log=/usr/local/apache2/logs/suexec_log # --with-suexec-safe-path=/usr/local/bin:/usr/bin:/bin # # The usernames for web server authentication need to match # the user names for mailbox owners. # # Finally, there has to be a copy of the entire php binary owned # by dspam residing in /usr/local/apache2/htdocs/bin.dspam/ # which is referenced in the first line of dspam.cgi like this: # # in dspam.cgi: # #!/usr/local/apache2/htdocs/bin.dspam/php # <?php # ... # ################################################################ ServerName dspam.yourdomain.com DocumentRoot /usr/local/apache2/htdocs/dspam SuexecUserGroup dspam dspam Alias /images/ "/usr/local/apache2/htdocs/dspam/images/" <Directory "/usr/local/apache2/htdocs/dspam"> Options -Indexes Options ExecCGI </Directory> <Files dspam.cgi> Order deny,allow Deny from all AuthType Basic AuthName "DSPAM Control Center" AuthUserFile /some/where/auth/upass Require valid-user Satisfy Any </Files> </VirtualHost>

    Reference Links
  • DSPAM Project Home Page
  • RFC 822 Standard for the format of ARPA Internet text messages
  • Procmail Quick Reference Guide
  • SuExec Support for Apache 2
  • This project at Freshmeat


  • HOME © 2012 Michael Thompson