[Zope-dev] zope.sendmail grantma-retryfixes branch review

Marius Gedminas mgedmin at b4net.lt
Wed Mar 19 17:32:35 EDT 2008


I'm starting a new thread, because the original one got hijacked by the
doctest/unittest flamefest.

I appear to have unsubscribed from zope3-checkins a while ago, which is
why I'm posting here.

Disclaimer: I'm not very good at doing code reviews.  I tend to mention
every little detail that could've been improved and forget to praise
everything that's already perfect.

There are two checkins on the branch.

=========
rev 84645
=========
http://svn.zope.org/zope.sendmail/branches/grantma-retryfixes/?rev=84645&view=rev

Matthew, Subversion thinks the two PDF files you added are plain-text
(svn:eol-style is "native", and svn diff shows the actual contents of
the PDFs, which isn't very useful to human readers).  I would suggest

  svn propdel svn:eol-style *.pdf
  svn propset svn:mime-type application/pdf *.pdf

Also, the Quue_*.pdf seems to have a typo in the file name.

At first sight those diagrams look scary and complicated.  I'd look into
ways to express their content in a text file with plain-text
explanations and ASCII-art diagrams.

=========
rev 84551
=========
http://svn.zope.org/zope.sendmail/branches/grantma-retryfixes/?rev=84551&view=rev

> Index: src/zope/sendmail/configure.zcml
> ===================================================================
> --- src/zope/sendmail/configure.zcml	(revision 84550)
> +++ src/zope/sendmail/configure.zcml	(revision 84551)
> @@ -14,11 +14,18 @@
>  
>    <!--
>      To send mail, uncomment the following directive and be sure to
> -    create the queue directory.
> +    create the queue directory. 
> +    
> +    Other parameters sepcify the polling interval, and the retry 

"sepcify"?

> +    interval used when a temporary failure is detected.  Lock links
> +    can also be cleared on server start.
>  
>    <mail:queuedDelivery permission="zope.SendMail"
>                        queuePath="./queue"
> -                      mailer="smtp" />
> +                      retryInterval="300"
> +                      pollingInterval="3000"
> +	              cleanLockLinks="False"
> +		      mailer="smtp" />
>    -->

Please avoid literal tab characters in source files.

> Index: src/zope/sendmail/zcml.py
> ===================================================================
> --- src/zope/sendmail/zcml.py	(revision 84550)
> +++ src/zope/sendmail/zcml.py	(revision 84551)
> @@ -70,7 +70,35 @@ class IQueuedDeliveryDirective(IDelivery
>          description=u"Defines the path for the queue directory.",
>          required=True)
>  
> -def queuedDelivery(_context, permission, queuePath, mailer, name="Mail"):
> +    pollingInterval = Int(
> +        title=u'Polling Interval',
> +        description=u'Defines the polling interval for queue processing in'
> +                    ' milliseconds',
> +        default=3000,
> +        required=False)
...
> -        thread = QueueProcessorThread()
> +        thread = QueueProcessorThread(interval=pollingInterval/1000,

You're using integer division here and losing sub-second precision.  If you
don't want that, write interval=pollingInterval/1000.0.

> +                                      retry_interval=retryInterval,
> +                                      clean_lock_links=cleanLockLinks)
>          thread.setMailer(mailerObject)
>          thread.setQueuePath(queuePath)
>          thread.start()
 
> Index: src/zope/sendmail/mailer.py
> ===================================================================
> --- src/zope/sendmail/mailer.py	(revision 84550)
> +++ src/zope/sendmail/mailer.py	(revision 84551)
...
> @@ -41,22 +49,105 @@ class SMTPMailer(object):
>          self.password = password
>          self.force_tls = force_tls
>          self.no_tls = no_tls
> +        self.logger = None
>  
> -    def send(self, fromaddr, toaddrs, message):
> -        connection = self.smtp(self.hostname, str(self.port))
> +    def set_logger(self, logger):
> +        if not isinstance(logger, logging.Logger):
> +            raise DoesNotImplement('Invalid logger given')

I don't think DoesNotImplement is right here -- there are no interfaces
you're looking for.  Better use the built-in TypeError.

But is the type check necessary at all?  The Pythonic thing is to rely on duck
typing.

> +        self.logger = logger
> +
> +    def _log(self, log_level, *args):
> +        if self.logger is None:
> +            return
> +        # This is not elegant, but it can be fixed later

In my experience "later" usually doesn't happen.  Why not fix it now?

> +        if log_level == 'debug':
> +            self.logger.debug(*args)
> +        elif log_level =='info':
> +            self.logger.info(*args)
> +        elif log_level =='error':
> +            self.logger.error(*args)
> +        elif log_level =='warning':
> +            self.logger.warning(*args)
> +

Just use
    
           self.logger.log(log_level, *args)

and logging.DEBUG/logging.INFO/logging.ERROR/logging.WARNING as the
values for log_level.

> +    def _handle_smtp_error(self,
> +                           smtp_code,
> +                           smtp_error,
> +                           fromaddr,
> +                           toaddrs,
> +                           message_id):
> +        """Process results of an SMTP error
> +           returns True to indicate break needed"""

The standard docstring convention is to do this:

           """Process results of an SMTP error.

           Returns True to indicate break needed.
           """

One sentence on the first line, then an empty line, then additional
paragraphs after that.  The closing """ on a separate line.  All
sentences terminated by periods.

> +    def send(self, fromaddr, toaddrs, message, message_id):
> +        try:
> +            connection = self.smtp(self.hostname, str(self.port))
> +        except socket.error, e:
> +            self._log('info',
> +                "%s - SMTP server %s:%s - could not connect(),"
> +                " %s.",
> +                message_id,
> +                self.hostname,
> +                str(self.port),
> +                str(e),)
> +            # temporary failure, go and sleep for 
> +            # retry_interval
> +            raise MailerTemporaryFailureException()
>  
>          # send EHLO
>          code, response = connection.ehlo()
>          if code < 200 or code >= 300:
>              code, response = connection.helo()
>              if code < 200 or code >= 300:
> -                raise RuntimeError('Error sending HELO to the SMTP server '
> -                                   '(code=%s, response=%s)' % (code, response))
> +                self._log('warning',
> +                          SMTP_ERR_MEDIUM_LOG_MSG,
> +                          message_id,
> +                          code,
> +                          str(response),
> +                          'error sending HELO')
> +                raise MailerTemporaryFailureException()
>  
>          # encryption support
> -        have_tls =  connection.has_extn('starttls')
> +        have_tls = connection.has_extn('starttls')
>          if not have_tls and self.force_tls:
> -            raise RuntimeError('TLS is not available but TLS is required')
> +            error_str = 'TLS is not available but TLS is required'
> +            self._log('warning',
> +                      SMTP_ERR_SERVER_LOG_MSG,
> +                      message_id,
> +                      self.hostname,
> +                      self.port,
> +                      error_str)
> +            raise MaileriTemporaryFailure(error_str)

Typo: "Mailer*i*TemporaryFailure".  Also an indication that this code
branch is not tested, otherwise the typo would've been caught.

> +        except smtplib.SMTPRecipientsRefused, e:
> +            # This exception is raised because no recipients
> +            # were acceptable - lets take the most common error
> +            # code and proceed with that
> +            freq = {}
> +            for result in e.recipients.values():
> +                if freq.has_key(result):
> +                    freq[result] += 1
> +                else:
> +                    freq[result] = 1
> +            max_ = 0
> +            for result in freq.keys():
> +                if freq[result] > max_:
> +                    most_common = result
> +                    max_ = freq[result]
> +            (smtp_code, smtp_error) = most_common

Please extract this bit into a separate function.  The send method is
already way too long and complicated.

Once it's extracted an tested with an isolated test, you can refactor
it to something like

    freq = collections.defaultdict(int)
    for smtp_code, smtp_error in e.recipients.values():
        freq[smtp_code, smtp_error] += 1
    smtp_code, smtp_error = max(freq, key=freq.get)

Oh, wait, that only works on Python 2.5.  Scratch the refactoring.
    

> +        # Log ANY errors
> +        if send_errors is not None:

If an exception happens in this bit above:

> +        try:
> +            send_errors = connection.sendmail(fromaddr, toaddrs, message)
> +        except smtplib.SMTPSenderRefused, e:

then send_errors will be undefined and you'll get a NameError.
Currently this is mostly not a problem, because almost all except
clauses for that try re-raise a different exception (either directly, or
via _handle_smtp_error).

I suggest adding 

           send_errors = None

above the try:, just in case.

> +        # Log ANY errors
> +        if send_errors is not None:
> +            sentaddrs = [x for x in toaddrs
> +                         if x not in send_errors.keys()]
> +            for address in send_errors.keys():
> +                self._log('warning',
> +                    SMTP_ERR_LOG_MSG,
> +                    message_id,
> +                    str(send_errors[address][0]),
> +                    send_errors[address][1],
> +                    fromaddr,
> +                    address)

Suggest rewriting as

               sentaddrs = [x for x in toaddrs
                            if x not in send_errors]
               for address, (smtp_code, smtp_error) in send_errors.items():
                   self._log('warning',
                       SMTP_ERR_LOG_MSG,
                       message_id,
                       str(smtp_code),
                       smtp_error,
                       fromaddr,
                       address)

(Also note that you don't have to explicitly call str() on all ints you
pass to a message; '%s' % 42 works perfectly fine.)

> +        else:
> +            sentaddrs = list(toaddrs)


> Index: src/zope/sendmail/tests/test_mailer.py
> ===================================================================
> --- src/zope/sendmail/tests/test_mailer.py	(revision 84550)
> +++ src/zope/sendmail/tests/test_mailer.py	(revision 84551)
...
>          class SMTP(object):
> +            _exception = None
> +            _exception_args = None
> +            _send_errors = None
>  
>              def __init__(myself, h, p):
>                  myself.hostname = h
>                  myself.port = p
> +                self.smtp = myself
>                  if type(p) == type(u""):
>                      raise socket.error("Int or String expected")
> -                self.smtp = myself
> +                if p == '0':
> +                    raise socket.error("Emulated connect() failure")
>  
>              def sendmail(self, f, t, m):
>                  self.fromaddr = f
>                  self.toaddrs = t
>                  self.msgtext = m
> +                if hasattr(self, '_exception'):
> +                    if self._exception and issubclass(self._exception, Exception):
> +                        if hasattr(self, '_exception_args') \
> +                            and type(self._exception_args) is tuple:
> +                                raise self._exception(*self._exception_args)
> +                        else:
> +                            raise self._exception('Crazy Arguments WANTED!')

That's a bit of an overkill.  If you change the _exception_args default
value to () above, it would be perfectly fine to replace this lats bit
with

                   if self._exception:
                       raise self._exception(*self._exception_args)

It's the job of the Python interpreter to check the suitability of
_exception and _exception_args for the raise statement, you don't have
to do everything by hand.

> +                if self._send_errors:
> +                    return self._send_errors
...
> +    def test_set_logger(self):
> +        # Do this with throw away instances...
> +        test_mailer = SMTPMailer()
> +        log_object = logging.getLogger('test_logger')
> +        test_mailer.set_logger(log_object)
> +        self.assertEquals(isinstance(test_mailer.logger, logging.Logger), True)

Why not

           self.assertTrue(test_mailer.logger is log_object)

instead?
     
>      def test_send(self):
>          for run in (1,2):
>              if run == 2:
>                  self.setUp(u'25')
> -            fromaddr = 'me at example.com'
> -            toaddrs = ('you at example.com', 'him at example.com')
> -            msgtext = 'Headers: headers\n\nbodybodybody\n-- \nsig\n'
> -            self.mailer.send(fromaddr, toaddrs, msgtext)
> -            self.assertEquals(self.smtp.fromaddr, fromaddr)
> -            self.assertEquals(self.smtp.toaddrs, toaddrs)
> -            self.assertEquals(self.smtp.msgtext, msgtext)
> +            else:
> +                self.setUp()

Pointless, the unittest framework calls self.setUp() for you
automatically. 

> +    def test_mailer_no_connect(self):
> +        # set up test value to raise socket.error exception
> +        self.setUp('0')
> +        try:
> +            self.mailer.send(**self.test_kwargs)
> +        except MailerTemporaryFailureException:
> +            pass

You want

           self.assertRaises(MailerTemporaryFailureException,
                             self.mailer.send, **self.test_kwargs)

here (and elsewhere)

> +        self.assertEquals(self.mailer.logger.infos,


> Index: src/zope/sendmail/tests/test_delivery.py
> ===================================================================
> --- src/zope/sendmail/tests/test_delivery.py	(revision 84550)
> +++ src/zope/sendmail/tests/test_delivery.py	(revision 84551)
> @@ -21,6 +21,7 @@ $Id$
...
> +from socket import error as socket_error

What's the advantage of this over

  import socket

and then using socket.error elsewhere?
 
> +    def test_unlink(self):
> +        self.thread.log = LoggerStub()          # Clean log
> +        self.filename = os.path.join(self.dir, 'message')
> +        self.tmp_filename = os.path.join(self.dir, '.sending-message')
> +        temp = open(self.filename, "w+b")
> +        temp.write('X-Zope-From: foo at example.com\n'
> +                   'X-Zope-To: bar at example.com, baz at example.com\n'
> +                   'Header: value\n\nBody\n')
> +        temp.close()
> +        self.md.files.append(self.filename)
> +        os.link(self.filename, self.tmp_filename)

Is os.link available on Windows?

> +        self.thread._unlinkFile(self.tmp_filename)
> +        self.failUnless(os.path.exists(self.filename))
> +        self.failIf(os.path.exists(self.tmp_filename), 'File exists')

> Index: src/zope/sendmail/maildir.py
> ===================================================================
> --- src/zope/sendmail/maildir.py	(revision 84550)
> +++ src/zope/sendmail/maildir.py	(revision 84551)
...
> +    def _cleanLockLinks(self):
> +        """Clean the maildir of any .sending-* lock files"""
> +        # Fish inside the Maildir queue directory to get
> +        # .sending-* files
> +        # Get The links
> +        join = os.path.join
> +        subdir_cur = join(self.path, 'cur')
> +        subdir_new = join(self.path, 'new')
> +        lock_links = [join(subdir_new, x) for x in os.listdir(subdir_new)
> +                        if x.startswith(SENDING_MSG_LOCK_PREFIX)]
> +        lock_links += [join(subdir_cur, x) for x in os.listdir(subdir_cur)
> +                        if x.startswith(SENDING_MSG_LOCK_PREFIX)]
> +        # Remove any links
> +        for link in lock_links:
> +            try:
> +                os.unlink(link)
> +            except OSError, e:
> +                if e.errno == 2: # file does not exist

If you import errno, you can write

                   if e.errno = errno.ENOENT:

instead.  I find that a bit clearer.

> +                    # someone else unlinked the file; oh well
> +                    pass
> +                else:
> +                    # something bad happend
> +                    raise

> Index: src/zope/sendmail/interfaces.py
> ===================================================================
> --- src/zope/sendmail/interfaces.py	(revision 84550)
> +++ src/zope/sendmail/interfaces.py	(revision 84551)
...
> +#
> +# Exception classes for use within Zope Sendmail, for use of Mailers
> +#
> +class IMailerFailureException(IException):
> +    """Failure in sending mail"""
> +    pass

The 'pass' is not needed here.

> +class MailerFailureException(Exception):
> +    """Failure in sending mail"""
> +
> +    implements(IMailerFailureException)
> +
> +    def __init__(self, message="Failure in sending mail"):
> +        self.message = message
> +        self.args = (message,)

Is the __init__ needed?  Exception.__init__ will do essentially the same
thing.

> +
> +class IMailerTemporaryFailureException(IMailerFailureException):
> +    """Temporary failure in sending mail - retry later"""
> +    pass
> +
> +class MailerTemporaryFailureException(MailerFailureException):
> +    """Temporary failure in sending mail - retry later"""
> +
> +    implements(IMailerTemporaryFailureException)
> +
> +    def __init__(self, message="Temporary failure in sending mail - retry later"):
> +        self.message = message
> +        self.args = (message,)
> +
> +
> +class IMailerPermanentFailureException(IMailerFailureException):
> +    """Permanent failure in sending mail - take reject action"""
> +    pass
> +
> +class MailerPermanentFailureException(MailerFailureException):
> +    """Permanent failure in sending mail - take reject action"""
> +
> +    implements(IMailerPermanentFailureException)
> +
> +    def __init__(self, message="Permanent failure in sending mail - take reject action"):
> +        self.message = message
> +        self.args = (message,)
> +
>  
>  class IMailer(Interface):
> -    """Mailer handles synchronous mail delivery."""
> +    """Mailer handles synchronous mail delivery.
>  
> -    def send(fromaddr, toaddrs, message):
> +    Mailer can raise the exceptions
> +
> +        MailerPermanentFailure
> +        MailerTemporaryFailure

Well, no, your exception classes are named
MailerTemporaryFailureException and MailerPermanentFailureException.

(I'd prefer the shorter names, personally.  Maybe MailerTemporaryError
and MailerPermanentError, because FooError is the common naming
convention for exceptions.)

> +
> +    to indicate to sending process what action to take.
> +    """
> +
> +    def send(fromaddr, toaddrs, message, message_id):
>          """Send an email message.
>  
>          `fromaddr` is the sender address (unicode string),
> @@ -138,12 +203,18 @@ class IMailer(Interface):
>          2822.  It should contain at least Date, From, To, and Message-Id
>          headers.
>  
> +        `message_id` is an id for the message, typically a filename.
> +

The naming clash with the Message-ID header (which has got *absolutely
nothing* to do with this message_id argument) is unfortunate.  I think
this should be clarified in the docstring.  No, I think someone should
go through the source code and replace message_id with queue_id
everywhere where it refers to the os.path.basename(file_in_queue) and
not to the Message-ID header.

>          Messages are sent immediatelly.
>  
>          Dispatches an `IMailSentEvent` on successful delivery, otherwise an
>          `IMailErrorEvent`.
>          """
>  
> +    def set_logger(logger):
> +        """Set the log object for the Mailer - this is for use by
> +           QueueProcessorThread to hand a logging object to the mailer
> +        """

Docstring formatting convention.

           """Set the logger for additional messages.

           The additional messages include information about SMTP
           errors, connection timeouts, etc.

           The logger should be a logging.Logger object.  If you pass
           None, no messages will be logged.
           """

Also, it's conventional to leave two blank lines between class and
interface declarations.  (PEP-8 says this.)

>  class ISMTPMailer(IMailer):
>      """A mailer that delivers mail to a relay host via SMTP."""


> Index: src/zope/sendmail/delivery.py
> ===================================================================
> --- src/zope/sendmail/delivery.py	(revision 84550)
> +++ src/zope/sendmail/delivery.py	(revision 84551)
> @@ -223,6 +241,8 @@ class QueueProcessorThread(threading.Thr
>          self.maildir = Maildir(path, True)
>  
>      def setMailer(self, mailer):
> +        if not(IMailer.providedBy(mailer)):
> +            raise (DoesNotImplement)

This doesn't look like Python -- all those extra parens.

           if not IMailer.providedBy(mailer):
                raise TypeError('not an IMailer', mailer)

>          self.mailer = mailer
>  
>      def _parseMessage(self, message):
...
> +    def _unlinkFile(self, filename):
> +        """Unlink the message file """
> +        try:
> +            os.unlink(filename)
> +        except OSError, e:
> +            if e.errno == 2: # file does not exist
> +                # someone else unlinked the file; oh well
> +                pass
> +            else:
> +                # something bad happend, log it
> +                raise

I've seen this function before.  Can you merge all the copies into a
global function, say, called safe_delete()?

> +    def _queueRetryWait(self, tmp_filename, forever):
> +        """Implements Retry Wait if there is an SMTP Connection
> +           Failure or error 4xx due to machine load etc

Blank line after the first one, then no extra indentation.

> +        """
> +        # Clean up by unlinking lock link
> +        self._unlinkFile(tmp_filename)
> +        # Wait specified retry interval in stages of self.interval
> +        count = self.retry_interval
> +        while(count > 0 and not self.__stopped):
> +            if forever:
> +                time.sleep(self.interval)
> +            count -= self.interval

Interesting.  But lose the unnecessary parens after the while, at least.

Is there any point to enter the loop if the forever arg is False?

> +        # Plug for test routines so that we know we got here
> +        if not forever:
> +            self.test_results['_queueRetryWait'] \
> +                    = "Retry timeout: %s count: %s" \
> +                        % (str(self.retry_interval), str(count))

Ouch.  I think monkeypatching this method would be cleaner.

>      def run(self, forever=True):
>          atexit.register(self.stop)
> +        # Clean .sending- lock files from queue
> +        if self.clean_lock_links:
> +            self.maildir._cleanLockLinks()
> +        # Set up logger for mailer
> +        if hasattr(self.mailer, 'set_logger'):
> +            self.mailer.set_logger(self.log)

The interface requires all mailers to have a set_logger(), so why the
hasattr test?

> @@ -263,8 +320,9 @@ class QueueProcessorThread(threading.Thr
>                  fromaddr = ''
>                  toaddrs = ()
>                  head, tail = os.path.split(filename)
> -                tmp_filename = os.path.join(head, '.sending-' + tail)
> -                rejected_filename = os.path.join(head, '.rejected-' + tail)
> +                tmp_filename = os.path.join(head, SENDING_MSG_LOCK_PREFIX + tail)
> +                rejected_filename = os.path.join(head, REJECTED_MSG_PREFIX + tail)
> +                message_id = os.path.basename(filename)
>                  try:
>                      # perform a series of operations in an attempt to ensure
>                      # that no two threads/processes send this message
> @@ -339,53 +397,46 @@ class QueueProcessorThread(threading.Thr
>                      file.close()
>                      fromaddr, toaddrs, message = self._parseMessage(message)
>                      try:
> -                        self.mailer.send(fromaddr, toaddrs, message)

Oh, the message_id argument to mailer.send() is new!  Please, please,
please, call it queue_id, and not message_id.

> -                    except smtplib.SMTPResponseException, e:
> -                        if 500 <= e.smtp_code <= 599:
> -                            # permanent error, ditch the message
> -                            self.log.error(
> -                                "Discarding email from %s to %s due to"
> -                                " a permanent error: %s",
> -                                fromaddr, ", ".join(toaddrs), str(e))
> -                            os.link(filename, rejected_filename)
> -                        else:
> -                            # Log an error and retry later
> -                            raise
> +                        sentaddrs = self.mailer.send(fromaddr,
> +                                                     toaddrs,
> +                                                     message,
> +                                                     message_id)
> +                    except MailerTemporaryFailureException, e:
> +                        self._queueRetryWait(tmp_filename, forever)
> +                        # We break as we don't want to send message later
> +                        break;

No semicolons please.

> +                    except MailerPermanentFailureException, e:
> +                        os.link(filename, rejected_filename)
> +                        sentaddrs = []


Overall I like the new features *a lot*.  The stylistic issues are
mostly minor.  The most important of those are

  * Fix the integer division bug (easy, but a unit test would be good).

  * Resolve the exception name confusion (easy).

  * Clearly distinguish queue IDs from RFC-2822 message IDs (should be easy).

And I have a questions:

  * What's up with retry_interval?  _queueRetryWait doesn't do anything
    in the inner loop, after sleeping for retry_interval seconds -- it
    just checks self.__stopped.  I suppose this makes Zope 3 shutdown
    (or restart) faster, but the name seems to be wrong -- there are no
    actual retries made after retry_interval seconds.

Thank you very much for diving into the murky depths and improving
zope.sendmail!

Marius Gedminas
-- 
If it weren't for the last minute, nothing would get done.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mail.zope.org/pipermail/zope-dev/attachments/20080319/66b5f9d5/attachment-0001.bin


More information about the Zope-Dev mailing list