[Zope] Using RegEx in Script(Python)

Joel Burton joel@joelburton.com
Mon, 30 Jun 2003 19:21:07 -0400


On Fri, Jun 27, 2003 at 01:09:27AM +0200, Andreas Pakulat wrote:
> Hi,
> 
> I wanted to know if the above can be done? What I need is a function
> that replaces every character of a string, that is not in [a-zA-Z1-9]
> with an underscore. I want to use this to automatically create an
> Object-Id from a title, to create a new Object.
> 
> If this is not possible directly within a Script(Python), can it be done
> using an ExternalMethod? I suppose yes.
> 
> Andreas

If you're looking to have a "clean-zope-id" method, we use the
following. A simple regex solution can sometimes forget to fix things
like leading underscores, or getting rid of double underscores or such.
I actually do this w/o regexes using translate(), but regexs might be
faster. Feel free to benchmark and say so. ;)


#!/usr/bin/env python2.1

"""ConvertStringToID

   Converts a string into a Zope-safe ID.

   This removes all non-identifier safe characters. It replaces
   most with underscores, while trying to make the ID match a
   sensible choice (eg "Bill's House" -> "bills_house", not "bill_s_house").
   The output is always lowercase, and any leading underscores are
   removed (as they would be illegal in Zope.
"""

import string

tt = '______________________________________________._0123456789_______abcdefghijklmnopqrstuvwxy_______abcdefghijklmnopqrstuvwxyz_____________________________________________________________________________________________________________________________________'

def ConvertStringToID(s, maxlen=None):
    """
    Convert String to ID

    s = string to convert
    maxlen = maximum length of ID

    returns string.
    """
    
    # translate most things to underscore. remove punctuation below w/o translating
    s = string.translate(s, tt, '!@#$%^&*()-=+,\'"')

    # remove ALL double-underscores
    while s.find("__") > -1:
        s = s.replace('__','_')

    # when we use py2.2.2, this and below can simply be s = s.strip("_"). yeah!
    # trim underscores off front
    while s.startswith("_"):
        s = s[1:]

    # trim underscores off end
    while s.endswith("_"):
        s = s[:-1]

    # trim to maxlength
    if maxlen and len(s) > maxlen:
        s = s[:maxlen]

    return s


if __name__ == '__main__':
    assert ConvertStringToID("____A Lover's  %   Tale (Of 2 Cities).doc_") == "a_lovers_tale_of_2_cities.doc"



HTH.
-- 

Joel BURTON  |  joel@joelburton.com  |  joelburton.com  |  aim: wjoelburton
Independent Knowledge Management Consultant