First Python Script: Roman Numbers

Sat Apr 14 04:22:21 EDT 2001

In article <9b6pqv$1seu$1 at news.idiom.com>, Jim Dennis wrote:

> Here's my first Python script.  Way back when I was taking classes
> at a community college and learning Pascal, I got an assignment to
> convert numbers to roman numeral strings.  After tangling with the
> various rules for a bit, I realized that I could actually eliminate
> most of the conditionals by using a more elegant list of tokens.

> Since then I use the "toRoman" as a stock exercise for teaching
> myself new scripting and programming languages.  I've even used
> version in Bourne syntax for teaching classes in shell scripting
> (I've included that, too).

 I didn't mention it in my original post, but my fromRoman
 function is new.  I'd never spend the time to get that working
 in the various other forms of this exercise that I've done.

 Naturally there was a bug in the first version that I posted
 (thus, this follow up).  The problem was that strings like
 IXIV are not canonical Roman numbers (that would add up to 
 13 or XIII).  It took about six lines of code to fix that 
 (make a dictionary of illegal combinations and a list of 
 tokens to skip.  If we see CM, XC or IX we add the appropriate
 patterns, CD, XL or IV respectively to the skip list).

 For fun, I'll add a new test suite generator.  It generates
 every string consisting of the single letter roman numeral
 tokens: MDCLXVI;  In other words, it counts in base 7 using
 those as the digits.    Actually I'll simply re-write and 
 generate something I did in Pascal about 10 years ago --- a toy
 to count in "any" base --- which I orginally wrote to "count"
 in base 26 using the letters of the Roman alphabet as the "base."

 Maybe I'll post that tomorrow.

> Here's the script, I'll let it speak for itself:

>#!/usr/bin/env python
># Roman Number Converter 
>#	by James T. Dennis (c)2001 <jimd at starshine.org>

>#  This code is released and licensed under the terms of the GPL
>#  or, at the user's option, the BSD license as specified at the
>#  following URLs:
>#	http://www.freebsd.org/copyright/freebsd-license.html
>#	http://www.gnu.org/copyleft/gpl.html
>#
>#   In any event it is provided free of charge, "as-is" and wholly
>#   without any warranty.   Use it, mangle it, incorporate it into
>#   any software that will have it.
>
># Convert natural numbers to their Roman numeral representations 
># and vice versa.
>
># First we associate a list of numeric values with
># their Roman numeral (string token) equivalents as follows:
>
>Rom={}
>Rom["M"] = 	1000
>Rom["CM"] =  	 900
>Rom["D"] =  	 500
>Rom["CD"] =  	 400
>Rom["C"] = 	 100
>Rom["XC"] =	  90
>Rom["L"] =	  50
>Rom["XL"] =	  40
>Rom["X"] =	  10
>Rom["IX"] =	   9
>Rom["V"] =	   5
>Rom["IV"] =	   4
>Rom["I"] =	   1
>
>RomSeq = ( "M", "CM", "D", "CD", "C", "XC", "L", "XL", 
>	   "X", "IX", "V", "IV", "I" )
>
># In this case we create a Python dictionary (otherwise we'd
># create two arrays, one of integer values and the other of strings
># and use our own functions to search them and translate using a 
># common index).  We also create a sequence tuple in descending
># order.  It's for interating over the value list in a convenient order.
>
># We include the two letter tokens (IV, CM, CD, XC, etc) because
># it makes our main conversion loop simpler (as we'll see).
># Basically it means we can loop straight through without having
># to encode a bunch of parsing conditionals (the sequence, including
># the two letter tokens already incorporates most the the parsing
># rules for roman numeral strings).  
>
># This allows us to convert from Arabic to Roman in about 7 lines 
># of code; and from Roman back to Arabic less than 20
>
># Here's how we use these data structures:
>
>def toRoman (n):
>	result=""
>	if n < 1 or n > 4000:
>		return None
>		## raise IndexError?
>	for i in RomSeq:
>		while Rom[i] <= n:
>			result = result + i
>			n = n - Rom[i]
>	return result
>
># Our result starts as an empty string.
># We interate over the sequence of roman numeral component strings
># if the corresponding value (the value associated with "M" or "CM"
># etc) is greater than our number, we append the current string to 
># our result and subtract its corresponding value from our copy of n
>
># Converting from a roman numeral string back to arabic representation
># is a bit trickier.  We can try using the same data structure.  However,
># we could encounter many types of errors.  
># We were able to filter out all invalid integers with a single compound 
># conditional but there are lots of ways to compose strings that are 
># *not* valid roman numerals.  
>
>def fromRoman (s):
>	result=0
>	s = s.upper()
>	# Start by converting to upper case for convenience

	# IXIV is not a valid canonical roman number
	# (13? = XIII) 
	# So let's make a list of the two character token
	# pairs that are verbotten

	verbotten = { "CM" : "CD", "XC" : "XL", "IX" : "IV" }

	# And a list of tokens to skip if we've seen their
	# (larger) key.  Remember that our parsing is "orderly"
	# so we'll never consider a CM after we've encountered
	# a CD --- the M, CM, and CD cases will always be 
	# considered in that order.
	# The list starts empty: we just append values to it
	# *if* we encounter any of the keys in verbotten.
	skip = []

>
>	for i in RomSeq:
>		# A Roman number, in canonical form, is 
>		# sorted from the larger tokens towards the 
>		# lower valued ones.  Thus IIMXCC is not 
>		# a proper roman number (it would have to be
>		# sorted to MCXCII or 1192).  In fact CMM is
>		# also not valid --- it would have to be MCM
>		# and CMCM is is not valid at all it can't
>		# be simply re-arranged, it must be re-written 
>		# (MDCCC)
>
>		# So we simply walk through the list of tokens
>		# in sequence (just as we did for convert toRoman)
>		# and we compare each token to the "head" of the
>		# remaining string.

		# However we have to mix in a couple extra checks
		# to prevent sequences of more than three of the
		# [MCXI] tokens or more than one of any of the
		# others.

		if i in skip:
		# we've seen CM, XC, or IX so we 
		# skip the corresponding CD, XL, and IV tokens
			continue

>
>		seen = 0
>		limit = 1
>		if i in ("M", "C", "X", "I"):
>			limit = 3
>		# The M, C, X and I tokens make appear in sequences
>		# up to three times.  All others may only appear once
>		# each

		# The seen/limit parsing works for most cases.
		# One case that is unaddressed by it is exlemplified
		# by the sequences IXIV (13? == XIII) or CMCD (1300)

		# See we use the verbotten dictionary to set our 
		# a "skip" list, if we see 

>
>		head = s[:len(i)]
>		while i == head:
>			# on a match (head of string matches token):
>			# track number of times we've see this token
>			# vs. the token's sequence limit
>			seen = seen + 1
>			if seen > limit:
>				break

			if verbotten.has_key(i):
				skip.append(verbotten[i])

>
>			s = s[len(i):]
>			# behead the remaining string
>			head = s[:len(i)]
>			# and get the new head of the string
>			result = result + Rom[i]
>			# oh yeah, and add corresponding value to result
>
>	if s == "":
>		return result
>		
>## The following simply prints a list
>## by converting to a roman number *and back*.
>longest=""
>for i in range(1,4000):
>	s=toRoman(i)
>	if i != fromRoman(s):
>		print "Error?  is %s really %d" % (s, i)
>	print "%5d\t%s" % (fromRoman(s), s) 
>	if len(s) > len(longest):
>		longest = s
>print longest
>
>
>#### End of script
>
> The last 10 lines are just a test suite.
>
> I've noticed that this script works fine under Python 2.x
> but it seems that s.upper() isn't in Python 1.5.x (or that
> I have to import a string module; I didn't spend time on that).
>
> Here's a shell script version of just the "toRoman()"
> function:
>
>
>#!/bin/bash
>valarray="1000 M 900 CM 500 D 400 CD 100 C 90 XC 50 L 40 XL 10"
>valarray="$valarray X 9 IX 5 V 4 IV 1 I"
>
>[ "$1" -lt 4000 ] || {
>   echo "Can't Represent numbers over 4000 in this character set" >&2
>   exit 1
>   }
>
>n="$1"
>set -- $valarray
>while [ "$n" -gt 0 ]; do  # build roman numeral
>        while [ "$n" -lt "$1" ]; do  # find scale
>                shift 2
>                done
>        while [ "$n" -ge $1 ]; do  # add values to result
>                let n-=$1
>                result=$result$2
>                done
>        done
>echo $result