Mombu the Programming Forum sponsored links

Go Back   Mombu the Programming Forum > Programming > Programming languages > Bug in string.find was: Re: Proposed PEP: New style indexing was
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
41 9th March 03:17
antoon pardon
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing, was Re: Bug in slice type



Op 2005-08-27, Steve Holden schreef <steve@holdenweb.com>:

Sometimes it is convenient to have the exception thrown at a later
time.


And maybe the more convenient place for this "if" is in a whole different
part of your program, a part where using -1 as an invalid index isn't
at all obvious.

You always seem to look at such things in a very narrow scope. You never
seem to consider that various parts of a program have to work together.

So what happens if you have a module that is collecting string-index
pair, colleted from various other parts. In one part you
want to select the last letter, so you pythonically choose -1 as
index. In an other part you get a result of find and are happy
with -1 as an indictation for an invalid index. Then these
data meet.

--
Antoon Pardon
  Reply With Quote


  sponsored links


42 11th March 06:07
steven bethard
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing,was



See the current thread in python-dev[1], which proposes a new method,
str.partition(). I believe that Raymond Hettinger has shown that almost
all uses of str.find() can be more clearly be represented with his
proposed function.

STeVe

[1]http://mail.python.org/pipermail/python-dev/2005-August/055781.html
  Reply With Quote
43 11th March 20:49
steve holden
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing,


Or perhaps it's just that I try not to mix parts inappropriately.

That's when debugging has to start. Mixing data of such types is
somewhat inadvisable, don't you agree?

I suppose I can't deny that people do things like that, myself included,
but mixing data sets where -1 is variously an error flag and a valid
index is only going to lead to trouble when the combined data is used.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
  Reply With Quote
44 12th March 07:18
terry reedy
External User
 
Posts: 1
Default Bug in string.find;


The fact that the -1 return *has* lead to bugs in actual code is the
primary reason Guido has currently decided that find and rfind should go.
A careful review of current usages in the standard library revealed at
least a couple bugs even there.

Terry J. Reedy
  Reply With Quote
45 12th March 10:49
paul rubin
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing,was Re: Bug in slice type


"Terry Reedy" <tjreedy@udel.edu> writes:

Really it's x[-1]'s behavior that should go, not find/rfind.

Will socket.connect_ex also go? How about dict.get? Being able to
return some reasonable value for "failure" is a good thing, if failure
is expected. Exceptions are for unexpected, i.e., exceptional failures.
  Reply With Quote
46 12th March 10:49
antoon pardon
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing, was Re: Bug in slice type


Op 2005-08-29, Steve Holden schreef <steve@holdenweb.com>:

I didn't know it was inappropriately to mix certain parts. Can you
give a list of modules in the standard list I shouldn't mix.


The type of both data is the same, it is a string-index pair in
both cases. The problem is that a module from the standard lib
uses a certain value to indicate an illegal index, that has
a very legal value in python in general.


It is not about what people do. If this was about someone implementing
find himself and using -1 as an illegal index, I would certainly agree
that it was inadvisable to do so. Yet when this is what python with
its libary offers the programmer, you seem reluctant find fault with it.

Yet this is what python does. Using -1 variously as an error flag and
a valid index and when people complain about that, you say it sounds like
whining.

--
Antoon Pardon
  Reply With Quote
47 12th March 10:50
antoon pardon
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing,was Re: Bug in slice type


Op 2005-08-29, Steven Bethard schreef <steven.bethard@gmail.com>:


Do we really need this? As far as I understand most of this
functionality is already provided by str.split and str.rsplit

I think adding an optional third parameter 'full=False' to these
methods, would be all that is usefull here. If full was set
to True, split and rsplit would enforce that a list with
maxsplit + 1 elements was returned, filling up the list with
None's if necessary.


head, sep, tail = str.partion(sep)

would then almost be equivallent to

head, tail = str.find(sep, 1, True)


Code like the following:

head, found, tail = result.partition(' ')
if not found:
break
result = head + tail


Could be replaced by:

head, tail = result.split(' ', 1, full = True)
if tail is None
break
result = head + tail


I also think that code like this:

while tail:
head, _, tail = tail.partition('.')
mname = "%s.%s" % (m.__name__, head)
m = self.import_it(head, mname, m)
...


Would probably better be written as follows:

for head in tail.split('.'):
mname = "%s.%s" % (m.__name__, head)
m = self.import_it(head, mname, m)
...


Unless I'm missing something.


--
Antoon Pardon


[1]http://mail.python.org/pipermail/python-dev/2005-August/055781.html
  Reply With Quote
48 12th March 10:50
terry reedy
External User
 
Posts: 1
Default Bug in string.find;


I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
useful, especially when 'x' is an expression instead of a name. But even
if -1 were not a legal subscript, I would still consider it a design error
for Python to mistype a non-numeric singleton indicator as an int. Such
mistyping is only necessary in a language like C that requires all return
values to be of the same type, even when the 'return value' is not really a
return value but an error signal. Python does not have that typing
restriction and should not act as if it does by copying C.


Not familiar with it.


A default value is not necessarily an error indicator. One can regard a
dict that is 'getted' as an infinite dict matching all keys with the
default except for a finite subset of keys, as recorded in the dict.

If the default is to be regarded a 'Nothing to return' indicator, then that
indicator *must not* be in the dict. A recommended idiom is to then create
a new, custom subset of object which *cannot* be a value in the dict.
Return values can they safely be compared with that indicator by using the
'is' operator.

In either case, .get is significantly different from .find.

Terry J. Reedy
  Reply With Quote
49 12th March 10:50
paul rubin
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing,was Re: Bug in slice type


"Terry Reedy" <tjreedy@udel.edu> writes:

There are other abbreviations possible, for example the one in the
proposed PEP at the beginning of this thread.

OK, .find should return None if the string is not found.
  Reply With Quote
50 12th March 14:35
bryan olson
External User
 
Posts: 1
Default Bug in string.find; was: Re: Proposed PEP: New style indexing,was


Hear us out; your disagreement might not be so complete as you
think. From-the-far-end indexing is too useful a feature to
trash. If you look back several posts, you'll see that the
suggestion here is that the index expression should ********ly
call for it, rather than treat negative integers as a special
case.

I wrote up and sent off my proposal, and once the PEP-Editors
respond, I'll be pitching it on the python-dev list. Below is
the version I sent (not yet a listed PEP).


--
--Bryan


PEP: -1
Title: Improved from-the-end indexing and slicing
Version: $Revision: 1.00 $
Last-Modified: $Date: 2005/08/26 00:00:00 $
Author: Bryan G. Olson <bryan.olson@acm.org>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 26 Aug 2005
Post-History:


Abstract

To index or slice a sequence from the far end, we propose
using a symbol, '$', to stand for the length, instead of
Python's current special-case interpretation of negative
subscripts. Where Python currently uses:

sequence[-i]

We propose:

sequence[$ - i]

Python's treatment of negative indexes as offsets from the
high end of a sequence causes minor obvious problems and
major subtle ones. This PEP proposes a consistent meaning
for indexes, yet still supports from-the-far-end
indexing. Use of new syntax avoids breaking existing code.


Specification

We propose a new style of slicing and indexing for Python
sequences. Instead of:

sequence[start : stop : step]

new-style slicing uses the syntax:

sequence[start ; stop ; step]

It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and 'None' work the same as in old-style slicing.

Within the square-brackets, the '$' symbol stands for the
length of the sequence. One can index from the high end by
subtracting the index from '$'. Instead of:

seq[3 : -4]

we write:

seq[3 ; $ - 4]

When square-brackets appear within other square-brackets,
the inner-most bracket-pair determines which sequence '$'
describes. The length of the next-outer sequence is denoted
by '$1', and the next-out after than by '$2', and so on. The
symbol '$0' behaves identically to '$'. Resolution of $x is
syntactic; a callable object invoked within square brackets
cannot use the symbol to examine the context of the call.

The '$' notation also works in simple (non-slice) indexing.
Instead of:

seq[-2]

we write:

seq[$ - 2]

If we did not care about backward compatibility, new-style
slicing would define seq[-2] to be out-of-bounds. Of course
we do care about backward compatibility, and rejecting
negative indexes would break way too much code. For now,
simple indexing with a negative subscript (and no '$') must
continue to index from the high end, as a deprecated
feature. The presence of '$' always indicates new-style
indexing, so a programmer who needs a negative index to
trigger a range error can write:

seq[($ - $) + index]


Motivation

From-the-far-end indexing is such a useful feature that we
cannot reasonably propose its removal; nevertheless Python's
current method, which is to treat a range of negative
indexes as special cases, is warty. The wart bites novice or
imperfect Pythoners by not raising an exceptions when they
need to know about a bug. For example, the following code
prints 'y' with no sign of error:

s = 'buggy'
print s[s.find('w')]

The wart becomes an even bigger problem with more
sophisticated use of Python sequences. What is the 'stop'
value for a slice when the step is negative and the slice
includes the zero index? An instance of Python's slice type
will report that the stop value is -1, but if we use this
stop value to slice, it gets misinterpreted as the last
index in the sequence. Here's an example:

class BuggerAll:

def __init__(self, somelist):
self.sequence = somelist[:]

def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]


print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]

The above prints:

[9, 7, 5, 3, 1]
[]

Un-commenting the print statement in __getitem__ shows:

Slice says start, stop, step are: 9 -1 -2

The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python
interprets the negative number as an offset from the high
end of the sequence.

Steven Bethard offered the simpler example:
py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10) (9, -1, -2)
py> range(10)[9:-1:-2]
[]

The double-meaning of -1, as both an exclusive stopping
bound and an alias for the highest valid index, is just
plain whacked. So what should the slice object return? With
Python's current indexing/slicing, there is no value that
just works. 'None' will work as a stop value in a slice, but
index arithmetic will fail. The value 0 - (len(sequence) +
1) will work as a stop value, and slice arithmetic and
range() will happily use it, but the result is not what the
programmer probably intended.

The problem is subtle. A Python sequence starts at index
zero. There is some appeal to giving negative indexes a
useful interpretation, on the theory that they were invalid
as subscripts and thus useless otherwise. That theory is
wrong, because negative indexes were already useful, even
though not legal subscripts, and the reinterpretation often
breaks their exiting use. Specifically, negative indexes are
useful in index arithmetic, and as exclusive stopping
bounds.

The problem is fixable. We propose that negative indexes not
be treated as a special case. To index from the far end of a
sequence, we use a syntax that ********ly calls for far-end
indexing.


Rationale

New-style slicing/indexing is designed to fix the problems
described above, yet live happily in Python along-side the
old style. The new syntax leaves the meaning of existing
code unchanged, and is even more Pythonic than current
Python.

Semicolons look a lot like colons, so the new semicolon
syntax follows the rule that things that are similar should
look similar. The semicolon syntax is currently illegal, so
its addition will not break existing code. Python is
historically tied to C, and the semicolon syntax is
evocative of the similar start-stop-step expressions of C's
'for' loop. JPython is tied to Java, which uses a similar
'for' loop syntax.

The '$' character currently has no place in a Python index,
so its new interpretation will not break existing code. We
chose it over other unused symbols because the usage roughly
corresponds to its meaning in the Python library's regular
expression module.

We expect use of the $0, $1, $2 ... syntax to be rare;
nevertheless, it has a Pythonic consistency. Thanks to Paul
Rubin for advocating it over the inferior multiple-$ syntax
that this author initially proposed.


Backwards Compatibility

To avoid braking code, we use new syntax that is currently
illegal. The new syntax more-or-less looks like current
Python, which may help Python programmers adjust.

User-defined classes that implement the sequence protocol
are likely to work, unchanged, with new-style slicing.
'Likely' is not certain; we've found one subtle issue (and
there may be others):

Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.

Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:

__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__

must also consistently implement:

__len__

Sane programmers already follow this rule.

Copyright:

This do***ent has been placed in the public domain.
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666