Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > Tied arrays and a Perl limitation
User Name
Password
REGISTER NOW! Mark Forums Read




Reply
1 23rd October 17:06
andrew hamm
External User
 
Posts: 1
Default Tied arrays and a Perl limitation



[Ilya, I notice your name in the Authors section of the "overload"
perldoc, so this is most likely right up your alley, but of course not
demanding or limiting any replies to your responsibility...]

I am working on a module to implement a flexible array type using
TIEARRAY. I wish it to look as natural as possible when used in code, and
prefer not to have to use the more clumsy syntax of OO for such a natural
object like an array

ie don't like

$faref->item($ofs)
$faref->splice($ofs, $len, @list)
@{$faref->items($from .. $to)}

when

$flexarray[$ofs]
splice @flexarray, $ofs, $len, @list
@flexarray[$from .. $to]

etc is so much more appropriate.

The principle of the Array::Flex is simply to have a non-zero origin for
the array. An interesting variant will be an array where the lower bound
(lwb) will shift dynamically due to assignments to entries below the
current lwb, or due to shift, unshift and splice activities. Another
interesting variation would be an array with a fixed lower and upper bound
(upb) with suitable semantics and runtime checks. Language connoisseurs
will probably pick that "flex" is borrowed from Algol68 and that is close
to my design goal.

NOTE: variations have not been implemented in my sample code; either tie
options or subclasses are suitable choices to offer these variations.

The module is quite simple, contents as follows:

package Array::Flex;

use strict;
use warnings;
use Carp;

use fields qw(array lwb);
our $NEGATIVE_INDICES = 1;

# normalise an index to suit the embedded array
sub _idx {
my $index = $_[1] - $_[0]{lwb};
$index < 0 and
croak ref($_[0]), ": index $_[1] out of bounds";
# subclassed variants might also check the upper bound
return $index;
} # _idx

# helper subs accessible only by (tied @x)->SUB
# which is sad...
sub lwb { return $_[0]{lwb} }
sub upb { return $_[0]{lwb} + $#{$_[0]{array}} }
sub range { return $_[0]->lwb .. $_[0]->upb }

sub TIEARRAY {
my ($class, $lwb) = splice @_, 0, 2;
my Array::Flex $self = fields::new($class);
$self->{array} = [ @_ ];
$self->{lwb} = $lwb;
return $self;
}

sub FETCH { return $_[0]{array}[$_[0]->_idx($_[1])] }
sub STORE { return $_[0]{array}[$_[0]->_idx($_[1])] = $_[2] }

# I've elected that scalar(@flex) registers the true count of elements
# as discussed below, $#flex and another (missing) feature can
# offer the upper and lower bounds of the array.
# $#flex should also yield the true upper bound as normal IMO.
#
sub FETCHSIZE { $#{$_[0]{array}} + 1 }
sub STORESIZE { $#{$_[0]{array}} = $_[1] - 1 }
sub CLEAR { $_[0]{array} = [ ] }
## EXISTS DELETE not shown - trivial beasts

# for the basic class, these do not affect the lower bound
sub POP { return pop @{$_[0]{array}} }
sub SHIFT { return shift @{$_[0]{array}} }
sub PUSH { my Array::Flex $self = shift; return push @{$self->{array}},
@_ }
sub UNSHIFT { my Array::Flex $self = shift; return unshift
@{$self->{array}}, @_ }

sub SPLICE {
my Array::Flex $self = shift;
my ($offset, $length) = splice @_, 0, 2;
return splice @{$self->{array}}, $self->_idx($offset), $length, @_;
}

1;

__END__

and a simple test script:

#!/usr/bin/perl

use strict;
use warnings;
use Array::Flex;

tie my @flex, 'Array::Flex', -2, 0, 1, 2, 3, 4;

print "size : ", scalar(@flex), "\n";
print "$flex[-2] $flex[-1] $flex[0] $flex[1] $flex[2]\n";
print "@flex[-2 .. 2]\n";

eval { print "@flex\n" };
# following is a failed idea for kludgy access to lwb
eval { print "$flex[-lwb]\n" };

print "\ntesting splice now\n";

splice @flex, 0, -2, 10, 20, 30;
print "@flex[-2 .. 4]\n";
print "last : ", $#flex, "\n";
print "size : ", scalar(@flex), "\n";

print "lwb : ", tied(@flex)->lwb, "\n";
print "upb : ", tied(@flex)->upb, "\n";
print "range: ", join($", tied(@flex)->range), "\n";

__END__

output is:

size : 5
0 1 2 3 4
0 1 2 3 4
Use of uninitialized value in join or string at flextest line 14.
Use of uninitialized value in join or string at flextest line 14.
2 3 4 # the eval'd print struggles valiantly
Argument "-lwb" isn't numeric in array element at flextest line 16.
2 # the eval'd print struggles valiantly

testing splice now
0 1 10 20 30 3 4
last : 6
size : 7
lwb : -2
upb : 4
range: -2 -1 0 1 2 3 4

__END__

In the test, if I make @flex range from a lower bound of 4 instead of -2,
then the "@flex" in the eval'd prints causes the croak to fire in FETCH
(actually, in the called _idx function) because it asks for an illegal
element at index zero when expanding @flex.

Here are my observations:

1) The Array::Flex works quite well when accessing elements explicitly, eg
with single indexes, an index range or a list of indexes. However Perl
normally lets us deal with more than one element of an array in a single
expression...

2) The Array::Flex does not "play well" with other arrays, even other flex
arrays. Perl's implicit assumption that arrays always start at origin
zero* means that

@x = @flex;

fails because the runtime tries to expand @flex into

@flex[0..$#flex]
or
@flex[0..tied(@flex)->FETCHSIZE-1]

ie forces the assumption that the lower bound is zero* If the flex's lwb
is below zero, we only get a part of the array, and if the flex's upb is
above zero, then the first FETCH at offset zero causes a runtime croak.

To make this class play nicely, Perl would need to expand @flex into

@flex[tied(@flex)->FETCHLWB..$#flex]
or
@flex[tied(@flex)->FETCHLWB..tied(@flex)->FETCHSIZE-1]
or
@flex[tied(@flex)->FETCHLWB..tied(@flex)->FETCHUPB]

where FETCHLWB and FETCHUPB are suggested tie methods that are not
currently implemented.

* let's ignore $[ since it's deprecated and also not powerful enough for
the job in it's current form (values 0 or 1 allowed, and effective
globally not very locally)

3) If the FETCHLWB and FETCHUPB methods were available, I think it's
entirely reasonable for

@flex

to simply yield a non-magical list in the usual manner; the origin is
therefore not an issue when the contents of the @flex are fetched.
Assignment from an Array::Flex[10:20] to an Array::Flex[40:50] would
simply work as expected without assistance I believe.

4) FETCHLWB, FETCHUPB and FETCHSIZE are naturally related, and presumably
one of these functions could be left undefined; the runtime should be able
to automagically flesh out the missing function in the same way that the
overload module does. If a programmer fails to implement the proper
relationship when defining all three (six)? Well, I guess they'll have a
bug to catch...

5) STORELWB and STOREUPB are most likely useful or even demanded when
FETCH??B is defined. See below regarding "natural" syntax for accessing
?????LWB functions from user code.

6) $#flex is naturally a call to FETCHUPB and STOREUPB if it is defined,
otherwise FETCHSIZE and STORESIZE as usual. autodefinition in the style of
the overload module will handle this quite adequately.

7) There is no syntax available to access the lwb easily. Since the upper
bound is normally accessed with $#array, it makes sense to invent some
syntax close to this; either:

$#?array where ? is an unambiguous modifier character
or
$?array where ? is some unambiguous character
such as
$[array

$[array looks very nice; although it will be ambiguous with $[, I notice
that the conflict between $# and $#array seems to be well-handled in the
compiler.

I have not searched for other available characters that might fit in the
$?array suggestion.

For the first suggestion $#?array, a few possibilities that look OK are

$#-array # clashes with access to upb of @-

$#[array

$##array # vaguely reminiscent of ksh's ${var##pattern} for what
it's worth

$#?array # ie a literal ? - many more would also fit

8) Finally, perl would need to consult FETCHLWB in various circumstances
where an implicit lwb of zero is currently assumed... Somewhat of a task
but I hope not impossible. My first approach (if I had enough knowledge of
internals) would be to seek out calls to FETCHSIZE and STORESIZE because
surely this is always done when @x is expanded implicitly.


If you've made it this far, congratulations and thanks for your attention.
Any thoughts, including the achievability of my suggestions? Are there any
TIE tricks I'm missing that would solve the limitations I've found? Maybe
some clever use of overloading?

Please don't hesitate to ask for clarification if I've been a bit vague in
too many places.
  Reply With Quote


 


2 23rd October 17:07
terrence brannon
External User
 
Posts: 1
Default Tied arrays and a Perl limitation



"Andrew Hamm" <ahamm@mail.com> writes:

What problem does having a non-zero origin for an array solve?

--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.
  Reply With Quote
3 23rd October 17:07
andrew hamm
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


What problem does any algorithm or data structure solve? I thought
moderation should be enough to stop c.l.p.misc style
questions-in-lieu-of-answers, but I'll grace your question with an answer,
just for the hell of it.

* If the only valid indexes for an array range from 0 to N, why does Perl
have hashes? Surely we can use zero-based arrays for all code.

* why do some languages base their arrays from 1 instead of zero, for
example SQL's and 4GL's?

* why do some languages offer user-assignable ranges for arrays?

* why do some languages offer, and some programmers benefit from, strong
array bounds checking?

* storing data for measurements made at a range of temperatures, ranging
from -20 to +100

* storing info related to a range of dates, for example from 1990 to the
present day, or the future. In a database engine I use (informix) dates
are represented as an integral number of days from 1st Jan 1900 (day zero)
so the LWB of an array would be int(dmy(1, 1, 1990)) to borrow notation
of an Informix function DMY which constructs DATEs from a d/m/y triple.

* basically, any orderable sequence which does not start at zero (or one)
is a prime candidate for a non-zero origin. Use your imagination in any
field you are familiar with.

I'd appreciate it if someone considered my question instead of questioning
my motives or intelligence. I think I've presented the case in a very
strong and clear way.
  Reply With Quote
4 23rd October 17:07
yitzchak scott-thoennes
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


I thought it was a reasonable question, and didn't see any sign of
"questioning [your] motives or intelligence".

To make your module work right, it seems that perl's concept of what
constitutes an array would have to expand. But if syntax were added
to get/set the lower bound as you suggest, it doesn't seem like your
module is needed any more (though some of the existing modules that
impose various restrictions on arrays could take advantage of it).

In any case, there are some very good questions to ask yourself about
suggested new features in perldoc perlhack. If you haven't perused
them, it might be a good idea.

For what it's worth, I don't see this as a problem that needs to be
solved; using either a standard array or hash or some existing tie
class will meet the needs of most data structures, and I can't see how
to introduce a variable lower bound in a way that older perl code
(containing hardcoded limits of 0 and $#a or @a) wouldn't choke on
when encountering a flex array.
  Reply With Quote
5 23rd October 17:08
terrence brannon
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


"Andrew Hamm" <ahamm@mail.com> writes:


between zero-based arrays and Perl hashes, most common problems can be solved.

changing the start index from 0 to 1 does not solve any new
problems. My question was about the functional utility of such a
change. this change does not offer any new functional utility

I dont know -- why dont you tell us. I've a M.S. in Computer Science
and 5 years of professional experience with Perl never had a need for such a beast.

This appears irrelevant to my question and your initial topic.

search.cpan.org for Tie::RangeHash


search.cpan.org for Date::Range

Perl hashes can take numbers as indices starting wherever you
want. Keep a second array with your desired ordering and you are done.


Must you go on the defensive? I was hoping to learn something. It
seemed like an unusual thing to need and I thought you had a solid
motiviation for your undertakings.


I don't. Your needs do not appear novel or necessary.

--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.

----- End forwarded message -----
  Reply With Quote
6 23rd October 17:08
andrew hamm
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


possibly, but often there's a lot of "why do you want to do that"
preconceptions when it's just a matter of knowing a particular class of
endevour... Still, I'm not upset :-) and I apologise immediately to
Terrence if I've caused him any offence.


Yes indeed (he says with a twinkle in his eye) I see a lot of merit in
revisiting the concept behind $[ but "doing it properly" this time...


"Where is the implementation?"

I believe that if I actually contributed a patch-bundle that implemented
the entire idea (of a variable lwb complete with my suggested $[array
syntax) then there would be something concrete to discuss, swirl around
the palate and spit unto a bucket, and also there would be far less
resistance of the "but we don't have the bloody time to do it" variety. I
fully understand the unfairness of asking for a feature...

However, sadly, I do not have the time no mo' to become a hacker. I used
to understand the runtime guts of the Icon language, and even contributed
a simple x86 stack-switch function for co-routines on Intel, but alas
those happy times seem but a distant memory. I wish. Perhaps this could be
my new years resolution.

"Backwards compatibility"

See below for my parallel drawn to threading.

"Could it be a module instead?"

Well, that's what I'm trying to achieve in the first instance, but the
assumption builtin to implicit expansion of @array is causing grief.

"Is the feature generic enough?"

I think if it was there then people would begin to use it. There's been
enough non-zero origins in programming since programming was invented.
Sure, a conventional zero or one origin tends to be most useful, but that
doesn't mean that a class of problem can't be solved better if that
limitation is removed.

"Does it potentially introduce new bugs?"

hmmmmm.... passing a flex to any code which says:

for my $i (0 .. $#array)

will provoke a bug for sure. My attitude is that the user of a flex would
need to take responsibility, and if the feature settles well with peoples
perception, then perhaps modules could slowly "come around". Currently, a
similar issue revolves around the question "is a module thread-safe?" and
the contemporary answer is - "not necessarily: if you are threading, you
need to take responsibility". Perhaps in a year or three, most modules
will be validated for threading. Here's hoping, because I like threads...

"Does it preclude other desirable features?"

I doubt it.

The remaining points in this section of perlhack don't need treating right now.

A hash is not suitable when there is a natural ordering of the indexes. It
becomes clumsy to encode perpetual calls to "sort keys" etc.

Standard arrays can leave large gaps of useless allocation if your indexes
start way up in the thousands.

Using an offset to "correct" an index leaves you open to bugs (if you
forget to encode the offset subtraction) Furthermore, the offset will need
to be stored in a separate variable which is not encapsulated safely where
it belongs. And we all know the dangers of not encapsulating.

Standard arrays do not offer bounds checking (which I feel should probably
be limited to a module, and not necessarily native - wonder how other
people feel about that)

If a programmer uses a flex array (either a new-fangled native or my 1/2
complete tie module) then of course the programmer must accept the
consequences. If there are unavoidable issues when using a flex array (eg
passing them to a module provokes failure) then the user who wants to use
the flex AND a specific module must make a decision. However until it's
thought about deeper it's hard to predict whether it would slip in easily
or slip in with bruises and blistering.

Using a tie module would be a baby step. Going native would only be worth
the effort if it can be shown to be both easy and seamless.

Also, I'm wondering if I'm missing some possible use of overloads. I see
that @{ } can be overloaded. However I'm very new to Perl's advanced
overloading. I also need to re-visit "Object Oriented Perl" by Damien
Conway; specifically the chapter and section that discusses both tying and
blessing into the same class in a specific way.

This project is in some ways, an experiment into an idea.


----- End forwarded message -----
  Reply With Quote
7 23rd October 17:08
mark jason dominus
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


Andrew Hamm <ahamm@mail.com>:


I don't think so. I've felt for a long time that the tie interface
was not quite carefully-enough thought out. I think that at the
bottom there is a philosophical conflict between

(a) Even a tied array should behave like an array.

and

(b) tie() should allow you to overload array syntax with any
weirdo semantics you want.

Usually (a) was the winner in the past, but there has been a trend
toward (b).

For example, consider

($a, $b, $c) = @tiedarray[4,6,3];

The (b) philosophy says that this should translate to

($a, $b, $c) = $tied_object->FETCHSLICE(4,6,3);

but in this case the (a) philosophy won out, and Perl actually behaves


$a = $tied_object->FETCH(4);
$b = $tied_object->FETCH(6);
$c = $tied_object->FETCH(3);

instead. Thinking of useful and interesting semantics that would be
enabled by (b) and are impossible with (a) is left as an exercise.

Sometimes the (a) philosophy rules out interesting and useful
features. For example, consider Tie::File, which ties an array to a
file, one element per line. One might like

delete $FILE[3];

to excise record 3 from the file entirely, so that for example

0 Maximo Perez
1 The Train
2 Luis Melian Lafineur
3 Olimar
4 Brimstone
5 Clubs

becomes

0 Maximo Perez
1 The Train
2 Luis Melian Lafineur
4 Brimstone
5 Clubs

This is not what "delete $array[3]" does on an ordinary array, but it
seems like a useful extension. However, with Perl's present 'tie'
semantics, it is impossible to get the extension to work consistently,
because of:

delete @FILE[2,3];

which should yield this file:

0 Maximo Perez
1 The Train
4 Brimstone
5 Clubs

The problem is that Perl translates this to
$tied_object->DELETE(2);
$tied_object->DELETE(3);

and a naive implementation of DELETE will instead yield

0 Maximo Perez
1 The Train
3 Olimar
5 Clubs

There is really no way around this. The simple fact is that although
we want

delete @FILE[2,3];

and

delete $FILE[2]; delete $FILE[3];

have different meanings and different effects on the array, there is
no way, from behind the tied array interface, to tell which of them
was requested. The only solution is to change the API. But
philosophy (a) says that there is no need to change the API because
what we are trying to do is to make an array behave in a
non-array-like way.

Here is another example, which is simpler and so may be more
illustrative of the conflict. Consider

$z = pop @array;

On an ordinary array, if the array is empty, this returns undef. One
might decide that that is an invariant property of all arrays, and
should be obeyed by tied arrays as well as by regular arrays. That
would suggest that "pop" should be implemented as
if ($tied_object->FETCHSIZE == 0) {
$z = undef; } else {
$z = $tied_object->POP;
}

This is philosophy (a). Or one could take the point of view that the
return value of "pop" on an empty tied array should be dictated by the
author of the tied array class, so that the implementation would be simply
$z = $tied_object->POP;


This is philosophy (b). The initial implementation took philosophy
(a); in 2002 I put in a patch to make it (b) instead, but I am still
wondering whether I might have made a mistake by doing so.

Here's a third example, close to your heart. What does

$z = $tiedarray[$x]

do when $x is negative? For an ordinary array, it esentially does

if (@tied_array < -$x) {
$z = undef; # Out of bounds subscript
} else {
$z = $tied_array[@tied_array + $x];
}

and one might like to require that the semantics of negative values be
the same for all arrays, even tied arrays. Perl 5's original
implementation did do this. If $x was negative, Perl would actually call
$z = $tied_object->FETCH[$tied_object->FETCHSIZE() + $x];

so that FETCH would never know that the index had originally been
negative. In fact, by default, it still does this.

However, as you discovered, this completely rules out a number of
interesting behaviors. And as I discovered, even for tied arrays that
would like to assign the usual meaning to negative subscripts, it can
be useful for the tied methods to find out whether the subscript was
negative. Consider Tie::File again. Suppose someone says

$z = $FILE[-1];

There is a simple and efficient way to retrieve the last record of the
associated file, even if the file is very large. But Tie::File never
gets a chance to do that. Instead, Perl calls
$tied_object->FETCHSIZE

which forces Tie::File to scan the *entire* file and count all the
records; then Perl calls $tied_object->FETCH() with a positive value.
This annoyance was the that the NEGATIVE_INDICES feature was put
into the tied array interface in the first place. You can view this
as yet another step away from (a) toward (b).

One thing you might want to notice here, however, is that these
problems can be tricky. It is hard to see in advance what methods
will really be required. I don't remember who did the original
implementation of tied arrays. (They were missing from Perl up until
version 5.005, I believe.) But I imagine that whoever it was probably
expected that they were providing a fully useful and complete
interface, and perhaps was even a little diffident about the size of
the API, which was much larger than the orresponding API for tied
hashes. And in some sense, I think they *did* provide a fully useful
and complete API. It just wasn't fully complete enough.

Then a couple of years ago, someone got frustrated and put in the
NEGATIVE_INDICES feature, and---here's the punch line---he apparently
thought that this would be enough to properly support arrays with
arbitrary-range indices. If you look at the tests for this feature in
t/op/tiearray.t, you will see that the author thought he was actually
implementing an array with indices -2 .. 2. He even calls his tied
aray class "NegIndex". The problem you point out---that it doesn't
work, and can't be made to work---never occurred to him. He wanted to
fix the Tie::File behavior; NEGATIVE_INDICES was sufficient for this;
and as a bonus, it seemed that this would also be enough to allow
arrays with subscript ranges -2..2---but he totally missed that it was
*not* enough. What a pinhead! I wonder who it could have been? The
initials in the test file say it is someone named "mjd", but I can't
think of anyone offhand who has those initials.

So I have two points in all of this. First, that it might be hard to
see what is the minimal API for emulating all the weird things people
would want to do with an array. You think you have got it all, but
then someone else comes along and wants to do something you hadn't
considered. Rather than continuing to add the features
piece-by-piece, it might be better for someone good at thinking to sit
down and think really hard, not just baout their own need-of-the-week,
but about the Perl array guts-API and what it supports. And I mean
*really* hard. It is not even immediately clear, for example, what

@tied_array = (1, 2, 3);

should do to be most useful.

And second, that there is another point of view, with a consistent
philosophical position, that says that although it is understandable
that you want to do this stuff, that is too bad, and that is not what
tied arrays are for.

Incidentally, similar problems arise with tied hashes.
values(%tied_hash) does not call $tied_object->VALUES(). Instead, it calls
$tied_object->FETCH($tied_object->FIRSTKEY())
$tied_object->FETCH($tied_object->NEXTKEY())
$tied_object->FETCH($tied_object->NEXTKEY())
$tied_object->FETCH($tied_object->NEXTKEY())
...

which again rules out a number of useful behaviors.

I was then going to address your specific proposals, but I realized
that I was about to start writing about how you might not be doing
what you really wanted, and how this was perhaps not the best way to
achieve your ends, and you said you didn't want that, so I won't.
I'll just say that I don't like your proposal, for several reasons,
and I'm glad that it seems unlikely that you'll actually implement it,
and even less likely that any implementation of yours would be
adopted. But I hope at least you feel that I took your question
seriously.
  Reply With Quote
8 23rd October 17:08
andrew hamm
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


I don't want to get into a belligerent exchange on a moderated group. I
don't like getting into them on unmoderated groups.


Doesn't cover the task. Tie::RangeHash is more of a compression creature.
Saves a lot of space when elements ranging from 10000 .. 30000 all have
the same value, or

$cost{'1999-12-15,2000-01-14'} = 150;

as taken from the doco. This is not the same as having a discrete value
for each element ranging from index '1999-12-15' to '2000-01-14'


And how can that type be used as an index into an array? Once again it's
thrust is similar to Tie::RangeHash, and does not focus on discrete values
for each discrete "key".

But you are never done with writing double the code to get to your values.
It's hardly elegant or robust compared to a proper flex array.


ok. I'll switch off the shields. I'm a bit gunshy of perl newsgroups. I'll
assume you are being friendly, so pleased to meet you. Let's go...

One of our departments codes for Health Insurance companies. Many of their
records are keyed off the date of the event. As I've mentioned in another
message, in Informix SQL, the underlying integral value representing a
date is the number of days since 1/1/1990. so 1/1/2000 is the value 36525
for example. A flex array would very naturally represent a series of data.

One of the newsreaders I use is X-news. It seems to give a sequentially
increasing value to the messages. Old messages fall off the end. The
numbers are up in the thousands for busy newsgroups. Seems like the code
in X-news could be represented in a flex array.

An engineering analysis program might want to evaluate and record a load
from -90 degrees to 90 degrees. A flex array ranged from -90 to 90
directly provides the slots to store the data.

In australia, the postcodes (aka ZIP codes) in my state range from 2000.
If you were storing statistical facts about this, why muck around with a
zero-origin array? Range checking would also be valuable when developing
code. Sometimes a fixed length array helps you find bugs. A tied array
providing that capability would not interfere with any perler who is happy
and comfortable with the indeterminate and dynamic length of a Perl array.

A non-zero origin to arrays is just a simple and sweet way to represent a
list of data which is keyed by a range that does not start at zero, and
where the range is monotonically increasing in steps of one. I would not
even demand that the steps should be anything other than value of one,
although there probably could be cases for that. I do not recall ever
seeing a programming language that offered steps that were not integral or
stepped by a value other than one.
  Reply With Quote
9 23rd October 17:08
andrew hamm
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


Yes - I think it proceeded more along a "what-if" progression. And
available time from keen people too.


yes indeed - playing around with this module, I dabbled with implementing

$flex["lwb"]

to get reasonably sexy access to the lower bound, but alas the non-numeric
index is rejected with a runtime warning iirc, and the index value ends up
falling back to zero because of string conversion.

I can see you've thought about this quite a bit. Interesting points you
raise. The lack of a SLICE type of method also precluded a few possible
code alternatives I looked at. Still, I do not see any reason why it could
not be added. It might break existing tie code, but then again, the lack
of a SLICE in the class would be sufficient to make it behave in the
current fashion. I don't know if that would affect the compiler or
runtime, however. [yes, I must get into hacking, sometimes it's
frustrating not knowing the guts]


Absolutely! if my code could know that implicit expansion is being
performed @flex, then I could bypass the use of the offset for the
duration, so that

@a = @flex;

could be fudged into working.

I like to look at things more abstractly, and I feel it's entirely
reasonable to view an orderable domain as suitable indexes. From my point
of view, the fundamental difference between a hash and an array is simply
that the keys of a hash are an unordered type, and the keys of an array
are an ordered type. Other constraints on the indexes of an array are that
the values are steadily increasing in steps equivalent to one.

Now, I'm not asking for a Perl array to support indexes from 'a' to 'z'
for example (although Pascal and similar languages do indeed support this
capability) but that's an interesting suggestion. With the right notation,
all that is needed to index an array is some "class" that implements a
sequential, ordered range. Clearly this is beyond the scope of Perl's
native types, but it should become possible via ties or classes. If we
were discussing Java, we might say that a type usable as an index need
only implement a MonotonicRange interface.

How so? I agree with b, and it has allowed me the scope to derive a class
where the lower bound changes (actually, the lwb changes with a
shift/unshift, not pop) What have you seen that suggests b is a mistake? I
think it's too much assumption to take over the behaviour of pop like
that. Thanks for the patch for (b).


I don't know when a negative index was given a meaning in Perl (i noticed
it sometime during the life of Perl 5 in the last few years, but never
chased down it's history). I liked it; my other favourite language is
Icon, and this language has given formal meaning to negative indexes from
it's inception. Perl's interpretation is extremely similar to Icon's. I
always assumed that some conniosseur (sp?) of languages borrowed it from
Icon, then again, it's not such an alien concept that it couldn't be invented independently...


When $x is discovered to be negative? yes:

@x = (0, 1, 2);
print "$x[-1]\n";
print "$x[-2]\n";
print "$x[-3]\n";
print "$x[-4]\n";

illustrates the point clearly. A tie class can surely choose to croak at
it's pleasure, depending on the behaviour it is providing. What's so tricky about that?

I was relieved to discover $NEGATIVE_INDICES since it solved the first
stumbling block. You might have noticed my posting about this nearly two
weeks ago. I answered that one myself and moved on.

I'm not complaining, but from it's inception it's been growing in leaps
and bounds through the versions. The progress has been good, but it's not
complete. All volunteer work too - you've got to love this hacker
community.

Yes the API is closer to complete, and complete enough for many purposes,
but Perl has shown a very effective history of growth. The more recent
optional DELETE and EXISTS methods plug in quite nicely without blowing up
existing code. I think the further growth of the tie interface can only
come from real world experience leading the way. My experiments have shown
the value of an optional FETCHLWB method at the very least, and I
suggested a FETCHUPB method for symmetry, which would allow FETCHSIZE to
be optional under the right circumstances.

[OK mjd, I see you are deep into the tie api. I'm now talking to the
source. I had hoped for that :=)]

Ummm, what? i'm not saying it can't work, I'm saying that

@x = @flex

doesn't work. $flex[-2] works fine. Used in an ordinary manner, the flex
arrays are simple and going strong. It's only when attempting to use the
implicit ranges that I'm finding problems. If you only ever code

$flex[$idx]
or
@flex[$x .. $y]

then it works properly. You just cannot play with @x = @flex, or even
@flex1 = @flex2 at the moment.


Weird features should not be accommodated. What wierd features? In the
beginning Larry gave us $[. Unfortunately it was not convenient and caused
all sorts of problems. Now it's deprecated, and rightly so. That doesn't
mean that a non-zero lwb is invalid or "wierd". It just means it could be better handled.


See, your code sample there contains some assumptions. This assumption
needs exposing. The SLICE method you suggested sounds ideal. FETCHSLICE
and STORESLICE as optional methods. I like it. FETCHLWB and STORELWB -
what else is left for arrays?

@tied = (1, 2, 3)

looks like it could be
tied(@tied)->STORESLICE(tied(@tied)->FETCHLWB, 3, 1, 2, 3)

assuming that STORESLICE and FETCHLWB exists. Presumably the STORESLICE
will decide whether it truncates the array or not dependent on the
implemented behaviour. Otherwise, lack of definition of STORESLICE would
fall back to STORE for individual fields. Lack of FETCHLWB would fall back
to an assumption of a zero origin. Sounds clear to me and does not
preclude a variety of implementations within the module. [Pity someone
with initials ah doesn't have the skillset to provide the patch]

So what exactly are they for? Just for half-assed limited enhancements?
The real question has got to be - just what is an array anyway? Are you
saying that the Perl philosophy doesn't trust fancy city-folk ideas?

Why did I complain in another posting about a suggestion that I'm doing it
"the wrong way"?

You've probably noticed that some people with a certain background scoff
at hashes with string indexes. They are horrified! No, you're joking! I
seem to recall that the first camel book addressed this point of view, but
I gave that copy away several years ago now. There now appears to be a
Perl orthodoxy which claims that only a zero-based index for arrays is
natural. If I was motivated enough, I'd trot over to the area of this
office full of wierd people wearing ties (some cobweb covered COBOL
programmers) and ask them if COBOL starts at zero or one, or gives you a
choice. I'll bet that if cobol only starts at one, they will be horrified
at zero origins. The 4GL I allude to below starts at origin 1. I expect I
could find at least one 4GL programmer in this building who will start
saying the rosary if I try to tell them about zero origin arrays.

This is the only reason why I call the bluff of anyone who questions the
motivation of a particular coding choice. From the wrong point of view,
any coding solution looks like "the wrong way to do it".

I sometimes give answers in c.l.p.misc (when i can be arsed to sift
through the beligerence and fighting) and I never attack people's problem
space. I have also spent a huge amount of time answering questions on
comp.databases.informix (enough to make it into the top 10 all-time
posters on that newsgroup) and I've always applied the policy of trusting
that people at least know the problem they are trying to solve. If there
are alternative solutions it always shakes out as you engage in friendly
exchange. This is why I think it's inappropriate to lead with "you're
probably approaching YOUR problem the wrong way, and you should think like
me".

[btw, i'm not saying that this exchange is unfriendly in any way. It's
quite pleasant and nice to be challenged in a positive way on a perl ng
instead of abusively as happens so often on c.l.p.misc]

ok - i'm willing to hear what your several reasons are. You sound like you
are gravitating back to the (a) philosphy you listed, and away from (b). I
would always put myself into the (b) group because I really like abstraction.


yes - I do, you've given me some valuable debate, and I thank you for
that. And because you've been a decent person, I trust you enough to
invite you to present some suggestions of alternative implementations.
Just how would you go about coding the storage of data where the first
index is (for example) 10_000. Would you really leave a large hole in the
front of the array? What if the number was not as neat as 10_000?

[just for the record, in Informix the integral value of the date
"1/1/2000" is 36_525 - the number of days since 1/1/1900]

You don't need to do full code, just some outlines, and I can flesh out
the nitty gritty. If the only alternative is a class, I can implement
that, but the syntax sucks imho. $ref->method(args) does not look like an
array. Must all code, sufficiently high level, degenerate into this
uniform sea of non-intuitive tokens? Let me illustrate that with another
story.

Another problem I have to chew over is re-implementing a mountain (a
huuuuge mountain) of 4GL code (Informix 4gl) into some other "modern"
language. Java is the fairly obvious language to push the 4GL code, for a
number of reasons:

1) Java programmers are a dime a dozen, and Java appears to be
understandable by the simpler mind of business programmers. I don't mean
to disparage business programmers, but few of them have studied computer
science, advanced mathematics or any of the hard sciences. They are far
more likely to have done business courses. COBOL is still endemic in so
many parts of the commercial programming world, and it's not merely
because of legacy code...

Most of the Java programmers in my company are young graduates who are
willing to spend time in the world of business programming. I started life
as a hard-core C programmer working with engineers and I really hated the
idea of doing business programming and SQL. It's funny where life takes
you :-) Anyway, the young grads don't seem to fall into many programming
traps with Java, so it seems to have fulfilled that part of it's design
admirably - ie safe programming.

2) Java is trendy and saleable.

3) the bosses think it should be Java

Now, the 4GL in use here has some interesting characteristics in it's
fundamental data types. All types can have a NULL value (this 4GL works
with SQL, after all) but the propagation of NULL is interesting and
different from either Perl or Java's treatment:
a + b => NULL if a or b is null.
Same rings true for many of the operators, in fairly obvious ways to suit the operator.
a == b => true/false/UNKNOWN.
Same rings true for all the comparison operators.

The unknown boolean value comes about if a or b are NULL. The truth tables
for the comparison and boolean operators are tri-state; they indicate T,F
and NULL inputs, and yield T, F and UNKNOWN results. The unknown value
propagates up the expression in clearly defined ways:

if a == b then .... else ..... end if

goes into the ELSE part if a NULL value propagates to the top of the
conditional as an UNKNOWN.

CASE a == b
WHEN TRUE ....
WHEN FALSE ....
OTHERWISE ....
END CASE

is perfectly legitimate AND useful code.

With Perl, I know I will be able to emulate this behaviour quite nicely
using classes and overloads, so coding 4GL style in Perl (ie mechanically
translating 4GL to Perl) is achievable. However, the project to translate
to Perl would not get approval here, and some of Perl syntax will freak
out business programmers.

Unfortunately, the situation is not nearly so easy for Java. Java's native
numerics do not have a NULL value (unless I've missed something). Java
provides class-based alternatives to the fundamental types, but they don't
appear to support operators - only method calls (unless I've missed
something). If I have to implement a class representing all the Informix
database types, then Java's lack of operator overloading forces code to
look like this

a = b.add(c.times(d.negate()))

where methods add, times, etc take care of implementing the NULL
propagation. I think that looks really nasty and makes me go cross-eyed,
and it looks like a dodgy translation too, not nice neat hand-written
code.

Actually it gets worse in Java. We can't use Java's native null value,
because then

b.add(x)

will fail miserably if b is null! So I would have to add a null indicator
to the class for each mapped type, and a NULL value specific to each
class, and the translated 4GL code would end up looking like a nasty
translation instead of neat first-class Java. Furthermore, the initial
value of a variable would be a Java null, and would require

a = Informix.Integer.nullvalue;

to make the variable initially have a logical null value. Nasty nasty
nasty.

Java's lack of operator overload might be "good" because it stops a lot of
the abuse that operator overloading brings to C++, but occasionally there
is a legitimate need for operator overloading...

The mapping from 4GL would result in clean code if translated to C++, but
I recoil in horror at the idea that we present C++ to business
programmers.

Anyway, this diversion into another problem and Java's weakness is because
I want to illustrate the point about how sucky solutions like
b.add(c.times(d.negate())) are. I draw the parallel with implementing flex
arrays as references to classes. Using $aref in place of @a is imho ugly
when it can be avoided. Arrays are royalty in almost all languages - why
lose that status just because you want to extend the behaviour?

Perhaps a complete massive lump of overloads will implement a flex class
more effectively than a TIEARRAY. I haven't been there yet... However a
class-based array will mostly be called $flex, not @flex, so we lose a
simple clue as to the nature of the beast. Call me old-fashioned, but I'd
like an array in Perl to look like .... a Perl array. If that's the worst
artifact of a class-based solution, I guess I could live with it, but I
haven't discovered yet whether overloaded @{} will make it easy to write
pretty code like

$flexref[-10 .. 10]

I'm really puzzled and a bit bemused by the fear of non-zero array bounds,
or the fear of high-level array semantics. I thought the tied array
facility was precisely for adding semantics that goes beyond Perl's
ordinary arrays. Am I wrong? Is it only intended to allow you to add
events and manipulators to arrays, but otherwise leave you stuck with
rather ordinary behaviour?

The first implementation of tie came from tying hashes to dbm, correct?
I'm sure it was originally added as a "neat" feature that was somewhat
complicated to implement, but then some bright soul(s) realised that it
leads on to a more powerful notation and programming facility which allows
programmers to go beyond the rules imposed by Perl's notion of data types.
Therefore I think that very last sentence I just wrote virtually mandates
that your interpretation (b) is the correct one for tied anything.

So what still brings reluctance to your mind? You suggest that a deep
thoughtful look at the entire API is needed. On the other hand, a deep
thoughtful look at what is still missing might be useful and would not
throw away what has already been achieved. How many more missing features
are there before the TIE is complete? I seriously doubt that an endless
stream of suggestions can be thrown at TIEs.

I think it's more a case of needing some deep introspection
(psychoanalysis if you like) to discover any deeply held notions left deep
within. I have discovered the deep assumption that origin is zero. Someone
else discovered the need to block the magical effect of negative indexes
(ok it was you). You have mentioned a few others including for hashes and
other types.

SLICE, LWB and UPB functions sounds like it will pretty-much wrap up the
tie array API. Maybe there's one or two more; I've only spent a short time
thinking about it, and I certainly don't know all the nooks and crannies
of Perl which make a few more assumptions about arrays.

So, if you can spare the time, hit me with your reasons you "hope" i never
implement the patches. What is so dirty about the idea? I am open to a
well-reasoned argument which says clearly why further additions to the tie
API are a hopeless waste of time. Are you so dispairing of the whole
subject just because a few suggestions from my direction would still not
complete the tie facility? Are you dispairing because it's all a lot of
hard work that may not benefit a lot of people?
  Reply With Quote
10 23rd October 17:08
terrence brannon
External User
 
Posts: 1
Default Tied arrays and a Perl limitation


"Andrew Hamm" <ahamm@mail.com> writes:


(Set|Array)::IntSpan looks to be exactly what the doctor (or in this
case, The Hamm) ordered

--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.

----- End forwarded message -----
  Reply With Quote
Reply


Thread Tools
Display Modes




666