Ordered list question

Thomas 'PointedEars' Lahn PointedEars at web.de
Sun Jul 17 15:00:11 EDT 2011


jyoung79 at kc.rr.com wrote:
^^^^^^^^^^^^^^^^^^
Something is missing there.

> I'm currently working on a project where I'm looping through xml elements,
> pulling the 'id' attribute (which will be coerced to a number)

No, usually it won't.

> as well as the element tag.

That's element _type name_.

> I'm needing these elements in numerical order (from the id).

Attribute values of type ID MUST NOT start with a decimal digit in XML [1].

> Example xml might look like:
> 
> <price id="5">
> <copyright id="1">
> <address id="3">

That is not even well-formed, as the end tags of the `address', `copyright', 
and `price' elements (in that order) are missing.  Well-formed XML would be 
either

  <foo>
    <price id="5"/>
    <copyright id="1"/>
    <address id="3"/>
  </foo>

or

  <foo>
    <price id="5">
      <copyright id="1"/>
    </price>
    <address id="3"/>
  </foo>

or

  <foo>
    <price id="5"/>
    <copyright id="1">
      <address id="3"/>
    </copyright>
  </foo>

or

  <price id="5">
    <copyright id="1"/>
    <address id="3"/>
  </price>  

or

  <price id="5">
    <copyright id="1">
      <address id="3"/>
    </copyright>
  </price>  

but neither might be Valid (or make sense).  Check your DTD or XML Schema.

> There will be cases where elements might be excluded, but I'd still need
> to put what I find in id numerical order.  In the above example I would
> need the order of 1, 3, 5 (or copyright, address, price).  In javascript I
> can easily index an array, and any preceding elements that don't exist
> will be set to 'undefined':
> 
> -----
> var a = [];
> 
> a[parseInt('5')] = 'price';
> a[parseInt('1')] = 'copyright';
> a[parseInt('3')] = 'address';
> 
> //  a is now   [undefined, copyright, undefined, address, undefined,
> price] -----

This is nonsense even in "javascript" (there really is no such language 
[1]).  In ECMAScript implementations like JavaScript you would write

  var a = [];
  a[5] = "price";
  a[1] = "copyright";
  a[3] = "address";

as array indexes are only special object properties, and properties are 
stored as strings anyway.  However, the highest index you can store this 
way, in the sense that it increases the `length' of the array, would be 
2³²−2 (as the value of the `length' property ranges from 0 to 2³²–1).

Python's `list' type is roughly equivalent to ECMAScript's `Array' type.  
Important differences include that apparently you cannot store as much items 
in a Python list as in an ECMAScript Array – 

>>> for p in range(0, pow(2, 31)-1): a.append(p)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

  [Kids, don't try this at home!]

>>> for p in range(0, pow(2, 31)): a.append(p)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: range() result has too many items 

–, and that you need to add enough items in order to access one (so there 
are no sparse lists):

>>> a[23] = 42
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

(I was not aware of that.)  Also, the access parameter must be integer:

>>> a["23"] = 42
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str

>>> a["foo"] = "bar"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str

[Using a non-numeric or out-of-range parameter (see above) for bracket 
property access on an ECMAScript Array means that the number of elements in 
the array does not increase, but that the Array instance is augmented with a 
non-element property, or an existing non-element property is overwritten; 
this cannot happen with Python lists.]

> Next, I can loop through the array and remove every 'undefined' in order
> to get the ordered array I need:

Or you could be using an ECMAScript Object instance in the first place, and 
iterate over its enumerable properties.  This would even work with proper 
IDs, but if you are not careful – chances of which your statements about the 
language indicate – you might need further precautions to prevent showing up 
of user-defined enumerable properties inherited from Object.prototype:

  var o = {
    5: "price",
    1: "copyright",
    3: "address"
  };

or programmatically:

  var o = {};
  o[5] = "price";
  o[1] = "copyright";
  o[3] = "address";

Then:

  for (var prop in o)
  {
    /* get prop or o[prop] */
  }

> -----
>> var newA = [];
> 
>> for (var x = 0; x < a.length; x++) {

Unless a.length changes:

  for (var x = 0, len = a.length; x < len; ++x) {

The variable name `x' should also be reserved for non-counters, e. g. object 
references.  Use i, j, k, and so forth in good programming tradition here 
instead.

>     if (a[x] != undefined) {

  if (typeof a[x] != "undefined") {

as your variant would also evaluate to `false' if a[x] was `null', and would 
throw an exception in older implementations at no advantage (however, you 
might want to consider using `a[x] !== undefined' for recent implementations 
only).

>         newA.push(a[x]);
>     }
> }
> 
> // newA is now   [copyright, address, price]

Or you would be using Array.prototype.push() (or a[a.length] = …) in the 
first place instead of this, as contrary to what you stated above you appear 
to be only interested in the element type names:

  var a = [];
  a.push("price");
  a.push("copyright");
  a.push("address");

> -----
> 
> My question is, does python have a similar way to do something like this?
> I'm assuming the best way is to create a dictionary and then sort it by
> the keys?

As are ECMAScript objects, Python's dictionaries are an *unordered* 
collection of name-value pairs.  You would be using Python's `list' type, 
and its append() method (foo.append(bar)) or concatenate two lists instead 
(foo += [bar]).  Then you would sort the list (see below).  (You could also 
use a dictionary object, and use its keys() method and then sort its return 
value.  Depends on your use-case.)

I am getting the idea here that you intend to apply string parsing on XML.  
However, when working with XML you should instead be using an XML parser to 
get a document object, then XPath on the document object to retrieve the 
`id' attribute values of the elements that have an `id' attribute 
('//*[@id]/@id') or the elements themselves ('//*[@id]'), in which case you 
would use XPathResult::*ORDERED_NODE_SNAPSHOT_TYPE to apply the 
snapshotItem() method to generate a list, and then you would probably simply 
say mylist.sort() [but see below].  Different XPath APIs for Python might 
also present the result as a list already without you having to call the 
snapshotItem() method.  You should look into the libxml2 module and lxml.

If you are instead interested in finding out, e.g., the element type for a 
specific ID, without using XPath again, then you should build and sort a 
list of dictionaries –

  a = [{"id": "5", "type": "price"},
       {"id": "1", "type": "copyright"},
       {"id": "3", "type": "address"}]

or, programmatically

  a = []
  a.append({"id": "5", "type": "price"})
  a.append({"id": "1", "type": "copyright"})
  a.append({"id": "3", "type": "address"})

– which is BTW syntactically exactly the approach that you would use in an 
ECMAScript implementation (except for the trailing semicolon that should not 
be missing in ECMAScript).  The Python solution [3] –

  a.sort(cmp=lambda x,y: cmp(x['id'], y['id']))

or (since Python 2.4)

  a.sort(key=lambda x: x['id'])

or (since Python 2.4)

  sorted(a, cmp=lambda x, y: cmp(x['id'], y['id']))

or (since Python 2.4)

  sorted(a, key=lambda x: x['id'])

– only differs from the ECMAScript-based one in the way that the lambda 
expression for the comparator is written [4]:

  a.sort(function(x, y) {
    var x_id = x.id,
        y_id = y.id;
    return ((x_id < y_id) ? -1 : ((x_id == y_id) ? 0 : 1));
  });

(The local variables should improve the efficiency in the worst case.  You 
may omit some parentheses there at your discretion.)

A difference between sorted() and the other Python ways is that the former 
returns a sorted list but leaves the original list as it is.  (ECMAScript 
does not provide a built-in method to sort an array of objects by the 
object's property values, nor does it provide a built-in one that sorts an 
array or array-like object not-in-place.  But such is easily implemented.)


HTH
__________
[1] <http://www.w3.org/TR/xml/#id>
[2] <http://PointedEars.de/es-matrix>
[3] <http://wiki.python.org/moin/HowTo/Sorting/>
[4] 
<https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/array/sort>
-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.



More information about the Python-list mailing list