[XML-SIG] Re: ElementTree

Greg Wilson gvwilson at cs.utoronto.ca
Wed Mar 16 22:08:27 CET 2005


Hi everyone.  I posted a problem with ElementTree to c.l.py yesterday.
Fredrik sent me a one-line patch (included below).  I applied it, but
ElementTree still fails in the same place, the same way, so I switched to
cElementTree.  It parses half of my input document, but fails on the
fourth occurrence of &rquot; --- it handles the previous three, and
occurrences of &lquot; and &ldots;, just fine.  As before, xml.dom.minidom
parses the document without complaint.  Any ideas?  My file, the DTD, and
my script are attached; validate.py is dying on line 130 of the input
file.

Thanks,
Greg

On Wed, 16 Mar 2005, Fredrik Lundh wrote:

> hi greg,
>
> > Hi Frederik.  I added ElementTree to the data crunching book, then went
> > back and started revising my PSF-funded course material tools to use it.
> > Immediately ran into a problem with DTDs (described in the first
> > attachment, which I posted to c.l.python y'day).
>
> did you see my reply?
>
>     http://article.gmane.org/gmane.comp.python.general/392915
-------------- next part --------------
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE lec SYSTEM "swc.dtd">

<lec title="Introduction" id="intro" svn="$Id: intro.swc 21 2005-03-16 18:08:45Z gvwilson $">



<topic title="Motivation" summary="motivation for course">

 <slide>

  <b1>Computers are as important to scientists as telescopes and test tubes

   <b2>Analyze problems that are too complex for traditional means</b2>

   <b2>Simulate things that can't be studied in laboratories</b2>

  </b1>

  <b1>Many scientists now spend much of their professional lives writing and maintaining software

   <b2>A quarter of graduate students in science and engineering spend 25-50% of their time programming</b2>

  </b1>

  <b1>But most scientists have never been taught how to do this efficiently

   <b2>It's a long way from the loops and arrays of first year to simulating bone development in foetal marsupials&ldots;</b2>

   <b2>Like being shown how to differentiate polynomials, then expected to invent the rest of calculus</b2>

  </b1>

  <b1>This course will teach you how to design, build, maintain, and share programs more efficiently

   <b2>Focus: tools and techniques appropriate for half a dozen people working together for a year

    <b3>Everything you do at that scale will also make you more productive when you're working on your own for a week</b3>

   </b2>

   <b2>Will <e>not</e> turn you into a computer scientist

    <b3>Far too many of them around anyway</b3>

   </b2>

   <b2>Instead, goal is to teach you the equivalent of good laboratory technique for computational science

    <b3>The 20% of ideas that account for 80% of real world use</b3>

    <b3>Software carpentry, rather than software engineering</b3>

   </b2>

  </b1>

 </slide>

</topic>



<topic title="Meeting Standards" summary="need to improve quality as well as efficiency">

 <slide>

  <b1>Experimental results are only publishable if they are believed to be <e>correct</e> and <e>reproducible</e>

   <b2>Equipment calibrated, samples uncontaminated, relevant steps recorded</b2>

   <b2>In practice, almost always rely on the professionalism of the people doing the work</b2>

  </b1>

  <b1>How well do computational scientists meet these standards?

   <b2>Correctness of code rarely questioned

    <b3>We all know programs are buggy, but when was the last time you saw a paper rejected because of concerns over the quality of the software used to produce the results?</b3>

   </b2>

   <b2>Reproducibility often nonexistent

    <b3>How many people can reproduce, much less trace, each result in their thesis?</b3>

   </b2>

  </b1>

  <b1>Quality expectations can change overnight

   <b2>Like the American car market when German and Japanese imports appeared in the 1970s</b2>

  </b1>

 </slide>

</topic>



<topic title="Who You Are" summary="target audience">

 <slide>

  <b1>User stories

    <b2>Important part of designing user interfaces for mass-distribution software</b2>

    <b2>Helps make discussion of features and usability more concrete</b2>

  </b1>

  <b1>Bhargan Basepair

   <b2>27; B.Sc. in zoology</b2>

   <b2>Did an introductory Fortran course nine years ago, and attended a workshop on web-based bioinformatics tools when he started his job</b2>

   <b2>Now developing fuzzy pattern-matching algorithms for Genes'R'Us, a biotech firm with labs in four countries</b2>

  </b1>

  <b1>Harald Helmet

   <b2>23; B.Eng in mechanical engineering, now doing an M.Sc. part time</b2>

   <b2>Did C in first year; has been using MATLAB ever since</b2>

   <b2>Modeling thermal degradation (a.k.a. &lquot;melting&rquot;) of firefighters's helmets</b2>

  </b1>

  <b1>Rachel Rotor

   <b2>34; Ph.D. in physics</b2>

   <b2>Took two courses on C and two on numerical analysis as an undergrad, and a computer graphics course as a graduate student</b2>

   <b2>Now in charge of the 5-person flywheel braking group at Yoyodyne Inc.</b2>

  </b1>

  <b1>Sally Synthesis

   <b2>22; finished a B.Eng. in chemical engineering last year, now doing an M.Sc. in chemistry</b2>

   <b2>Did Java in first year, taught herself C, and has built a personal web site (static HTML only)</b2>

   <b2>Thesis topic is improving the yield of fullerene production processes</b2>

  </b1>

 </slide>

</topic>



<topic title="A Quick Self-Test" summary="self-test">

 <slide>

  <b1>Adapted from Joel Spolsky <cite ref="spolsky-joel-on-software"/>

   <b2>0 for &lquot;no&rquot;, 1 for &lquot;yes&rquot;</b2>

   <b2>-1 if you don't know what the term means, or how to tell</b2>

  </b1>

  <b1>So:

   <b2>Do you use version control?</b2>

   <b2>Can you rebuild everything in one step?</b2>

   <b2>Do you have an automated test suite?

    <b3>Bonus marks if the tests report how much of the code they exercise</b3>

   </b2>

   <b2>Do you build the software, and run the test suite, daily?</b2>

   <b2>Do you have a bug database?</b2>

   <b2>Do you use a symbolic debugger?</b2>

   <b2>Is your code written in a uniform, readable way?

    <b3>Bonus marks if you use a style checker to check this automatically</b3>

   </b2>

   <b2>Is there a searchable archive of project-related communication?</b2>

   <b2>Do you have an up-to-date schedule with binary milestones?</b2>

   <b2>Can you trace everything you release back to the software that produced it?</b2>

   <b2>Do you do code reviews?</b2>

   <b2>Is time set aside in the schedule for infrastructure development and training?</b2>

  </b1>

  <b1>And your score is?</b1>

 </slide>

</topic>



<topic title="Learn by Building" summary="course philosophy">

 <slide>

  <b1>So why are we where we are?

   <b2>It's difficult to learn these things from academic computer scientists

    <b3>CS research is more concerned with rapid prototyping than with reliability</b3>

   </b2>

   <b2>People are naturally sceptical of innovation

    <b3>Particularly after they've seen a few bandwagons roll through</b3>

    <b3>Glass's Law <cite ref="glass-software-engineering-facts"/>: any new way of doing things initially slows you down</b3>

   </b2>

   <b2>You only have to be as good as the competition

    <b3>American auto makers in the 1970s</b3>

   </b2>

  </b1>

  <b1>This course's approach:

   <b2>Introduce some basic tools

    <b3>Students immediately see benefit of taking the course</b3>

    <b3>Tools can be used to manage the course itself</b3>

   </b2>

   <b2>Show students how to build tools like these

    <b3>Where &lquot;how&rquot; includes both what goes into the software, and how to create it</b3>

    <b3>Solidifies understanding of tools' capabilities and limitations</b3>

    <b3>Makes discussion of technique more concrete</b3>

   </b2>

   <b2>Show students what else they can do with their new skills

    <b3>The right way to tackle issues that come up over and over again</b3>

   </b2>

  </b1> 

  <b1>Key point: avoid overload

   <b2>People who already know these things tend to underestimate how hard they are to learn</b2>

   <b2>No point preaching to the top 10%</b2>

   <b2>Try instead to move the middle of the bell curve to the right</b2>

  </b1>

 </slide>

</topic>



<topic title="Topics" summary="topics">

 <slide>

  <b1>Three Tools

   <b2><ref sec="shell" text="title"/></b2>

   <b2><ref sec="version" text="title"/></b2>

   <b2><ref sec="make" text="title"/></b2>

  </b1>

  <b1>Programming

   <b2><ref sec="py01" text="title"/></b2>

   <b2><ref sec="py02" text="title"/></b2>

   <b2><ref sec="py03" text="title"/></b2>

   <b2><ref sec="ads" text="title"/></b2>

   <b2><ref sec="py04" text="title"/></b2>

  </b1>

  <b1>Individual Practices

   <b2><ref sec="test01" text="title"/></b2>

   <b2><ref sec="test02" text="title"/></b2>

   <b2><ref sec="debugger" text="title"/></b2>

   <b2><ref sec="debugging" text="title"/></b2>

   <b2><ref sec="style" text="title"/></b2>

   <b2><ref sec="team" text="title"/></b2>

  </b1>

  <b1>Data Crunching

   <b2><ref sec="re" text="title"/></b2>

   <b2><ref sec="xml01" text="title"/></b2>

   <b2><ref sec="xml02" text="title"/></b2>

   <b2><ref sec="sql01" text="title"/></b2>

   <b2><ref sec="sql02" text="title"/></b2>

  </b1>

  <b1>The Web

   <b2><ref sec="http" text="title"/></b2>

   <b2><ref sec="cgi01" text="title"/></b2>

   <b2><ref sec="security" text="title"/></b2>

   <b2><ref sec="cgi02" text="title"/></b2>

  </b1>

  <b1>Putting It All Together

   <b2><ref sec="proj101" text="title"/></b2>

   <b2><ref sec="proj102" text="title"/></b2>

   <b2><ref sec="summary" text="title"/></b2>

  </b1>

 </slide>

</topic>



<topic title="Setting Up" summary="what you will need">

 <slide>

  <b1>Some previous programming experience

   <b2><c>for</c> loops, <c>if</c>/<c>then</c>/<c>else</c></b2>

   <b2>Function calls</b2>

   <b2>Arrays</b2>

   <b2>File I/O</b2>

   <b2>Compilation</b2>

  </b1>

  <b1>Individual setup

   <b2>Python (version 2.4 or higher)</b2>

   <b2>Cygwin (on Windows)</b2>

   <b2>DrPython

    <b3>Or Komodo</b3>

    <b3>At least get a smart editor</b3>

   </b2>

  </b1>

  <b1>Course setup

   <b2>Subversion</b2>

   <b2>Trac

    <b3>Apache</b3>

    <b3>SQLite or PostgreSQL</b3>

    <b3>PySVN</b3>

   </b2>

  </b1>

  <b1>Time

    <b2>Expect to spend 2-3 hours outside class for each lecture</b2>

  </b1>

 </slide>

</topic>



<topic title="Recommended Reading" summary="recommended reading">

 <slide>

  <b1>Books

   <b2><cite ref="hunt-thomas-pragmatic-programmer"/></b2>

   <b2><cite ref="glass-software-engineering-facts"/></b2>

   <b2><cite ref="spolsky-joel-on-software"/></b2>

   <b2><cite ref="lutz-ascher-learning-python"/></b2>

   <b2><cite ref="wilson-data-crunching"/></b2>

  </b1>

  <b1>Web resources

   <b2><fixme>Create list of useful links</fixme></b2>

  </b1>

 </slide>

</topic>



<topic title="The Rules" summary="rules of programming">

 <slide format="enum">

  <b1>A week of hard work can sometimes save you an hour of thought.</b1>

  <b1>If it's worth doing again, it's worth automating.</b1>

  <b1>Anything repeated in two or more places will eventually be wrong in at least one.</b1>

  <b1>The three chief virtues of a programmer are laziness, impatience, and hubris.</b1>

  <b1>It's not what you know, it's what you can.</b1>

  <b1>The deadline isn't when you're supposed to finish; the deadline is when it starts to be late.</b1>

  <b1>Never debug standing up.</b1>

  <b1>Tools are signposts, not destinations.</b1>

  <b1>Not everything worth doing is worth doing well.</b1>

  <b1>Code unto others as you would have others code unto you.</b1>

  <b1>Every complex file format eventually turns into a badly-designed programming language.</b1>

 </slide>

</topic>



</lec>

-------------- next part --------------
<!-- $Id: swc.dtd 22 2005-03-16 18:09:19Z gvwilson $ -->

<!ENTITY ldots "&#x8230;">

<!ENTITY lquot "&#x201C;">

<!ENTITY rquot "&#x201D;">

-------------- next part --------------
#!/usr/bin/env python



import sys, os

import cElementTree as ElementTree



for filename in sys.argv[1:]:

    ElementTree.parse(filename)



More information about the XML-SIG mailing list