Parsing An Industrial XML File

Attached is code to parse an XML files that we will soon receive from our Automated Meter Reading (AMR) vendor. We bill our water accounts from daily reads, which we receive seven days a week. Hourly reads will assist our water department with helping customers figure out water usage. In other words, hourly reads serve as diagnostic information to see when water was used during a twenty-four hour period.

Each of the search tags in the Python program, which I believe are XML leaves, are the actual parts of the XML file. Adding these to the dictionary this way ties the XML file to the dictionary, which in this case I believe is a good thing, if someone looks at the XML file all on its own.

This example uses lxml, which from what I can interpret out in the Python community is the XML parsing library of choice.

Pulls a name space away from an element. This is for easier searching.

def parseHrNs(qname):
 namespace, element_name ='^{(.+)}(.+)$', qname).groups()

 namespace = None
 element_name = qname

 return namespace, element_name

def parseHrXmlDoc(fnam):

 rc = 0
 se_read = []

 current_ch = 0
 current_endpoint = 0

 ert_ch_text_key = ' '

 hrDict = {}

 context = etree.iterparse(fnam)

 context = None
 rc = -1

 if 0 == rc:
 nspace = None

 These searches are roughly in order of appearance in the XML document.

 Channel number found after ID.

 ID is the place to write the last data before assigning new ID.

 Check on se_read[] provides initialization step, so we don't assign rubbish at the

 for action, elem in context:
 nspace, search_tag = parseHrNs(elem.tag)

 #print("%s: %s" % (search_tag, elem.text))

 if 'exportGuid' == search_tag:
 hrDict[search_tag] = elem.text

 elif 'exportDateTime' == search_tag:
 hrDict[search_tag] = elem.text

 elif 'collectionSystemType' == search_tag:
 hrDict[search_tag] = elem.text

 elif 'collectionSystemID' == search_tag:
 hrDict[search_tag] = elem.text

 elif 'startTime' == search_tag:
 hrDict[search_tag] = elem.text

 elif 'endTime' == search_tag:
 hrDict[search_tag] = elem.text

 elif 'intervalLengthInSeconds' == search_tag:
 hrDict[search_tag] = elem.text

 elif 'ID' == search_tag:

 There really are null elements, so perform a continue.

 if not elem.text:
 This is an initialization step.
 se_read is an empty list on initialization, so
 just assign the current endpoint if se_read is an empty list. That
 should only happen once.

 After initialization, basically, this is already visted logic.
 We've already seen and cached the ID in current_endpoint and
 the channel in current_ch. Before assigning the new ID,
 assign what we have in the dictionary. The fact we're here
 means we've seen a new ID, so the prior data must be put in dictionary.

 if se_read:

 Endpoint ids are not unique for the dictionary if they are dual-port, so
 we need to create a unique dual key based on the channel number.

 current_ch was already saved, and look below. We have not taken
 elem.text's value yet. We're still going on already cached data, which
 is our intent. See explanation above.

 ert_ch_text_key = str(current_endpoint) + '-' + str(current_ch)

 hrDict[ert_ch_text_key] = se_read

 se_read = []

 current_endpoint = elem.text

 elif 'channelID' == search_tag:
 Channel must be preserved, even though we've already seen ID.
 current_ch will be used before next ID assigned.

 current_ch = elem.text

 elif 'value' == search_tag:

 return hrDict

Decision To Implement Progamming Language

As many already know, our town started on a large water project nearly two years ago. Specifically, the project’s goal was to change how water meters were read. Nearly twenty years ago, a similar project made the meter’s reading accessible on the outside of most structures, except where meters were installed in manholes and so-called “pits”. Either an ARB in early years or a touchpad within the last five years, was installed to facilitate reading the meter from outside the structure, that is within 50-100′ of a meter.

The current project is implementing a “fixed network”, by replacing the ARB or touchpad with a radio endpoint. The fixed part of the fixed network means that collectors listen to the endpoints; gather reads that are bubbled up throughout the day; and send them back to an application that matches the endpoint serial number with its associated customer data. Electric and gas meters often participate in a mobile network, where a vehicle traverses a route that comes near to all endpoints in the area, so they can be read.

The piece that sent customer configuration to the fixed network application and processed the water meter reads generated daily fell to the town’s IT department to implement. We call the application and server AMR (automated meter reading).

Two of this server’s primary functions are automatic. It checks for and processes customer configuration data, and sends it to the endpoint configuration system, where the meter reads are processed from the fixed network. The other function is to process daily meter reads. Other functions, like collecting reads for a billing cycle and reports comprise the other functions. It is a goal to have a web application facilitate configuration, report generation, and snap-shotting reads for a billing cycle.

One of our goals was to use as much open source as possible, including Python. Python makes up nearly 100% of the software implementation. Our python software makes use of the mysqldb module, which facilitates a SQL access to the MySQL database.

Our current feelings are that Python was the right choice. It is a language widely taught in high school, and there is a large pool of Python programming talent. Any instances where we felt speed was an issue, was not due to Python, but both schema implementation and a need to re-write the SQL code.

And, like most software projects, there is a constant need to continue implementing functions and fix bugs. Python has worked well in this area, without encouraging a lot of patch-work (spaghetti) code.

A Good Example of Time Wasted

This is a story about letting something be important that is not; poking at — investigating — that unimportant thing over several months without devoting enough time to find out what was really going on; and finally solving the problem.

As readers of already know, I’m involved in working on an Automated Meter Reading (AMR) data transfer system. The main functions of the system are to store daily water meter reads from our Itron wireless meter reading system; store and push meter configuration to that wireless system; and snap-shot meter reads for water bills.

From the very moment the daily reads were automated in late May 2010, the python code would spit out this error:

/home/amr/bin/ Warning: Incorrect integer value: '' for column 'MeterID' at row 1
rc = ins_cursor.execute(sql_info[sqlCd.cmd], sql_info[])
/home/amr/bin/ Warning: Out of range value adjusted for column 'MeterID' at row 1
rc = ins_cursor.execute(sql_info[sqlCd.cmd], sql_info[])
Downloading files from Itron.

There seemed to be two obvious ways to solve this problem:


  test_meter_val = int(meter_val)
except ValueError:
  meter_val = 0


  rc = \
  ins_cursor.execute(sql_info[sqlCd.cmd], \
except MySQLdb.Error, e:
  raise e
except MySQLdb.Warning, e:
  print("Here is the MySQL warning ", e)

Well as it stands, the MySQLdb.Warning never triggered, because I needed to re-read this article —

Then I finally got the plain try except block to work.

However, getting it to work required more than a cursory glance and a one line code fix. It required a test system where the database was not the production database; hence no production data was at risk.

The moral of the story is this. Don’t chase down this kind of warning, unless it is going to cost you something big, and if you are going to chase down the problem, sink enough time in it to understand fully what is going on. I failed on both counts.