TxDPS geocoding, continued

Aren Cambre's picture

TxDPS ticketing along Dallas North TollwayI've continued to work on the TxDPS dataset.

The dataset has two types of tickets, for location purposes: those with GPS coordinates, and those without.

It's straightforward to geolocate the tickets with GPS coordinates. To the right is an example from the Dallas North Tollway. You'll notice that the tickets match the road's actual path really well. Sure, there are some stray tickets here and there, but they are minimal. I probably need to inspect the average straight-line distance of each ticket from the road's centerline to ascertain the true variance over one dimenson, but I imagine it wil be minimal.

Now the other challenge is for tickets without GPS coordinates. This generally applies to almost all tickets issued before 2008.

The way the TxDPS records ticket locations is to note the road's class and name and the nearest reference marker. (Reference markers discussed in an earlier post.)

Fortunately, I have a separate database of all Texas Department of Transportation reference markers with GPS coordinates, so as long as I can correlate to the nearest reference marker, I can geolocate the ticket within roughly 1 mile.

The TxDPS road class designations are:

  • 1 = Interstate (also used for tollways)
  • 2 = US or state highways
  • 3 = Farm to Market or Ranch to Market
  • 4 = county road
  • 5 = city street
  • & = other

Now here's a problem: both US 71 and TX 71 show up in the TxDPS database with name 0071 and road class 2. How do you tell roads like these apart?

If you have GPS, it's easy. But for the majority of tickets without GPS, it's more challenging.

One way is to hope both roads aren't in the same county. TxDPS also stores a county code, so you do the lookup on road name, road class, reference marker, and county.

But what if a state highway and US highway are both in the same county? This is the case for US 70 and TX 70!

In that case, all you can really do is hope that the reference markers are sufficiently apart to prevent an incorrect match. And in the case of US 70 and TX 70, where they intersect, US 70 will have higher reference markers that do not overlap with TX 70's markers.

Two more problems, that seem to be relatively minor, are that officers and clerical staff make mistakes. I've seen many examples where the GPS coordinates are a good distance from the indicated reference marker, and the error isn't explained by a keystroke. All I can guess is the officer wasn't paying attention and just remembered the most recent reference marker.

Another case are likely problems with clerical staff. I've seen some tickets where the GPS location is clearly on one road, but the route class and name are on a different road. This is only a guess, but I figure that if an officer focuses on road X and turns in a stack of tickets from that road, but that stack includes a small number of tickets from road Y, the typist may make some mistakes with road Y.

Now for an error I made.

The TxDPS county code correlates to a TxDOT county code, but indirectly.

The table that holds the tickets contains the county code, and I figured the corresponded directly to the TxDOT county codes.

Well, not exactly. If you look at the TxDOT county codes, you'll notice they don't all increment perfectly. For example, look at Kenedy County. The neighboring counties in alpha sort are Kendall (131) and Kent (132). Kenedy's number is 66.

I have no idea why it is ordered this way. All I do know is that, being formed in 1921, Kenedy County is among the newest Texas counties. So that might explain the discontinuity. But why number 66? Was there a now-disbanded county whose name was between Donley (65) and Duval (67)? Even if this happened way back when, why would it matter in the late 20th century, when this numeric convention was likely set? That would have been decades after Kenedy County's formation.

That curiosity notwithstanding, the TxDPS county reference was offset by 1 for many counties. In fact, the county number referred to a different table in the TxDPS database that correlated the TxDPS table refernce number to the TxDOT number.

It admittedly took me way too long to figure this out. Until I did, I was flummoxed by things like this:

This is where I was analyzing variance in my data. The red lines are between the locations of reference markers and the average location of traffic tickets for a given reference marker. Why was there such a strong correlation between Eastland County (Eastland, TX) and Ector County (Odessa, TX)?

Well, turns out that the tickets referenced county ID 68. County 68 in the TxDOT dataset is Eastland County. But the problem is that county 68 in the TxDPS dataset is in fact county 69 in the TxDOT data, which is Ector County!

As I'm typing, I have a program running re-analyzing all my TxDPS tickets, and this time I have corrected for the bad county numbers. Hopefully the 2nd time through will work! I'll find out when it finishes running in a few hours.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <small> <sup> <sub> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <h2> <h3> <h4> <img> <br> <br /> <p> <div> <span> <b> <i> <table> <td> <tr> <tbody>
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>, <c>, <cpp>, <drupal5>, <drupal6>, <java>, <javascript>, <mysql>, <php>, <ps1>, <python>, <r>, <ruby>, <sql>. The supported tag styles are: <foo>, [foo].
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.