Data processing steps

Research data process stepsTo the right are the data processing steps. Overview of each step:

Request data from jurisdiction

This is usually an open records request employing a combination of the Texas Public Information Act (TPIA), Rule 12 of the Texas Rules of Judicial Administration, and a common law right of access to judicial records per the U.S. Supreme Court's Nixon v. Warner Communications, Inc., 435 U.S. 589 (1978) (more info).

Generally the request is emailed, faxed, or mailed to jurisdictions.

All actions related to this are in my open records log.

Review data quality

Reviewing data quality is ascertaining whether the provided data is what I requested. Many times the data is not adequate because my request wasn't specific enough or because cities are unwilling or unable to produce the data. If the data is inadequate, I continue to work with the city until it's not economical to continue.

Transform data into CSV

I convert all useful data into CSV format to allow easy importation into a database. This step will utilize various technologies, although in some cases the data is already in CSV or Excel and is easy to import.

Load CSV into raw data table

This is simple data importation. No transformations, just get the raw data into the database.

Transform data into common dataset

The hope is that I will have a single, large table containing the data from all cities. The goal is sufficiently refined data that it can be treated as one large dataset. However, I know this vision will be impossible because some cities did not produce all requested data.

Geocode location information

I want to do some spatial analysis of the data I received. To do that, I need to convert the address information contained in most tickets to latitude and longitude. This will have several challenges, some related to approximation of addresses or incomplete data,