American Grand Jury Foundation
Home
The grand jury
Civic commentaries
Civics and Home schooling
Local government facts
Library
About us
Terms of use
Contact us

Local Government Facts

Some Notes on the Fine Art
of Data Dredging

Data DredgerNothing is more important for active citizens than using facts rather than unfounded conclusions and emotions to make a case for needed improvements in local government. Well, that might not be altogether true in these sensate times, but a few traditionalists still prefer factually supported conclusions rather than guess work, wishful thinking, self-interest, or examining the entrails of slaughtered animals.

 When training grand jurors, we always urge them to support their recommendations with facts. Invariably, someone in the class contemptuously expresses the old saw about “lies, damned lies, and statistics.” This cliché is variously attributed to Benjamin Disraeli, Will Rogers, Harry Truman, Mark Twain, or others, but since its first utterance many decades past, its luster has faded. Though the bromide has great meaning for the speaker, people familiar with the tools and methods of objective analysis will conclude that he fails to understand an important point: Though numbers are essential for evaluating public services, they must always be used with caution, as the following suggestions imply.

For decision making, one must never rely on numbers, quotations, and the like without confirming their validity. To help you do this, we will generally cite the source of data we display in this Web site. If you visit the source we cite to verify our information, you might occasionally be surprised to discover that some of the facts differ from what you see on our Web site. One reason for the discrepancy might be that we changed some of the source data slightly to illustrate something, such as an odd trend line in a graph or “data spikes.” This, of course, is not a matter of error but the altering of data for an instructional purpose.

 Error is the more troublesome type of problem with numbers no matter where you find them. Anyone working with local-government statistics published by federal or state agencies sooner or later encounters this problem in governmental databases. Suppose, for example, that a state law requires the county sheriff to forward certain crime statistics to a state agency. Typically, a clerk in the sheriff’s department obtains the needed information from forms that deputy sheriffs complete. The clerk probably types the numbers on a state form he or she completes and sends it to the state agency. Perhaps typists in the state agency transfer the numbers from the form to a computer database. At this point, the numbers have passed through three sets of hands: those of the deputy sheriffs, the person in the sheriff’s department who compiles the data for the state agency, and typists in the state agency. Now imagine that someone in the sheriff’s department suddenly discovers that one or more of the deputies incorrectly completed the “source document.” The corrected data then follow the same path from the sheriff’s department clerk to the typists in the state agency.

Here is another example of hazards that might be in data sets. In your own data-dredging activities, you might discover discrepancies between what you have obtained, say, from an official statistical report of one year and what that report shows in a later year for the same subject. This is not necessarily a sign of error. Sometimes, for example, state agencies have a legal deadline they must meet for publishing an annual report based on data they collect from local government. Therefore, they must “close the books” on publishing data from local government at a certain point in the year. A few months after the report is issued, employees of the agency might receive revised or corrected data about the topic that were not originally available in time to include them in the statistical report of the previous year. In this case, the updated information is added to the state’s database. However, suppose you are working with a copy of the statistical report that was originally published. If you prepare a document based on that report, you might be surprised when someone objects to your findings because some of the data in your report do not match what is in the updated state database. Not only is your report compromised, but you are also. Much more could be said about this than we have described, but the point should by now be obvious: Data dredging is hard work, and the sources of error are numerous.

In most cases, the kinds of data we will present on this site illustrate facts that civil grand jurors and other citizens in California have used, or should use, in issuing reports about their investigations. You will find examples of frequent subjects of civil grand jury investigations in Grand Juries in California (pp. 96-105). We will sometimes use this list to decide what kinds of data to present on this site. Or, we may find in a state report, a newspaper article, or an academic study other data that we think might be useful in evaluating local-government services, functions, and programs. In some cases, we might verify these data by checking them against their origin. On other occasions, we might reproduce the data without verification. This is another reason why visitors to this Web site should always check data on our site before using them for their own purposes.

In general, we present data on this site with no public-policy purpose in mind. The Local Government Facts section of this site is not principally about topics; rather, it serves three purposes: (a) to show the variety and nature of data concerning local government; (b) to illustrate and encourage the fair and objective use of data; and (c) to encourage citizen activists to use sound information in their advocacy projects. Because our emphasis is generally the method of using data (not necessarily the subject itself), we might occasionally alter the original data to illustrate a point about designing tables, graphs, or charts.

 Data, particularly of the numerical variety, should rarely be used as the sole basis for judging the effectiveness, need, or realignment of a public organization. Think of them as tools for discovery. When you find interesting, provocative, or paradoxical data, use them for interviewing people who are qualified to help you understand what they might mean. For example, try to obtain answers to questions such as these: How and why are the data collected? To whom are they reported? How are the data used in the everyday business of the bureaucracy? What do people who work for the organization think about them? What is done to make certain that the data are accurate? Are there penalties for falsifying them or failing to report them?

Finally, keep this maxim in mind when you work with numbers in studying local government services and functions: Facts are always incontrovertibly accurate and reliable when they reflect favorably on the organization; otherwise, they are indeed “lies, damned lies, and statistics.”


May 1, 2008

Return to top

Return to
Local Government Facts main page



 
©2008-2017 American Grand Jury Foundation, All Rights Reserved