Ok, who has the egov license plate? Vivek? Steve?
Ok, who has the egov license plate? Vivek? Steve?
This article makes me very sad. With the elimination of the Congressional Office of Technology Assessment in 1995 and the massive staff cuts to the Congressional Research Service and GAO as well as Congress itself (House, Senate), Professor Saloma’s hopes are but a pipe dream. He prognosticated, in 1968, that IT development will allow a future congressman to …
[sit] at a console in his office poring over computer print-outs into the late evening hours or over the weekend and cutting through the paper arguments and justifications of executive programs with penetrating lines of questions. … In situations that invite adversary argument, alternative positions and points of view will be more thoroughly developed and cogently presented.
What we’ve seen instead is the rise of partisan think tanks that have their own particular brew of “facts,” and Member of Congress who lack the resources to perform (or have performed for them) many of the basic analyses that would enable them to do their jobs.
Today, the National Day of Civic Hacking, I am at home, taking care of my 7-month old, having given up my ticket to #hackfordc so that someone else can meet the cool folks I’ve had the pleasure of working with over the years. Even as I struggle through Learn Python the Hard Way, I am reminded that you don’t have to be a coder to contribute.
In that spirit, here are 3 project ideas that make use of congressional information.
WHAT’S HAPPENING ON THE HILL?
It’s just about impossible to follow when Congress has scheduled a committee/subcommittee hearing/meeting without a paid subscription to a news service that gathers this info. But over the last few years, the Senate and House have begun releasing meeting notices online in parsable formats. Unfortunately, there’s no publicly-available central place to see all the notices from the different committees, and it’s not possible to sign up for official alerts for a particular subcommittee. All the data is there, but it isn’t being corralled.
For most people, it may be useful to follow a few particular subcommittees, but information about actions by others are distracting. For example, I pay attention to the Legislative Branch Appropriations Subcommittee, but don’t really care much about the other Appropriations Subcommittees. There should be a way to filter out the noise.
What would be great is if one could subscribe to subcommittee notices as RSS feeds, or even better, as something that could be pushed by email as information is updated. A user could subscribe to the subcommittees (including the full committee itself) of his or her choosing, and ignore the rest.
Here’s where you can find the data:
The Senate meeting calendar is available in XML here. As you look at the XML, you can see that the calendar identifies both the name of the full committee and the subcommittee.
The House publishes notices of meetings and markups weekly here, and if you go to a particular committee (say Appropriations), there’s an RSS feed for upcoming committee meetings. (It’s also possible to filter the docs.house.gov calendar by subcommittee, but I’m not sure how you get at the underlying data.) The subcommittee is identified in the description tag, along with other details.
OPEN UP DRAFT LEGISLATION
It’s important to be able to have plain text versions of bills, especially draft legislation. Why? Clean (non-PDF) versions can be compared against other iterations to see what has changed and marked-up so that you can easily make suggestions for improvements. Unfortunately, pre-introduction legislation is only made available to staff as a PDF, which is hardly useful to anyone. And sometimes even introduced legislation is available first as a PDF and only later as XML.
What would be helpful is a tool that ingests PDFs of draft-legislation and returns plain text. But converting the PDF to text isn’t enough. It also would need to remove the line numbers, the headers (e.g. “F:\M13\ROYCE\ROYCE_005.XML” as well as the page numbers), and the footers (e.g. “F:\M13\ROYCE\ROYCE_005.XML f:\VHLC\022613\022613.176.xml (542138|23)”. By clearing out this additional stuff, you’re left with the text of the legislation only, which can then be used in many ways.
Here are some examples of Senate pre-introduced legislation. Example 1, Example 2, Example 3, Example 4. You can use this Google search to find more: ‘S. ll ”In the Senate of the United States” filetype:pdf’. Note that the S.L.C. in the top right corner means it was drafted by Senate Legislative Counsel, indicating it likely will follow standard formatting.
Here are some examples of House pre-introduced legislation. Example 1, Example 2, Example 3, Example 4. You can use this Google search to find more ‘H. R. ll IN THE HOUSE OF REPRESENTATIVES “(Original Signature of Member) ” filetype:pdf’. In the House, nearly all legislation is drafted by House Legislative Counsel, so they all follow a pretty standard format.
CRS REPORT FRESHNESS RATINGS
The Congressional Research Service is a congressional think tank, and it issues report on important issues of the day. Over time, CRS will update a report to reflect new facts or changing circumstances. Sometimes these changes are significant, but other times the update could be as minor as the addition of punctuation or removal of a citation. However, there’s no way for the reader to know whether the new report needs to be read closely or if there’s just been a cosmetic change.
CRS reports should have freshness ratings based on a comparison of the current text to the previous iteration. So, if the language is virtually identical except for the addition of a sentence, it would receive a low rating (e.g. 1% fresh), but if the report has been largely rewritten, it would receive a high rating (e.g. 80% fresh).
All CRS reports have a unique identifier on their front page as well as the date it was issued. For example, a report could have unique ID RL1234 and have an issued day of May 1, 2012. If it is reissued, the unique ID stays the same, but it gets a new issued date of September 1, 2013. Alas, the reports are in PDF format, so it’s probably a non-trivial problem to show what text has changed. But using PDF-to-text, you can at least compare the output files to see whether there’s a trivial or significant difference.
So where you can find CRS reports? That’s another problem, but a large corpus is available at opencrs, which just happens to have an API. If you want to gather more, there’s other aggregators, or you could use this Google search ‘7-5700 “Congressional Research Service” filetype:pdf’.
I published this article in Slate earlier this week and have been floored by the response. I don’t mind when people disagree with me, but I was astonished at (1) how poorly people read and (2) how much their emotions rule their decision-making.
In the article, I argue that we should pay members of Congress more money so as to align their incentives with ours. This is fairly controversial. The responses, however, have largely fallen into these categories:
Except for the final two arguments, the first four merely vent rage. I wonder if the authors even got past the first paragraph, or even the title of the article? Probably not.
This expression of populist rage and false economy is counterproductive to the goals of the writers. They want Congress to be more responsive, or at least more responsible. They don’t see how low pay is connected to poor performance.
Unfortunately, it’s this kind of rage that leads Congress to do stupid things like cutting their own pay and decimating their staff. You can’t cut off oxygen to the brain and expect everything to work like normal.
The end of the Roman republic was marked when rich populists used their vast wealth to buy the sentiment of their fellow-citizens. I wonder if we’re heading towards our own rubicon.
In honor of Constitution Day, here’s an interactive 360 degree view of the U.S. Capitol’s Rotunda that I took earlier this year.
Olympics vs Mars
Last night, we attended a Board of Architectural Review hearing on the suitability of Alexandria School Board’s proposed architectural plans [large PDF] for the rebooted Jefferson-Houston Ele-middle school. While we do not oppose the construction of a new school, we think it’s important to mitigate the adverse effects on our neighborhood that the current plans would create, particularly as the building will be much larger, will draw more traffic and create more noise, and will be relocated right on top of West Street and thereby affect the look, light, and feel of the neighborhood.
The Parker-Gray Board of Architectural Review (aka BAR) voted 4-1 to defer a final decision on architectural style until September 12, but gave (tentative?) approval for the building’s massing and size. Read this Alexandria News article for more details, and see Alexandria Public School’s (i.e. ACPS’s) website here.
I am concerned that ACPS will not hold another community meeting prior to its re-submission of its plans to BAR; this omission is made more stark by the ongoing-meetings of the school official-centered Steering Committee during this hiatus. Considering that the last community meeting in June was intended only to inform the public, and not gather our feedback prior to the BAR submission, it seems that ACPS will hold these final negotiations with city staff closer to the vest. While the public meetings haven’t been as responsive to big picture concerns as I (and many others) would have hoped, the community still should be consulted before the school board makes a final submission to BAR.
The following is video of my (short) statement to BAR. Two members of BAR made much more eloquent statements than I did as they cast their votes, and it’s worth your while to watch them in the second video below. (The whole hearing can be watched here.)
Statements of Philip Moffat and Robert Duffy
My work requires me to advocate for legal and political information to be released by the government in easily reusable formats, such as those that can be digested by computers. While we all benefit from “machine readable” data, most of us rely on programmers to get the most out of this information.
This is only natural. Lawyers are conduits for the law, doctors are conduits for medicine, architects are conduits for buildings, and programmers are conduits for datasets.
While I’ve had some success in getting others to build tools for me, it’s time to become more savvy. In the time after I tried to learn C and C++ in the late 1990s, computer languages have changed a lot. My colleagues have suggested Python as a flexible, intuitive, and reasonably friendly computer language, so I’m going to give it a shot.
Google’s python learning module was unfriendly and difficult to follow, but I’m having success with this Hello World book, which is mercifully geared towards an intelligent 8-year-old. Most importantly, it comes with an installer that sets everything up — incredibly helpful when you’re starting from scratch.
My goals are pretty modest:
(1) Take draft legislation released as a PDF from the House and transform it into a clean document that can be compared to others. (This will require use of a PDF-to-text program to transform the document into a .txt file, regex to clean up the text and remove common formatting crapola, and MS Word to compare the document to different versions of the same legislation to identify changes.)
(2) Get a value for two documents (e.g. CRS Reports) to indicate what percentage of the document is new information and what percentage is old. In essence, this will be a way of assigning “freshness” ratings to CRS reports. (This will again require PDF-to-text program, the ability to identify time and report numbers, and to run a basic diff.)
Of course, there’s more clever things I’d like to do, but that’s probably beyond where my time and abilities will allow me to reach. Regardless, I hope to get better insights on how the technology side works, and there’s no better way than to get my hands dirty.
Brilliant way to find art in DC — and not just the usual suspects.