As part of BlueLightCamp ’14, a group of civil servants, hackers, and emergency service workers got together for a weekend of furious creation.

I decided to look at flooding data. The recent floods in the UK are a brutal remember of the realities of climate change and our poor stewardship of the nation’s waterways.

The UK Government has a large collection of Flooding Data online – including some very detailed river-by-river data.

Initially, we thought it would be a great idea if every river in the country could Tweet. Telling local residents how high its water levels were and how likely it was to flood.

Unfortunately, the data isn’t in brilliant shape. It’s hard to find a reference guide to see what the data mean, where the measuring stations are, and which levels indicated likely flooding.

River Data
River Data

Instead, we settled on something simpler. There is a 3 Day Flooding Forecast XML file.

First, the good news! We were able to successfully parse the data and get a Twitter bot running which, once per day, publishes a three-day forecast.

The flood forecast for Wednesday is…pic.twitter.com/kmL7FS4vlN — bluelight (@3dayflood) May 26, 2014

Check out the @3dayflood @FloodForecast twitter feed

@3dayflood

The source-code is available on GitHub. In order to run it, you’ll need 3 things.

  1. Python 2.7.
  2. Tweepy – needed to interact with Twitter.
  3. Twitter Developer Tokens – needed to post to Twitter.

Once you’ve installed Python and Tweepy, and added in your OAuth keys to script, it can be run by simply issuing this command:

python 3dayflood.py

Ok, so now the bad news 🙂

The data wasn’t in a particularly great state. Let’s go through it step-by-step.

XML Data

  • XML – ok, just a minor gripe. XML is just as easy to work with as JSON, but it’s nice to be offered a choice.
  • No valid schema – if you’re going to use XML, you might as well do it properly!
  • Welsh Dates. The UK Government has a statutory obligation to publish in English and Welsh – I don’t have a problem with that. There is, however, no need to print dates in both languages. Dates should either be represented as ISO 8601 (2014-05-25T10:30:00+0100) or as a UNIX Timestamp(1401046863). That way, the programmer can easily determine the time and, if needed, format it in English, Welsh, French, Esperanto, and Klingon.
  • Ideally, each day should be in its own object – rather than split several ways.
  • The images… *sigh*… I was expecting that there would be a link to each image. Nope! It’s a Base64 encoded PNG. Not terribly hard to decode, true, but not the best way. They are fairly low resolution, which is a shame.
  • The summary stuff is fine: summary But the next lot of data are rather tangled.
  • The model here is “Risk → Day → Area → Region”. This seems somewhat illogical to me. Surely the user wants to see “Day → Region → Risk”? The area is fairly inconsequential – I don’t care if my county is flooding, just if I am. The areas seem fairly nebulous and don’t conform to any normal geo-spacial coding of which I’m aware.
  • Or, perhaps, the model should be “Region → Day → Risk” – that way, rather than searching through each Risk in order to find my local area, I can get a direct forecast for my specific area and ignore everything else.
  • Finally, there’s no coding on the regions – they should at least have a WOEID, Lat/Long paid, or similar.

Here’s a quick a dirty look at how I would have structured the data (in JSON).

  1. {
  2.  “date” : “2014-05-25T10:30:00+0100”,
  3.  “summary” : {
  4.   “english” : “…”,
  5.   “cymru”   : “….”
  6.  }
  7.  “days” : [
  8.   {
  9.    “day”     : 1,
  10.    “date”    : “2014-05-25”,
  11.    “image”   : “http://example.com/day1.png”,
  12.    “regions” : [
  13.     {
  14.      “name” : “Cambridgeshire”,
  15.      “risk” : “low”,
  16.      “id”   : “123456”
  17.     },
  18.     {},
  19.     {}
  20.    ]
  21.   },
  22.   {},
  23.   {}
  24.  ]
  25.  “regions” : [
  26.   {
  27.    “name” : “Cambridgeshire”,
  28.    “id”   : “123456”,
  29.    “days” : [
  30.     {
  31.      “day”  : 1,
  32.      “date” : “2014-05-25”,
  33.      “risk” : “low”
  34.     },
  35.     {},
  36.     {}
  37.    ]
  38.   },
  39.   {},
  40.   {}
  41.  ]
  42.  “risks” : [
  43.   {
  44.    “risk” : “low”,
  45.    “summary” : {
  46.     “english” : “…”,
  47.     “cymru”   : “…”
  48.    }
  49.   },
  50.   {},
  51.   {}
  52.  ]
  53. }

You can get the code from GitHub. Enjoy!

Posted on behalf of Terence Eden

profile for Terence Eden on Stack Exchange

Putting UK Flooding Alerts Onto Twitter #UKBLC14
Tagged on:                                     
%d bloggers like this: