In a section of this site called the “PizzyLabs,” you’ll find a link to a service I created called “SKYWARN Storm Spotter Status“.
A little over a year ago, I wrote about my creation of a service to check Hazardous Weather Outlooks (HWOs) that were posted to the National Weather Service (NWS) website. Everything worked fine, for about a year – until March 31st, 2015.
Oops…they noticed
Alerts suddenly stopped, without warning, just silence. After some investigation and debugging, it appeared to be that my domain was blacklisted from scraping the HWOs off of the NWS website.
When I developed the service, I looked at several APIs and feeds (RSS, XML, ATOM) to get the information I needed, but none of them were for HWOs, and none of them included the Spotter Information Statement. I searched for months before the decision to scrape their website, and without having any luck, I decided they wouldn’t mind – being a government website with lots of visitors – and probably wouldn’t even notice.
I guess they did.
What do I do now?
For months, I tried different aspects of the www.weather.gov website, documentation, Google searches and third-party weather APIs. It wasn’t until June that I was able to finally stumble upon the AERISÂ weather API.
I used this new API documentation in conjunction with some documents found during my research:
- NWS Count-Public Forecast Zones Correlation file [Website w/ download link]
- I used the newer link on that page to create the locations database, allowing me to offer county/area name, with state breakdown, and association to Weather Zone and FIPS code.
- The Zone+FIPS row is given a unique ID in the database (since neither of those values are unique).
- FIPS to County-code [PDF]
- This did not make it into the database but was valuable in determining what information was available to make an informed decision related to API queries
I have fully recoded the Storm Spotter Status service to make use of the AERIS weather API “advisory” endpoint, with a filter on “outlooks.” Currently, it is setup to use a Developer API, as that is free and the service is used only by a few people. However, there are some throttling limits in place, being a developer account to only allow 10 requests per minute, and only 750,000 requests per day.
That may sound like a lot, but if you do the math for a user base as small as 15 people, you get something like this:
- 1 request x 15 users x 1 hour = 15 requests per hour
- 15 requests x 24 hours = 360 requests per day
If you allow multiple people to set multiple locations as version 1 of the service did, that 360 jumps up pretty quickly. So a couple things were put in place with version 2 to help keep things under control…and free.
Keeping it free
Limiting locations
The main thing to keep things on the free side of the service is to limit every user to only 1 location. Previously, it didn’t matter how many locations you wanted to watch, because there was no cost to gathering the data. Now there is a potential cost.
Limiting everyone to a single location allows for a maximum of 31,250 users selecting a single location. There are less than 4,400 distinct locations in the database
Caching Results
The second thing I had to come up with was how to cache results and make sure I didn’t waste another request to get data I already retrieved. This was the source of many evenings and iterations of code to decide what “supported place” would return the most relevant and useful information.
Originally I did everything by FIPS code because a request would return something like this:
{ "details": { "type": "HWO", "name": "HAZARDOUS WEATHER OUTLOOK", "loc": "TXZ012", "body": "THIS HAZARDOUS WEATHER OUTLOOK IS FOR THE TEXAS AND OKLAHOMA\nPANHANDLES.\n\n.DAY ONE...TONIGHT.\n\nISOLATED THUNDERSTORMS ARE POSSIBLE THROUGH TONIGHT. SEVERE WEATHER\nIS NOT EXPECTED...BUT A FEW STRONGER STORMS COULD PRODUCE WIND GUSTS\nTO AROUND 50 MPH AND SMALL HAIL. \n\n.DAYS TWO THROUGH SEVEN...WEDNESDAY THROUGH MONDAY.\n\nISOLATED TO SCATTERED THUNDERSTORMS ARE POSSIBLE WEDNESDAY THROUGH\nTHURSDAY NIGHT. THUNDERSTORM CHANCES WILL INCREASE FRIDAY INTO THE\nWEEKEND. SEVERE THUNDERSTORMS ARE POSSIBLE ON FRIDAY AND THE\nPOTENTIAL FOR HEAVY RAIN INCREASES FRIDAY THROUGH THE\nWEEKEND...WHICH COULD LEAD TO SOME FLOODING OR FLASH FLOODING.\n\n.SPOTTER INFORMATION STATEMENT...\n\nSPOTTER ACTIVATION IS NOT ANTICIPATED AT THIS TIME.", "bodyFull": "FLUS44 KAMA 091926\nHWOAMA\n\nHAZARDOUS WEATHER OUTLOOK\nNATIONAL WEATHER SERVICE AMARILLO TX\n226 PM CDT TUE JUN 9 2015\n\n\n\nOKZ001>003-TXZ001>020-101100-\nCIMARRON-TEXAS-BEAVER-DALLAM-SHERMAN-HANSFORD-OCHILTREE-LIPSCOMB-\nHARTLEY-MOORE-HUTCHINSON-ROBERTS-HEMPHILL-OLDHAM-POTTER-CARSON-\nGRAY-WHEELER-DEAF SMITH-RANDALL-ARMSTRONG-DONLEY-COLLINGSWORTH-\n226 PM CDT TUE JUN 9 2015\n\n\n\nTHIS HAZARDOUS WEATHER OUTLOOK IS FOR THE TEXAS AND OKLAHOMA\nPANHANDLES.\n\n.DAY ONE...TONIGHT.\n\nISOLATED THUNDERSTORMS ARE POSSIBLE THROUGH TONIGHT. SEVERE WEATHER\nIS NOT EXPECTED...BUT A FEW STRONGER STORMS COULD PRODUCE WIND GUSTS\nTO AROUND 50 MPH AND SMALL HAIL. \n\n.DAYS TWO THROUGH SEVEN...WEDNESDAY THROUGH MONDAY.\n\nISOLATED TO SCATTERED THUNDERSTORMS ARE POSSIBLE WEDNESDAY THROUGH\nTHURSDAY NIGHT. THUNDERSTORM CHANCES WILL INCREASE FRIDAY INTO THE\nWEEKEND. SEVERE THUNDERSTORMS ARE POSSIBLE ON FRIDAY AND THE\nPOTENTIAL FOR HEAVY RAIN INCREASES FRIDAY THROUGH THE\nWEEKEND...WHICH COULD LEAD TO SOME FLOODING OR FLASH FLOODING.\n\n.SPOTTER INFORMATION STATEMENT...\n\nSPOTTER ACTIVATION IS NOT ANTICIPATED AT THIS TIME." }, "timestamps": { "issued": 1433877960, "issuedISO": "2015-06-09T14:26:00-05:00", "begins": 1433877960, "beginsISO": "2015-06-09T14:26:00-05:00", "expires": 1433934000, "expiresISO": "2015-06-10T06:00:00-05:00", "added": 1433878037, "addedISO": "2015-06-09T14:27:17-05:00" }, "poly": "", "includes": { "counties": [ "OKC007", "OKC025", "OKC139", "TXC011", "TXC065", "TXC087", "TXC111", "TXC117", "TXC129", "TXC179", "TXC195", "TXC205", "TXC211", "TXC233", "TXC295", "TXC341", "TXC357", "TXC359", "TXC375", "TXC381", "TXC393", "TXC421", "TXC483" ], "fips": [ "40007", "40025", "40139", "48011", "48065", "48087", "48111", "48117", "48129", "48179", "48195", "48205", "48211", "48233", "48295", "48341", "48357", "48359", "48375", "48381", "48393", "48421", "48483" ], "wxzones": [ "OKZ001", "OKZ002", "OKZ003", "TXZ001", "TXZ002", "TXZ003", "TXZ004", "TXZ005", "TXZ006", "TXZ007", "TXZ008", "TXZ009", "TXZ010", "TXZ011", "TXZ012", "TXZ013", "TXZ014", "TXZ015", "TXZ016", "TXZ017", "TXZ018", "TXZ019", "TXZ020" ], "zipcodes": [ 73844, 73901, 73931, 73932, 73933, 73937, 73938, 73939, 73942, 73944, 73945, 73946, 73947, 73949, 73950, 73951, 73960, 79001, 79002, 79003, 79005, 79007, 79008, 79010, 79011, 79012, 79013, 79014, 79015, 79016, 79018, 79019, 79022, 79024, 79025, 79029, 79033, 79034, 79036, 79039, 79040, 79044, 79045, 79046, 79051, 79054, 79056, 79057, 79058, 79059, 79061, 79062, 79065, 79066, 79068, 79070, 79077, 79078, 79079, 79080, 79081, 79083, 79084, 79086, 79087, 79091, 79092, 79093, 79094, 79095, 79096, 79097, 79098, 79101, 79102, 79103, 79104, 79105, 79106, 79107, 79108, 79109, 79110, 79111, 79114, 79116, 79117, 79118, 79119, 79120, 79121, 79124, 79159, 79166, 79168, 79172, 79174, 79178, 79185, 79189, 79226, 79230, 79237, 79240, 79251 ] }, "place": { "name": "potter", "state": "tx", "country": "us" }, "profile": { "tz": "America/Chicago" } }
There is an array of FIPS codes included in the response, as well as counties, zip codes, and weather zones. But there were also 9 other sets of these pieces of data that had the same content, with the exception of the details.loc value and the place information. This made the association with a given FIPS code nearly impossible without several queries using the data already provided in these two locations…while the rest of the dataset was duplicated in the 9 other datasets returned.
So, a rewrite to use the “wxzone” and “details.loc” values was decided on, as well as a limit of 1 dataset in the response.
Now, all the State Zones (or “wxzones”) included in the response will receive a cache of this report in the database. This means that a single request will cover everyone near the first person to request the area. With this in mind, the request throttling now had to be updated to prevent duplicate requests.
Throttling Requests
With a single request returning information for multiple locations, and a limit on 10 requests every 60 seconds in place, I had to build a loop that would keep track of three three things in a strange way:
- What State Zones have been seen
- How many requests have been made
- Time
The algorithm works like this:
- Get all the unique locations requested by users (and sort them into various arrays with varying keys…user to location, location to user, location to data, location to preformatted data)
- Start the loop – while less than 15 minutes have elapsed total, or until we’re done
- If the counter is less than 10 and timer is less than 60, continue; else go to 4**
- Get the next item (State Zone) in thelocation list
- If there are no more items, break out of the loop
- Check if thislocationhas been added to the “already checked” list, and continue if not
- Request the data from the API
- Process the data for storage and alert (create database row and hash spotter information statement for comparison)
- Add the State Zone to the “already checked” list
- Iterate over the State Zones in the response
- If there are any users requesting an included State Zone
- Calculate the statement issued time vs user’s last alert issue time and statement hash vs last alert hash (don’t double alert on time or hash)
- if time are different, add them to an “Alert List”
- Write the advisory data to the database (prior to alerting users)
- If there are any users in the “Alert List”
- Pull the corresponding Location Row for the user’s location selection and cache it (by db Location ID)
- If we can find the Location Row, and the user can receiveDMs (twitter settings)
- Construct the message from the Location Row data (county, state, NWS Office [cwa]) and Response data (spotter statement, URL construction)
- Attempt to alert the user
- Record any errors
- ** If we skipped right to here, don’t increment the Request Counter, and continue the loop
- Else: Check the time from the start of the loop
- If time elapsed is < 60 seconds, increment the counter (keep going until 10)
- If time elapsed is > 60, reset the timer and reset the counter to 0 (time ran out, reset it all)
- Else: Check the time from the start of the loop
- Handle any Developer Errors
It took a while to put all that together, and was a departure from how I normally program. My normal work is intended to be efficient and streamlined. This required me to be willing to do things I wouldn’t normally do in a loop so that I could throttle processing time and avoid “dead loops” where nothing is done because 10 iterations had elapsed in very short time.
Conclusions and Future Updates
After writing all of this up, some thoughts occurred while looking at the data. Chief of which is that there are only 4389 specific locations. But the limit of one location per user and 750,000 requests in a 24 hour period means comes out to 31,250 requests every hour can be made before that 750k limit is reached. That means there are enough endpoint requests available to make a call for every specific location every hour.
However, there is the limit of 10 requests per minute. This means 439 batches of 10 requests must be made to get every location. This would take 7.5 hours to complete due to the 10-request throttling. When the caching is taken into account, though, a single request can cover around 23 State Zones (like in the example above), which brings the total time to complete all requests to somewhere around 19 minutes.
With this understanding of just how big or small the dataset is and the requirements to get it in its entirety, everything seems doable while maintaining a free status. However, in an effort to avoid the throttling and possible pricing plan changes – and if the service becomes popular, I will be researching adding a tiered pricing plan to allow multiple locations.
This would take a while to design how a free and paid service would work in parallel, but it’s good to know that it can operate on the free tier as-is for now.