There are multiple parts to the data collection and data retrieval part of this application. Firstly, geo-referenced data was obtained on 1.88 million U.S wildfires from over a 24 year period. These were collected by the USDA Forestry Service.
https://www.kaggle.com/rtatman/188-million-us-wildfires Short, Karen C. 2017. Spatial wildfire occurrence data for the United States, 1992-2015 [FPA_FOD_20170508]. 4th Edition. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2013-0009.4
The main information we used from this includes fire size, fire cause, fire date and fire location.
Obtaining historical weather data for all of these locations was beyond our budget and we also required weather data from a large number of non-wildfire dates and locations to be able to analyse differences between situations when wildfires occur and when they don't.
We engaged in historical frequency analysis to determine dates and locations close to where a wildfire occurred however where a wildfire did not end up occurring. This allows us to compare the minor differences between situations where wildfires end up taking place and when they don't.
We randomly selected 100,000 wildfires from the dataset to create a new dataset. For each separate location in the new dataset, we used the data obtained from the historical freqeuency analysis to choose 3 dates whereby a wildfire did not occur. This meant that we now had 400,000 datapoints in our new dataset.
Next, the Dark Sky API was utilised to obtain historical weather data for each of these 400,000 points. Several challenges were faced during this process however. The wildfire dataset provided dates in Julian date format meanwhilst we needed to provide UNIX timestamps to the Dark Sky API.
The following methodology was used to convert timescales:
from astropy.time import Time
element = Time(element, format='jd', scale='utc')
element = element.datetime
element = calendar.timegm(element.timetuple())
The types of weather data we retrieved included: Dew Point, Humidity Day Time Temperature (average Temperature between sunrise and sunset), 24 hour average Temperature, Day Time Wind Gust (average Wind Gust between sunrise and sunset), 24-hour average Wind Gust, average Wind Speed, average Wind Bearing, Air Pressure, Precipitation Intensity, Precipitation Type and Cloud Cover.
We also attempted to retrieve UV Index information however the data was sadly incomplete for a lot of historical dates. There was also an attempt to use Google APIs to obtain elevation information for all these points and also for real-time points however we judged the financial cost of this was too high at the time.
Through this process we produced a dataset with 400,000 records. 4 records for each location, 1 of these records was when a wildfire occurred and 3 when a wildfire did not occur. All records had full weather information for the date and location as well. This was then used to train our model.
In regards to real-time data retrieval, we first needed accurate forest data. We judged that we could not use hard-coded locations or obtain locations from OpenStreetMap as the frequency and accuracy of locations varied too much between countries or even between cities.
We currently obtain forest locations from a 2014 world tree cover dataset. https://earthenginepartners.appspot.com/science-2013-global-forest/download_v1.2.html Data is sourced from: Hansen/UMD/Google/USGS/NASA
We consider all locations whereby the tree cover is over 50% to be a forest. The output of this was then filtered by granules to ensure we didn't have more locations than we could afford to purchase real-time weather data for.
The real-time data retrieval scripts obtain locations from the dataset we produce through tree cover analysis and obtain weather data for the following 6 days for each location.
The real-time weather data retrieved includes: Dew Point, Humidity Day Time Temperature (average Temperature between sunrise and sunset), 24 hour average Temperature, Day Time Wind Gust (average Wind Gust between sunrise and sunset), 24-hour average Wind Gust, average Wind Speed, average Wind Bearing, Air Pressure, Precipitation Intensity, Precipitation Type and Cloud Cover.
A Python script runs in the background of our server that reruns the real-time data retrieval process and wildfire prediction process on a daily basis to ensure we always provide the user with the most up to date information.
The retrieved weather data for each day is stored in a separate table on the same centralised database. Our prediction scripts retrieve data from this.
Whenever accesses the application, the Django framework's 'model' mechanism is used to retrieve wildfire predictions, fire station information and wind information all in one go rather than having multiple for loops which would have increased lag for the user.
There is also full logging and error catching throughout the data processing flow.