Full Source Code




Project Specification


We first learnt about the Europa Challenge towards the end of May 2017. The goal of creating an application that uses NASA's WebWorldWind to help save the planet was an appealing target. With our strong collective experience of software engineering, we saw potential in giving the challenge a go.


Initial brainstorming revolved around predicting natural disasters, such as hurricanes, tornadoes, wildfires and flooding. Wildfires eventually came out on top, following several fierce debates within the team. We then spent a week researching wildfires, how they can be predicted, and the types of data we will need to do this. Eventually, we came up with a project specification to show to our mentor Lilian.


Click here to read it




Retrieving training data for the SVM


VishDocs

The Beginnings¶

For our support vector machine to give us a reliable model, the most important thing was to feed it a large amount of accurate and relevant data in the correct format.

We spent many hours working on ways to obtain the necessary weather data however we couldn’t find a single free source that could give us sufficient data in the correct format. All APIs that could be used required funding and we had none.

Finally, we decided that we had no choice but to use the BeautifulSoup library in Python to webscrape weather data. We used this to obtain 6.5 years of weather data for 19 locations in the United States. The weather data obtained included temperature, humidity, wind speed and rain. The script was optimised so that we only had to make 1 request to obtain 1 month of weather data. The source provided 6 hour averages, from 00:00 to 06:00, from 06:00 to 12:00, from 12:00 to 18:00 and from 18:00 to 00:00 hence giving us 4 data points for each day. Of course, there were occasions when the source didn’t have data for some times and we had to take this into account.

In [ ]:
"""If the current element is a temperature, tempBool becomes true. This checks for that"""

if tempBool== True:
#converting the temperature from string to float
    word=float(word)
                
"""Appending the temperature to a temporary temperature array
The temperature array contains at most 4 temperatures at any given time."""
                    
    tempotemp.append(word)
    tempcount+=1

                    
 """If the value of temperature is not 0, then add 1 to a variable.
This keeps track of how many data points we don't have values for.
It comes in handy for calculating averages"""
    if word!=0.0:
        tempnonz+=1
                    
        tempBool=False
"""There's 4 temperature data points every day, this keeps track of how many have
been added to the temporary array. Once there's 4, both them and the daily averages
calculated from them are added to the final temperature array"""
if tempcount==4:
                   
    if (sum(tempotemp)/float(len(tempotemp)))== 0:
        tempotemp.append("no data") 
    else:
    #Here we calculate the average and append it to the end of the temporary array
        tempotemp.append(sum(tempotemp) / float(tempnonz))

"""Keeps track of the last 14 days of weather data, used to provide the past
14 day weather average for every day"""
        if len(lastWeekTemp)==14:
            lastWeekTemp.pop(0)

        #We append daily average to it (last element in the temporary array is the average)
            lastWeekTemp.append(tempotemp[4])
        else:
                            
            lastWeekTemp.append(tempotemp[4])

tempAvg.append(sum(lastWeekTemp) / float(len(lastWeekTemp)) )
#We append the 4 data points + average array to a final temperature array
temperature.append(tempotemp)
#We zero everything readying it for when the next temperature value is found
tempnonz=0
tempcount=0
tempotemp=[]

This system was applied to all data types giving us 4 large arrays, each containing multiple nested arrays. Each nested array contained 1 day of data. There were also 4 large arrays containing past 14 day averages for each data type for each day.

Wildfire data for every single day was obtained through the USDA Forestry Service: This included whether a fire occurred and what the cause of the fire was. We adapted this to a useful format and filtered out all fires that were caused by arson, railroads and even nearby children.

In [ ]:
#Setting up the csv containing 6.5 years of weather data.
with open('6_years.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        if len(row)==0:
            #ignoring empty rows 
            pass
        else:
            wildfires.append(row)

for element in wildfires:
    element=[''.join(element)]
    for item in element:
        x = item.split('\t')
        #removing wildfires that took place due to arson, nearby children or railroads
        if (x[4]==7) or (x[4]==5) or (x[4]==6):
            pass
        else:
            data.append(x)


data = data[1:]

for element in data:

    #obtaining date and location for each wildfire
    day=str((str(element[17]).split()[0]))
    day=day.split("-")
    
    day= str(day[1]+"/"+day[2] + "/"+ day[0])

    """An array exists for each location we are obtaining data for. If a fire occurred in any of those
    locations, the date of that fire is added to the array"""
    fireDict[locationUrl[forests[element[0]]]].append(day)

We merged all the data we had giving us 1 massive array. This contained multiple nested arrays, each representing 1 day. Each nested array contained temperature, humidity, wind rain data + daily averages and 14 day averages. They also contained a 1 or 0 representing whether a fire occurred at that location on that day.

In [ ]:
#For each date we have weather data for
for each in currentDate:
    currentBool=0
    #If there was a fire on that day, append 1 to fireList, else append 0.
    for day in fireDict[locationList]:
                    
        if each==day:
            currentBool=1
        else:
            pass
    fireList.append(currentBool)

    """Merge arrays for all data types to create 1 massive array of arrays. Each array within it represents 1 day of data.
    This works since there's exactly the same amount of data for each data type.
    We also append a boolean indicating whether there was a fire or not on that day to the arrays."""
    for each in range(len(temperature)): 
        finalArray.append(temperature[each]+humidity[each]+wind[each])
        finalArray[Counter].append(currentDate[each])
        finalArray[Counter].append(weather[each])
        finalArray[Counter].append(rainAvg[each])
        finalArray[Counter].append(tempAvg[each])
        finalArray[Counter].append(humAvg[each])
        finalArray[Counter].append(windAvg[each])
        """We append the name of the location we are appending data for to the array""" 
        finalArray[Counter].append((list(locationUrl.keys())[list(locationUrl.values()).index(locationList)]))
        finalArray[Counter].append(fireList[each])
        Counter+=1

We used the Pandas library to convert the array to a Pandas Dataframe which was then converted to a csv. Each nested array was turned into 1 row in the csv and the date was made into the index.

In [ ]:
df= pd.DataFrame(finalArray,columns=["Temp1","Temp2", "Temp3","Temp4", "avgTemp", "Hum1","Hum2","Hum3","Hum4"
                                     ,"avgHum","Wind1","Wind2","Wind3","Wind4","avgWind", "CurrentDate", "Rain"
                                     ,"14DayAvgRain","14DayAvgTemp", "14dayAvgHum", "14DayAvgWind", "Location" ,"Fire"])
df = df.set_index('CurrentDate')

df.to_csv("MergedWildfireWeather.csv")

The script was only run once, to obtain data hence some inefficiencies in the code were tolerated. The csv produced by this was balanced then fed into the support vector machine.




Obtaining Daily Real-Time Data


Vishal_Docs

We used the numpy and Pandas library to do most of our data analysis. We obtained our real-time weather data through the Dark Sky API. We created a csv_to_array method to convert our forest locations csv to a python array.

In [ ]:
def csv_to_array(self, file):    
        forest_locations=(np.genfromtxt(file, delimiter=','))
        forest_locations= forest_locations.tolist()
        
        return(forest_locations)
    

In regards to past data, Dark Sky only lets us get either the last 24 hours or the last 48 hours of data. It provides us data for every single hour within this time period so we have to combine it to get our daily averages. And of course, we want daily averages for each location in our forest locations csv. All we do for this is add every hour of temperature, humidity, wind and rain data to a temporary array. Once it's done getting data for that location, we calculate the averages of values in the temporary arrays and append these to a final array containing daily averages for each location. Note that Dark Sky may not be able to provide all 24 or 48 hours of data for every single location meaning that the script must not rely on there being 24 or 48 data points.

In [ ]:
for element in hourly_data:
                    
                    count+=1

                    #Add hourly data to temporary arrays
                    temperature_array.append(element['temperature'])
                    humidity_array.append(element['humidity']*100)
                    wind_array.append(element['windSpeed'])
                    
                    if(element['precipIntensity'])==0:
                        pass 
                    else:
                        rain_counter+=1

                    hour_counter+=1

                #checking number of hours that contain rain. Ideally we use use mm/hour however we need to use the same standards used in the training data.  
                if rain_counter==0:
                    pass
                elif rain_counter<3:
                    rain_amount=1
                elif rain_counter<5:
                    rain_amount=2
                elif rain_counter<8:
                    rain_amount=3
                else:
                    rain_amount=4

                #Once you have hourly weather data, you merge it to obtain daily averages
                temp_list=[]
                temp_list.append(locationCount)
                temp_list.append(loc[0])
                temp_list.append(loc[1])
                temp_list.append(sum(temperature_array)/float(len(temperature_array)))
                temp_list.append(sum(humidity_array)/float(len(humidity_array)))
                temp_list.append(sum(wind_array)/float(len(wind_array)))
                temp_list.append(rain_amount)

                #you append the daily averages to a final array
                all_data.append(temp_list)

To have any real accuracy, we need to have more than just one or two days of data per location, hence we need a method that reads in a csv containing an unknown number of days of weather data per location, obtain another day of data per location and append this to the end of each location in the csv.

We do this by using the length of the forest locations csv. As each location will have the same number of days of data, we merely have to find the length of the csv and divide that by the length of the forest locations csv (number of forests). This will tell us how much data we have per location and hence at what positions we must append data once its been gathered through the api. Note that we only want to have the past 14 days of data per location hence once we have that much data, we must remove the oldest day of data every time we add a day.

In [ ]:
#imports the current csv and adds either 1 or 2 more days of data to each location
    #The "days" parameter determines how many days of data you are adding to each location
    def add_dayData(self, days):
        all_data= self.get_Data(self.key, days)
        currentData=(np.genfromtxt("initial.csv",delimiter=','))
        currentData= currentData.tolist()
        #keeps track of what position you're on in the array obtained from the get_Data method
        dataCounter=0
        new_data=[]

        #Here we're calculating the number of days of data we have per location
        currentData.pop(0)
        currentDays= len(currentData)/self.size
        

        #for each location, if its the last day of data we have for that location, append the new data to the end of it.
        #the all_data array contains 2 days of data per location, so once you get past the last element per location, add 2 to the counter.
        for element in currentData:
            #print(element[0])
            
            #if we have 14 days of data per location, get rid of the first two whilst adding to the end.
            if currentDays==14:
                if days==2:
                    #element[0] contains the index, it tells you which day of data you are on. Does that row show the 5th day of data for that location or the 14th.
                    #If there's already 14 days of data, simply don't append the 2 oldest days of data to the new array.
                    if element[0]==1 or element[0]==2:
                        pass
                    elif element[0]== currentDays:
                        #As we've removed the 2 oldest days of data, we must update the indexes of the rest
                        element[0]=element[0]-1
                        
                        new_data.append(element)
                        #Updating the indexes of the 2 next days of data in the array obtained from the get_Data method
                        all_data[dataCounter][0]=currentDays-1
                        all_data[dataCounter+1][0]=currentDays
                        #Appending this to the new array
                        new_data.append(all_data[dataCounter])
                        new_data.append(all_data[dataCounter+1])
                        dataCounter+=days
                    else:
                        element[0]=element[0]-1
                        new_data.append(element)

                elif days==1:
                    if element[0]==1:
                        pass
                    
                    elif element[0]== currentDays:
                        #As we're removing he oldest day of data, we must update the indexes of the rest
                        element[0]=element[0]-1
                        #Appended data from day 14, changing index to show its day 13
                        new_data.append(element)
                        #Data from the day whose data was just retrieved by the api
                        all_data[dataCounter][0]=currentDays
                        new_data.append(all_data[dataCounter])
                        dataCounter+=1
                    else:
                        #If there's 14 days of data per location but you aren't currently iterating over the 14th day then just adjust the index and append it to the new array
                        element[0]= element[0]-1
                        print(element[0])
                        new_data.append(element)

                    #If there's not already 14 days of data per location then just append all the data from the imported csv to the new array   
            else:
                if element[0]== currentDays:
                    new_data.append(element)
                    #add the new day of data obtained from the get_Data method to the bottom of each location with the correct index.
                    #ie: if there's currently 5 days of data per location, add the new day with an index of 6.
                    all_data[dataCounter][0]=currentDays+1
                    new_data.append(all_data[dataCounter])
                    if days==2:
                        all_data[dataCounter+1][0]=currentDays+2
                        new_data.append(all_data[dataCounter+1])
                    dataCounter+=days
                else:
                    new_data.append(element)

        return(new_data)

We also need to obtain 14 day averages. We want this to be dynamic so that if any point we choose to expand the system to work with 25 or 30 averages, that works as well. When adding new locations to forest_locations.csv, they need to be added to csv containing all the current data as well. For several methods to work correctly, we rely on the length of forest_locations provided to be accurate. When adding locations to csv, all days of data are zeroed for these locations, we don't want these zeroes to be used for the averages. As data will be added for each location every single day, in 14 days all zeroes will be replaced with recent data.

In [ ]:
    #Will produce a csv that gives 14 day averages
    def produce_average(self):
        currentData=(np.genfromtxt("initial.csv",delimiter=','))
        currentData= currentData.tolist()

        #removes column headings
        currentData.pop(0)
        currentDays= len(currentData)/self.size
        
        temperatureavg=0
        humavg=0
        windavg=0
        rainavg=0
        tempArray=[]

        #counts how many empty days of data there are for each location
        zeroedvalues=0
        
        avgArray=[]
        for element in currentData:
            #if both temperature and humidity are 0 then it's a zeroed day
            if element[4]==0 and element[5]==0:
                zeroedvalues+=1
            else:
                #add data to temporary arrays until you reach the end of the data you have for that location
                humavg+=element[4]
                temperatureavg+=element[3]
                windavg+=element[5]
                rainavg+=element[6]
            
            if element[0]==currentDays:
                if zeroedvalues==currentDays:
                    zeroedvalues-=1
                    
                #temporary arrays are used to calculate averages    
                humavg=humavg/(currentDays-zeroedvalues)
                temperatureavg=temperatureavg/(currentDays-zeroedvalues)
                windavg=round(windavg/(currentDays-zeroedvalues))
                rainavg=round(rainavg/(currentDays-zeroedvalues))

                #all averages are put into one array
                tempArray.append(temperatureavg)
                tempArray.append(humavg)
                tempArray.append(windavg)
                tempArray.append(rainavg)

                #this one array is nested into a final avgArray - nested arrays are required for conversion to csv.
                #Each nested array is a row
                avgArray.append(tempArray)
                
                tempArray=[]
                temperatureavg=0
                humavg=0
                windavg=0
                rainavg=0
                zeroedvalues=0
        
        df= pd.DataFrame(avgArray,columns=["Tempavg", "Humavg","Windavg", "Rainavg"])

        df.to_csv("averageWeather.csv")       

The API call and data manipulation required to obtain future weather data is essentially the same as with obtaining past weather data. Hence we have a simple "getFutureData" method that does this. Ideally we also want a method that can be used to call the other methods and use the returned arrays to create csvs. For that we have the relatively straightforward "produceOutput()" method:

In [ ]:
def produce_Output(self, which):

        if (which)== "initial":
            all_data= self.get_Data(self.key, 2)
            name="initial.csv"
        elif (which)== "add1":
            all_data= self.add_dayData(1)
            name="initial.csv"
        elif(which)=="add2":
            all_data= self.add_dayData(2)
            name="initial.csv"
        elif(which)=="future":
            all_data= self.getFutureData(self.key)
            name="future.csv"
        else:
            print("Wrong produce_Output parameters entered")

        df= pd.DataFrame(all_data,columns=["Days", "Longitude","Latitude", "Temp1", "Hum1","Wind1", "Rain"])

        df = df.set_index('Days')
        #Many are called a similar name as they simply add a day onto the main csv and then overwrite it.
        df.to_csv(name)

We want to use the data obtained from all previous methods to produce a final csv that can be used by our model successfully predict wildfires. As we want to predict wildfires for the next 6 days, we need to give our model both the past 14 days of data and the future 6 day forecast in the same csv. So we need a method to merge the two:

In [ ]:
#merges future weather data with past 14 day averages
    def output_to_svm(self):
        currentData=(np.genfromtxt("averageweather.csv",delimiter=','))
        currentData= currentData.tolist()
        currentData.pop(0)

        futureData=(np.genfromtxt("future.csv",delimiter=','))
        futureData= futureData.tolist()
        futureData.pop(0)

        finalArray=[]
        counter=0
        location=0
        for element in futureData:
            #appending averages to the end of the future weather data array
            element.extend(currentData[location][1:5])
            
            finalArray.append(element)
            counter+=1
            #We have forecasts for the next 6 days of data per location
            if counter==6:
                location+=1
                counter=0

        df= pd.DataFrame(finalArray,columns=["Days", "Longitude","Latitude", "AvgTemp", "AvgHum","AvgWind", "AvgRain", "14dayAvgTemp", "14dayAvgHum", "14dayAvgWind", "14dayAvgRain"])
        df.to_csv("svminput.csv")

The first time you would call the the script, you would require the following:

In [ ]:
file='forest_locations.csv'
key = ''

weatherGetter = Weather_data(file,key, 210)
weatherGetter.produce_Output("initial")
weatherGetter.produce_average()
weatherGetter.produce_Output("future")
weatherGetter.output_to_svm()

And after that, you just have to call : weatherGetter.produce_Output("add1") instead of "initial". But of course, once this is on the server, we don't manually want to do this every single day. Hence we created a script called "call_daily.py" that uses the "schedule" library to call all necessary methods in the right order every day. It adds a day of data to each location, uses the newly produced csv to create an svminput.

In [ ]:
from threading import Timer
import dark_sky as darksky
import MakePrediction as MakePrediction
import schedule
import time

"""The url for the csv containing forest locations and our Dark Sky key.
Dark Sky key has been removed so that it cannot be copied and used by another"""

file='forest_locations.csv'
key = ''

"""Initialising once, all scripts have been optimised so that they don't 
have to be reinitialised to handle updated data"""
weatherGetter = darksky.Weather_data(file,key, 210)
makePrediction= MakePrediction.MakePrediction()

def call_daily():

    """Updating everything in the right order, the output of
    "makePrediction.prediction()" is used by webworldwind"""
    weatherGetter.produce_Output("add1")
    print("1 day of data added")
    weatherGetter.produce_average()
    print("14 day averages updated")
    weatherGetter.produce_Output("future")
    print("Future forecast obtained")
    weatherGetter.output_to_svm()
    print("SVM input updated")
    makePrediction.prediction()
    print("SVM output updated/created")

"""The script runs automatically at 1 am EST. The timezone used depends on the timezone
of the pc or server running the scripts"""
schedule.every().day.at("01:00").do(call_daily)
while True:
    schedule.run_pending()
    time.sleep(1)



Algorithm


SVMDocs

Considering our data¶

Our initial goal was to apply a ML approach to accurately predict the likelihood of a wildfire occuring. The data we used was first balanced so we had an equal amount of data for both occasions with and without fires. This data was stored in CSV format. Using the Python library Pandas we can analyse the structure of this CSV.

In [12]:
import pandas as pd

df = pd.io.parsers.read_csv(
    'Data/NewBalanced.csv',

)

print(df.shape)
print('\n')
print(df.head(5))
print('\n')
print(df.tail(1))
(10341, 23)


  CurrentDate  Temp1  Temp2  Temp3  Temp4  avgTemp  Hum1  Hum2  Hum3  Hum4  \
0  01/13/2011     24     16     13     24    19.25    35    32    34    57   
1  01/15/2011     28     18     14     25    21.25    64    40    20    45   
2  01/20/2011     18     12     22     23    18.75    90    69    18    32   
3  01/28/2011     13     11     17     19    15.00    43    26    27    84   
4  02/04/2011      8     17     18     13    14.00    70    57    56    84   

   ...   Wind3  Wind4   avgWind  Rain  14DayAvgTemp  14dayAvgHum  \
0  ...   1.864  0.000  1.242667     0     14.230769    53.916667   
1  ...   0.621  0.000  0.621000     0     15.392857    46.059524   
2  ...   0.621  0.000  0.621000     0     18.160714    52.333333   
3  ...   0.621  0.000  1.242500     0     19.125000    46.107143   
4  ...   2.486  0.621  1.553500     0     16.160714    64.142857   

   14DayAvgWind  14DayAvgRain              Location  Fire  
0             2             0  La Ca_ada Flintridge     1  
1             2             0  La Ca_ada Flintridge     1  
2             1             0  La Ca_ada Flintridge     1  
3             1             0  La Ca_ada Flintridge     1  
4             2             0  La Ca_ada Flintridge     1  

[5 rows x 23 columns]


      CurrentDate  Temp1  Temp2  Temp3  Temp4  avgTemp  Hum1  Hum2  Hum3  \
10340  07/05/2017     31     33     32     25    30.25    33    22    32   

       Hum4  ...   Wind3  Wind4  avgWind  Rain  14DayAvgTemp  14dayAvgHum  \
10340    36  ...   5.593  7.457  5.59275     0     28.357143    36.678571   

       14DayAvgWind  14DayAvgRain     Location  Fire  
10340             5             0  placerville     0  

[1 rows x 23 columns]

This file has 23 features and 10,341 data points. Clearly not all of these features are useful for training a model. For example we have date and location. By applying principal component analysis (https://en.wikipedia.org/wiki/Principal_component_analysis) to our data we decided that the 7 features: 'avgTemp', 'avgWind', '14dayAvgTemp', '14dayAvgHum' and '14DayAvgRain' were most conducive to values in the 'Fire' column for whom a 1 corresponds to their being a fire and a 0 to no fire. Next came model selection. Due to the relatively small amount of data we had relative to the number of useful features we opted to go with a Support Vector Machine based model.

Support Vector Machines¶

A Support Vector Machine (or SVM for short) is a supervised ML method that can be used for classification of data. Each data item is plotted in n-dimensional space (where n corresponds to the number of features) and then classification is performed by finding a hyperplane that differentiates the classes of the data well. It does so by finding vectors (data points) belonging to each class and basing the position of the hyperplane upon the position of these vectors. These vectors are known as support vectors hence the name.

title

SVM's work particuarly well when the order of the feature-space is large (as in our case) because as its size increases its more likely that the classes will form distinct clusters allowing for a better fitting hyperplane. In addition having a relatively small amount of data isn't game over as presuming the classes form relatively tight data clusters hyperplanes fitted to larger datasets will still be in a similar place to smaller datasets. It is for these reasons we opted to go with a SVM model to make our predictions. Obviously we are assuming that our relatively small dataset is representative of what datap in the class generally looks like and training on a larger dataset likely wouldnt hurt. This is something we would like to improve our model by doing if provided with the resources.

A SVM in the context of Wildfires¶

In the context of our data the classes we are training to classify are given by the 'Fire' column in the CSV file (0 for no fire and 1 for fire). We shall consider only the features we deemed important through PCA when training our model. Construction of the dataFrame and splitting of data looks like this:

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
df = pd.io.parsers.read_csv(
    'Data/NewBalanced.csv',
    header=None,
    skiprows = [0],
    usecols=[5,10,15,17,18,19,20,22]
)

X = df.values[:,:7]

y = df.values[:,7]

#split the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=12345)

Next we have to standardise the data. Standardisation is an integral part of preprocessing for an SVM. It ensures all features exist on the same scale.

In [3]:
from sklearn import preprocessing
std_scale = preprocessing.StandardScaler().fit(X_train) #allows data to be standardised under the same scale
X_train_std = std_scale.transform(X_train)
X_test_std = std_scale.transform(X_test)

Implementing an SVM from scratch would be a tedious and tricky process. Luckily Scikit-Learn has already done so by creating a python wrapper for the C++ library LibSVM. LibSVM is a very efficient library for running SVM related tasks.

In [4]:
from sklearn.svm import SVC
clf = SVC(C=1.0, cache_size=200, class_weight='balanced', coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=True, random_state=None, shrinking=True,
tol=0.001, verbose=False) 

clf.fit(X_train,y_train)
Out[4]:
SVC(C=1.0, cache_size=200, class_weight='balanced', coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Of the input parameters above, the most important are C, class_weight, gamma and kernel. The purpose of C is to decide the trade off between fitting the model to the training set and maintaining a smooth hyperplane. class_weight simply denotes the structure of the training data provided relative to its classes. Our data is balanced hence we have used that here. gamma corresponds to how much influence a single training point has over the fitting of the hyperplane. In this example we have let sklearn select gamma automatically. Finally the kernel is the function that is responsible for finding the mathematical relationship between the independent feature vectors and corresponding classes. In our case we have selected 'rbf' or 'radial based field'. This kernel allows fitting of a non linear hyperplane to the data. This is useful as the relationship between our features (i.e. avgtemp, avghumid etc) and our classes (Fire, No fire) may not necessarily be linear.

The above code equates to training the model. Predictions can now be made with the following code snippet

In [9]:
clf.predict(X_test)
Out[9]:
array([ 0.,  0.,  0., ...,  1.,  0.,  1.])

In addition an accuracy score can be calculated similarily:

In [14]:
print('Accuracy is: {}%'.format(clf.score(X_test, y_test, sample_weight=None)*100))
Accuracy is: 72.5749274895%

This accuracy can be tweaked by changing hyper-parameters used to train the model as well as altering the data that is trained upon by means of changing the seed when splitting the data. Our final model obtains an accuracy of x%.

What about probability though?¶

When designing a model for a scenario such as a wildfire it would be far more useful to have probability values associated with the classe outcomes predicted. Unfortunately this is not an out of the box functionality of SVM's. Luckily however, it can be achieved by applying a procedure known as Platt scaling (https://en.wikipedia.org/wiki/Platt_scaling). Essentially this approach applies a probability distribution to predicted classes by means of the sigmoid function. In doing so it allows us to associate probabilities of data points being in certain classes. LibSVM, thus by proxy Scikit-Learn has an efficient implementation of this procedure which means it is only a single line to do so:

In [24]:
clf.predict_proba(X_test)
Out[24]:
array([[ 0.5684794 ,  0.4315206 ],
       [ 0.58296673,  0.41703327],
       [ 0.88029603,  0.11970397],
       ..., 
       [ 0.12410921,  0.87589079],
       [ 0.8163141 ,  0.1836859 ],
       [ 0.11962028,  0.88037972]])

Making predictions¶

Predictions can be made using this model by first retrieving the prediction data values from a CSV, standardising them under the same scale used for the training and then running Scikit-Learn's predict() function as shown earlier. For example:

In [17]:
foredf = pd.io.parsers.read_csv(
    'Data/svminput.csv',
    header=None,
    skiprows = [0],
    usecols=[1,2,3,4,5,6,8,9,10,11]
)

X_forecast = foredf.values[:,3:]
X_forecast_std = std_scale.transform(X_forecast)

fore_pred = clf.predict(X_forecast_std)

We then opted to append the predictions array above to a pandas dataFrame and compile that data frame as a new CSV 'svmoutput.csv'

In [18]:
forearray = foredf.values.tolist()

i = 0

for element in forearray:

    element.append(fore_pred[i])
    #element.append(fore_prob[i][1])

    i +=1

df = pd.DataFrame(forearray)

df.to_csv('Data/svmoutput.csv')

As you can imagine the generated CSV has the same format as the input CSV with the only exception being the appended prediction column added to the end.

In [28]:
df = pd.io.parsers.read_csv(
    'Data/svmoutput.csv', 

)

print(df.shape)
print('\n')
print(df.head(10))
(1158, 11)


   0          1           2       3   4     5          6          7  8  9  10
0  0  37.810696 -122.183232  24.285  49  2.35  19.210625  64.125000  3  1   0
1  1  37.810696 -122.183232  19.725  61  3.53  19.210625  64.125000  3  1   0
2  2  37.810696 -122.183232  17.725  69  3.85  19.210625  64.125000  3  1   0
3  3  37.810696 -122.183232  18.635  67  3.35  19.210625  64.125000  3  1   0
4  4  37.810696 -122.183232  19.520  62  2.73  19.210625  64.125000  3  1   0
5  5  37.810696 -122.183232  19.560  62  2.76  19.210625  64.125000  3  1   0
6  0  38.267458 -114.595831  26.085  26  2.31  23.372187  35.270833  2  2   1
7  1  38.267458 -114.595831  24.195  32  4.26  23.372187  35.270833  2  2   1
8  2  38.267458 -114.595831  24.155  34  4.10  23.372187  35.270833  2  2   1
9  3  38.267458 -114.595831  23.075  43  3.87  23.372187  35.270833  2  2   1

Our code in practice¶

The ideas and code snippets above are the basis for our code. We have opted to use an object orientated approach as this lends us several advantages such as clarity and generalisability. Below is an example of a script that utilises our code to process, standardise, train then test a model. It outputs a prediction for every value in the test dataset along with its probability (calculated through Platt scaling) and the correct value. Finally it outputs the overall accuracy the model achieved when making predictions upon the dataset.

In [ ]:
'''
Author: Flinn Dolman

@License: MIT

An example script that leverages our code to train a model and make predictions based upon it. Predictions
are printed to stdout and then the model used to make the predictions is saved.
'''
from SVM import SVM
from Standardiser import Standardiser


def Main():
    forecast_loc = 'Data/svminput.csv'
    standard_data = Standardiser()
    standard_data.initialise()
    clf = SVM()
    clf.initialise(standard_data.get_std_X_train(),standard_data.get_std_X_test(),standard_data.get_y_train(),standard_data.get_y_test())
    print('\nThese are the predictions: {}\n'.format(clf.predictions()))
    predictions, probs = clf.predictions()
    y_test = standard_data.get_y_test()


    for i in range(0,len(predictions)-1):
        print('Prediction: {}, with probability: {}, correct value: {}'.format(predictions[i],probs[i], y_test[i]))

    print('Accuracy is: {}%'.format(clf.accuracy()*100))

    fore_Pred, fore_Prob = clf.forecast_Pred(standard_data.loadForecast(forecast_loc))

    standard_data.make_CSV(fore_Pred,fore_Prob,'Data/svmoutputnew.csv')

    clf.saveModel()

if __name__ =="__main__":

    Main()

The structure of our code means that all this script is really responsible for is initialisation of objects and the formatting of the predictions, probabilities and correct values.




WebWorldWind


SVMDocs

The Foundations¶

We decided to use the Django web framework for our site. We believed that it would give us the flexibility we needed to perform the backend python operations we needed. If the reader is not familiar with Django we recommend reading its documentation here: https://docs.djangoproject.com/en/1.11/before continuing on with this document. From here on out we assume a basic level of profiency with the framework as explaining its interacies would distract from the 'meat' of our application. Interpreting predictions produced by our model involved using Pythons 'csv' module and creating a dictionary containing latitude, longitude and fire probability values using a CSV generated by the model named: 'svmoutput.csv' This dictionary could then be handed to app.html via Django. The action takes place in views.py:

In [ ]:
import csv
import os
from django.shortcuts import render
from django import template
def index(request):
    return render(request, 'WPSsite/index.html')

def app(request):
    my_dict = {}
    station = []
    longs = []
    reader = csv.reader(open(os.path.dirname(os.path.realpath(__file__)) + '/svmoutput.csv','r'))
    stations = csv.reader(open(os.path.dirname(os.path.realpath(__file__)) + '/firestations.csv','r'))

    for row in stations:
        station.append(row)

    counter = 0
    for row in reader:
        if row[0] == "":
            pass
        else:
            key = []
            key.append("{0:.4f}".format(float(row[2])))
            key.append("{0:.4f}".format(float(row[3])))
            
            perc = float(row[11])

            if repr(key) in my_dict:
                my_dict[repr(key)].append(perc)
            else:
                my_dict[repr(key)] = [station[counter]]
                my_dict[repr(key)].append(perc)
                counter = counter + 1
                
    return render(request, 'WPSsite/app.html', {'my_dict':my_dict})

def outreach(request):
    print("hello")
    return render(request, 'WPSsite/outreach.html')

def docs(request):
    return render(request, 'WPSsite/docs.html')

def mission(request):
    return render(request, 'WPSsite/mission.html')

def team(request):
    return render(request, 'WPSsite/team.html')

def contact(request):
    return render(request, 'WPSsite/contact.html')

We also appended at the beginning of each location key in the dictionary the nearest firestation, using the 'firestations.csv' file. This was needed to show the nearest firestations on our app's infoboxes. By including 'my_dict' as part of the return call to app.html we can use its contents in app.html via Django. More on that later. Lets now go through app.html.

The App¶

In [ ]:
<head lang="en">
    <script src="http://worldwindserver.net/webworldwind/worldwind.min.js" type="text/javascript"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js" type = "text/javascript"></script>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/fancybox/3.1.20/jquery.fancybox.js"></script>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/fancybox/3.1.20/jquery.fancybox.css" type="text/css" media="screen"/>



    <style>

    .panels{
        /* Permalink - use to edit and share this gradient: http://colorzilla.com/gradient-editor/#ededf5+0,eeeef5+100 */
        background: rgb(237,237,245); /* Old browsers */
        background: -moz-linear-gradient(45deg, rgba(237,237,245,1) 0%, rgba(238,238,245,1) 100%); /* FF3.6-15 */
        background: -webkit-linear-gradient(45deg, rgba(237,237,245,1) 0%,rgba(238,238,245,1) 100%); /* Chrome10-25,Safari5.1-6 */
        background: linear-gradient(45deg, rgba(237,237,245,1) 0%,rgba(238,238,245,1) 100%); /* W3C, IE10+, FF16+, Chrome26+, Opera12+, Safari7+ */
        filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#ededf5', endColorstr='#eeeef5',GradientType=1 ); /* IE6-9 fallback on horizontal gradient */
        border-radius: 5px;
    }

    .apppanel{
        /* Permalink - use to edit and share this gradient: http://colorzilla.com/gradient-editor/#ffffff+0,f2f2f2+100 */
        background: #ffffff; /* Old browsers */
        background: -moz-linear-gradient(top, #ffffff 0%, #f2f2f2 100%); /* FF3.6-15 */
        background: -webkit-linear-gradient(top, #ffffff 0%,#f2f2f2 100%); /* Chrome10-25,Safari5.1-6 */
        background: linear-gradient(to bottom, #ffffff 0%,#f2f2f2 100%); /* W3C, IE10+, FF16+, Chrome26+, Opera12+, Safari7+ */
        filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#ffffff', endColorstr='#f2f2f2',GradientType=0 ); /* IE6-9 */
        border-radius: 5px;
        margin-left:10px;
        margin-right:10px;
        padding: 10px 10px 10px 10px;
        width:1024px;
        margin-bottom:5px;
        margin: 0 auto;
    }

    .legend{
        /* Permalink - use to edit and share this gradient: http://colorzilla.com/gradient-editor/#ffffff+0,f2f2f2+100 */
        background: #ffffff; /* Old browsers */
        background: -moz-linear-gradient(top, #ffffff 0%, #f2f2f2 100%); /* FF3.6-15 */
        background: -webkit-linear-gradient(top, #ffffff 0%,#f2f2f2 100%); /* Chrome10-25,Safari5.1-6 */
        background: linear-gradient(to bottom, #ffffff 0%,#f2f2f2 100%); /* W3C, IE10+, FF16+, Chrome26+, Opera12+, Safari7+ */
        filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#ffffff', endColorstr='#f2f2f2',GradientType=0 ); /* IE6-9 */	

        border-radius: 5px;
        margin-left:10px;
        margin-right:10px;
        padding: 0px 10px 0px 10px;
        width:1024px;
        margin: 0 auto;
    }

    .fullwidth{
        margin-left:10px;
        margin-right:10px;
    }

    .webworldfull{
        margin-left:10px;
        margin-right:10px;

    }

    .wholeapp{
        border-radius: 5px;
        margin-left:10px;
        margin-right:10px;
        background-color:#e3e3e3;
    }
    
    #canvasOne{
        background-image: url("{% static 'img/starry.jpg' %}");
    }
    
    button{
        height:34px;
    }

    </style>

</head>

In order to make the UI look nice and easy to use, we needed some CSS to do so. We initialised these within the head of the document. A lot of the general website CSS was imported by extending the header.html file, done so through the use of Jinja templating.

Next, buttons and the search box are defined:

In [ ]:
<div style="display:inline-block; padding-right:50px;">
    <button type="button" id = "button1" class="btn btn-primary btn-md">Today</button>
    <button type="button" id = "button2" class="btn btn-primary btn-md">Tomorrow</button>
    <button type="button" id = "button3" class="btn btn-primary btn-md">29th July</button>
    <button type="button" id = "button4" class="btn btn-primary btn-md">30th July</button>
    <button type="button" id = "button5" class="btn btn-primary btn-md">31st July</button>
    <button type="button" id = "button6" class="btn btn-primary btn-md">1st August</button>
</div>

<div class="input-group" id="searchBox" style="position:absolute; display:inline-block;">
    <input type="text" class="form-control" placeholder="Location" id="searchText" style="width:475px; left:0%; border: 1px solid black; border-color: #2e6da4;" />
    <span class="input-group-btn">
        <button id="searchButton" class="btn btn-primary" type="button">
            <span class="glyphicon glyphicon-search"></span>
        </button>
    </span>
</div>

Notice that the button initialisation text is static. Next comes Javascript that cirmcumvents this.

In [ ]:
    //on page ready set the dates for the buttons.
    $(document).ready(function(){
        var one = new Date();
        var two = new Date(one.getTime() + (24 * 60 * 60 * 1000));
        var three = new Date(two.getTime() + (24 * 60 * 60 * 1000));
        var four = new Date(three.getTime() + (24 * 60 * 60 * 1000));
        var five = new Date(four.getTime() + (24 * 60 * 60 * 1000));
        var six = new Date(five.getTime() + (24 * 60 * 60 * 1000));

        var month = new Array();
        month[0] = "January";
        month[1] = "February";
        month[2] = "March";
        month[3] = "April";
        month[4] = "May";
        month[5] = "June";
        month[6] = "July";
        month[7] = "August";
        month[8] = "September";
        month[9] = "October";
        month[10] = "November";
        month[11] = "December";

        $("#button1").text("Today");
        $("#button2").text("Tomorrow");
        $("#button3").text(three.getDate() + " " + (month[three.getMonth()]).substring(0, 3));
        $("#button4").text(four.getDate() + " " + (month[four.getMonth()]).substring(0, 3));
        $("#button5").text(five.getDate() + " " + (month[five.getMonth()]).substring(0, 3));
        $("#button6").text(six.getDate() + " " + (month[six.getMonth()]).substring(0, 3));
    });
    //create a 1 min timer before launching survey pop up  
    setTimeout(function(){$( document ).ready(function() {
            $( "#button7" ).trigger( "click" );

});

    },1000*60);

Essentially Jquery is being used so that when the page is loaded by a browser the button text is being updated in line with the current date. In addition, A timer is being started with a 60 second countdown that leads to a popup window for our survey. We included this as to ascertain current users experience with our app. The next section contains the bulk of the application. We will go through it step by step.

In [ ]:
    //import prerequisite js files.
    require(['../../static/js/src/WorldWind','../../static/js/LayerManager'], function(ww, LayerManager) {

        var wwd = new WorldWind.WorldWindow("canvasOne");
        var new_layers;

        // Define the event listener to initialize Web World Wind.
        // Tell World Wind to log only warnings.
            WorldWind.Logger.setLoggingLevel(WorldWind.Logger.LEVEL_WARNING);

        // Create the World Window.
        

            // Add imagery layers.
            var layers = [
                {layer: new WorldWind.BMNGLayer(), enabled: true},
                {layer: new WorldWind.BMNGLandsatLayer(), enabled: false},
                {layer: new WorldWind.BingAerialWithLabelsLayer(null), enabled: true},
                {layer: new WorldWind.CompassLayer(), enabled: true},
                {layer: new WorldWind.CoordinatesDisplayLayer(wwd), enabled: true},
                {layer: new WorldWind.ViewControlsLayer(wwd), enabled: true},
                {layer: new WorldWind.AtmosphereLayer(), enabled:true}
                ];

            var annotationsLayer = new WorldWind.RenderableLayer("Annotations");

            for (var l = 0; l < layers.length; l++) {
                layers[l].layer.enabled = layers[l].enabled;
                wwd.addLayer(layers[l].layer);
            }
            //dont really need this. Kept it in for easy switching of annotation colour.
            var backgroundColors = [
            WorldWind.Color.RED,
            WorldWind.Color.GREEN,
            WorldWind.Color.MAGENTA,
            WorldWind.Color.BLUE,
            WorldWind.Color.DARK_GRAY,
            WorldWind.Color.BLACK,
            WorldWind.Color.BLACK,
            WorldWind.Color.RED,
            WorldWind.Color.BLACK,
            WorldWind.Color.BLACK,
            WorldWind.Color.BLACK];
            


            var placemark,
                placemarkAttributes = new WorldWind.PlacemarkAttributes(null),
                highlightAttributes,
                placemarkLayer1 = new WorldWind.RenderableLayer("Placemarks"),
                placemarkLayer2 = new WorldWind.RenderableLayer("Placemarks"),
                placemarkLayer3 = new WorldWind.RenderableLayer("Placemarks"),
                placemarkLayer4 = new WorldWind.RenderableLayer("Placemarks"),
                placemarkLayer5 = new WorldWind.RenderableLayer("Placemarks"),
                placemarkLayer6 = new WorldWind.RenderableLayer("Placemarks"),
                latitude,
                longitude;

            // Set up the common placemark attributes.
            placemarkAttributes.imageScale = 1;
            placemarkAttributes.imageOffset = new WorldWind.Offset(
                WorldWind.OFFSET_FRACTION, 0.5,
                WorldWind.OFFSET_FRACTION, 0.5);
            placemarkAttributes.imageColor = WorldWind.Color.WHITE;
            //variables for placemark classification.
            var low = [];
            var mid = [];
            var high = [];
            var placemarks = [];
            var Objs = [];
            var probabilities = [];

Requirejs is used to load in external js files from our site js folder. In this case we are using WebWorldWind so we load in its initialisation file: WorldWind.js as well as LayerManager.js. Our LayerManager.js file is similar to the one that can be found at: https://webworldwind.org/examples/ but is slightly modified so that when the search bar is used it also zooms into the location rather than just finding it on the globe. The rest of the code is just initialsing the worldwindow, the globe and placemark related features.

In [ ]:
//Django cycle handles placemark generation and rendering.		
            "{z% for key, value in my_dict.items %z}"
                //probs is an array of svm predictions.			
                var probs = ['','','','','',''];
                var values = []
                //give a colour to a placemark based upon its predicted probability.
                for (var y = 0; y < probs.length; y=y+1){
                    if(probs[y] < 0.20){
                        values.push('rgba(34, 139, 34, 0)');
                    }
                    else if(probs[y] > 0.19 && probs[y] < 0.3){
                        values.push('rgb(255, 214, 51)');
                    }
                    else if(probs[y] > 0.29 && probs[y] < 0.4){
                        values.push('rgb(255, 167, 0)');
                    }
                    else if(probs[y] > 0.39 && probs[y] < 0.5){
                        values.push('rgb(255, 131, 0)');
                    }
                    else if(probs[y] > 0.49 && probs[y] < 0.6){
                        values.push('rgb(255, 84, 0)');
                    }
                    else if(probs[y] > 0.59 && probs[y] < 0.7){
                        values.push('rgb(255, 42, 0)');
                    }
                    else if(probs[y] > 0.69 && probs[y] < 0.8){
                        values.push('rgb(230, 0, 0)');
                    }
                    else if(probs[y] > 0.79 && probs[y] < 0.9){
                        values.push('rgb(26, 0, 0)');
                    }
                    else{
                        values.push('rgb(0, 0, 0)');
                    }
                }

                for (var x = 0; x < values.length; x=x+1) {
                    var canvas = document.createElement("canvas"),
                        ctx2d = canvas.getContext("2d"),
                        size = 64, c = size / 2  - 0.5, innerRadius = 2, outerRadius = 7;

                    canvas.width = size;
                    canvas.height = size;

                    // Create the custom image for the placemark.


                    var gradient = ctx2d.createRadialGradient(c, c, innerRadius, c, c, outerRadius);
                    gradient.addColorStop(0, values[x]);		

                    ctx2d.fillStyle = gradient;
                    ctx2d.arc(c, c, outerRadius, 0, 2 * Math.PI, false);
                    ctx2d.fill();


                    // Create the placemark.
                    placemark = new WorldWind.Placemark(new WorldWind.Position(",", 1e2), false, null);
                    placemark.altitudeMode = WorldWind.RELATIVE_TO_GROUND;
                    //define how we wish annotations to appear.
                    annotationAttributes = new WorldWind.AnnotationAttributes(null);
                    annotationAttributes.cornerRadius = 14;
                    annotationAttributes.backgroundColor = backgroundColors[4];
                    annotationAttributes.textColor = new WorldWind.Color(1, 1, 1, 1);
                    annotationAttributes.drawLeader = true;
                    annotationAttributes.leaderGapWidth = 40;
                    annotationAttributes.leaderGapHeight = 30;
                    annotationAttributes.opacity = 1;
                    annotationAttributes.scale = 1;
                    annotationAttributes.width = 200;
                    annotationAttributes.height = 150;
                    annotationAttributes.textAttributes.color = WorldWind.Color.WHITE;
                    annotationAttributes.insets = new WorldWind.Insets(10, 10, 10, 10);
                    //annotation @ placemark location
                    annotation = new WorldWind.Annotation(new WorldWind.Position(",", 1e2), annotationAttributes);

                    annotation.enabled = false;
               
                   
                    annotation.label = "Lat: "+ "" + "\nLong: " + "" + '\n Probability of fire: ' + ((parseFloat(probs[x])*100).toFixed(1)).toString() + "%" + "\n\nNearest Fire Station: \n"; 
                    
                  
                    placemarks.push(placemark);
                    probabilities.push(((parseFloat(probs[x])*100).toFixed(1)).toString());

                    //generate placemark labels that give placemark probs and classify placemarks based on probs
                    if (parseFloat(probs[x]) < 0.2){
                        console.log("");
                    }
                    else{
                        placemark.label = ((parseFloat(probs[x])*100).toFixed(1)).toString() + "%";

                        }
                    
                    if (probs[x] < 0.3){

                        placemark.enabled =false;

                        low.push(placemark);
                     }

                    else if(probs[x] > 0.3 && probs[x] < 0.5 ){

                        placemark.enabled = false;

                        mid.push(placemark);


                     }

                     else{
                         placemark.enabled = false;

                         high.push(placemark);
                         placemark.enabled = false;


                     }

                    


                    
                    // Create the placemark attributes for the placemark.
                    placemarkAttributes = new WorldWind.PlacemarkAttributes(placemarkAttributes);
                    // Wrap the canvas created above in an ImageSource object to specify it as the placemark image source.
                    placemarkAttributes.imageSource = new WorldWind.ImageSource(canvas);
                    placemark.attributes = placemarkAttributes;

                    // Create the highlight attributes for this placemark. Note that the normal attributes are specified as
                    // the default highlight attributes so that all properties are identical except the image scale. You could
                    // instead vary the color, image, or other property to control the highlight representation.
                    highlightAttributes = new WorldWind.PlacemarkAttributes(placemarkAttributes);
                    highlightAttributes.imageScale = 1.2;
                    placemark.highlightAttributes = highlightAttributes;

                    //create an object that we will use for associating annotations with placemarks. We need this later for picking.
                    var Obj = new Object();  
                    Obj.placemark = placemark;  
                    Obj.annotation = annotation;  

                    Obj.getPlacemark = function () {  
                        return this.placemark;  
                    };

                    Obj.getAnnotation = function () {  
                        return this.annotation;  
                    };

                    // if all the objects are in an array we can iterate through them easily.
                    Objs.push(Obj)


                    // Add the placemark to the layer.
                    
                    if(x==0){
                        placemarkLayer1.addRenderable(placemark);
                    }
                    else if(x==1){
                        placemarkLayer2.addRenderable(placemark);
                    }
                    else if(x==2){
                        placemarkLayer3.addRenderable(placemark);
                    }
                    else if(x==3){
                        placemarkLayer4.addRenderable(placemark);
                    }
                    else if(x==4){
                        placemarkLayer5.addRenderable(placemark);
                    }
                    else if(x==5){
                        placemarkLayer6.addRenderable(placemark);
                    }
                    else{
                        console.log("ERROR");
                    }

                    annotationsLayer.addRenderable(annotation);
                }
                
            
            
            "{z% endfor %z}"

Here is where we use the dictionary my_dict we discussed earlier. We use Jinja logic to go through each element of the dictionary, using a for loop. This code creates custom placemarks whose colour is determined by the probability of fire in svmoutput.csv and whose location is determined by the lats and longs in the same file. Then annotations are created and the placemarks are added to layers to be loaded according to which button is pressed by the user.

In [ ]:
new_layers = [placemarkLayer1,placemarkLayer2,placemarkLayer3,placemarkLayer4,placemarkLayer5,placemarkLayer6];
            wwd.addLayer(new_layers[0]);
            wwd.addLayer(annotationsLayer);

            var highlightedItems = [];
            var layerManger = new LayerManager(wwd);

            //Reveal all placemarks generated
            uncheckbox();
            //Activate highlight observation.
            highlight();

            //On button click show different day.
            $( "#button1" ).click(function() {
                dayLoaded(0)
            });
            $( "#button2" ).click(function() {
                dayLoaded(1)
            });
            $( "#button3" ).click(function() {
                dayLoaded(2)
            });
            $( "#button4" ).click(function() {
                dayLoaded(3)
            });
            $( "#button5" ).click(function() {
                dayLoaded(4)
            });
            $( "#button6" ).click(function() {
                dayLoaded(5)
            });

The above code does much of the prepartion work for the viewer to use the app. It adds the generated layers to the globe and enables the placemarks to be seen (uncheckbox()). Highlighting functionality such as annotation box appearance is then enabled through highlight() and jquery waits to run the dayloaded() function which controls which placemark layer (each corresponding to a day) appears according to a button push. The next section of code controls placemark filtering based on checkboxes. It is omitted here to keep this documentation from becoming to long but if the reader is interested they can check out app.html in our source code. We will move onto the highlight() function and its dependencies.

In [ ]:
function highlight(){
            wwd.addEventListener("mousemove", handlePick);
            
        }
 
        function handlePick(o) {
                // the mouse location.
                var x = o.clientX,
                    y = o.clientY;
       
                var redrawRequired = highlightedItems.length > 0; // must redraw if we de-highlight previous shapes

                // De-highlight any previously highlighted shapes. Reset the label back to its original value.
                for (var h = 0; h < highlightedItems.length; h++) {
                    for (var i = 0; i < Objs.length; i++){
                        
                        if(parseFloat(probabilities[i]) > 20 ){
                            placemarks[i].label = probabilities[i] + "%";
                        }
                    }
                    highlightedItems[h].highlighted = false;
                    //make annotation invisible
                    highlightedAnnotations[h].enabled = false;
           
                    
                }
                highlightedItems = [];
                highlightedAnnotations = [];

                // Perform the pick. Must first convert from window coordinates to canvas coordinates, which are
                // relative to the upper left corner of the canvas rather than the upper left corner of the page.
                var pickList = wwd.pick(wwd.canvasCoordinates(x, y));
                if (pickList.objects.length > 0) {
                    redrawRequired = true;
                }

                // Highlight the items picked by simply setting their highlight flag to true. 
                if (pickList.objects.length > 0) {
                    for (var p = 0; p < pickList.objects.length; p++) {
                        if (!pickList.objects[p].isTerrain) {
                            //iterate through our list of objects that associate placemarks and annotations 
                            for (var i = 0; i < Objs.length; i++){
                                //temporarily scrap labels so they dont appear over the annotations
                                placemarks[i].label = "";
                                //this if statement tests which highlighted placemark is part of the object that associates placemarks and annotations 
                                if(Object.is(pickList.objects[p].userObject,Objs[i].getPlacemark())){
                                    
                                    //if highlightedplacemark is part of object then make the associated annotation visible.
                                    
                                    Objs[i].getAnnotation().enabled = true;
                                    highlightedAnnotations.push(Objs[i].getAnnotation());
                                }
                            }

                            pickList.objects[p].userObject.highlighted = true;
                            
                            

                            // Keep track of highlighted items in order to de-highlight them later.
                            highlightedItems.push(pickList.objects[p].userObject);
                        }
                    }
                }

                // Update the window if we changed anything.
                if (redrawRequired) {
                    wwd.redraw(); // redraw to make the highlighting changes take effect on the screen
                }
            }

The basis for this section is based upon the annotation and picking examples at: https://webworldwind.org/examples/. Essentially what is going on is when a placemark is highlighted its corresponding annotation is enabled. At the same time the labels for every other placemark are set blank so as to ensure the clarity of the annotation. When its time to dehighlight the placemarks the labels are reset back to their initial value. Finally we will look at the dayloaded() function.

In [ ]:
//Adds and removes placemark layers depending on which button the user has clicked.			
        function dayLoaded(day) {
                //remove all placemarks to stop duplicate layers from being added. 
                wwd.removeLayer(new_layers[0]);
                wwd.removeLayer(new_layers[1]);
                wwd.removeLayer(new_layers[2]);
                wwd.removeLayer(new_layers[3]);
                wwd.removeLayer(new_layers[4]);
                wwd.removeLayer(new_layers[5]);
                wwd.removeLayer(annotationsLayer);
        
            if(day==0){
                wwd.addLayer(new_layers[0]);
                wwd.removeLayer(new_layers[1]);
                wwd.removeLayer(new_layers[2]);
                wwd.removeLayer(new_layers[3]);
                wwd.removeLayer(new_layers[4]);
                wwd.removeLayer(new_layers[5]);
                wwd.addLayer(annotationsLayer);
            }
            else if(day==1){
                wwd.removeLayer(new_layers[0]);
                wwd.addLayer(new_layers[1]);
                wwd.removeLayer(new_layers[2]);
                wwd.removeLayer(new_layers[3]);
                wwd.removeLayer(new_layers[4]);
                wwd.removeLayer(new_layers[5]);
                wwd.addLayer(annotationsLayer);
            }
            else if(day==2){
                wwd.removeLayer(new_layers[0]);
                wwd.removeLayer(new_layers[1]);
                wwd.addLayer(new_layers[2]);
                wwd.removeLayer(new_layers[3]);
                wwd.removeLayer(new_layers[4]);
                wwd.removeLayer(new_layers[5]);
                wwd.addLayer(annotationsLayer);
            }
            else if(day==3){
                wwd.removeLayer(new_layers[0]);
                wwd.removeLayer(new_layers[1]);
                wwd.removeLayer(new_layers[2]);
                wwd.addLayer(new_layers[3]);
                wwd.removeLayer(new_layers[4]);
                wwd.removeLayer(new_layers[5]);
                wwd.addLayer(annotationsLayer);
            }
            else if(day==4){
                wwd.removeLayer(new_layers[0]);
                wwd.removeLayer(new_layers[1]);
                wwd.removeLayer(new_layers[2]);
                wwd.removeLayer(new_layers[3]);
                wwd.addLayer(new_layers[4]);
                wwd.removeLayer(new_layers[5]);
                wwd.addLayer(annotationsLayer);
            }
            else if(day==5){
                wwd.removeLayer(new_layers[0]);
                wwd.removeLayer(new_layers[1]);
                wwd.removeLayer(new_layers[2]);
                wwd.removeLayer(new_layers[3]);
                wwd.removeLayer(new_layers[4]);
                wwd.addLayer(new_layers[5]);
                wwd.addLayer(annotationsLayer);
            }
            else{
                console.log("ERROR");
            }
        }
    });	


<br />
<center>
< 30%: <input type="checkbox" id="check1">
 30-50%: <input type="checkbox" id="check2">
> 50%: <input type="checkbox" id="check3">
<br/>
<input type="button" id="btnCheck" value = "Check" />
<br/>
Swiggity Swoo! If you are liking this app and want to help us out fill in the survey <a href="https://goo.gl/forms/uiuZsFpV8XU2ZiO32" target="_blank">here.</a> Thanks, it means alot!
</center>
<a id="various3" href="https://goo.gl/forms/uiuZsFpV8XU2ZiO32"><input id="button7" type="button" value="sneaky_button"/></a>
<script type="text/javascript">document.getElementById('button7').style.visibility = 'hidden';</script>

This function is called via button clicks. When that happens any existing placemark layer in the layerlist is removed and a new layer is added accoridng to which button was pressed. The final piece of html after this function are references to the survey that pops up. There is also a hyperlink on the page that takes clickers to the survey and that is implemented here.