Wildfires Valparaiso
Demo on Integrating geospatial information for effective forest fire prevention
Tools | story map (on-line) | model (960 Mb) | shp (36 Mb)
Motivation
-
Fires affect the environment, the population, and the economic activity of a territory.
-
Early forecasting of forest fire danger and unraveling the relevant mechanisms affecting their occurrence are essential for efficient management of available resources.
-
The availability of satellite image data and machine learning models presents a significant opportunity to address this challenge.
Sentinel 3 image during the Valparaíso fire (February 3, 2024), projected towards urban centers to the north.

Landsat images (SWIR-NIR-Red) from December 20, 2023, before the Valparaíso megafire (top), and from February 5, 2024, after the fire (bottom).

Reference: https://www.pucv.cl/
Based on a thorough review of scientific literature on fire prediction and the following criteria, we selected a study that we aim to implement in Chile:
-
Clarity and organization in the presentation of datasets and methodologies used. Published code and data, allowing the reproducibility of the study.
-
Prediction of wildfire danger for the following day in a fire-prone region of the Eastern Mediterranean, based on Deep Learning.
-
DL models are trained to capture the temporal and spatiotemporal context, generalize well for extreme wildfires, and demonstrate improved performance over the traditional Fire Weather Index (FWI).
-
Explainable Artificial Intelligence (xAI) techniques are applied to interpret the model's predictions and understand the mechanisms associated with fire production.
Wildfire Danger Prediction and Understanding With Deep Learning (Kondylatos et al., 2022. GRL)
Data Processing Workflow
In the following we explain the main steps in the data processing workflow that involves the sampling of the points, the preparation of the data samples used to train our model, the model training and the final model evaluation.
1. Sampling of data points
In a first stage, we select the points within our study region, the Region of Valparaiso in Central Chile, which we will use to build our positive and negative class samples to train a machine learning model.
Positive class
We define the positive class as representing the occurrence of a fire. Consequently, to obtain the samples associated to this class we randomly sample 400 points within each of the fire geometries used in this study (01_get_sample_points_fire.ipynb). The fires dataset was collected from the National Forest Corporation (CONAF) website and can be accessed from a shared drive (folder incendios_valparaiso).
In order to assure a high data quality in the positive samples, we discard sampled points located close to the border of the fire geometries, based on the satellite pixel geometry from which the data of the sampled points is extracted.
Negative class
To obtain the samples of the negative class we sample points from within the entire Region of Valparaiso, but out of all fire geometries, and we explore two approaches. First, we simply consider points outside of the fire geometries (negative class 1, see notebook 01_get_sample_points_no_fire.ipynb). The second approach is more restrictive, because we select points outside a buffer around the original fire geometries (negative class 2, see notebook 1_get_sample_points_with_dates_no_fire.ipynb).
2. Extraction of data from MODIS satellite
For the sampled points belonging to the positive and negative classes, we use the Google Earth Engine Python API to extract data from several MODIS satellite's daily collections such as Surface Reflectance, Surface Temperature (Day and Night), and some relevant spectral indices (NDVI, NDWI, EVI, BAI), which were also calculated directly from the appropriate reflectance bands. We additionally extracted data from the datasets ERA5 Climate Reanalysis and NASA NASADEM Digital Elevation 30m.
The data extraction is done in the notebook 02_extract_pixel_data_fire.ipynb for the sampled points in the positive class, and in the notebooks 02_extract_pixel_data_no_fire.ipynb and 02_extract_pixel_data_with_dates_no_fire.ipynb for the sampled points of the negative classes 1 and 2 described above.
For each of the sampled points, we extract data for a sequence of 10 consecutive days before a predefined prediction day. We use these sequences to build the time series used to train our model. The prediction day of each sequence is defined according to the class. For the positive class and the negative class 1, we define the prediction day as the starting date of the fire which each sampled point belongs to.... For the negative class 2, the prediction day is randomly sampled from a set of possible dates when no fire occurred in the study region.
3. Adding extra features
In order to enrich our dataset, we add features defined as the minimum distances calculated from the sampled points of both classes to various relevant geometry bodies in the region: water sources, routes, protected wildlife areas, and urban areas (03_create_features_distances.ipynb).
The shapefiles containing these geometries can be downloaded from https://drive.google.com/file/d/1iJqbJbAwkRNWS_f-iDqLk0WyCX_8KI_W/view?usp=sharing:
4. Preparation of model samples
Once we have a complete dataset of sampled points and their various features, we prepare the samples of the positive and negative classes which are input to the model training (04_prepare_model_data.ipynb). This process consists of transforming the 10-days time series of sampled points with all their associated feature values into arrays which are convenient format to input the model....
5. Model training
The hypothesis behind the model training is that a fire occurring on day
Following the work proposed by Kondylatos et al. (GRL, 2022), we formulate the problem as a supervised binary classification problem, training LSTM model
To reproduce the training of the LSTM model, please refer to the file model/README.md
6. Model evaluation
The notebook 06_model_evaluation.ipynb implements the calculation of relevant metrics used to measure the performance of the best trained model on a holdout testing dataset, which is set apart from the training process.
The model metrics can be found here in this repository.
7. Model prediction
After training and evaluating the model and for demonstrative purpose, we generate predictions for points in a grid that covers the entire Valparaiso Region and for prediction days that extend from 2016-11-04 to 2017-02-01 (see notebook 07_model_prediction_rHEALPix.ipynb).
These predictions can be accessed from a shared drive (file ../model_predictions/test_classneg_2/prediction_pixels_grid_rhealpix_valparaiso_clipped_all_distances.csv) and visualized with an interactive web application at https://cienciadedatos.inegi.org.mx/laboratorio/wildfires.
Future work
In the present work, we introduce an initial version of the trained model, which we plan to continue improving in the near future, by:
- Perform a more exhaustive optimization of the model hyperparameters.
- Include additional features to the model, such as other spectral indices that may be relevant for the fire prediction problem.
- Test our modelling workflow in other regions of interest in Chile or Mexico.