Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques

Youssef Bammou 1; Brahim Benzougagh 2; Abdessalam Ouallali 3; Shuraik Kader 4,5; Mustapha Raougua 6; Brahim Igmoullan 1

1, Department of Geology, Faculty of Science and Technology, Laboratory of Geo-Resources, Geo-Environment and Civil Engineering (L3G), Cadi Ayyad University, Marrakech, Morocco

2, Department of Geomorphology and Geomatics, Scientific Institute, Mohammed V, University in Rabat, Rabat‑City, Morocco

3, Process Engineering and Environment Laboratory, Hassan II University of Casablanca, Faculty of Sciences and Techniques of Mohammedia, Mohammedia, Morocco

4, School of Engineering and Built Environment, Griffith University, Nathan, QLD 4111, Australia

5, Green Infrastructure Research Labs (GIRLS), Cities Research Institute, Griffith University, Gold Coast, QLD 4215, Australia

6, Department of Geology, Faculty of Science and Technology, Data Science for Sustainable Earth Laboratory (Data4Earth), Sultan Moulay Slimane University, Beni Mellal, Morocco

E-mail:
youssef.bammou@ced.uca.ma

Received: 22/10/2024
Acceptance: 25/01/2025
Available Online: 26/01/2025
Published: 01/07/2025

DYSONA – Applied Science

 

Manuscript link
http://dx.doi.org/10.30493/DAS.2025.484839

Abstract

This research examines landslide susceptibility evaluation in the Tensift sub-catchment of Morocco. Despite the established nature of this research topic, this study employs machine learning (ML) models for landslide susceptibility assessment, acknowledging its distinctive traits and contributing elements. Landslide susceptibility was estimated using seven ML models (KNN, SVM, RF, XGBoost, ANN, LR, and DT), and their results were juxtaposed to identify the most suitable ML model for this application. The research combines quantitative and qualitative spatial data to map landslide susceptibility. Using different types of spatial data, such as slope, elevation, precipitation, and land use, in conjunction with ML models shows a comprehensive approach to the problem. A complete tolerance (TOL) and variance inflation factor (VIF) analysis was performed to select the conditioning factors to choose the most relevant features or factors that improve the accuracy of the landslide susceptibility model. The data was combined with a geographic database compiled from historical records of 1291 landslide areas in the region. The dataset was randomly split into a training set (70%) and a validation set (30%). The evaluation of models involved statistical indices and the ROC curve method. allowing a robust assessment. The XGBoost model was identified as the best-performing model with a high area under the curve (AUC) of 93.41%, closely followed by RF and KNN with AUC values of 91.09%. In addition, the root mean square error (RMSE) values were relatively low, ranging from a minimum of 0.257 for XGBoost to a maximum of 0.53 for ANN. The specific values obtained for AUC and RMSE offer significant insight into the efficacy of each model. The results of this study suggest that XGBoost is particularly effective for modeling landslide susceptibility in the Tensift sub-catchment, which may have implications for future research in similar semi-arid regions.

Keywords: Landslide susceptibility, Machine learning, GIS, Tensift, Morocco

Introduction

Landslides represent significant geomorphologic and geologic phenomena characterized by mass movements driven by gravitational forces, particularly in mountainous regions [1-3]. This natural phenomenon results in significant destruction of infrastructure, and loss of life across multiple regions globally on an annual basis. A comprehensive analysis of 4,862 landslide occurrences from 2004 to 2018 reveals that these events resulted in roughly 60,991 fatalities and considerable economic repercussions, establishing them as the most destructive natural hazard, comparable to floods and earthquakes [4][5]. The occurrence of landslides has garnered significant interest from governmental bodies and researchers globally, prompting efforts to devise and implement protective strategies due to the substantial risks they present [6].

Investigating the significant elements related to the characteristics of the study area and empirical evidence is essential for reducing the effects of landslides [7]. Various susceptibility models can be formulated to evaluate the probability of landslide occurrences in a specific region by utilizing existing data. The models presented herein assess the likelihood of potential landslide events utilizing methodologies including direct mapping and expert evaluation [8]. The development of these methods coincided with the emergence of spatial data processing and analysis algorithms, which gained prominence in the 1990s [9-11].

Landslides in Morocco are predominantly observed in the northern regions, particularly within the Rifain overthrust [12]. However, they also manifest in other areas, including the Tensift watershed in the High Atlas of Marrakech. Notably, the Tizi N’tichka national road, which links Marrakech to Ouarzazate, has been affected [13]. A significant landslide event in the village of Ijoukak in July 2019 resulted in the tragic loss of over 20 lives. Identifying regions susceptible to landslides is essential for the implementation of effective prevention strategies and risk management practices.

Extensive research conducted in Morocco indicates that the seven primary catchments of Tensift are recognized as regions significantly vulnerable to a range of natural hazards [14-16]. This region is defined by its steep inclines and high altitudes, which contribute to the potential for landslide hazards. The spatial analyses reveal the potential hazards posed by landslides to both infrastructure and human safety. Anticipating vulnerability to landslides represents a critical strategy for addressing this issue effectively.

The spatial prediction of landslide susceptibility maps is underpinned by a range of physical modeling methods and techniques employed by researchers for the simulation and mapping of landslide susceptibility [17]. In this context, the development of a database pertaining to particular rock properties necessitates considerable time investment and rigorous research efforts. Furthermore, a majority of scholars have demonstrated that this model category exhibits irregularities across extensive spatial regions [18][19]. Furthermore, these models present additional anomalies typically associated with the weighting of the chosen factors and the extent of their impact on landslides. Moreover, a variety of statistical models have been developed and implemented on a global scale. These models necessitate the utilization of remote sensing data alongside geographic information systems (GIS). The methodologies primarily rely on parameterization techniques, which encompass the frequency ratio (FR) [13], the value of information (VOI) [20], the analytic hierarchy process (AHP) [21], the entropy index (IoE) [22], multi-criteria decision analysis (MCDA) [23][24], and the statistical index (SI) [25]. The integration of landslide inventories with a range of influencing factors, including precipitation, soil type and texture, and fault direction, represents a critical advancement in the development of these models. Such methodologies have significantly streamlined the process of estimating landslide susceptibility. The dependability of these approaches is primarily attributed to the accessibility and availability of remote sensing data, including terrain geomorphology, water accumulation, and topography, which enhance the efficacy of these large-scale models.

Recently, a variety of machine learning algorithms have emerged for predicting landslide susceptibility through classification or regression techniques. These methodologies are employed for mapping areas prone to landslides, including Support Vector Machines (SVM) [26][27], K-Nearest Neighbors (KNN) [28], Random Forest (RF) [29], Extreme Gradient Boosting Contraction (XGBoost) [30], Artificial Neural Networks (ANN) [31], Decision Trees (DT) [27], and Logistic Regression (LR) [32].

This study aims to evaluate the effectiveness of this technique in Morocco, specifically in the mountainous region of the High Atlas of Marrakech, which is recognized for its historical and contemporary landslide occurrences. In this study, seven machine learning algorithms were employed for the mapping process: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Extreme Gradient Boosting Contraction (XGBoost), Artificial Neural Network (ANN), Decision Tree (DT), and Logistic Regression (LR). Notably, these algorithms have not been previously investigated for landslide simulation in the specified area. Consequently, the evaluation of the performances of these seven models has not been conducted across the various sub-catchments of the study area, representing a significant contribution to this research. Preparing related landslide susceptibility maps would illustrate the geographical delineation of regions prone to landslides, providing essential insights for planners to identify optimal locations for future development projects aimed at mitigating this risk.

This study’s findings contribute to the broader academic discourse, emphasizing the limitations encountered and proposing avenues for addressing these challenges in subsequent investigations. The originality of this research study is evident in its thorough assessment and comparison of various machine learning models for estimating landslide susceptibility across different regions, applicable not only to Morocco but also extending beyond its geographical confines, taking into account climatic, geological, and morphological variations. This innovative methodology employs a dataset comprising 14 conditional factors alongside historical landslide records, integrated with rigorous statistical analysis. The research employs a range of statistical indices alongside the ROC curve methodology to assess performance outcomes.

Material and Methods

Study area

The Tensift watershed is located in west-central Morocco, covering an area of 18,210 km². It typically comprises seven distinct sub-catchments. This study focuses on the R’dat, Zat, Ourika, Rheraya, N’Fis, El Mal, and Chichaoua catchments, extending from east to west of the watershed (Fig. 1) (Table 1).

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 1. Geographical location of Tenfist watershed (A), and its sub-basins (R’dat, Zat, Ourika, Rheraya, N’Fis, El Mal, and Chichaoua catchments, extending from east to west) (B and C). B represents altitude map of the area with red referring to mountainous and high altitude regions and green referring to low altitude regions.
Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Table 1. Physiographic and geological characteristics of the seven Atlas sub-basins

Landslide inventory map

This study established a database of landslide locations and stable sites following a comprehensive survey of the majority of the study area utilizing a GPS device and an analysis of Google Earth imagery for inaccessible regions. A total of 1,291 locations were documented, with 620 classified as landslide areas and the remaining 671 as stable locations (Fig. 2). In landslide modeling studies, the classification of inventory data typically relies on the quality and availability of the data. A common ratio of 70/30% for training and validation datasets is prevalent in the literature. The coding scheme (0, 1) was employed, where 1 denotes landslide-related pixels and 0 signifies non-landslide-related pixels. Subsequently, landslide inventory map was converted to raster data with a resolution of 30 m.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 2. Landslide inventories in the seven Tensift sub-basins

Data collection

The creation of the landslide susceptibility grids was based on different data sources (Table 2).

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Table 2. Data and sources used in the creation of landslide susceptibility grids

Landslide conditioning factors (LCFs)

There are many elements that favor the occurrence of landslides [33]. The choice of these criteria is predominantly associated with the distinct attributes of the study region, the utilized methodology, and the type and magnitude of landslides. This guarantees the development of a reliable and verified landslide susceptibility map through the appropriate execution of the selected methodology [34].

Fourteen causative factors were identified in this investigation (Fig. 3). These are founded on the most indicative local spatial, climatic, and morphological characteristics. UTM zone 29N was utilized for the georeferencing of all input data in this investigation. Aspect is a critical factor in landslide occurrences [35][36] and precisely denotes the direction of the maximum slope of the earth’s surface [37]. It is categorized as flat, north, northeast, east, southeast, south, southwest, west, and northwest (Fig. 3 A).

The type of soil significantly influences infiltration, runoff, and evaporation processes, which might impact site stability. This study utilizes soil types from the FAO’s World Harmonized Soil Database (WHSD), categorized into five classes based on coarse sand, silt/clay, and organic carbon contents (Fig. 3 B).

The proximity to roadways is an external anthropogenic factor that leads to road building, resulting in cut slopes that influence slope stability and disrupt the region’s topography [38]. According to the proximity to roadways, the Tensift sub-catchments were categorized into seven classes with intervals of 100 m (< 100 m, 100-200 m, 200-300 m, 300-400 m, 400-500 m, 500-600 m, and >600 m) (Fig. 3 C).

The topographic position index (TPI) indicates the elevation disparity between two adjacent cells [39]. Generally, high TPI values correspond to ridges, negative values indicate valleys, and values near zero describe steep or flat terrains. The gradient of the Tensift sub-catchments exhibits a combination of negative and positive values, spanning from -65.23 to 86.87 (Fig. 3 D).

The topographic wetness index (TWI) measures the drainage pattern respecting topography [18]. The TWI values in this study were derived from the identical DEM and classified into six categories (<5.03, 5.03-6.46, 6.46-8.23, 8.23-10.81, 10.81-14.73, and >14.73) (Fig. 3 E).

Faults are critical elements that directly affect vulnerability to ground movement. They significantly influence the degree of slope instability in fault-prone regions [6]. Numerous landslides are prevalent in regions adjacent to lineaments. This investigation delineated seven zones around the faults in the study area (<100, 100-200, 200-300, 300-400, 400-500, 500-600, and >600) (Fig. 3 F).

The normalized difference vegetation index (NDVI) measures vegetation cover in a certain area. The yearly NDVI map was produced using Sentinel-2 survey data from January 1, 2022, to December 31, 2022, via the Google Earth Engine platform. The resultant data was classified into five subclasses (-0.85-0, 0-0.10, 0.10-0.2, 0.2-0.3, and 0.3-1) (Fig. 3 G).

The land use classification within study area includes water bodies, agricultural fields, barren ground, forests, cultivated land, and pastures (Fig. 3 H). Previous studies demonstrate that areas with thick vegetation generally show reduced vulnerability to landslides in contrast to locations with sparse vegetation [39]. Precipitation can induce landslides due to the decrease in the safety factor of slopes [40]. The reduction in shear strength transpires during the infiltration phase of precipitation, which diminishes the strength of materials [41]. The rainfall record supplied by the Tensift Water Basin Agency (ABHT) for the nine stations enabled the creation of a rainfall map by the interpolation of mean annual rainfall via the kriging method. The outcome was categorized into eight subclasses (<154, 154-190, 190-230, 230-270, 270-320, 320-350, 350-390, 390-400) (Fig. 3 I).

The proximity to rivers significantly influences mountainous terrain, as waterways are crucial in the occurrence of landslides [37][42]. This study categorized the sub-catchments of Tensift into seven classes based on elevation intervals of 100 meters (<100 m, 100-200 m, 200-300 m, 300-400 m, 400-500 m, 500-600 m, and >600 m) (Fig. 3 J).

The elevation influences the spatial arrangement of slope forces [43]. Elevation of the study area varies from 385 to 4180 m. This variation was grouped into seven subclasses (<400, 700-1100, 1100-1900, 1900-2300, 2300-2700, 2700-4100 and >4100 m) (Fig. 3 K).

Curvature denotes the rate of change of the slope, indicating its effect on surface flow velocity, which may result in sediment displacement beneath [44]. Curvature was produced in ArcGIS 10.5 software from a digital elevation model (DEM) under concave, flat, and convex conditions (Fig. 3 L).

Slope is a crucial element frequently included in landslide susceptibility mappings, profoundly influencing slope stability [45]. The slope map for the Tensift sub-catchments was created using a DEM and categorized into six classifications (<7, 7-15, 15-25, 25-35, 35-40, and 40-71 degrees) (Fig. 3 M).

The lithology in study area exhibits various geological formations and facies (Table 1), which may influence the extent and severity of susceptibility to landslides. The lithological map is derived from the digitization of the 1:500,000 geological map of Morocco (Fig. 3 N).

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 3. The prepared landslide conditioning factor maps of  Tenfist watershed sub-basins: Aspect (A), Soil type (B), Distance to roads (C), TPI (D), TWI (E), Distance to faults (F), NDVI (G), LULC (H), Rainfall (I), Distance to rivers (J), Elevation (K), Curvature profile (L), Slope (M), and Lithology (N).

Selection of landslide factors

This study utilized seven predictive models to improve machine learning predictions of landslide vulnerability. These models were subjected to statistical analyses to uncover strong linear correlations among their components. The assessments comprised correlation matrix (CM) analysis, variance inflation factor (VIF) (Eq. 1), tolerance (TOL) (Eq. 2), and mutual information (MI) (Eq. 3). The analysis sought to identify and eliminate non-significant components. Substantial multicollinearity across components was evidenced by VIF values surpassing ten and TOL values below 0.1 [46]. In instances where two variables exhibited strong correlation and satisfied the multicollinearity criterion, the variable with the greater VIF value was discarded according to the CM analysis. MI analysis underscored the significance of elements contributing to landslides, with low MI values indicating negligible influence and warranting their exclusion.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques

Where:

j: the LS (landslide susceptibility) influence factor,

n: the subclass of GES influence factors,

Tol i: the tolerance of j,

VIF j: the variance inflation factor of j,

MI (n; j): the mutual information for n and j,

R: the coefficient of determination of the regression for the predisposition of j on all other predisposition factors,

H(n): the entropy of n,

H (n/j): the conditional entropy for n given the eroded area state factor j.

The optimum selection analysis procedure for landslide (LS) influencing factors and model application depended on the calculation of the normalized frequency ratio (NFR) (Eq. 4). This method, lately advocated to standardize the interpretation of input data across several components, seeks to guarantee consistency [47][48]. Subsequently, the frequency ratio (FR) (Eq. 5) was utilized to classify elements affecting LS, delineating the correlation between eroded locations (i.e., LS) and the determinants impacting LS [49]. The data were further normalized using this formula, transforming all maps to a normalized frequency ratio (NFR) scale from 0 (representing low landscape suitability) to 1 (representing high landscape suitability).

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques

Where: n represents the subclass of factors influencing landslide susceptibility (LS), FRn denotes the frequency ratio of n, NFRn signifies the normalized frequency ratio of n, Wn indicates the number of water sampling points located in n, Wt stands for the total number of water sampling points, Pn represents the number of pixels in n, and Pt represents the total number of all pixels.

The LS influencing factors were categorized into subclasses by analyzing maps generated using the Jenks technique of natural discontinuity [50]. However, it’s important to note that aspects such as aspect, land use and land cover (LULC), soil type, and lithology were categorized differently. Aspect was categorized by directional units, LULC was classified using supervised classification, and soil and lithologic units were categorized according to their respective units.

Landslide dataset preparation

The methodology employed for generating landslide susceptibility maps in this study is outlined in Fig. 4.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 4. The study workflow

Initially, the database was constructed from landslide susceptibility (LS) inventories (Fig. 2) and the 14 parameters impacting LS (Fig. 3). The database is converted into a numerical representation utilizing the frequency ratio (FR) approach to clarify the correlations between significant components and LS. Subsequently, assessments for multicollinearity, encompassing correlation matrix analysis (CM), variance inflation factors (VIF), tolerances (Tol), and mutual information (MI) tests, were performed to identify key factors influencing LS.

In the second phase, the performance and effectiveness of seven algorithms—Decision Tree (DT), Artificial Neural Network (ANN), Logistic Regression (LR), k-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), and XGBoost were assessed using validation metrics including specificity, sensitivity, false positive rate, precision, F1 score, accuracy, mean absolute error (MAE), root mean square error (RMSE), and area under the receiver operating characteristic (AUC-ROC) curve.

The database is partitioned into training and validation datasets, allocating 70% of the total data samples for training and 30% for validation, respectively, to produce LS maps. External validation of the random sample for each site is performed utilizing ArcGIS 10.5.1 software to guarantee an impartial sampling methodology.

Modeling approach based on ML models

This research is based on the utilization of seven algorithms used for flood vulnerability assessment: SVM, RF, LR, K-NN, DT, ANN, and XGBoost (Table 3).

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Table 3. Description of the algorithms applied in this study

Model evaluation measures

The developed models were validated using various performance measures, including specificity (Eq. 6), sensitivity (Eq. 7), accuracy (Eq. 8), and precision (Eq .9).

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques

With TP (true positive results), TN (true negative results), FP (false positive results), and FN (false negative results). The receiver operating characteristic (ROC) curve, was also used in the analysis. The most used ROC curve analyzes the AUC to determine the accuracy of the predictive models. RMSE and the MAE were also used in landslide susceptibility mapping. Both types of indices have been utilized in numerous academic studies.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques

Here, P and N denote the total number of pixels with and without torrential events, respectively. TP is a true positive, and TN is a true negative.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques

Where n represents the total number of samples in either the learning or testing phase. Xpredicted denotes the value projected by the landslide susceptibility model, while Xactual represents the observed value.

Results

Multicollinearity and factor selection

Multicollinearity analysis showed significant correlations among multiple parameters (Fig. 5). A strong positive correlation value of 0.63 was noted between the distance from roads and elevation. Moreover, significant linear correlations were observed between NDVI and precipitation, elevation and slope, lithology and elevation, slope and lithology, as well as between distances to roads and rivers.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 5. Pearson’s correlation coefficient (PCC) matrix among the included conditioning factors

Additionally, tolerance and variance inflation factor (VIF) analyses were performed to evaluate multicollinearity among the contributing variables. The tolerance (Tol) values varied from 0.30 to 0.92, with elevation demonstrating the greatest values and land use and land cover (LULC) showing the lowest. The highest reported VIF value was 3.30 for elevation, whilst the lowest value was 1.08 for LULC (Fig. 6 A).

The mutual information (MI) of the 14 components (Fig. 6 B) exhibits positive values ranging from 0.132 (slope) to 0.021 (LULC). The slope is the primary determinant, succeeded by elevation (0.120), lithology (0.118), distance from highways (0.095), and distance from faults (0.091).

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 6. Multicollinearity variance inflation factor (VIF) and tolerance (TOL) of the included conditioning factors (A), and relative importance (Manual information) (B) of conditioning factors used in this study

Landslide susceptibility mapping

The landslide susceptibility (LS) models were crafted through the integration of seven diverse algorithms. These algorithms generated a spectrum of probability predictions, spanning from 0 to 1, indicating the lowest and highest LS values, respectively. For heightened informativeness, the generated maps underwent segmentation into five distinct zones utilizing Jenks’ classification of natural breaks. These zones are delineated as follows: very low, low, moderate, high, and very high landslide susceptibility.

The initial visual scrutiny maps (Fig. 7) revealed certain trends. More specifically, the areas with very high LS values are concentrated in the center of the Tensift sub-basins, and these values get more critical downstream, especially for the Chichaoua sub-basins. Isolated occurrences of very high LS values were observed in the western areas. In contrast, regions exhibiting very low LS values were generally observed in low elevations and slope areas. This initial visual analysis provides valuable information on the spatial distribution of LS across the Tensift sub-catchments, highlighting high and low susceptibility areas. The data extracted from these maps plays a pivotal role in comprehending potential risks and directing management interventions within the study area.

Notable classification similarities are evident among the employed algorithms. The similarities were more evident in RF (Fig. 7 B), KNN (Fig. 7 D), DT (Fig. 7 E), and XGBoost (Fig. 7 G). The percentages of landslide susceptibility covered in the sub-basins of the Tensift Basin were compared based on the employed classification method (Fig. 8). All algorithms concurred that the Chichaoua sub-basin predominantly comprised regions with minimal to low landslide susceptibility (Fig. 8 B). Conversely, all models indicated the predominance of regions with high and very high susceptibility in the Ourika (Fig. 8 A), R’dat (Fig. 8 E), and N’Fis (Fig. 8 F) sub-basins. Notably, certain sub-basins exhibited significant discrepancies in classification among the employed algorithms. The El Mel sub-basin’s area was uniformly allocated across all landslide susceptibility categories in every method, except for the ANN, which indicated that the majority of this sub-basin was classified as highly or very highly susceptible (Fig. 8 G).

Comparing the areas susceptible to landslide risk (Fig. 9) reveals that the SVM (Fig. 9 A), RF (Fig. 9 B), LR (Fig. 9 C), KNN (Fig. 9 D), DT (Fig. 9 E), and XGBoost (Fig. 9 G) models indicate that the R’dat, Zat, and Ourika sub-basins are high-risk zones, with risk values ranging from 18.27% to 61.48%, 22.86% to 31.95%, and 10.34% to 44.55%, respectively. The ANN model (Fig. 9 F) indicates that 51.51% of the N’fis sub-catchment, 40.98% of the Ourika catchment, and 35.83% of the Rheraya catchment are susceptible to landslide risk.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 7. Landslide susceptibility maps in the Tensift sub-basins, Estimated by Support Vector Machine (SVM) (A), Random Forest (RF) (B), Logistic Regression (LR) (C), k-Nearest Neighbors (KNN) (D), Decision Tree (DT) (E), Artificial Neural Network (ANN) (F), and XGBoost (G) algorithms.
Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 8.  Percentages of landslide susceptibility coverage in the Tensift Basin sub-basins acording to the used classification algorithm
Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 9. Percentage areas exposed to landslides in the Tensift sub-catchments using Support Vector Machine (SVM) (A), Random Forest (RF) (B), Logistic Regression (LR) (C), k-Nearest Neighbors (KNN) (D), Decision Tree (DT) (E), Artificial Neural Network (ANN) (F), and XGBoost (G) algorithms and an overall comparison of the used algorithms (H). The colors in each area of plots A-G refer to landslide susceptibility with green refers to less susceptibility and red refers to more susceptibility

Validation and comparison of models

The effectiveness of the training (70%) and validation (30%) data was evaluated based on several performance metrics, including Precision (Pr), Sensitivity (Se), Specificity (Sp), Accuracy (Ac), F1 score, False Positive Rate (FPR), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the Area Under the Receiver Operating Characteristic (AUC-ROC) curve.

The XGBoost model exhibited exceptional performance on the training dataset, achieving a Precision Pr=0.971, a Sensitivity Se=0.985, a Specificity Sp=0.966, an Accuracy Ac=0.976, recall = 0.985, F1 score = 0.978, RMSE = 0.233, with a False Positive Rate (FPR) of 0.03, a Mean Absolute Error MAE=0.05, and an Area Under the Curve AUC=94.57% (Table 4). Moreover, the Random Forest (RF) and k-Nearest Neighbors (KNN) models demonstrated exceptional performance with Pr=0.958, Se=0.895, Sp=0.94, Ac=0.915, Recall=0.895, F1 score = 0.925, FPR=0.057, MAE=0.085, and AUC=91.47%. The Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), and Artificial Neural Network (ANN) models exhibited average to high performance, with sensitivity, specificity, precision, accuracy, area under the curve, and F1 scores exceeding 0.70, alongside minimal false positive rates and mean absolute error scores.

All models demonstrated superior performance on the validation dataset. Sensitivity values varied from a minimum of 0.71 in LR to a maximum of 0.95 in XGBoost (Table 5). Moreover, the XGBoost model consistently attained the greatest AUC values (93.41%), whilst the SVM model had the lowest values (70.03) (Fig. 10). The RMSE values for XGBoost indicate an optimal agreement with the observed and generated values; therefore, the expected susceptibility probability was achieved.

Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Table 4. Assessment of the performance of the studied algorithms using training data.
Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Table 5. Assessment of the performance of the studied algorithms using validation data
Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques
Figure 10. ROC curve analysis of different flooding models using training (A) and validation (B) data.

Discussion

This research employs seven models utilizing machine learning methods to assess landslide susceptibility in the Tensift sub-basins. A variety of influencing factors were identified and associated with a database derived from a historical landslide inventory. The examination of (CM), (TOL), (VIF), and (MI) concluded that the 14 identified factors can affect landslide occurrence in the study area.

This study’s findings reveal that five factors significantly contribute to the occurrence of landslides: Gradient, altitude, geological composition, distance to roads, and distance to rivers. The elements of slope and elevation are consistently affected by gravity, resulting in the movement of unstable soil along inclines. The incline of the hill correlates positively with the likelihood of landslides. Excavation activities related to the development or expansion of road networks may compromise slope stability and elevate the danger of landslides in proximity to highways [51]. Moreover, it is evident that the proximity to rivers significantly influences landslides. In fact, bank and gully erosion jeopardizes slope stability adjacent to rivers, heightening the risk of landslides in these regions [52].

Lithology and soil type were found to be significant factors in determining landslide susceptibility, necessitating the enhancement of these models through the incorporation of additional conditioning factors, particularly those associated with the soil during the study area’s development and preparation, specifically soil depth, moisture content, permeability, and surface roughness. The inclusion of these factors in subsequent research is advisable to enhance the models’ predictive capability. This will enable the creation of more precise landslide susceptibility maps and facilitate the link between these novel agroecological parameters and landslide events. Hybridizing learning methods could enhance model accuracy [53] through increasing the model’s robustness by incorporating various parameters and augmenting its predicted accuracy.

This study assesses the correlation between the landslide-free zone and the landslide-affected zone. The real landslide-free zone was far bigger than the landslide zone. To elucidate these disparities, future studies should further examine the ratio and correlation matrix of the landslide-free zone. Furthermore, the selection of a sample rate (70%/30%) for the training and validation datasets, deemed effective in this context, warrants re-assessment to incorporate the aforementioned impacting aspects. Various configurations of the sample ratio may be examined to guarantee more representative data and more dependable outcomes.

The characteristics, classification, and operation of the tested ML algorithms have been examined in previous research [50][54]. The current findings show that the XGBoost model was more accurate in classifying the studied regions according to their landslide susceptibility, which highlights the robustness of this model for similar studies and under similar semi-arid settings.

To gain a comprehensive understanding of landslide processes, it is advisable to concentrate on deep learning models capable of inferring intricate spatiotemporal correlations among variables. Integrating data-driven models with physical models can enhance the explainability and interpretability of machine learning models.

Conclusion

The results of this research show the effectiveness of integrating geospatial data into ML for landslide risk assessment. The advantages of using these models are the reduced evaluation time, the ideal accuracy, and the incorporation of both quantitative and qualitative data (e.g., NDVI, elevation, precipitation, slope, etc.) without the need to reclassify these constant factors, which provides more reliable results compared to classical statistical models. Implementing these models and methodologies in remote regions of Morocco’s many infrastructures lacking monitoring systems will enhance the comprehension of the nature, complexity, and severity of landslide-related issues and facilitate the development of secure and sustainable communities. The resultant landslide susceptibility maps might provide decision-makers and stakeholders with critical insights to enhance their comprehension of landslide-related hazards and formulate effective plans for prevention and mitigation of these natural disasters.

This study’s novelty resides in its thorough methodology for landslide susceptibility assessment, encompassing the intensified watershed, evaluation of various machine learning models, meticulous feature selection, and the identification of a robust model (i.e., XGBoost) for ongoing and comprehensive landslide susceptibility evaluation. These factors enhance understanding of machine learning applications for landslide susceptibility evaluation, specifically within the Tensift sub-basin in Morocco and other semi-arid regions.

References

  1. Park S, Kim J. Landslide susceptibility mapping based on random forest and boosted regression tree models, and a comparison of their performance. Appl. Sci. 2019;9(5):942. DOI
  2. Rahman G, Bacha AS, Ul Moazzam MF, Rahman AU, Mahmood S, Almohamad H, Al Dughairi AA, Al-Mutiry M, Alrasheedi M, Abdo HG. Assessment of landslide susceptibility, exposure, vulnerability, and risk in shahpur valley, eastern hindu kush. Front. Earth Sci. 2022;10:953627. DOI
  3. Silalahi FE, Pamela, Arifianti Y, Hidayat F. Landslide susceptibility assessment using frequency ratio model in Bogor, West Java, Indonesia. Geosci. Lett. 2019;6(1):10. DOI
  4. Froude MJ, Petley DN. Global fatal landslide occurrence from 2004 to 2016. Nat. Hazards Earth Syst. Sci. 2018;18(8):2161-81. DOI
  5. Pham BT, Nguyen-Thoi T, Qi C, Van Phong T, Dou J, Ho LS, Van Le H, Prakash I. Coupling RBF neural network with ensemble learning techniques for landslide susceptibility mapping. CATENA. 2020;195:104805. DOI
  6. Shahabi H, Khezri S, Ahmad BB, Hashim M. RETRACTED: Landslide susceptibility mapping at central Zab basin, Iran: A comparison between analytical hierarchy process, frequency ratio and logistic regression models. CATENA. 2014;115:55–70. DOI
  7. Kim JC, Lee S, Jung HS, Lee S. Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea. Geocarto Int. 2018;33(9):1000-15. DOI
  8. Salvatici T, Tofani V, Rossi G, D’Ambrosio M, Tacconi Stefanelli C, Masi EB, Rosi A, Pazzi V, Vannocci P, Petrolo M, Catani F. Application of a physically based model to forecast shallow landslides at a regional scale. Nat. Hazards Earth Syst. Sci. 2018;18(7):1919-35. DOI
  9. Carrara A, Guzzetti F, Cardinali M, Reichenbach P. Use of GIS technology in the prediction and monitoring of landslide hazard. Nat. Hazards. 1999;20:117-35. DOI
  10. Fell R, Corominas J, Bonnard C, Cascini L, Leroi E, Savage WZ. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning. Eng. Geol. 2008;102(3-4):85-98. DOI
  11. Thiery Y, Malet JP, Sterlacchini S, Puissant A, Maquaire O. Landslide susceptibility assessment by bivariate methods at large scales: application to a complex mountainous environment. Geomorphology. 2007;92(1-2):38-59. DOI
  12. Elmoulat M, Brahim LA, Mastere M, Jemmah AI. Mapping of mass movements susceptibility in the Zoumi Region using satellite image and GIS technology (Moroccan Rif). Int. J. Sci. Eng. Res. 2015;6(2):210-7.
  13. Youssef B, Bouskri I, Brahim B, Kader S, Brahim I, Abdelkrim B, Spalević V. The contribution of the frequency ratio model and the prediction rate for the analysis of landslide risk in the Tizi N’tichka area on the national road (RN9) linking Marrakech and Ouarzazate. CATENA. 2023;232:107464. DOI
  14. Cotti D, Harb M, Hadri A, Aboufirass M, Chaham KR, Libertino A, Campo L, Trasforini E, Krätzschmar E, Bellert F, Hagenlocher M. An integrated multi-risk assessment for floods and drought in the Marrakech-Safi region (Morocco). Front. Water. 2022;4:886648. DOI
  15. Karmaoui A, Zerouali S, Ayt Ougougdal H, Shah AA. A new mountain flood vulnerability index (MFVI) for the assessment of flood vulnerability. Sustain. Water Resour. Manag. 2021;7:1-3. DOI
  16. Meliho M, Khattabi A, Mhammdi N. Spatial assessment of soil erosion risk by integrating remote sensing and GIS techniques: a case of Tensift watershed in Morocco. Environ. Earth Sci. 2020;79(10):207. DOI
  17. Wang LJ, Guo M, Sawada K, Lin J, Zhang J. Landslide susceptibility mapping in Mizunami City, Japan: A comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. CATENA. 2015;135:271-82. DOI
  18. Pourghasemi HR, Rahmati O. Prediction of the landslide susceptibility: Which algorithm, which precision?. CATENA. 2018;162:177-92. DOI
  19. Zhao X, Chen W. Optimization of computational intelligence models for landslide susceptibility evaluation. Remote Sens. 2020;12(14):2180. DOI
  20. Manchar N, Benabbas C, Hadji R, Bouaicha F, Grecu F. Landslide susceptibility assessment in Constantine region (NE Algeria) by means of statistical models. Stud. Geotech. Mech. 2018;40(3):208-19. DOI
  21. Panchal S, Shrivastava AK. Landslide hazard assessment using analytic hierarchy process (AHP): A case study of National Highway 5 in India. Ain Shams Eng. J. 2022;13(3):101626. DOI
  22. Wang Y, Song C, Lin Q, Li J. Occurrence probability assessment of earthquake-triggered landslides with Newmark displacement values and logistic regression: The Wenchuan earthquake, China. Geomorphology. 2016;258:108-19. DOI
  23. Feizizadeh B, Blaschke T. Comparing GIS-Multicriteria Decision Analysis for landslide susceptibility mapping for the lake basin, Iran. In 2012 IEEE International Geoscience and Remote Sensing Symposium. IEEE. 2012:5390-3. DOI
  24. Saha A, Villuri VG, Bhardwaj A, Kumar S. A multi-criteria decision analysis (MCDA) approach for landslide susceptibility mapping of a part of Darjeeling District in North-East Himalaya, India. Appl. Sci. 2023;13(8):5062. DOI
  25. Wubalem A. Landslide susceptibility mapping using statistical methods in Uatzau catchment area, northwestern Ethiopia. Geoenvironmental Disasters. 2021;8(1):1. DOI
  26. Huang Y, Zhao L. Review on landslide susceptibility mapping using support vector machines. CATENA. 2018;165:520-9. DOI
  27. Zhiyong F, Changdong L, Wenmin Y. Landslide susceptibility assessment through TrAdaBoost transfer learning models using two landslide inventories. CATENA. 2023;222:106799. DOI
  28. Pradhan B, Jebur MN. Spatial prediction of landslide-prone areas through k-nearest neighbor algorithm and logistic regression model using high resolution airborne laser scanning data. Laser scanning applications in landslide assessment. Springer International Publishing. 2017:151-65. DOI
  29. Xu K, Zhao Z, Chen W, Ma J, Liu F, Zhang Y, Ren Z. Comparative study on landslide susceptibility mapping based on different ratios of training samples and testing samples by using RF and FR-RF models. Nat. Hazards Res. 2024;4(1):62-74. DOI
  30. Can R, Kocaman S, Gokceoglu C. A comprehensive assessment of XGBoost algorithm for landslide susceptibility mapping in the upper basin of Ataturk dam, Turkey. Appl. Sci. 2021;11(11):4993. DOI
  31. Gameiro S, Riffel ES, de Oliveira GG, Guasselli LA. Artificial neural networks applied to landslide susceptibility: The effect of sampling areas on model capacity for generalization and extrapolation. Appl. Geogr. 2021;137:102598. DOI
  32. Das G, Lepcha K. Application of logistic regression (LR) and frequency ratio (FR) models for landslide susceptibility mapping in Relli Khola river basin of Darjeeling Himalaya, India. SN Appl. Sci. 2019;1:1-22. DOI
  33. Sandric I, Ionita C, Chitu Z, Dardala M, Irimia R, Furtuna FT. Using CUDA to accelerate uncertainty propagation modelling for landslide susceptibility assessment. Environ. Model. Softw. 2019;115:176-86. DOI
  34. Jaafari A, Panahi M, Pham BT, Shahabi H, Bui DT, Rezaie F, Lee S. Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility. CATENA. 2019;175:430-45. DOI
  35. Meinhardt M, Fink M, Tünschel H. Landslide susceptibility analysis in central Vietnam based on an incomplete landslide inventory: Comparison of a new method to calculate weighting factors by means of bivariate statistics. Geomorphology. 2015;234:80-97. DOI
  36. Song Y, Niu R, Xu S, Ye R, Peng L, Guo T, Li S, Chen T. Landslide susceptibility mapping based on weighted gradient boosting decision tree in Wanzhou section of the Three Gorges Reservoir Area (China). ISPRS Int. J. Geo-Inf. 2018;8(1):4. DOI
  37. Fang Z, Wang Y, Peng L, Hong H. A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping. Int. J. Geogr. Inf. 2021;35(2):321-47. DOI
  38. Al-Najjar HA, Pradhan B. Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks. Geosci. Front. 2021;12(2):625-37. DOI
  39. Yusof NM, Pradhan B. Landslide susceptibility mapping along PLUS expressways in Malaysia using probabilistic based model in GIS. In IOP Conference Series: Earth and Environmental Science 2014;20(1):012031. DOI
  40. Grelle G, Soriano M, Revellino P, Guerriero L, Anderson MG, Diambra A, Fiorillo F, Esposito L, Diodato N, Guadagno FM. Space–time prediction of rainfall-induced shallow landslides through a combined probabilistic/deterministic approach, optimized for initial water table conditions. Bull. Eng. Geol. Environ. 2014;73:877-90. DOI
  41. Capparelli G, Tiranti D. Application of the MoniFLaIR early warning system for rainfall-induced landslides in Piedmont region (Italy). Landslides. 2010;7(4):401-10. DOI
  42. Park S, Choi C, Kim B, Kim J. Landslide susceptibility mapping using frequency ratio, analytic hierarchy process, logistic regression, and artificial neural network methods at the Inje area, Korea. Environ. Earth Sci. 2013;68:1443-64. DOI
  43. Althuwaynee OF, Pradhan B, Lee S. A novel integrated model for assessing landslide susceptibility mapping using CHAID and AHP pair-wise comparison. Int. J. Remote Sens. 2016;37(5):1190-209. DOI
  44. Chen W, Chai H, Sun X, Wang Q, Ding X, Hong H. A GIS-based comparative study of frequency ratio, statistical index and weights-of-evidence models in landslide susceptibility mapping. Arab. J. Geosci. 2016;9:1-6. DOI
  45. Hadji R, Limani Y, Baghem M, Demdoum A. Geologic, topographic and climatic controls in landslide hazard assessment using GIS modeling: a case study of Souk Ahras region, NE Algeria. Quat. Int. 2013;302:224-37. DOI
  46. Miao F, Zhao F, Wu Y, Li L, Török Á. Landslide susceptibility mapping in Three Gorges Reservoir area based on GIS and boosting decision tree model. Stoch. Environ. Res. Risk Assess. 2023;37(6):2283-303. DOI
  47. Namous M, Hssaisoune M, Pradhan B, Lee CW, Alamri A, Elaloui A, Edahbi M, Krimissa S, Eloudi H, Ouayah M, Elhimer H. Spatial prediction of groundwater potentiality in large semi-arid and karstic mountainous region using machine learning models. Water. 2021;13(16):2273. DOI
  48. Mao W, Xu C, Yang Y. Investigation on strength degradation of sandy soil subjected to concentrated particle erosion. Environ. Earth Sci. 2022;81. DOI
  49. Masoud AM, Pham QB, Alezabawy AK, El-Magd SA. Efficiency of geospatial technology and multi-criteria decision analysis for groundwater potential mapping in a Semi-Arid region. Water. 2022;14(6):882. DOI
  50. Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021;2(3):160. DOI
  51. Omar H, Ibrahim AL, Hashim M. Slope stability analysis using remote sensing data. Department of Remote Sensing. Faculty of Geoinformation Science and Engineering, Universiti Teknologi Malaysia. 2007.
  52. Kukemilks K, Saks T. Landslides and gully slope erosion on the banks of the Gauja River between the towns of Sigulda and Ligatne. Est. J. Earth Sci. 2013;62(4):231. DOI
  53. Yariyan P, Avand M, Soltani F, Ghorbanzadeh O, Blaschke T. Earthquake vulnerability mapping using different hybrid models. Symmetry. 2020;12(3):405. DOI
  54. Liu H, Lang B. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 2019;9(20):4396. DOI

Cite this article:

Bammou, Y., Benzougagh, B., Ouallali, A., Kader, S., Raougua, M., Igmoullan, B. Improving landslide susceptibility mapping in semi-arid regions using machine learning and geospatial techniques. DYSONA – Applied Science, 2025;6(2): 269-290. doi: 10.30493/das.2025.484839