{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Supplementary Script S1: Data Acquisition and Preprocessing (Google Earth Engine)\n",
    "\n",
    "**Study:** Assessing Drivers of Forest Loss as Indicators of Habitat Degradation in the Greater Kafue Ecosystem of Zambia Using a Random Forest Approach  \n",
    "**Authors:** Gift Mulenga, Darius Phiri, Matamyo Simwanda, Vincent R. Nyirenda  \n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "## Purpose\n",
    "\n",
    "This script acquires and pre-processes the 14 predictor (driver) variables used in the Random Forest analysis.  \n",
    "All data are retrieved via the Google Earth Engine (GEE) Python API and exported to Google Drive as 30 m GeoTIFF rasters.\n",
    "\n",
    "## Inputs Required\n",
    "\n",
    "| Input | Source | Notes |\n",
    "|-------|--------|-------|\n",
    "| GKE boundary shapefile | Local file (provided with manuscript) | `GKE_Boundary.shp` |\n",
    "| KNP boundary shapefile | Local file | `nationalParks.shp` |\n",
    "| GMA boundary shapefile | Local file | `GMA.shp` |\n",
    "| Roads shapefile | Local file (derived from OpenStreetMap) | `Roads.shp`; GRIP4 used as GEE fallback |\n",
    "| Settlement footprints | GEE asset | `projects/ee-giftmulenga13/assets/gke_buildings` |\n",
    "\n",
    "## Outputs Produced\n",
    "\n",
    "All 14 predictor rasters (30 m, EPSG:32735) exported to Google Drive folder `GKE_Drivers`:\n",
    "\n",
    "| Category | Variables |\n",
    "|----------|-----------|\n",
    "| Proximity | `dist_roads`, `dist_settlements`, `dist_rivers`, `dist_knp` |\n",
    "| Socio-economic | `pop_density`, `pop_change`, `pct_cultivated` |\n",
    "| Conservation | `protection_status`, `years_protected` |\n",
    "| Topographic | `elevation`, `slope`, `aspect`, `twi` |\n",
    "| Climatic | `mean_rainfall`, `mean_temp` |\n",
    "\n",
    "> **Note:** `years_protected` is exported here but excluded from the final RF model after VIF screening (see Script S2).\n",
    "\n",
    "## How to Use\n",
    "\n",
    "> **⚠ USER ACTION REQUIRED:** Update all paths in the `Config` class (Section 1) before running.\n",
    "\n",
    "1. Install requirements: `pip install earthengine-api geemap geopandas rasterio`\n",
    "2. Authenticate GEE: `ee.Authenticate()` (one-time setup)\n",
    "3. Update paths in `Config` to match your local file locations and GEE assets\n",
    "4. Run all cells in order\n",
    "5. Monitor export tasks at: https://code.earthengine.google.com/tasks\n",
    "6. Download completed files to `Config.OUTPUT_DIR` before running Script S2\n",
    "\n",
    "## Scientific Notes\n",
    "\n",
    "- **Spatial resolution:** 30 m (matching Landsat-derived land-cover maps)\n",
    "- **CRS:** EPSG:32735 (UTM Zone 35S), appropriate for central-western Zambia\n",
    "- **Climate data period:** 2000–2020 (20-year mean), representing long-term average conditions rather than a single year, to avoid annual anomalies\n",
    "- **Roads:** Local shapefile derived from OpenStreetMap (OSM); GRIP4 global roads dataset used as GEE fallback if local file unavailable\n",
    "- **pct_cultivated:** Uses ESA WorldCover 2021 (class 40 = Cropland) rather than the study's own LULC maps, to avoid circular prediction (a predictor should not be derived from the same classification used as the response variable)\n",
    "- **Protection dates:** Kafue National Park established 1950; GMAs established ~1972 under the Zambia National Parks and Wildlife Act\n",
    "\n",
    "## Dependencies\n",
    "\n",
    "```\n",
    "earthengine-api >= 0.1.370\n",
    "geemap >= 0.20.0\n",
    "geopandas >= 0.12.0\n",
    "rasterio >= 1.3.0\n",
    "numpy >= 1.23.0\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2eace7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import ee\n",
    "import geemap\n",
    "import geopandas as gpd\n",
    "import os\n",
    "from datetime import datetime\n",
    "import json\n",
    "import numpy as np\n",
    "\n",
    "# Try importing rasterio for validation (optional but recommended)\n",
    "try:\n",
    "    import rasterio\n",
    "    from rasterio.mask import mask\n",
    "    RASTERIO_AVAILABLE = True\n",
    "except ImportError:\n",
    "    RASTERIO_AVAILABLE = False\n",
    "    print(\"Warning: rasterio not available. Post-export validation will be limited.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f80af5f",
   "metadata": {},
   "source": [
    "#### 1. CONFIGURATION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12246ba2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 1. CONFIGURATION\n",
    "# =============================================================================\n",
    "\n",
    "class Config:\n",
    "    \"\"\"Configuration parameters for the analysis\"\"\"\n",
    "    \n",
    "    # Study Area\n",
    "    STUDY_AREA_NAME = \"Greater Kafue Ecosystem\"\n",
    "    BUFFER_DISTANCE = None  # 20km buffer in meters\n",
    "    \n",
    "    # Coordinate Reference System\n",
    "    TARGET_CRS = \"EPSG:32735\"  # UTM Zone 35S\n",
    "    TARGET_RESOLUTION = 30  # meters\n",
    "    \n",
    "    # ==========================================================================\n",
    "    # DIRECTORY PATHS - UPDATE THESE TO YOUR LOCATIONS!\n",
    "    # ==========================================================================\n",
    "    BASE_DIR = \"D:/Publication/Habitat Loss in the Greater Kafue Ecosystem/Thesis_anlysis/Working/11122025_GKE_Analysis_Scripts\"\n",
    "    DATA_DIR = os.path.join(BASE_DIR, \"data/data\")\n",
    "    OUTPUT_DIR = \"E:/Research/Msc_Tropical_Ecology/GKE_Objective_B/data\"\n",
    "    \n",
    "    # ==========================================================================\n",
    "    # INPUT FILE PATHS - Using existing shapefiles from Objective A\n",
    "    # ==========================================================================\n",
    "    BOUNDARIES = {\n",
    "        \"gke\": os.path.join(DATA_DIR, \"GKE_Boundary.shp\"),\n",
    "        \"knp\": os.path.join(DATA_DIR, \"nationalParks.shp\"),\n",
    "        \"gma\": os.path.join(DATA_DIR, \"GMA.shp\"),\n",
    "        \"roads\": os.path.join(DATA_DIR, \"Roads.shp\"),\n",
    "    }\n",
    "    \n",
    "    # LULC Raster files for each time point (for change detection)\n",
    "    LULC_RASTERS = {\n",
    "        1984: os.path.join(DATA_DIR, \"GKE_1984.tif\"),\n",
    "        1994: os.path.join(DATA_DIR, \"GKE_1994.tif\"),\n",
    "        2004: os.path.join(DATA_DIR, \"GKE_2004.tif\"),\n",
    "        2014: os.path.join(DATA_DIR, \"GKE_2014.tif\"),\n",
    "        2024: os.path.join(DATA_DIR, \"GKE_2024.tif\"),\n",
    "    }\n",
    "    \n",
    "    # ==========================================================================\n",
    "    # GOOGLE EARTH ENGINE ASSETS\n",
    "    # ==========================================================================\n",
    "    GEE_ASSETS = {\n",
    "        \"gke\": None,\n",
    "        \"knp\": None,\n",
    "        \"gma\": None,\n",
    "        \"roads\": \"projects/ee-giftmulenga13/assets/Roads\",\n",
    "        \"lulc_2024\":None, # \"projects/ee-giftmulenga13/assets/GKE_2024\",\n",
    "        \"settlements\": \"projects/ee-giftmulenga13/assets/gke_buildings\",\n",
    "    }\n",
    "    \n",
    "    # ==========================================================================\n",
    "    # Date Ranges for Temporal Variables\n",
    "    # ==========================================================================\n",
    "    CLIMATE_START = \"2000-01-01\"\n",
    "    CLIMATE_END = \"2020-12-31\"\n",
    "    POPULATION_YEARS = [2000, 2010, 2020]\n",
    "    \n",
    "    # Analysis Year\n",
    "    ANALYSIS_YEAR = 2024\n",
    "    \n",
    "    # Export settings\n",
    "    EXPORT_FOLDER = \"GKE_Drivers\"\n",
    "    EXPORT_SCALE = 30  # meters\n",
    "    \n",
    "    # ==========================================================================\n",
    "    # DRIVER VARIABLE DEFINITIONS (15 candidate variables; 14 retained after VIF screening)\n",
    "    # ==========================================================================\n",
    "    DRIVER_VARIABLES = {\n",
    "        'proximity': ['dist_roads', 'dist_settlements', 'dist_rivers', 'dist_knp'],\n",
    "        'socioeconomic': ['pop_density', 'pop_change', 'pct_cultivated'],\n",
    "        'conservation': ['protection_status', 'years_protected'],\n",
    "        'topographic': ['elevation', 'slope', 'aspect', 'twi'],\n",
    "        'climatic': ['mean_rainfall', 'mean_temp']\n",
    "    }\n",
    "    \n",
    "    # Protection establishment dates (with citations)\n",
    "    # KNP: Established 1950 (Zambia Wildlife Authority)\n",
    "    # GMAs: Established ~1972 under Zambia National Parks and Wildlife Act\n",
    "    PROTECTION_DATES = {\n",
    "        'knp': 1950,  # Kafue National Park established\n",
    "        'gma': 1972   # Game Management Areas system established\n",
    "    }\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb64ee38",
   "metadata": {},
   "source": [
    "#### 2.INITIALIZATION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf192bd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 2. INITIALIZATION\n",
    "# =============================================================================\n",
    "\n",
    "def initialize_gee():\n",
    "    \"\"\"Initialize Google Earth Engine\"\"\"\n",
    "    try:\n",
    "        ee.Initialize()\n",
    "        print(\"✓ Google Earth Engine initialized successfully\")\n",
    "    except Exception as e:\n",
    "        print(f\"Initializing GEE with authentication...\")\n",
    "        ee.Authenticate()\n",
    "        ee.Initialize()\n",
    "        print(\"✓ Google Earth Engine initialized after authentication\")\n",
    "\n",
    "\n",
    "def create_output_directories():\n",
    "    \"\"\"Create output directories if they don't exist\"\"\"\n",
    "    dirs = [\n",
    "        Config.OUTPUT_DIR,\n",
    "        os.path.join(Config.OUTPUT_DIR, \"proximity\"),\n",
    "        os.path.join(Config.OUTPUT_DIR, \"socioeconomic\"),\n",
    "        os.path.join(Config.OUTPUT_DIR, \"conservation\"),\n",
    "        os.path.join(Config.OUTPUT_DIR, \"topographic\"),\n",
    "        os.path.join(Config.OUTPUT_DIR, \"climatic\"),\n",
    "        os.path.join(Config.OUTPUT_DIR, \"validation\"),\n",
    "    ]\n",
    "    for d in dirs:\n",
    "        os.makedirs(d, exist_ok=True)\n",
    "    print(f\"✓ Output directories created/verified at {Config.OUTPUT_DIR}\")\n",
    "\n",
    "\n",
    "def verify_input_files():\n",
    "    \"\"\"Verify that all input files exist\"\"\"\n",
    "    print(\"\\n--- Verifying Input Files ---\")\n",
    "    all_exist = True\n",
    "    \n",
    "    for name, path in Config.BOUNDARIES.items():\n",
    "        if os.path.exists(path):\n",
    "            print(f\"  ✓ {name}: {path}\")\n",
    "        else:\n",
    "            print(f\"  ✗ {name}: {path} NOT FOUND\")\n",
    "            all_exist = False\n",
    "    \n",
    "    return all_exist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5970763a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 3. STUDY AREA DEFINITION\n",
    "# =============================================================================\n",
    "\n",
    "def load_local_shapefile(filepath, to_epsg=4326):\n",
    "    \"\"\"Load a local shapefile and return as GeoJSON for GEE\"\"\"\n",
    "    gdf = gpd.read_file(filepath)\n",
    "    \n",
    "    # Reproject to WGS84 if needed (GEE requires geographic coordinates)\n",
    "    if gdf.crs and gdf.crs.to_epsg() != to_epsg:\n",
    "        gdf = gdf.to_crs(epsg=to_epsg)\n",
    "    \n",
    "    return json.loads(gdf.to_json())\n",
    "\n",
    "\n",
    "def shapefile_to_ee_geometry(filepath):\n",
    "    \"\"\"Convert shapefile to EE Geometry\"\"\"\n",
    "    geojson = load_local_shapefile(filepath)\n",
    "    \n",
    "    # Handle multiple features by getting the union\n",
    "    features = geojson['features']\n",
    "    if len(features) == 1:\n",
    "        return ee.Geometry(features[0]['geometry'])\n",
    "    else:\n",
    "        geometries = [ee.Geometry(f['geometry']) for f in features]\n",
    "        return ee.Geometry.MultiPolygon([g.coordinates().getInfo() for g in geometries]).dissolve()\n",
    "\n",
    "\n",
    "def shapefile_to_ee_featurecollection(filepath):\n",
    "    \"\"\"Convert shapefile to EE FeatureCollection\"\"\"\n",
    "    geojson = load_local_shapefile(filepath)\n",
    "    return ee.FeatureCollection(geojson['features'])\n",
    "\n",
    "\n",
    "def get_study_area():\n",
    "    \"\"\"Get study area geometry from local shapefile or GEE asset\"\"\"\n",
    "    print(\"\\n--- Loading Study Area ---\")\n",
    "    \n",
    "    # Try local shapefile first\n",
    "    if os.path.exists(Config.BOUNDARIES[\"gke\"]):\n",
    "        print(\"  Using local shapefile...\")\n",
    "        study_area = shapefile_to_ee_geometry(Config.BOUNDARIES[\"gke\"])\n",
    "        print(f\"  ✓ Loaded from: {Config.BOUNDARIES['gke']}\")\n",
    "    elif Config.GEE_ASSETS.get(\"gke\"):\n",
    "        print(\"  Using GEE asset...\")\n",
    "        study_area = ee.FeatureCollection(Config.GEE_ASSETS[\"gke\"]).geometry()\n",
    "        print(f\"  ✓ Loaded from: {Config.GEE_ASSETS['gke']}\")\n",
    "    else:\n",
    "        raise FileNotFoundError(\"No GKE boundary found. Please provide shapefile or GEE asset path.\")\n",
    "    \n",
    "    return study_area\n",
    "\n",
    "\n",
    "def get_boundary_features():\n",
    "    \"\"\"Get boundary FeatureCollections for KNP, GMAs, and roads\"\"\"\n",
    "    print(\"\\n--- Loading Boundary Features ---\")\n",
    "    \n",
    "    boundaries = {}\n",
    "    \n",
    "    # KNP\n",
    "    if os.path.exists(Config.BOUNDARIES[\"knp\"]):\n",
    "        boundaries['knp'] = shapefile_to_ee_featurecollection(Config.BOUNDARIES[\"knp\"])\n",
    "        print(f\"  ✓ KNP loaded from local shapefile\")\n",
    "    elif Config.GEE_ASSETS.get(\"knp\"):\n",
    "        boundaries['knp'] = ee.FeatureCollection(Config.GEE_ASSETS[\"knp\"])\n",
    "        print(f\"  ✓ KNP loaded from GEE asset\")\n",
    "    \n",
    "    # GMAs\n",
    "    if os.path.exists(Config.BOUNDARIES[\"gma\"]):\n",
    "        boundaries['gma'] = shapefile_to_ee_featurecollection(Config.BOUNDARIES[\"gma\"])\n",
    "        print(f\"  ✓ GMAs loaded from local shapefile\")\n",
    "    elif Config.GEE_ASSETS.get(\"gma\"):\n",
    "        boundaries['gma'] = ee.FeatureCollection(Config.GEE_ASSETS[\"gma\"])\n",
    "        print(f\"  ✓ GMAs loaded from GEE asset\")\n",
    "    \n",
    "    # Roads\n",
    "    if os.path.exists(Config.BOUNDARIES[\"roads\"]):\n",
    "        boundaries['roads'] = shapefile_to_ee_featurecollection(Config.BOUNDARIES[\"roads\"])\n",
    "        print(f\"  ✓ Roads loaded from local shapefile\")\n",
    "    elif Config.GEE_ASSETS.get(\"roads\"):\n",
    "        boundaries['roads'] = ee.FeatureCollection(Config.GEE_ASSETS[\"roads\"])\n",
    "        print(f\"  ✓ Roads loaded from GEE asset\")\n",
    "    \n",
    "    return boundaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9883fec5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 4. PROXIMITY VARIABLES (INCLUDING dist_knp)\n",
    "# =============================================================================\n",
    "\n",
    "def calculate_distance_to_feature(feature_collection, study_area, pixel_size=30, max_distance=100000):\n",
    "    \"\"\"\n",
    "    Calculate Euclidean distance to nearest feature\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    feature_collection : ee.FeatureCollection\n",
    "        Features to calculate distance to\n",
    "    study_area : ee.Geometry\n",
    "        Study area boundary\n",
    "    pixel_size : int\n",
    "        Pixel size in meters\n",
    "    max_distance : int\n",
    "        Maximum distance to calculate (meters)\n",
    "    \"\"\"\n",
    "    # Rasterize features\n",
    "    feature_img = ee.Image(0).paint(feature_collection, 1)\n",
    "    \n",
    "    # Calculate distance using fastDistanceTransform\n",
    "    # Note: fastDistanceTransform returns squared distance in pixels\n",
    "    distance = feature_img.Not().fastDistanceTransform(256).sqrt().multiply(pixel_size)\n",
    "    \n",
    "    # Clip to study area and cap at max distance\n",
    "    distance = distance.clip(study_area).min(max_distance)\n",
    "    \n",
    "    return distance\n",
    "\n",
    "\n",
    "def calculate_distance_to_knp(study_area, knp_fc, pixel_size=30):\n",
    "    \"\"\"\n",
    "    Calculate distance to KNP boundary\n",
    "    \n",
    "    This is important for understanding edge effects and spillover\n",
    "    of human activities from GMAs into the protected area.\n",
    "    \"\"\"\n",
    "    print(\"Processing: Distance to KNP Boundary...\")\n",
    "    \n",
    "    # Calculate distance\n",
    "    dist_knp = calculate_distance_to_feature(knp_fc, study_area, pixel_size)\n",
    "    \n",
    "    return dist_knp.rename('dist_knp')\n",
    "\n",
    "\n",
    "def get_proximity_variables(study_area, boundaries, pixel_size=30):\n",
    "    \"\"\"\n",
    "    Get proximity-based driver variables\n",
    "    \n",
    "    Variables:\n",
    "    - dist_roads: Distance to nearest road (m)\n",
    "    - dist_settlements: Distance to nearest settlement (m)\n",
    "    - dist_rivers: Distance to nearest river (m)\n",
    "    - dist_knp: Distance to KNP boundary (m) - NEW\n",
    "    \"\"\"\n",
    "    print(\"\\n--- Processing Proximity Variables ---\")\n",
    "    \n",
    "    # Distance to roads\n",
    "    print(\"Processing: Distance to Roads...\")\n",
    "    if 'roads' in boundaries:\n",
    "        dist_roads = calculate_distance_to_feature(\n",
    "            boundaries['roads'], study_area, pixel_size\n",
    "        ).rename('dist_roads')\n",
    "    else:\n",
    "        # Fallback to GRIP roads\n",
    "        print(\"  Using GRIP global roads dataset...\")\n",
    "        grip = ee.FeatureCollection(\"projects/sat-io/open-datasets/GRIP4/Africa\")\n",
    "        grip_clipped = grip.filterBounds(study_area)\n",
    "        dist_roads = calculate_distance_to_feature(\n",
    "            grip_clipped, study_area, pixel_size\n",
    "        ).rename('dist_roads')\n",
    "    \n",
    "    # Distance to settlements\n",
    "    print(\"Processing: Distance to Settlements...\")\n",
    "    if Config.GEE_ASSETS.get(\"settlements\"):\n",
    "        settlements = ee.FeatureCollection(Config.GEE_ASSETS[\"settlements\"])\n",
    "        dist_settlements = calculate_distance_to_feature(\n",
    "            settlements, study_area, pixel_size\n",
    "        ).rename('dist_settlements')\n",
    "    else:\n",
    "        # Fallback to GHSL built-up areas\n",
    "        print(\"  Using GHSL global settlements...\")\n",
    "        ghsl = ee.Image(\"JRC/GHSL/P2023A/GHS_BUILT_S/2020\")\n",
    "        built = ghsl.select('built_surface').gt(0)\n",
    "        dist_settlements = built.Not().fastDistanceTransform(256).sqrt().multiply(pixel_size)\n",
    "        dist_settlements = dist_settlements.clip(study_area).rename('dist_settlements')\n",
    "    \n",
    "    # Distance to rivers\n",
    "    print(\"Processing: Distance to Rivers...\")\n",
    "    # Using HydroSHEDS river network\n",
    "    rivers = ee.FeatureCollection(\"WWF/HydroSHEDS/v1/FreeFlowingRivers\")\n",
    "    rivers_clipped = rivers.filterBounds(study_area)\n",
    "    dist_rivers = calculate_distance_to_feature(\n",
    "        rivers_clipped, study_area, pixel_size\n",
    "    ).rename('dist_rivers')\n",
    "    \n",
    "    # Distance to KNP boundary (NEW)\n",
    "    print(\"Processing: Distance to KNP Boundary...\")\n",
    "    knp_fc = boundaries.get('knp')\n",
    "    if knp_fc:\n",
    "        dist_knp = calculate_distance_to_knp(study_area, knp_fc, pixel_size)\n",
    "    else:\n",
    "        print(\"  Warning: KNP boundary not available, using WDPA...\")\n",
    "        wdpa = ee.FeatureCollection(\"WCMC/WDPA/current/polygons\")\n",
    "        kafue = wdpa.filter(ee.Filter.eq('NAME', 'Kafue'))\n",
    "        dist_knp = calculate_distance_to_feature(\n",
    "            kafue, study_area, pixel_size\n",
    "        ).rename('dist_knp')\n",
    "    \n",
    "    print(\"✓ Proximity variables completed.\")\n",
    "    \n",
    "    return {\n",
    "        'dist_roads': dist_roads,\n",
    "        'dist_settlements': dist_settlements,\n",
    "        'dist_rivers': dist_rivers,\n",
    "        'dist_knp': dist_knp\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d92b688e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 5. SOCIO-ECONOMIC VARIABLES (REVISED - ESA WorldCover for pct_cultivated)\n",
    "# =============================================================================\n",
    "\n",
    "def get_socioeconomic_variables(study_area):\n",
    "    \"\"\"\n",
    "    Get socio-economic driver variables\n",
    "    \n",
    "    IMPORTANT REVISION:\n",
    "    - pct_cultivated now uses ESA WorldCover 2021 instead of local 2024 LULC\n",
    "    - This avoids circular predictor issue where the predictor would be \n",
    "      derived from the same data used to define the target variable\n",
    "    \n",
    "    Variables:\n",
    "    - pop_density: Population density from WorldPop (persons per 100m pixel)\n",
    "    - pop_change: Population change 2000-2020 (absolute change)\n",
    "    - pct_cultivated: Percent cultivated land in 1km radius (from ESA WorldCover 2021)\n",
    "    \"\"\"\n",
    "    print(\"\\n--- Processing Socio-economic Variables ---\")\n",
    "    \n",
    "    # Population density (WorldPop 2020)\n",
    "    print(\"Processing: Population Density...\")\n",
    "    worldpop = ee.ImageCollection(\"WorldPop/GP/100m/pop\")\n",
    "    pop_2020 = worldpop.filterDate('2020-01-01', '2020-12-31').filter(\n",
    "        ee.Filter.eq('country', 'ZMB')\n",
    "    ).first()\n",
    "    \n",
    "    # Note: WorldPop values are persons per 100m pixel, not per km²\n",
    "    # To convert to persons/km²: multiply by 100 (since 1km² = 100 × 100m pixels)\n",
    "    pop_density = pop_2020.clip(study_area).rename('pop_density')\n",
    "    \n",
    "    # Population change (2000-2020)\n",
    "    print(\"Processing: Population Change...\")\n",
    "    pop_2000 = worldpop.filterDate('2000-01-01', '2000-12-31').filter(\n",
    "        ee.Filter.eq('country', 'ZMB')\n",
    "    ).first()\n",
    "    \n",
    "    pop_change = pop_2020.subtract(pop_2000).clip(study_area).rename('pop_change')\n",
    "    \n",
    "    # ==========================================================================\n",
    "    # PERCENT CULTIVATED - TO USE ESA WORLDCOVER 2021\n",
    "    # ==========================================================================\n",
    "    # CRITICAL: Using ESA WorldCover 2021 instead of local 2024 LULC\n",
    "    # This avoids the circular predictor problem where pct_cultivated would\n",
    "    # be derived from the same classification used to define habitat loss\n",
    "    print(\"Processing: Percent Cultivated (ESA WorldCover 2021)...\")\n",
    "    print(\"  NOTE: Using ESA WorldCover 2021 to avoid circular predictor issue\")\n",
    "    \n",
    "    # ESA WorldCover 2021 - cropland class = 40\n",
    "    worldcover = ee.ImageCollection(\"ESA/WorldCover/v200\").first()\n",
    "    cropland = worldcover.select('Map').eq(40)  # Cropland class in ESA WorldCover\n",
    "    \n",
    "    # Calculate percent cultivated in 1km neighborhood\n",
    "    kernel = ee.Kernel.circle(radius=1000, units='meters')\n",
    "    pct_cultivated = cropland.reduceNeighborhood(\n",
    "        reducer=ee.Reducer.mean(),\n",
    "        kernel=kernel\n",
    "    ).multiply(100).clip(study_area).rename('pct_cultivated')\n",
    "    \n",
    "    print(\"✓ Socio-economic variables completed.\")\n",
    "    \n",
    "    return {\n",
    "        'pop_density': pop_density,\n",
    "        'pop_change': pop_change,\n",
    "        'pct_cultivated': pct_cultivated\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76d68851",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 6. CONSERVATION VARIABLES\n",
    "# =============================================================================\n",
    "\n",
    "def get_protection_variables(study_area, knp_fc, gma_fc):\n",
    "    \"\"\"\n",
    "    Get conservation/protection driver variables\n",
    "    \n",
    "    Variables:\n",
    "    - protection_status: Categorical (0=unprotected, 1=GMA, 2=KNP)\n",
    "    - years_protected: Years since protection establishment\n",
    "    \"\"\"\n",
    "    print(\"\\n--- Processing Conservation Variables ---\")\n",
    "    \n",
    "    # Protection status (0=unprotected, 1=GMA, 2=KNP)\n",
    "    print(\"Processing: Protection Status...\")\n",
    "    \n",
    "    # Create base image (unprotected = 0)\n",
    "    protection = ee.Image(0).clip(study_area)\n",
    "    \n",
    "    # Paint GMAs as 1\n",
    "    gma_mask = ee.Image(0).paint(gma_fc, 1).clip(study_area)\n",
    "    protection = protection.where(gma_mask.eq(1), 1)\n",
    "    \n",
    "    # Paint KNP as 2 (higher protection)\n",
    "    knp_mask = ee.Image(0).paint(knp_fc, 1).clip(study_area)\n",
    "    protection = protection.where(knp_mask.eq(1), 2)\n",
    "    \n",
    "    protection_status = protection.rename('protection_status')\n",
    "    \n",
    "    # Years protected\n",
    "    print(\"Processing: Years Protected...\")\n",
    "    # Using documented establishment dates\n",
    "    current_year = Config.ANALYSIS_YEAR\n",
    "    knp_years = current_year - Config.PROTECTION_DATES['knp']  # 74 years (1950-2024)\n",
    "    gma_years = current_year - Config.PROTECTION_DATES['gma']  # 52 years (1972-2024)\n",
    "    \n",
    "    print(f\"  KNP established: {Config.PROTECTION_DATES['knp']} ({knp_years} years)\")\n",
    "    print(f\"  GMAs established: {Config.PROTECTION_DATES['gma']} ({gma_years} years)\")\n",
    "    \n",
    "    years_protected = ee.Image(0).clip(study_area)\n",
    "    years_protected = years_protected.where(gma_mask.eq(1), gma_years)\n",
    "    years_protected = years_protected.where(knp_mask.eq(1), knp_years)\n",
    "    \n",
    "    years_protected = years_protected.rename('years_protected')\n",
    "    \n",
    "    print(\"✓ Conservation variables completed.\")\n",
    "    \n",
    "    return {\n",
    "        'protection_status': protection_status,\n",
    "        'years_protected': years_protected\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e284374",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 7. TOPOGRAPHIC VARIABLES\n",
    "# =============================================================================\n",
    "\n",
    "def get_topographic_variables(study_area):\n",
    "    \"\"\"\n",
    "    Get topographic driver variables from SRTM DEM\n",
    "    \n",
    "    Variables:\n",
    "    - elevation: Elevation (m)\n",
    "    - slope: Slope (degrees)\n",
    "    - aspect: Aspect (degrees from north)\n",
    "    - twi: Topographic Wetness Index\n",
    "    \"\"\"\n",
    "    print(\"\\n--- Processing Topographic Variables ---\")\n",
    "    \n",
    "    # Load SRTM DEM\n",
    "    dem = ee.Image(\"USGS/SRTMGL1_003\").clip(study_area)\n",
    "    \n",
    "    # Elevation\n",
    "    print(\"Processing: Elevation...\")\n",
    "    elevation = dem.rename('elevation')\n",
    "    \n",
    "    # Slope\n",
    "    print(\"Processing: Slope...\")\n",
    "    slope = ee.Terrain.slope(dem).rename('slope')\n",
    "    \n",
    "    # Aspect\n",
    "    print(\"Processing: Aspect...\")\n",
    "    aspect = ee.Terrain.aspect(dem).rename('aspect')\n",
    "    \n",
    "    # Topographic Wetness Index (TWI)\n",
    "    print(\"Processing: Topographic Wetness Index...\")\n",
    "    # TWI = ln(a / tan(b)) where a = flow accumulation, b = slope\n",
    "    \n",
    "    # Calculate flow accumulation\n",
    "    flow_acc = ee.Image(\"WWF/HydroSHEDS/15ACC\").clip(study_area)\n",
    "    \n",
    "    # Calculate TWI\n",
    "    slope_rad = slope.multiply(ee.Number(3.14159).divide(180))\n",
    "    tan_slope = slope_rad.tan().max(0.001)  # Avoid division by zero\n",
    "    twi = flow_acc.log().subtract(tan_slope.log()).rename('twi')\n",
    "    \n",
    "    print(\"✓ Topographic variables completed.\")\n",
    "    \n",
    "    return {\n",
    "        'elevation': elevation,\n",
    "        'slope': slope,\n",
    "        'aspect': aspect,\n",
    "        'twi': twi\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10c6a08a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 8. CLIMATIC VARIABLES\n",
    "# =============================================================================\n",
    "\n",
    "def get_climatic_variables(study_area):\n",
    "    \"\"\"\n",
    "    Get climatic driver variables\n",
    "    \n",
    "    Variables:\n",
    "    - mean_rainfall: Mean annual rainfall (mm) from CHIRPS\n",
    "    - mean_temp: Mean annual temperature (°C) from ERA5\n",
    "    \"\"\"\n",
    "    print(\"\\n--- Processing Climatic Variables ---\")\n",
    "    \n",
    "    # Mean annual rainfall (CHIRPS 2000-2020)\n",
    "    print(\"Processing: Mean Annual Rainfall...\")\n",
    "    chirps = ee.ImageCollection(\"UCSB-CHG/CHIRPS/DAILY\").filterDate(\n",
    "        Config.CLIMATE_START, Config.CLIMATE_END\n",
    "    )\n",
    "    \n",
    "    # Calculate annual totals then mean\n",
    "    years = ee.List.sequence(2000, 2020)\n",
    "    \n",
    "    def annual_precip(year):\n",
    "        start = ee.Date.fromYMD(year, 1, 1)\n",
    "        end = start.advance(1, 'year')\n",
    "        return chirps.filterDate(start, end).sum()\n",
    "    \n",
    "    annual_precip_images = ee.ImageCollection.fromImages(\n",
    "        years.map(lambda y: annual_precip(ee.Number(y)))\n",
    "    )\n",
    "    \n",
    "    mean_rainfall = annual_precip_images.mean().clip(study_area).rename('mean_rainfall')\n",
    "    \n",
    "    # Mean annual temperature (ERA5-Land 2000-2020)\n",
    "    print(\"Processing: Mean Annual Temperature...\")\n",
    "    era5 = ee.ImageCollection(\"ECMWF/ERA5_LAND/MONTHLY_AGGR\").filterDate(\n",
    "        Config.CLIMATE_START, Config.CLIMATE_END\n",
    "    )\n",
    "    \n",
    "    # Convert from Kelvin to Celsius\n",
    "    mean_temp = era5.select('temperature_2m').mean().subtract(273.15)\n",
    "    mean_temp = mean_temp.clip(study_area).rename('mean_temp')\n",
    "    \n",
    "    print(\"✓ Climatic variables completed.\")\n",
    "    \n",
    "    return {\n",
    "        'mean_rainfall': mean_rainfall,\n",
    "        'mean_temp': mean_temp\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4af1cc7f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 9. EXPORT FUNCTIONS\n",
    "# =============================================================================\n",
    "\n",
    "def export_image_to_drive(image, description, folder, region, scale=30, crs=\"EPSG:32735\"):\n",
    "    \"\"\"\n",
    "    Export image to Google Drive\n",
    "    \n",
    "    Returns task object for monitoring\n",
    "    \"\"\"\n",
    "    task = ee.batch.Export.image.toDrive(\n",
    "        image=image,\n",
    "        description=description,\n",
    "        folder=folder,\n",
    "        region=region,\n",
    "        scale=scale,\n",
    "        crs=crs,\n",
    "        maxPixels=1e13,\n",
    "        fileFormat='GeoTIFF'\n",
    "    )\n",
    "    task.start()\n",
    "    return task\n",
    "\n",
    "\n",
    "def export_all_variables(all_variables, study_area):\n",
    "    \"\"\"\n",
    "    Export all driver variables to Google Drive\n",
    "    \n",
    "    Returns dict of task objects\n",
    "    \"\"\"\n",
    "    print(\"\\n--- Exporting Variables to Google Drive ---\")\n",
    "    \n",
    "    tasks = {}\n",
    "    \n",
    "    for category, variables in all_variables.items():\n",
    "        print(f\"\\n  {category.upper()}:\")\n",
    "        for var_name, image in variables.items():\n",
    "            description = f\"{var_name}\"\n",
    "            print(f\"    Exporting: {var_name}...\")\n",
    "            \n",
    "            task = export_image_to_drive(\n",
    "                image=image,\n",
    "                description=description,\n",
    "                folder=Config.EXPORT_FOLDER,\n",
    "                region=study_area,\n",
    "                scale=Config.EXPORT_SCALE,\n",
    "                crs=Config.TARGET_CRS\n",
    "            )\n",
    "            \n",
    "            tasks[var_name] = task\n",
    "    \n",
    "    print(f\"\\n✓ {len(tasks)} export tasks started\")\n",
    "    return tasks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6878d75",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 10. POST-EXPORT VALIDATION\n",
    "# =============================================================================\n",
    "\n",
    "def validate_exported_raster(filepath, expected_crs=\"EPSG:32735\", expected_resolution=30):\n",
    "    \"\"\"\n",
    "    Validate an exported raster file\n",
    "    \n",
    "    Checks:\n",
    "    - File exists and is readable\n",
    "    - CRS matches expected\n",
    "    - Resolution matches expected\n",
    "    - NoData coverage percentage\n",
    "    - Basic statistics (min, max, mean, std)\n",
    "    - Extreme value detection\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    dict : Validation results\n",
    "    \"\"\"\n",
    "    if not RASTERIO_AVAILABLE:\n",
    "        return {'status': 'skipped', 'reason': 'rasterio not available'}\n",
    "    \n",
    "    if not os.path.exists(filepath):\n",
    "        return {'status': 'failed', 'reason': 'file not found'}\n",
    "    \n",
    "    results = {\n",
    "        'filepath': filepath,\n",
    "        'filename': os.path.basename(filepath),\n",
    "        'status': 'passed',\n",
    "        'warnings': [],\n",
    "        'errors': []\n",
    "    }\n",
    "    \n",
    "    try:\n",
    "        with rasterio.open(filepath) as src:\n",
    "            # Basic metadata\n",
    "            results['crs'] = str(src.crs)\n",
    "            results['shape'] = src.shape\n",
    "            results['resolution'] = src.res\n",
    "            results['bounds'] = src.bounds\n",
    "            results['dtype'] = str(src.dtypes[0])\n",
    "            results['nodata_value'] = src.nodata\n",
    "            \n",
    "            # CRS check\n",
    "            if src.crs and src.crs.to_string() != expected_crs:\n",
    "                results['warnings'].append(f\"CRS mismatch: expected {expected_crs}, got {src.crs}\")\n",
    "            \n",
    "            # Resolution check\n",
    "            if abs(src.res[0] - expected_resolution) > 1:\n",
    "                results['warnings'].append(\n",
    "                    f\"Resolution mismatch: expected {expected_resolution}m, got {src.res[0]:.1f}m\"\n",
    "                )\n",
    "            \n",
    "            # Read data and calculate statistics\n",
    "            data = src.read(1)\n",
    "            \n",
    "            # Handle NoData\n",
    "            if src.nodata is not None:\n",
    "                valid_mask = data != src.nodata\n",
    "                valid_data = data[valid_mask]\n",
    "            else:\n",
    "                valid_mask = ~np.isnan(data)\n",
    "                valid_data = data[valid_mask]\n",
    "            \n",
    "            # NoData coverage\n",
    "            total_pixels = data.size\n",
    "            valid_pixels = valid_data.size\n",
    "            nodata_pct = ((total_pixels - valid_pixels) / total_pixels) * 100\n",
    "            \n",
    "            results['total_pixels'] = int(total_pixels)\n",
    "            results['valid_pixels'] = int(valid_pixels)\n",
    "            results['nodata_percentage'] = round(nodata_pct, 2)\n",
    "            \n",
    "            # Statistics\n",
    "            if valid_pixels > 0:\n",
    "                results['statistics'] = {\n",
    "                    'min': float(np.min(valid_data)),\n",
    "                    'max': float(np.max(valid_data)),\n",
    "                    'mean': float(np.mean(valid_data)),\n",
    "                    'std': float(np.std(valid_data)),\n",
    "                    'median': float(np.median(valid_data))\n",
    "                }\n",
    "                \n",
    "                # Extreme value detection\n",
    "                q1, q3 = np.percentile(valid_data, [25, 75])\n",
    "                iqr = q3 - q1\n",
    "                lower_bound = q1 - 3 * iqr\n",
    "                upper_bound = q3 + 3 * iqr\n",
    "                \n",
    "                extreme_low = np.sum(valid_data < lower_bound)\n",
    "                extreme_high = np.sum(valid_data > upper_bound)\n",
    "                \n",
    "                results['extreme_values'] = {\n",
    "                    'below_lower_bound': int(extreme_low),\n",
    "                    'above_upper_bound': int(extreme_high),\n",
    "                    'lower_bound': float(lower_bound),\n",
    "                    'upper_bound': float(upper_bound)\n",
    "                }\n",
    "                \n",
    "                # Warnings for high NoData or extreme values\n",
    "                if nodata_pct > 10:\n",
    "                    results['warnings'].append(f\"High NoData coverage: {nodata_pct:.1f}%\")\n",
    "                \n",
    "                if (extreme_low + extreme_high) / valid_pixels > 0.05:\n",
    "                    results['warnings'].append(\n",
    "                        f\"High proportion of extreme values: {(extreme_low + extreme_high) / valid_pixels * 100:.1f}%\"\n",
    "                    )\n",
    "            else:\n",
    "                results['errors'].append(\"No valid pixels found\")\n",
    "                results['status'] = 'failed'\n",
    "                \n",
    "    except Exception as e:\n",
    "        results['errors'].append(str(e))\n",
    "        results['status'] = 'failed'\n",
    "    \n",
    "    # Set overall status\n",
    "    if results['errors']:\n",
    "        results['status'] = 'failed'\n",
    "    elif results['warnings']:\n",
    "        results['status'] = 'warning'\n",
    "    \n",
    "    return results\n",
    "\n",
    "\n",
    "def validate_all_exports(output_dir):\n",
    "    \"\"\"\n",
    "    Validate all exported raster files in the output directory\n",
    "    \n",
    "    Returns comprehensive validation report\n",
    "    \"\"\"\n",
    "    print(\"\\n\" + \"=\"*60)\n",
    "    print(\"POST-EXPORT VALIDATION\")\n",
    "    print(\"=\"*60)\n",
    "    \n",
    "    if not RASTERIO_AVAILABLE:\n",
    "        print(\"  Warning: rasterio not available. Install with 'pip install rasterio'\")\n",
    "        return None\n",
    "    \n",
    "    validation_results = {\n",
    "        'timestamp': datetime.now().isoformat(),\n",
    "        'output_directory': output_dir,\n",
    "        'variables': {},\n",
    "        'summary': {\n",
    "            'total': 0,\n",
    "            'passed': 0,\n",
    "            'warning': 0,\n",
    "            'failed': 0,\n",
    "            'skipped': 0\n",
    "        }\n",
    "    }\n",
    "    \n",
    "    # Expected variables by category\n",
    "    expected_files = {\n",
    "        'proximity': ['dist_roads.tif', 'dist_settlements.tif', 'dist_rivers.tif', 'dist_knp.tif'],\n",
    "        'socioeconomic': ['pop_density.tif', 'pop_change.tif', 'pct_cultivated.tif'],\n",
    "        'conservation': ['protection_status.tif', 'years_protected.tif'],\n",
    "        'topographic': ['elevation.tif', 'slope.tif', 'aspect.tif', 'twi.tif'],\n",
    "        'climatic': ['mean_rainfall.tif', 'mean_temp.tif']\n",
    "    }\n",
    "    \n",
    "    for category, files in expected_files.items():\n",
    "        print(f\"\\n  {category.upper()}:\")\n",
    "        category_dir = os.path.join(output_dir, category)\n",
    "        \n",
    "        for filename in files:\n",
    "            filepath = os.path.join(category_dir, filename)\n",
    "            var_name = filename.replace('.tif', '')\n",
    "            \n",
    "            result = validate_exported_raster(filepath)\n",
    "            validation_results['variables'][var_name] = result\n",
    "            validation_results['summary']['total'] += 1\n",
    "            validation_results['summary'][result['status']] += 1\n",
    "            \n",
    "            # Print status\n",
    "            status_symbol = {'passed': '✓', 'warning': '⚠', 'failed': '✗', 'skipped': '○'}\n",
    "            symbol = status_symbol.get(result['status'], '?')\n",
    "            \n",
    "            if result['status'] == 'passed':\n",
    "                stats = result.get('statistics', {})\n",
    "                print(f\"    {symbol} {var_name}: min={stats.get('min', 'N/A'):.2f}, \"\n",
    "                      f\"max={stats.get('max', 'N/A'):.2f}, NoData={result.get('nodata_percentage', 'N/A'):.1f}%\")\n",
    "            elif result['status'] == 'warning':\n",
    "                print(f\"    {symbol} {var_name}: {', '.join(result['warnings'])}\")\n",
    "            else:\n",
    "                print(f\"    {symbol} {var_name}: {result.get('reason', 'Unknown error')}\")\n",
    "    \n",
    "    # Print summary\n",
    "    print(f\"\\n  --- Validation Summary ---\")\n",
    "    print(f\"  Total variables: {validation_results['summary']['total']}\")\n",
    "    print(f\"  Passed: {validation_results['summary']['passed']}\")\n",
    "    print(f\"  Warnings: {validation_results['summary']['warning']}\")\n",
    "    print(f\"  Failed: {validation_results['summary']['failed']}\")\n",
    "    print(f\"  Skipped: {validation_results['summary']['skipped']}\")\n",
    "    \n",
    "    # Save validation report\n",
    "    report_path = os.path.join(output_dir, 'validation', 'validation_report.json')\n",
    "    os.makedirs(os.path.dirname(report_path), exist_ok=True)\n",
    "    \n",
    "    with open(report_path, 'w') as f:\n",
    "        json.dump(validation_results, f, indent=2, default=str)\n",
    "    \n",
    "    print(f\"\\n  ✓ Validation report saved: {report_path}\")\n",
    "    \n",
    "    return validation_results\n",
    "\n",
    "\n",
    "def generate_validation_summary_table(validation_results):\n",
    "    \"\"\"\n",
    "    Generate a summary table of validation results for the thesis\n",
    "    \"\"\"\n",
    "    if validation_results is None:\n",
    "        return None\n",
    "    \n",
    "    summary_data = []\n",
    "    \n",
    "    for var_name, result in validation_results['variables'].items():\n",
    "        if result['status'] == 'skipped':\n",
    "            continue\n",
    "            \n",
    "        stats = result.get('statistics', {})\n",
    "        \n",
    "        row = {\n",
    "            'Variable': var_name,\n",
    "            'Status': result['status'],\n",
    "            'Valid Pixels': result.get('valid_pixels', 'N/A'),\n",
    "            'NoData (%)': result.get('nodata_percentage', 'N/A'),\n",
    "            'Min': round(stats.get('min', np.nan), 2),\n",
    "            'Max': round(stats.get('max', np.nan), 2),\n",
    "            'Mean': round(stats.get('mean', np.nan), 2),\n",
    "            'Std': round(stats.get('std', np.nan), 2),\n",
    "            'Warnings': len(result.get('warnings', []))\n",
    "        }\n",
    "        summary_data.append(row)\n",
    "    \n",
    "    # Save as CSV\n",
    "    import csv\n",
    "    csv_path = os.path.join(Config.OUTPUT_DIR, 'validation', 'validation_summary.csv')\n",
    "    \n",
    "    with open(csv_path, 'w', newline='') as f:\n",
    "        writer = csv.DictWriter(f, fieldnames=summary_data[0].keys())\n",
    "        writer.writeheader()\n",
    "        writer.writerows(summary_data)\n",
    "    \n",
    "    print(f\"  ✓ Validation summary table saved: {csv_path}\")\n",
    "    \n",
    "    return summary_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "855e55c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# 11. INTERACTIVE VISUALIZATION\n",
    "# =============================================================================\n",
    "\n",
    "def create_visualization_map(all_variables, study_area):\n",
    "    \"\"\"\n",
    "    Create an interactive map with all driver variables\n",
    "    \"\"\"\n",
    "    print(\"\\n--- Creating Interactive Map ---\")\n",
    "    \n",
    "    # Initialize map centered on GKE\n",
    "    Map = geemap.Map()\n",
    "    \n",
    "    # Get centroid for centering\n",
    "    centroid = study_area.centroid().getInfo()['coordinates']\n",
    "    Map.setCenter(centroid[0], centroid[1], 7)\n",
    "    \n",
    "    # Add study area boundary\n",
    "    Map.addLayer(\n",
    "        ee.Image().paint(ee.FeatureCollection([ee.Feature(study_area)]), 1, 2),\n",
    "        {'palette': ['red']},\n",
    "        'Study Area'\n",
    "    )\n",
    "    \n",
    "    # Visualization parameters for different variable types\n",
    "    vis_params = {\n",
    "        'dist_roads': {'min': 0, 'max': 50000, 'palette': ['red', 'yellow', 'green']},\n",
    "        'dist_settlements': {'min': 0, 'max': 50000, 'palette': ['red', 'yellow', 'green']},\n",
    "        'dist_rivers': {'min': 0, 'max': 10000, 'palette': ['blue', 'cyan', 'white']},\n",
    "        'dist_knp': {'min': 0, 'max': 50000, 'palette': ['green', 'yellow', 'red']},\n",
    "        'pop_density': {'min': 0, 'max': 1, 'palette': ['white', 'yellow', 'red']},\n",
    "        'pop_change': {'min': -10, 'max': 100, 'palette': ['blue', 'white', 'red']},\n",
    "        'pct_cultivated': {'min': 0, 'max': 100, 'palette': ['green', 'yellow', 'brown']},\n",
    "        'protection_status': {'min': 0, 'max': 2, 'palette': ['gray', 'orange', 'green']},\n",
    "        'years_protected': {'min': 0, 'max': 75, 'palette': ['white', 'lightgreen', 'darkgreen']},\n",
    "        'elevation': {'min': 900, 'max': 1500, 'palette': ['green', 'yellow', 'brown', 'white']},\n",
    "        'slope': {'min': 0, 'max': 30, 'palette': ['green', 'yellow', 'red']},\n",
    "        'aspect': {'min': 0, 'max': 360, 'palette': ['red', 'yellow', 'green', 'cyan', 'blue', 'magenta', 'red']},\n",
    "        'twi': {'min': 0, 'max': 15, 'palette': ['brown', 'yellow', 'blue']},\n",
    "        'mean_rainfall': {'min': 600, 'max': 1200, 'palette': ['brown', 'yellow', 'green', 'blue']},\n",
    "        'mean_temp': {'min': 18, 'max': 25, 'palette': ['blue', 'green', 'yellow', 'red']},\n",
    "    }\n",
    "    \n",
    "    # Add layers (initially hidden)\n",
    "    for category, variables in all_variables.items():\n",
    "        for var_name, image in variables.items():\n",
    "            vis = vis_params.get(var_name, {'min': 0, 'max': 1, 'palette': ['white', 'black']})\n",
    "            Map.addLayer(image, vis, var_name, shown=False)\n",
    "    \n",
    "    print(\"✓ Interactive map created\")\n",
    "    \n",
    "    return Map"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a15ef516",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# =============================================================================\n",
    "# 12. MAIN EXECUTION\n",
    "# =============================================================================\n",
    "\n",
    "def main():\n",
    "    \"\"\"\n",
    "    Main execution function\n",
    "    \"\"\"\n",
    "    print(\"=\"*60)\n",
    "    print(\"SCRIPT 1: DATA ACQUISITION AND PREPROCESSING (REVISED)\")\n",
    "    print(\"Greater Kafue Ecosystem - Driver Variables\")\n",
    "    print(\"=\"*60)\n",
    "    print(f\"\\nTimestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\")\n",
    "    \n",
    "    # Print configuration\n",
    "    print(\"\\n--- Configuration ---\")\n",
    "    print(f\"  Study Area: {Config.STUDY_AREA_NAME}\")\n",
    "    print(f\"  Target CRS: {Config.TARGET_CRS}\")\n",
    "    print(f\"  Resolution: {Config.TARGET_RESOLUTION}m\")\n",
    "    print(f\"  Output Directory: {Config.OUTPUT_DIR}\")\n",
    "    \n",
    "    # Print driver variables\n",
    "    total_vars = sum(len(v) for v in Config.DRIVER_VARIABLES.values())\n",
    "    print(f\"\\n  Driver Variables ({total_vars} total):\")\n",
    "    for category, vars in Config.DRIVER_VARIABLES.items():\n",
    "        print(f\"    {category}: {', '.join(vars)}\")\n",
    "    \n",
    "    # Initialize\n",
    "    initialize_gee()\n",
    "    create_output_directories()\n",
    "    \n",
    "    # Verify inputs\n",
    "    if not verify_input_files():\n",
    "        print(\"\\n⚠ Warning: Some input files not found. Will use GEE fallbacks where available.\")\n",
    "    \n",
    "    # Load study area and boundaries\n",
    "    study_area = get_study_area()\n",
    "    boundaries = get_boundary_features()\n",
    "    \n",
    "    # Process all variable categories\n",
    "    all_variables = {}\n",
    "    \n",
    "    # Proximity variables\n",
    "    all_variables['proximity'] = get_proximity_variables(study_area, boundaries)\n",
    "    \n",
    "    # Socio-economic variables (using ESA WorldCover for pct_cultivated)\n",
    "    all_variables['socioeconomic'] = get_socioeconomic_variables(study_area)\n",
    "    \n",
    "    # Conservation variables\n",
    "    all_variables['conservation'] = get_protection_variables(\n",
    "        study_area, \n",
    "        boundaries.get('knp'),\n",
    "        boundaries.get('gma')\n",
    "    )\n",
    "    \n",
    "    # Topographic variables\n",
    "    all_variables['topographic'] = get_topographic_variables(study_area)\n",
    "    \n",
    "    # Climatic variables\n",
    "    all_variables['climatic'] = get_climatic_variables(study_area)\n",
    "    \n",
    "    # Summary of processed variables\n",
    "    print(\"\\n\" + \"=\"*60)\n",
    "    print(\"PROCESSING SUMMARY\")\n",
    "    print(\"=\"*60)\n",
    "    \n",
    "    for category, variables in all_variables.items():\n",
    "        print(f\"\\n  {category.upper()}: {len(variables)} variables\")\n",
    "        for var_name in variables.keys():\n",
    "            print(f\"    - {var_name}\")\n",
    "    \n",
    "    total_processed = sum(len(v) for v in all_variables.values())\n",
    "    print(f\"\\n  Total variables processed: {total_processed}\")\n",
    "    \n",
    "    # Export to Google Drive\n",
    "    print(\"\\n\" + \"=\"*60)\n",
    "    print(\"EXPORTING TO GOOGLE DRIVE\")\n",
    "    print(\"=\"*60)\n",
    "    \n",
    "    tasks = export_all_variables(all_variables, study_area)\n",
    "    \n",
    "    print(f\"\\n  Export folder: {Config.EXPORT_FOLDER}\")\n",
    "    print(f\"  Monitor progress at: https://code.earthengine.google.com/tasks\")\n",
    "    \n",
    "    # Create interactive map\n",
    "    Map = create_visualization_map(all_variables, study_area)\n",
    "    \n",
    "    # Instructions for validation\n",
    "    print(\"\\n\" + \"=\"*60)\n",
    "    print(\"NEXT STEPS\")\n",
    "    print(\"=\"*60)\n",
    "    print(\"\"\"\n",
    "    1. Monitor export tasks in GEE Code Editor\n",
    "    2. Download completed files from Google Drive\n",
    "    3. Organize files into category subdirectories\n",
    "    4. Run validation:\n",
    "    \n",
    "       validation_results = validate_all_exports(Config.OUTPUT_DIR)\n",
    "       generate_validation_summary_table(validation_results)\n",
    "    \n",
    "    5. Review validation report before proceeding to Script 02\n",
    "    \"\"\")\n",
    "    \n",
    "    return {\n",
    "        'variables': all_variables,\n",
    "        'tasks': tasks,\n",
    "        'map': Map,\n",
    "        'study_area': study_area\n",
    "    }\n",
    "\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    results = main()\n",
    "    \n",
    "    # Display interactive map (if in Jupyter/notebook environment)\n",
    "    # results['map']"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
