Sharepoint-Download-Tool/GEMINI.md

# SharePoint Download Tool - Technical Documentation

A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.

## Project Overview

*   **Purpose:** Enterprise-grade synchronization tool for local mirroring of SharePoint content.
*   **Technologies:**
    *   **Microsoft Graph API:** Advanced REST API for SharePoint data.
    *   **MSAL:** Secure authentication using Azure AD Client Credentials.
    *   **Requests:** High-performance HTTP client with streaming and Range header support.
    *   **ThreadPoolExecutor:** Parallel file processing for optimized throughput.

## Core Features (Production Ready)

1.  **Windows Long Path Support:** Automatically handles Windows path limitations by using `get_long_path` and `\\?\` absolute path prefixing.
2.  **High-Performance Integrity:** Uses the `quickxorhash` C-library if available for fast validation of large files. Includes a manual 160-bit circular XOR fallback implementation.
3.  **Timestamp Synchronization:** Compares SharePoint `lastModifiedDateTime` with local file `mtime`. Only downloads if the remote source is newer, significantly reducing sync time.
4.  **Optimized Integrity Validation:** Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
5.  **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets.
6.  **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors.
7.  **Robust Library Discovery:** Automatic resolution of document library IDs with built-in fallbacks for localized names.
8.  **Self-Healing Sessions:** Automatically refreshes expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
9.  **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
10. **Pagination:** Full support for OData pagination, ensuring complete folder traversal.

## Building and Running

### Setup
1.  **Dependencies:** `pip install -r requirements.txt` (Installing `quickxorhash` via C-compiler is recommended for best performance).
2.  **Configuration:** Settings are managed via `connection_info.txt` or the GUI.
    *   `ENABLE_HASH_VALIDATION`: (True/False)
    *   `HASH_THRESHOLD_MB`: (Size limit for hashing)

### Execution
*   **GUI:** `python sharepoint_gui.py`
*   **CLI:** `python download_sharepoint.py`

## Development Conventions

*   **QuickXorHash:** When implementing/updating hashing, ensure the file length is XORed into the **last 64 bits** (bits 96-159) of the 160-bit state per MS spec.
*   **Long Paths:** Always use `get_long_path()` when interacting with local file system (open, os.path.exists, etc.).
*   **Timezone Handling:** Always use UTC (ISO8601) when comparing timestamps with SharePoint.
*   **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls. For item-specific operations, use `get_fresh_download_url`.
*   **Authentication:** Use `get_headers(app, force_refresh=True)` when a 401 error is encountered.
*   **Logging:** Prefer `logger.info()` or `logger.error()` over `print()`.