# SharePoint Download Tool - Technical Documentation A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API. ## Project Overview * **Purpose:** Enterprise-grade synchronization tool for local mirroring of SharePoint content. * **Technologies:** * **Microsoft Graph API:** Advanced REST API for SharePoint data. * **MSAL:** Secure authentication using Azure AD Client Credentials. * **Requests:** High-performance HTTP client with streaming and Range header support. * **ThreadPoolExecutor:** Parallel file processing for optimized throughput. ## Core Features (Production Ready) 1. **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets. 2. **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors. 3. **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading. 4. **Pagination:** Full support for OData pagination, ensuring complete folder traversal regardless of item count. 5. **Self-Healing Sessions:** Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process. 6. **Logging & Audit:** Integrated Python `logging` to `sharepoint_download.log` and structured CSV reports for error auditing. ## Building and Running ### Setup 1. **Dependencies:** `pip install -r requirements.txt` 2. **Configuration:** Use `connection_info.template.txt` to create `connection_info.txt`. ### Execution `python download_sharepoint.py` ## Development Conventions * **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls. For item-specific operations, use `get_fresh_download_url` to handle token/URL expiry. * **Authentication:** Use `get_headers(app, force_refresh=True)` when a 401 error is encountered from Graph API to ensure session continuity. * **Thread Safety:** Use `report_lock` when updating the shared error list from worker threads. * **Logging:** Prefer `logger.info()` or `logger.error()` over `print()` to ensure persistence in `sharepoint_download.log`. * **Integrity:** Always verify file integrity using `size` and `quickXorHash` where available.