- Opdaterer README.md med beskrivelse af Timestamp Sync, Hash Toggle og 30MB grænse - Opdaterer GEMINI.md med tekniske specifikationer for QuickXorHash og biblioteks-fallback - Tilføjer vejledning til de nye konfigurationsmuligheder i GUI'en
44 lines
2.9 KiB
Markdown
44 lines
2.9 KiB
Markdown
# SharePoint Download Tool - Technical Documentation
|
|
|
|
A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.
|
|
|
|
## Project Overview
|
|
|
|
* **Purpose:** Enterprise-grade synchronization tool for local mirroring of SharePoint content.
|
|
* **Technologies:**
|
|
* **Microsoft Graph API:** Advanced REST API for SharePoint data.
|
|
* **MSAL:** Secure authentication using Azure AD Client Credentials.
|
|
* **Requests:** High-performance HTTP client with streaming and Range header support.
|
|
* **ThreadPoolExecutor:** Parallel file processing for optimized throughput.
|
|
|
|
## Core Features (Production Ready)
|
|
|
|
1. **Timestamp Synchronization:** Intelligent sync logic that compares SharePoint `lastModifiedDateTime` with local file `mtime`. Only downloads if the remote source is newer, significantly reducing sync time.
|
|
2. **Optimized Integrity Validation:** Implements the official Microsoft **QuickXorHash** (160-bit circular XOR). Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
|
|
3. **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets.
|
|
4. **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors.
|
|
5. **Robust Library Discovery:** Automatic resolution of document library IDs with built-in fallbacks for localized names (e.g., "Delte dokumenter" to "Documents").
|
|
6. **Self-Healing Sessions:** Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
|
|
7. **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
|
|
8. **Pagination:** Full support for OData pagination, ensuring complete folder traversal regardless of item count.
|
|
|
|
## Building and Running
|
|
|
|
### Setup
|
|
1. **Dependencies:** `pip install -r requirements.txt`
|
|
2. **Configuration:** Settings are managed via `connection_info.txt` or the GUI.
|
|
* `ENABLE_HASH_VALIDATION`: (True/False)
|
|
* `HASH_THRESHOLD_MB`: (Size limit for hashing)
|
|
|
|
### Execution
|
|
* **GUI:** `python sharepoint_gui.py`
|
|
* **CLI:** `python download_sharepoint.py`
|
|
|
|
## Development Conventions
|
|
|
|
* **QuickXorHash:** When implementing/updating hashing, ensure the file length is XORed into the **last 64 bits** (bits 96-159) of the 160-bit state per MS spec.
|
|
* **Timezone Handling:** Always use UTC (ISO8601) when comparing timestamps with SharePoint to avoid daylight savings mismatches.
|
|
* **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls. For item-specific operations, use `get_fresh_download_url`.
|
|
* **Authentication:** Use `get_headers(app, force_refresh=True)` when a 401 error is encountered.
|
|
* **Logging:** Prefer `logger.info()` or `logger.error()` over `print()`.
|