Opdatér dokumentation med tidsstempel-synk og hash-optimeringer

- Opdaterer README.md med beskrivelse af Timestamp Sync, Hash Toggle og 30MB grænse - Opdaterer GEMINI.md med tekniske specifikationer for QuickXorHash og biblioteks-fallback - Tilføjer vejledning til de nye konfigurationsmuligheder i GUI'en
2026-03-29 19:25:28 +02:00
parent acede4a867
commit 367d31671d
2 changed files with 38 additions and 26 deletions
@@ -13,26 +13,31 @@ A production-ready Python utility for robust synchronization of SharePoint Onlin

 ## Core Features (Production Ready)

-1.  **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets.
-2.  **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors.
-3.  **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
-4.  **Pagination:** Full support for OData pagination, ensuring complete folder traversal regardless of item count.
-5.  **Self-Healing Sessions:** Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
-6.  **Logging & Audit:** Integrated Python `logging` to `sharepoint_download.log` and structured CSV reports for error auditing.
+1.  **Timestamp Synchronization:** Intelligent sync logic that compares SharePoint `lastModifiedDateTime` with local file `mtime`. Only downloads if the remote source is newer, significantly reducing sync time.
+2.  **Optimized Integrity Validation:** Implements the official Microsoft **QuickXorHash** (160-bit circular XOR). Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
+3.  **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets.
+4.  **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors.
+5.  **Robust Library Discovery:** Automatic resolution of document library IDs with built-in fallbacks for localized names (e.g., "Delte dokumenter" to "Documents").
+6.  **Self-Healing Sessions:** Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
+7.  **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
+8.  **Pagination:** Full support for OData pagination, ensuring complete folder traversal regardless of item count.

 ## Building and Running

 ### Setup
 1.  **Dependencies:** `pip install -r requirements.txt`
-2.  **Configuration:** Use `connection_info.template.txt` to create `connection_info.txt`.
+2.  **Configuration:** Settings are managed via `connection_info.txt` or the GUI.
+    *   `ENABLE_HASH_VALIDATION`: (True/False)
+    *   `HASH_THRESHOLD_MB`: (Size limit for hashing)

 ### Execution
-`python download_sharepoint.py`
+*   **GUI:** `python sharepoint_gui.py`
+*   **CLI:** `python download_sharepoint.py`

 ## Development Conventions

-*   **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls. For item-specific operations, use `get_fresh_download_url` to handle token/URL expiry.
-*   **Authentication:** Use `get_headers(app, force_refresh=True)` when a 401 error is encountered from Graph API to ensure session continuity.
-*   **Thread Safety:** Use `report_lock` when updating the shared error list from worker threads.
-*   **Logging:** Prefer `logger.info()` or `logger.error()` over `print()` to ensure persistence in `sharepoint_download.log`.
-*   **Integrity:** Always verify file integrity using `size` and `quickXorHash` where available.
+*   **QuickXorHash:** When implementing/updating hashing, ensure the file length is XORed into the **last 64 bits** (bits 96-159) of the 160-bit state per MS spec.
+*   **Timezone Handling:** Always use UTC (ISO8601) when comparing timestamps with SharePoint to avoid daylight savings mismatches.
+*   **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls. For item-specific operations, use `get_fresh_download_url`.
+*   **Authentication:** Use `get_headers(app, force_refresh=True)` when a 401 error is encountered.
+*   **Logging:** Prefer `logger.info()` or `logger.error()` over `print()`.