Opdater dokumentation (README og GEMINI.md) med Production Ready specifikationer

2026-03-26 15:44:30 +01:00
parent 1ed21e4184
commit 4c52b0c8db
2 changed files with 33 additions and 49 deletions
@@ -1,51 +1,36 @@
-# SharePoint Download Tool
+# SharePoint Download Tool - Technical Documentation

-A Python-based utility designed to recursively download folders and files from a specific SharePoint Online Site using the Microsoft Graph API.
+A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.

 ## Project Overview

-*   **Purpose:** Automates the synchronization of specific SharePoint document library folders to a local directory.
+*   **Purpose:** Enterprise-grade synchronization tool for local mirroring of SharePoint content.
 *   **Technologies:** 
-    *   **Python 3.x**
-    *   **Microsoft Graph API:** Used for robust data access.
-    *   **MSAL (Microsoft Authentication Library):** Handles Entra ID (Azure AD) authentication using Client Credentials flow.
-    *   **Requests:** Manages HTTP streaming for large file downloads.
-*   **Architecture:**
-    *   `download_sharepoint.py`: The core script that orchestrates authentication, site/drive discovery, and recursive folder traversal.
-    *   `connection_info.txt`: Centralized configuration file for credentials and target paths.
-    *   `requirements.txt`: Defines necessary Python dependencies.
+    *   **Microsoft Graph API:** Advanced REST API for SharePoint data.
+    *   **MSAL:** Secure authentication using Azure AD Client Credentials.
+    *   **Requests:** High-performance HTTP client with streaming and Range header support.
+    *   **ThreadPoolExecutor:** Parallel file processing for optimized throughput.
+
+## Core Features (Production Ready)
+
+1.  **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets.
+2.  **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors.
+3.  **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
+4.  **Pagination:** Full support for OData pagination, ensuring complete folder traversal regardless of item count.
+5.  **Logging & Audit:** Integrated Python `logging` to `sharepoint_download.log` and structured CSV reports for error auditing.

 ## Building and Running

-### Prerequisites
-*   Python 3.x installed.
-*   A registered application in Microsoft Entra ID with `Sites.Read.All` (or higher) application permissions.
-
 ### Setup
-1.  **Install Dependencies:**
-    ```bash
-    pip install -r requirements.txt
-    ```
-2.  **Configure Connection:**
-    Edit `connection_info.txt` with your specific details:
-    *   `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET`
-    *   `SITE_URL`: Full URL to the SharePoint site.
-    *   `DOCUMENT_LIBRARY`: The name of the target library (e.g., "Documents").
-    *   `FOLDERS_TO_DOWNLOAD`: Comma-separated list of folder names to sync.
-    *   `LOCAL_PATH`: The destination path on your local machine.
+1.  **Dependencies:** `pip install -r requirements.txt`
+2.  **Configuration:** Use `connection_info.template.txt` to create `connection_info.txt`.

 ### Execution
-Run the main download script:
-```bash
-python download_sharepoint.py
-```
-
-### Validation
-After execution, a CSV report named `download_report_YYYYMMDD_HHMMSS.csv` is generated, detailing any failed downloads or size mismatches for verification.
+`python download_sharepoint.py`

 ## Development Conventions

-*   **Authentication:** Always use the Graph API with MSAL for app-only authentication.
-*   **Error Handling:** All file and folder operations should be wrapped in try-except blocks, with errors logged to the generated CSV report.
-*   **Verification:** Post-download verification is performed by comparing the local file size against the `size` property returned by the Graph API.
-*   **Security:** Never commit `connection_info.txt` or any file containing secrets. Use the provided `.gitignore`.
+*   **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls.
+*   **Thread Safety:** Use `report_lock` when updating the shared error list from worker threads.
+*   **Logging:** Prefer `logger.info()` or `logger.error()` over `print()` to ensure persistence in `sharepoint_download.log`.
+*   **Integrity:** Always verify file integrity using `size` and `quickXorHash` where available.