- Opdaterer README.md med beskrivelse af Timestamp Sync, Hash Toggle og 30MB grænse - Opdaterer GEMINI.md med tekniske specifikationer for QuickXorHash og biblioteks-fallback - Tilføjer vejledning til de nye konfigurationsmuligheder i GUI'en
2.9 KiB
2.9 KiB
SharePoint Download Tool - Technical Documentation
A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.
Project Overview
- Purpose: Enterprise-grade synchronization tool for local mirroring of SharePoint content.
- Technologies:
- Microsoft Graph API: Advanced REST API for SharePoint data.
- MSAL: Secure authentication using Azure AD Client Credentials.
- Requests: High-performance HTTP client with streaming and Range header support.
- ThreadPoolExecutor: Parallel file processing for optimized throughput.
Core Features (Production Ready)
- Timestamp Synchronization: Intelligent sync logic that compares SharePoint
lastModifiedDateTimewith local filemtime. Only downloads if the remote source is newer, significantly reducing sync time. - Optimized Integrity Validation: Implements the official Microsoft QuickXorHash (160-bit circular XOR). Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
- Resumable Downloads: Implements HTTP
Rangeheaders to resume partially downloaded files, critical for multi-gigabyte assets. - Reliability: Includes a custom
retry_requestdecorator for Exponential Backoff, handling throttling (429) and transient network errors. - Robust Library Discovery: Automatic resolution of document library IDs with built-in fallbacks for localized names (e.g., "Delte dokumenter" to "Documents").
- Self-Healing Sessions: Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
- Concurrency: Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
- Pagination: Full support for OData pagination, ensuring complete folder traversal regardless of item count.
Building and Running
Setup
- Dependencies:
pip install -r requirements.txt - Configuration: Settings are managed via
connection_info.txtor the GUI.ENABLE_HASH_VALIDATION: (True/False)HASH_THRESHOLD_MB: (Size limit for hashing)
Execution
- GUI:
python sharepoint_gui.py - CLI:
python download_sharepoint.py
Development Conventions
- QuickXorHash: When implementing/updating hashing, ensure the file length is XORed into the last 64 bits (bits 96-159) of the 160-bit state per MS spec.
- Timezone Handling: Always use UTC (ISO8601) when comparing timestamps with SharePoint to avoid daylight savings mismatches.
- Error Handling: Always use the
safe_get(retry-wrapped) method for Graph API calls. For item-specific operations, useget_fresh_download_url. - Authentication: Use
get_headers(app, force_refresh=True)when a 401 error is encountered. - Logging: Prefer
logger.info()orlogger.error()overprint().