- Tilføjer 'get_long_path' for at understøtte Windows-stier over 260 tegn - Implementerer dual-mode hashing: Bruger 'quickxorhash' C-bibliotek hvis muligt, ellers manual Python fallback - Opdaterer requirements.txt med quickxorhash - Opdaterer README.md og GEMINI.md med de seneste funktioner og tekniske specifikationer
3.2 KiB
3.2 KiB
SharePoint Download Tool - Technical Documentation
A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.
Project Overview
- Purpose: Enterprise-grade synchronization tool for local mirroring of SharePoint content.
- Technologies:
- Microsoft Graph API: Advanced REST API for SharePoint data.
- MSAL: Secure authentication using Azure AD Client Credentials.
- Requests: High-performance HTTP client with streaming and Range header support.
- ThreadPoolExecutor: Parallel file processing for optimized throughput.
Core Features (Production Ready)
- Windows Long Path Support: Automatically handles Windows path limitations by using
get_long_pathand\\?\absolute path prefixing. - High-Performance Integrity: Uses the
quickxorhashC-library if available for fast validation of large files. Includes a manual 160-bit circular XOR fallback implementation. - Timestamp Synchronization: Compares SharePoint
lastModifiedDateTimewith local filemtime. Only downloads if the remote source is newer, significantly reducing sync time. - Optimized Integrity Validation: Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
- Resumable Downloads: Implements HTTP
Rangeheaders to resume partially downloaded files, critical for multi-gigabyte assets. - Reliability: Includes a custom
retry_requestdecorator for Exponential Backoff, handling throttling (429) and transient network errors. - Robust Library Discovery: Automatic resolution of document library IDs with built-in fallbacks for localized names.
- Self-Healing Sessions: Automatically refreshes expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
- Concurrency: Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
- Pagination: Full support for OData pagination, ensuring complete folder traversal.
Building and Running
Setup
- Dependencies:
pip install -r requirements.txt(Installingquickxorhashvia C-compiler is recommended for best performance). - Configuration: Settings are managed via
connection_info.txtor the GUI.ENABLE_HASH_VALIDATION: (True/False)HASH_THRESHOLD_MB: (Size limit for hashing)
Execution
- GUI:
python sharepoint_gui.py - CLI:
python download_sharepoint.py
Development Conventions
- QuickXorHash: When implementing/updating hashing, ensure the file length is XORed into the last 64 bits (bits 96-159) of the 160-bit state per MS spec.
- Long Paths: Always use
get_long_path()when interacting with local file system (open, os.path.exists, etc.). - Timezone Handling: Always use UTC (ISO8601) when comparing timestamps with SharePoint.
- Error Handling: Always use the
safe_get(retry-wrapped) method for Graph API calls. For item-specific operations, useget_fresh_download_url. - Authentication: Use
get_headers(app, force_refresh=True)when a 401 error is encountered. - Logging: Prefer
logger.info()orlogger.error()overprint().