Files
Sharepoint-Download-Tool/GEMINI.md
Martin Tranberg c5d4ddaab0 Enterprise-grade optimeringer: Windows Long Path, High-Performance Hashing og Dokumentation
- Tilføjer 'get_long_path' for at understøtte Windows-stier over 260 tegn
- Implementerer dual-mode hashing: Bruger 'quickxorhash' C-bibliotek hvis muligt, ellers manual Python fallback
- Opdaterer requirements.txt med quickxorhash
- Opdaterer README.md og GEMINI.md med de seneste funktioner og tekniske specifikationer
2026-03-29 19:33:31 +02:00

3.2 KiB

SharePoint Download Tool - Technical Documentation

A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.

Project Overview

  • Purpose: Enterprise-grade synchronization tool for local mirroring of SharePoint content.
  • Technologies:
    • Microsoft Graph API: Advanced REST API for SharePoint data.
    • MSAL: Secure authentication using Azure AD Client Credentials.
    • Requests: High-performance HTTP client with streaming and Range header support.
    • ThreadPoolExecutor: Parallel file processing for optimized throughput.

Core Features (Production Ready)

  1. Windows Long Path Support: Automatically handles Windows path limitations by using get_long_path and \\?\ absolute path prefixing.
  2. High-Performance Integrity: Uses the quickxorhash C-library if available for fast validation of large files. Includes a manual 160-bit circular XOR fallback implementation.
  3. Timestamp Synchronization: Compares SharePoint lastModifiedDateTime with local file mtime. Only downloads if the remote source is newer, significantly reducing sync time.
  4. Optimized Integrity Validation: Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
  5. Resumable Downloads: Implements HTTP Range headers to resume partially downloaded files, critical for multi-gigabyte assets.
  6. Reliability: Includes a custom retry_request decorator for Exponential Backoff, handling throttling (429) and transient network errors.
  7. Robust Library Discovery: Automatic resolution of document library IDs with built-in fallbacks for localized names.
  8. Self-Healing Sessions: Automatically refreshes expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
  9. Concurrency: Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
  10. Pagination: Full support for OData pagination, ensuring complete folder traversal.

Building and Running

Setup

  1. Dependencies: pip install -r requirements.txt (Installing quickxorhash via C-compiler is recommended for best performance).
  2. Configuration: Settings are managed via connection_info.txt or the GUI.
    • ENABLE_HASH_VALIDATION: (True/False)
    • HASH_THRESHOLD_MB: (Size limit for hashing)

Execution

  • GUI: python sharepoint_gui.py
  • CLI: python download_sharepoint.py

Development Conventions

  • QuickXorHash: When implementing/updating hashing, ensure the file length is XORed into the last 64 bits (bits 96-159) of the 160-bit state per MS spec.
  • Long Paths: Always use get_long_path() when interacting with local file system (open, os.path.exists, etc.).
  • Timezone Handling: Always use UTC (ISO8601) when comparing timestamps with SharePoint.
  • Error Handling: Always use the safe_get (retry-wrapped) method for Graph API calls. For item-specific operations, use get_fresh_download_url.
  • Authentication: Use get_headers(app, force_refresh=True) when a 401 error is encountered.
  • Logging: Prefer logger.info() or logger.error() over print().