Files
Sharepoint-Download-Tool/GEMINI.md
Martin Tranberg 367d31671d Opdatér dokumentation med tidsstempel-synk og hash-optimeringer
- Opdaterer README.md med beskrivelse af Timestamp Sync, Hash Toggle og 30MB grænse
- Opdaterer GEMINI.md med tekniske specifikationer for QuickXorHash og biblioteks-fallback
- Tilføjer vejledning til de nye konfigurationsmuligheder i GUI'en
2026-03-29 19:25:28 +02:00

2.9 KiB

SharePoint Download Tool - Technical Documentation

A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.

Project Overview

  • Purpose: Enterprise-grade synchronization tool for local mirroring of SharePoint content.
  • Technologies:
    • Microsoft Graph API: Advanced REST API for SharePoint data.
    • MSAL: Secure authentication using Azure AD Client Credentials.
    • Requests: High-performance HTTP client with streaming and Range header support.
    • ThreadPoolExecutor: Parallel file processing for optimized throughput.

Core Features (Production Ready)

  1. Timestamp Synchronization: Intelligent sync logic that compares SharePoint lastModifiedDateTime with local file mtime. Only downloads if the remote source is newer, significantly reducing sync time.
  2. Optimized Integrity Validation: Implements the official Microsoft QuickXorHash (160-bit circular XOR). Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
  3. Resumable Downloads: Implements HTTP Range headers to resume partially downloaded files, critical for multi-gigabyte assets.
  4. Reliability: Includes a custom retry_request decorator for Exponential Backoff, handling throttling (429) and transient network errors.
  5. Robust Library Discovery: Automatic resolution of document library IDs with built-in fallbacks for localized names (e.g., "Delte dokumenter" to "Documents").
  6. Self-Healing Sessions: Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
  7. Concurrency: Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
  8. Pagination: Full support for OData pagination, ensuring complete folder traversal regardless of item count.

Building and Running

Setup

  1. Dependencies: pip install -r requirements.txt
  2. Configuration: Settings are managed via connection_info.txt or the GUI.
    • ENABLE_HASH_VALIDATION: (True/False)
    • HASH_THRESHOLD_MB: (Size limit for hashing)

Execution

  • GUI: python sharepoint_gui.py
  • CLI: python download_sharepoint.py

Development Conventions

  • QuickXorHash: When implementing/updating hashing, ensure the file length is XORed into the last 64 bits (bits 96-159) of the 160-bit state per MS spec.
  • Timezone Handling: Always use UTC (ISO8601) when comparing timestamps with SharePoint to avoid daylight savings mismatches.
  • Error Handling: Always use the safe_get (retry-wrapped) method for Graph API calls. For item-specific operations, use get_fresh_download_url.
  • Authentication: Use get_headers(app, force_refresh=True) when a 401 error is encountered.
  • Logging: Prefer logger.info() or logger.error() over print().