Files
Sharepoint-Download-Tool/GEMINI.md

2.3 KiB

SharePoint Download Tool - Technical Documentation

A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.

Project Overview

  • Purpose: Enterprise-grade synchronization tool for local mirroring of SharePoint content.
  • Technologies:
    • Microsoft Graph API: Advanced REST API for SharePoint data.
    • MSAL: Secure authentication using Azure AD Client Credentials.
    • Requests: High-performance HTTP client with streaming and Range header support.
    • ThreadPoolExecutor: Parallel file processing for optimized throughput.

Core Features (Production Ready)

  1. Resumable Downloads: Implements HTTP Range headers to resume partially downloaded files, critical for multi-gigabyte assets.
  2. Reliability: Includes a custom retry_request decorator for Exponential Backoff, handling throttling (429) and transient network errors.
  3. Concurrency: Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
  4. Pagination: Full support for OData pagination, ensuring complete folder traversal regardless of item count.
  5. Self-Healing Sessions: Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
  6. Logging & Audit: Integrated Python logging to sharepoint_download.log and structured CSV reports for error auditing.

Building and Running

Setup

  1. Dependencies: pip install -r requirements.txt
  2. Configuration: Use connection_info.template.txt to create connection_info.txt.

Execution

python download_sharepoint.py

Development Conventions

  • Error Handling: Always use the safe_get (retry-wrapped) method for Graph API calls. For item-specific operations, use get_fresh_download_url to handle token/URL expiry.
  • Authentication: Use get_headers(app, force_refresh=True) when a 401 error is encountered from Graph API to ensure session continuity.
  • Thread Safety: Use report_lock when updating the shared error list from worker threads.
  • Logging: Prefer logger.info() or logger.error() over print() to ensure persistence in sharepoint_download.log.
  • Integrity: Always verify file integrity using size and quickXorHash where available.