2.3 KiB
2.3 KiB
SharePoint Download Tool - Technical Documentation
A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.
Project Overview
- Purpose: Enterprise-grade synchronization tool for local mirroring of SharePoint content.
- Technologies:
- Microsoft Graph API: Advanced REST API for SharePoint data.
- MSAL: Secure authentication using Azure AD Client Credentials.
- Requests: High-performance HTTP client with streaming and Range header support.
- ThreadPoolExecutor: Parallel file processing for optimized throughput.
Core Features (Production Ready)
- Resumable Downloads: Implements HTTP
Rangeheaders to resume partially downloaded files, critical for multi-gigabyte assets. - Reliability: Includes a custom
retry_requestdecorator for Exponential Backoff, handling throttling (429) and transient network errors. - Concurrency: Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
- Pagination: Full support for OData pagination, ensuring complete folder traversal regardless of item count.
- Self-Healing Sessions: Automatically detects and resolves 401 Unauthorized errors by refreshing both expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
- Logging & Audit: Integrated Python
loggingtosharepoint_download.logand structured CSV reports for error auditing.
Building and Running
Setup
- Dependencies:
pip install -r requirements.txt - Configuration: Use
connection_info.template.txtto createconnection_info.txt.
Execution
python download_sharepoint.py
Development Conventions
- Error Handling: Always use the
safe_get(retry-wrapped) method for Graph API calls. For item-specific operations, useget_fresh_download_urlto handle token/URL expiry. - Authentication: Use
get_headers(app, force_refresh=True)when a 401 error is encountered from Graph API to ensure session continuity. - Thread Safety: Use
report_lockwhen updating the shared error list from worker threads. - Logging: Prefer
logger.info()orlogger.error()overprint()to ensure persistence insharepoint_download.log. - Integrity: Always verify file integrity using
sizeandquickXorHashwhere available.