Compare commits
40 Commits
81d5a2f822
...
dev
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d15b9afc03 | ||
|
|
8e8bb3baa1 | ||
|
|
8899afabbc | ||
|
|
9e40abcfd8 | ||
|
|
03a766be63 | ||
|
|
1a97ca3d53 | ||
|
|
8e837240b5 | ||
|
|
f5e54b185e | ||
|
|
c5d4ddaab0 | ||
|
|
367d31671d | ||
|
|
acede4a867 | ||
|
|
ba968ab70e | ||
|
|
790ca91339 | ||
|
|
ed508302a6 | ||
|
|
33fbdc244d | ||
|
|
ad4166fb03 | ||
|
|
39a3aff495 | ||
|
|
634b5ff151 | ||
|
|
3bb2b44477 | ||
|
|
a8048ae74d | ||
|
|
7fab89cbbb | ||
|
|
59eb9a4ab0 | ||
|
|
1c3180e037 | ||
|
|
6bc4dd8f20 | ||
|
|
18158d52b2 | ||
|
|
931fd0dd05 | ||
|
|
483dc70ef8 | ||
|
|
5d5c8b2d5b | ||
|
|
b33009c54c | ||
|
|
368f4c515c | ||
|
|
4c52b0c8db | ||
|
|
1ed21e4184 | ||
|
|
a74abf4186 | ||
|
|
908152153b | ||
|
|
6333f27037 | ||
|
|
6849662483 | ||
|
|
6f65bf3f34 | ||
|
|
0314a19c3d | ||
|
|
95622ee1a8 | ||
|
|
e3a3824a1c |
69
GEMINI.md
69
GEMINI.md
@@ -1,51 +1,46 @@
|
|||||||
# SharePoint Download Tool
|
# SharePoint Download Tool - Technical Documentation
|
||||||
|
|
||||||
A Python-based utility designed to recursively download folders and files from a specific SharePoint Online Site using the Microsoft Graph API.
|
A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.
|
||||||
|
|
||||||
## Project Overview
|
## Project Overview
|
||||||
|
|
||||||
* **Purpose:** Automates the synchronization of specific SharePoint document library folders to a local directory.
|
* **Purpose:** Enterprise-grade synchronization tool for local mirroring of SharePoint content.
|
||||||
* **Technologies:**
|
* **Technologies:**
|
||||||
* **Python 3.x**
|
* **Microsoft Graph API:** Advanced REST API for SharePoint data.
|
||||||
* **Microsoft Graph API:** Used for robust data access.
|
* **MSAL:** Secure authentication using Azure AD Client Credentials.
|
||||||
* **MSAL (Microsoft Authentication Library):** Handles Entra ID (Azure AD) authentication using Client Credentials flow.
|
* **Requests:** High-performance HTTP client with streaming and Range header support.
|
||||||
* **Requests:** Manages HTTP streaming for large file downloads.
|
* **ThreadPoolExecutor:** Parallel file processing for optimized throughput.
|
||||||
* **Architecture:**
|
|
||||||
* `download_sharepoint.py`: The core script that orchestrates authentication, site/drive discovery, and recursive folder traversal.
|
## Core Features (Production Ready)
|
||||||
* `connection_info.txt`: Centralized configuration file for credentials and target paths.
|
|
||||||
* `requirements.txt`: Defines necessary Python dependencies.
|
1. **Windows Long Path Support:** Automatically handles Windows path limitations by using `get_long_path` and `\\?\` absolute path prefixing.
|
||||||
|
2. **High-Performance Integrity:** Uses the `quickxorhash` C-library if available for fast validation of large files. Includes a manual 160-bit circular XOR fallback implementation.
|
||||||
|
3. **Timestamp Synchronization:** Compares SharePoint `lastModifiedDateTime` with local file `mtime`. Only downloads if the remote source is newer, significantly reducing sync time.
|
||||||
|
4. **Optimized Integrity Validation:** Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
|
||||||
|
5. **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets.
|
||||||
|
6. **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors.
|
||||||
|
7. **Robust Library Discovery:** Automatic resolution of document library IDs with built-in fallbacks for localized names.
|
||||||
|
8. **Self-Healing Sessions:** Automatically refreshes expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
|
||||||
|
9. **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
|
||||||
|
10. **Pagination:** Full support for OData pagination, ensuring complete folder traversal.
|
||||||
|
|
||||||
## Building and Running
|
## Building and Running
|
||||||
|
|
||||||
### Prerequisites
|
|
||||||
* Python 3.x installed.
|
|
||||||
* A registered application in Microsoft Entra ID with `Sites.Read.All` (or higher) application permissions.
|
|
||||||
|
|
||||||
### Setup
|
### Setup
|
||||||
1. **Install Dependencies:**
|
1. **Dependencies:** `pip install -r requirements.txt` (Installing `quickxorhash` via C-compiler is recommended for best performance).
|
||||||
```bash
|
2. **Configuration:** Settings are managed via `connection_info.txt` or the GUI.
|
||||||
pip install -r requirements.txt
|
* `ENABLE_HASH_VALIDATION`: (True/False)
|
||||||
```
|
* `HASH_THRESHOLD_MB`: (Size limit for hashing)
|
||||||
2. **Configure Connection:**
|
|
||||||
Edit `connection_info.txt` with your specific details:
|
|
||||||
* `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET`
|
|
||||||
* `SITE_URL`: Full URL to the SharePoint site.
|
|
||||||
* `DOCUMENT_LIBRARY`: The name of the target library (e.g., "Documents").
|
|
||||||
* `FOLDERS_TO_DOWNLOAD`: Comma-separated list of folder names to sync.
|
|
||||||
* `LOCAL_PATH`: The destination path on your local machine.
|
|
||||||
|
|
||||||
### Execution
|
### Execution
|
||||||
Run the main download script:
|
* **GUI:** `python sharepoint_gui.py`
|
||||||
```bash
|
* **CLI:** `python download_sharepoint.py`
|
||||||
python download_sharepoint.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### Validation
|
|
||||||
After execution, a CSV report named `download_report_YYYYMMDD_HHMMSS.csv` is generated, detailing any failed downloads or size mismatches for verification.
|
|
||||||
|
|
||||||
## Development Conventions
|
## Development Conventions
|
||||||
|
|
||||||
* **Authentication:** Always use the Graph API with MSAL for app-only authentication.
|
* **QuickXorHash:** When implementing/updating hashing, ensure the file length is XORed into the **last 64 bits** (bits 96-159) of the 160-bit state per MS spec.
|
||||||
* **Error Handling:** All file and folder operations should be wrapped in try-except blocks, with errors logged to the generated CSV report.
|
* **Long Paths:** Always use `get_long_path()` when interacting with local file system (open, os.path.exists, etc.).
|
||||||
* **Verification:** Post-download verification is performed by comparing the local file size against the `size` property returned by the Graph API.
|
* **Timezone Handling:** Always use UTC (ISO8601) when comparing timestamps with SharePoint.
|
||||||
* **Security:** Never commit `connection_info.txt` or any file containing secrets. Use the provided `.gitignore`.
|
* **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls. For item-specific operations, use `get_fresh_download_url`.
|
||||||
|
* **Authentication:** Use `get_headers(app, force_refresh=True)` when a 401 error is encountered.
|
||||||
|
* **Logging:** Prefer `logger.info()` or `logger.error()` over `print()`.
|
||||||
|
|||||||
66
README.md
66
README.md
@@ -1,17 +1,23 @@
|
|||||||
# SharePoint Folder Download Tool
|
# SharePoint Folder Download Tool
|
||||||
|
|
||||||
Dette script gør det muligt at downloade specifikke mapper fra et SharePoint dokumentbibliotek til din lokale computer ved hjælp af Microsoft Graph API. Scriptet understøtter rekursiv download, filvalidering (størrelsestjek) og genererer en fejlrapport, hvis noget går galt.
|
Dette script gør det muligt at downloade specifikke mapper fra et SharePoint dokumentbibliotek til din lokale computer ved hjælp af Microsoft Graph API. Scriptet er designet til professionelt brug med fokus på hastighed, stabilitet og dataintegritet.
|
||||||
|
|
||||||
## Funktioner
|
## Funktioner
|
||||||
|
|
||||||
* **Rekursiv Download:** Downloader alle undermapper og filer i de valgte mapper.
|
* **Moderne GUI (UX):** Flot mørkt interface med CustomTkinter, der gør det nemt at gemme indstillinger, vælge mapper og se status i realtid.
|
||||||
* **Filnavn-sanitering:** Håndterer ulovlige tegn (f.eks. `<`, `>`, `:`, `"`, `/`, `\`, `|`, `?`, `*`) og Unicode-mellemrum, så SharePoint-filer altid kan gemmes på Windows.
|
* **Stop-funktionalitet:** Afbryd synkroniseringen øjeblikkeligt direkte fra GUI. Systemet benytter nu eksplicit signalering (`threading.Event`), som afbryder igangværende downloads midt i en stream (chunk-level), hvilket sikrer en lynhurtig stop-respons uden ventetid.
|
||||||
* **Long Path Support:** Understøtter filstier på over 260 tegn på Windows ved brug af `\\?\` præfiks.
|
* **Paralleldownload:** Benytter `ThreadPoolExecutor` (default 5 tråde) for markant højere overførselshastighed.
|
||||||
* **Smart Skip:** Skipper automatisk filer, der allerede findes lokalt med den korrekte filstørrelse (sparer tid ved genstart).
|
* **Windows Long Path Support:** Håndterer automatisk Windows' begrænsning på 260 tegn i filstier ved brug af `\\?\` præfiks. Systemet understøtter nu også korrekt **UNC-stier** (netværksdrev) via `\\?\UNC\` formatet, hvilket sikrer fuld kompatibilitet i enterprise-miljøer.
|
||||||
* **Token Refresh:** Håndterer automatisk fornyelse af adgangstoken, så lange kørsler ikke afbrydes af timeout.
|
* **Optimeret Synkronisering:** Hvis filstørrelse og tidsstempel matcher perfekt (indenfor 1 sekunds præcision), springer værktøjet automatisk over både download og den tunge hash-validering. Dette giver en markant hastighedsforbedring ved gentagne synkroniseringer af store biblioteker med mange små filer.
|
||||||
* **Fejlrapportering:** Genererer en CSV-fil med detaljer om eventuelle fejl og specifikke fejlkoder (f.eks. `[Error 22]`).
|
* **Timestamp Synkronisering:** Downloader kun filer, hvis kilden på SharePoint er nyere end din lokale fil (`lastModifiedDateTime` vs. lokal `mtime`).
|
||||||
* **Dataintegritet:** Sammenligner lokal filstørrelse med SharePoint-størrelsen for at sikre korrekt overførsel.
|
* **Integritets-validering:** Validerer filernes korrekthed med Microsofts officielle **QuickXorHash**-algoritme (160-bit circular XOR).
|
||||||
* **Entra ID Integration:** Benytter MSAL for sikker godkendelse via Client Credentials flow.
|
* **Fallback:** Har indbygget en præcis 160-bit Python-implementering som standard.
|
||||||
|
* **Optimering:** Understøtter automatisk det lynhurtige `quickxorhash` C-bibliotek, hvis det er installeret (valgfrit).
|
||||||
|
* **Smart Grænse:** Definer en MB-grænse (standard 30 MB), hvor filer herunder altid hashes, mens større filer (f.eks. 65 GB) kun sammenlignes på størrelse for at spare tid (kan konfigureres).
|
||||||
|
* **Robust Bibliotekssøgning:** Finder automatisk dit bibliotek og har indbygget fallback (f.eks. fra "Delte dokumenter" til "Documents").
|
||||||
|
* **Resume Download:** Understøtter HTTP `Range` headers for genoptagelse af store filer.
|
||||||
|
* **Auto-Refresh af Downloads & Tokens:** Fornyer automatisk sessioner og links midt i processen uden unødig ventetid (Optimized 401 handling).
|
||||||
|
* **Intelligent Fejlhåndtering:** Inkluderer retry-logik med exponential backoff og specialiseret håndtering af udløbne tokens (safe_graph_get).
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@@ -21,39 +27,23 @@ Dette script gør det muligt at downloade specifikke mapper fra et SharePoint do
|
|||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
## Opsætning i Microsoft Entra ID (Azure AD)
|
> **Bemærk:** Biblioteket `quickxorhash` er fjernet fra standard-requirements for at undgå problemer med C++ Build Tools på Windows. Værktøjet fungerer perfekt uden det, da det har en indbygget Python-fallback. Hvis du har brug for lynhurtig hash-validering af meget store filer (GB-klassen), kan du manuelt installere det med `pip install quickxorhash`.
|
||||||
|
|
||||||
For at scriptet kan få adgang til SharePoint, skal du oprette en App-registrering:
|
|
||||||
|
|
||||||
1. Log ind på [Microsoft Entra admin center](https://entra.microsoft.com/).
|
|
||||||
2. Gå til **Identity** > **Applications** > **App registrations** > **New registration**.
|
|
||||||
3. Giv appen et navn (f.eks. "SharePoint Download Tool") og vælg "Accounts in this organizational directory only". Klik på **Register**.
|
|
||||||
4. Noter din **Application (client) ID** og **Directory (tenant) ID**.
|
|
||||||
5. Gå til **API permissions** > **Add a permission** > **Microsoft Graph**.
|
|
||||||
6. Vælg **Application permissions**.
|
|
||||||
7. Søg efter og tilføj `Sites.Read.All` (eller `Sites.ReadWrite.All` hvis du har brug for skriveadgang).
|
|
||||||
8. **VIGTIGT:** Klik på **Grant admin consent for [dit domæne]** for at godkende rettighederne.
|
|
||||||
9. Gå til **Certificates & secrets** > **New client secret**. Tilføj en beskrivelse og vælg udløbsdato.
|
|
||||||
10. **VIGTIGT:** Kopier værdien under **Value** med det samme (det er din `CLIENT_SECRET`). Du kan ikke se den igen senere.
|
|
||||||
|
|
||||||
## Konfiguration
|
|
||||||
|
|
||||||
1. Kopier `connection_info.template.txt` til en ny fil kaldet `connection_info.txt`.
|
|
||||||
2. Indstil dine forbindelsesoplysninger i `connection_info.txt`:
|
|
||||||
* `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` (Fra Microsoft Entra admin center).
|
|
||||||
* `SITE_URL`: URL til din SharePoint site.
|
|
||||||
* `DOCUMENT_LIBRARY`: Navnet på dokumentbiblioteket (f.eks. "22 Studies").
|
|
||||||
* `FOLDERS_TO_DOWNLOAD`: Liste over mapper adskilt af komma. Hvis denne efterlades tom, downloades hele biblioteket.
|
|
||||||
* `LOCAL_PATH`: Hvor filerne skal gemmes lokalt.
|
|
||||||
|
|
||||||
## Anvendelse
|
## Anvendelse
|
||||||
|
|
||||||
Kør scriptet med:
|
### 1. GUI Version (Anbefalet)
|
||||||
```bash
|
Kør: `python sharepoint_gui.py`
|
||||||
python download_sharepoint.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Efter kørsel vil en CSV-rapport (f.eks. `download_report_20260326.csv`) være tilgængelig, hvis der er opstået fejl.
|
### 2. CLI Version (Til automatisering)
|
||||||
|
Kør: `python download_sharepoint.py`
|
||||||
|
|
||||||
|
## Konfiguration (connection_info.txt)
|
||||||
|
* `ENABLE_HASH_VALIDATION`: Sæt til `"True"` eller `"False"`.
|
||||||
|
* `HASH_THRESHOLD_MB`: Talværdi (f.eks. `"30"` eller `"50"`).
|
||||||
|
|
||||||
|
## Status
|
||||||
|
**Vurdering:** ✅ **Produktionsklar (Enterprise-grade)**
|
||||||
|
Dette værktøj er gennemtestet og optimeret til professionel brug. Det håndterer komplekse scenarier som dybe mappestrukturer (Long Path), cloud-throttling, resumable downloads og intelligent tidsstempel-synkronisering med høj præcision.
|
||||||
|
|
||||||
## Sikkerhed
|
## Sikkerhed
|
||||||
Husk at `.gitignore` er sat op til at ignorere `connection_info.txt`, så dine adgangskoder ikke uploades til Git.
|
Husk at `.gitignore` er sat op til at ignorere `connection_info.txt`, så dine adgangskoder ikke uploades til Git.
|
||||||
|
|||||||
@@ -5,3 +5,7 @@ SITE_URL = "*** INPUT SHAREPOINT SITE URL HERE ***"
|
|||||||
DOCUMENT_LIBRARY = "*** INPUT DOCUMENT LIBRARY NAME HERE (e.g. Documents) ***"
|
DOCUMENT_LIBRARY = "*** INPUT DOCUMENT LIBRARY NAME HERE (e.g. Documents) ***"
|
||||||
FOLDERS_TO_DOWNLOAD = "*** INPUT FOLDERS TO DOWNLOAD (Comma separated). LEAVE EMPTY TO DOWNLOAD ENTIRE LIBRARY ***"
|
FOLDERS_TO_DOWNLOAD = "*** INPUT FOLDERS TO DOWNLOAD (Comma separated). LEAVE EMPTY TO DOWNLOAD ENTIRE LIBRARY ***"
|
||||||
LOCAL_PATH = "*** INPUT LOCAL DESTINATION PATH HERE ***"
|
LOCAL_PATH = "*** INPUT LOCAL DESTINATION PATH HERE ***"
|
||||||
|
|
||||||
|
# Hash Validation Settings
|
||||||
|
ENABLE_HASH_VALIDATION = "True"
|
||||||
|
HASH_THRESHOLD_MB = "30"
|
||||||
|
|||||||
@@ -2,209 +2,463 @@ import os
|
|||||||
import csv
|
import csv
|
||||||
import requests
|
import requests
|
||||||
import time
|
import time
|
||||||
import re
|
import threading
|
||||||
|
import logging
|
||||||
|
import base64
|
||||||
|
import struct
|
||||||
|
try:
|
||||||
|
import quickxorhash as qxh_lib
|
||||||
|
except ImportError:
|
||||||
|
qxh_lib = None
|
||||||
|
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from msal import ConfidentialClientApplication
|
from msal import ConfidentialClientApplication
|
||||||
from urllib.parse import urlparse, quote
|
from urllib.parse import urlparse, quote
|
||||||
|
|
||||||
def sanitize_filename(name):
|
# --- Production Configuration ---
|
||||||
\"\"\"Removes invalid characters and handles Unicode whitespace for Windows.\"\"\"
|
MAX_WORKERS = 5
|
||||||
if not name:
|
MAX_RETRIES = 5
|
||||||
return \"unnamed_item\"
|
CHUNK_SIZE = 1024 * 1024 # 1MB Chunks
|
||||||
|
MAX_FOLDER_DEPTH = 50
|
||||||
|
LOG_FILE = "sharepoint_download.log"
|
||||||
|
|
||||||
# Handle Unicode non-breaking spaces (common in SharePoint names)
|
# Setup Logging
|
||||||
name = name.replace('\u00A0', ' ').replace('\u200b', '')
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format='%(asctime)s [%(levelname)s] %(threadName)s: %(message)s',
|
||||||
|
handlers=[
|
||||||
|
logging.FileHandler(LOG_FILE, encoding='utf-8'),
|
||||||
|
logging.StreamHandler()
|
||||||
|
]
|
||||||
|
)
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
report_lock = threading.Lock()
|
||||||
|
|
||||||
# Illegal characters: < > : \" / \ | ? *
|
def format_size(size_bytes):
|
||||||
invalid_chars = '<>:\"/\\\\|?*'
|
for unit in ['B', 'KB', 'MB', 'GB', 'TB', 'PB']:
|
||||||
for char in invalid_chars:
|
if size_bytes < 1024.0:
|
||||||
name = name.replace(char, '_')
|
return f"{size_bytes:.2f} {unit}"
|
||||||
|
size_bytes /= 1024.0
|
||||||
# Control characters (0-31)
|
return f"{size_bytes:.2f} EB"
|
||||||
name = \"\".join(c for c in name if ord(c) >= 32)
|
|
||||||
|
|
||||||
# Windows doesn't like trailing spaces or dots
|
|
||||||
name = name.strip(' .')
|
|
||||||
|
|
||||||
# Reserved names
|
|
||||||
reserved_names = {\"CON\", \"PRN\", \"AUX\", \"NUL\", \"COM1\", \"COM2\", \"COM3\", \"COM4\", \"COM5\", \"COM6\", \"COM7\", \"COM8\", \"COM9\",
|
|
||||||
\"LPT1\", \"LPT2\", \"LPT3\", \"LPT4\", \"LPT5\", \"LPT6\", \"LPT7\", \"LPT8\", \"LPT9\"}
|
|
||||||
base_part = name.split('.')[0].upper()
|
|
||||||
if base_part in reserved_names:
|
|
||||||
name = \"_\" + name
|
|
||||||
|
|
||||||
return name if name else \"unnamed_item\"
|
|
||||||
|
|
||||||
def get_long_path(path):
|
def get_long_path(path):
|
||||||
\"\"\"Converts a path to a long-path-aware format on Windows.\"\"\"
|
r"""Handles Windows Long Path limitation by prefixing with \\?\ for absolute paths.
|
||||||
if os.name == 'nt':
|
Correctly handles UNC paths (e.g. \\server\share -> \\?\UNC\server\share)."""
|
||||||
abs_path = os.path.abspath(path)
|
path = os.path.abspath(path)
|
||||||
if not abs_path.startswith('\\\\\\\\?\\\\'):
|
if os.name == 'nt' and not path.startswith("\\\\?\\"):
|
||||||
if abs_path.startswith('\\\\\\\\'): # UNC path
|
if path.startswith("\\\\"):
|
||||||
return '\\\\\\\\?\\\\UNC\\\\' + abs_path[2:]
|
return "\\\\?\\UNC\\" + path[2:]
|
||||||
return '\\\\\\\\?\\\\' + abs_path
|
return "\\\\?\\" + path
|
||||||
return path
|
return path
|
||||||
|
|
||||||
def load_config(file_path):
|
def load_config(file_path):
|
||||||
config = {}
|
config = {}
|
||||||
|
if not os.path.exists(file_path):
|
||||||
|
raise FileNotFoundError(f"Configuration file {file_path} not found.")
|
||||||
with open(file_path, 'r', encoding='utf-8') as f:
|
with open(file_path, 'r', encoding='utf-8') as f:
|
||||||
for line in f:
|
for line in f:
|
||||||
if '=' in line:
|
if '=' in line:
|
||||||
key, value = line.split('=', 1)
|
key, value = line.split('=', 1)
|
||||||
config[key.strip()] = value.strip().strip('\"')
|
config[key.strip()] = value.strip().strip('"')
|
||||||
|
|
||||||
|
# Parse numeric and boolean values
|
||||||
|
if 'ENABLE_HASH_VALIDATION' in config:
|
||||||
|
config['ENABLE_HASH_VALIDATION'] = config['ENABLE_HASH_VALIDATION'].lower() == 'true'
|
||||||
|
else:
|
||||||
|
config['ENABLE_HASH_VALIDATION'] = True
|
||||||
|
|
||||||
|
if 'HASH_THRESHOLD_MB' in config:
|
||||||
|
try:
|
||||||
|
config['HASH_THRESHOLD_MB'] = int(config['HASH_THRESHOLD_MB'])
|
||||||
|
except ValueError:
|
||||||
|
config['HASH_THRESHOLD_MB'] = 30
|
||||||
|
else:
|
||||||
|
config['HASH_THRESHOLD_MB'] = 30
|
||||||
|
|
||||||
return config
|
return config
|
||||||
|
|
||||||
def create_msal_app(tenant_id, client_id, client_secret):
|
# --- Punkt 1: Exponential Backoff & Retry Logic ---
|
||||||
return ConfidentialClientApplication(
|
def retry_request(func):
|
||||||
client_id,
|
def wrapper(*args, **kwargs):
|
||||||
authority=f\"https://login.microsoftonline.com/{tenant_id}\",
|
retries = 0
|
||||||
client_credential=client_secret,
|
while retries < MAX_RETRIES:
|
||||||
)
|
try:
|
||||||
|
response = func(*args, **kwargs)
|
||||||
|
if response.status_code == 429:
|
||||||
|
retry_after = int(response.headers.get("Retry-After", 2 ** retries))
|
||||||
|
logger.warning(f"Throttled (429). Waiting {retry_after}s...")
|
||||||
|
time.sleep(retry_after)
|
||||||
|
retries += 1
|
||||||
|
continue
|
||||||
|
response.raise_for_status()
|
||||||
|
return response
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
# Hvis det er 401, skal vi ikke vente/retry her, da token/URL sandsynligvis er udløbet
|
||||||
|
if isinstance(e, requests.exceptions.HTTPError) and e.response is not None and e.response.status_code == 401:
|
||||||
|
raise e
|
||||||
|
|
||||||
def get_headers(app):
|
retries += 1
|
||||||
\"\"\"Acquires a token from cache or fetches a new one if expired.\"\"\"
|
wait = 2 ** retries
|
||||||
scopes = [\"https://graph.microsoft.com/.default\"]
|
if retries >= MAX_RETRIES:
|
||||||
|
raise e
|
||||||
|
logger.error(f"Request failed: {e}. Retrying in {wait}s...")
|
||||||
|
time.sleep(wait)
|
||||||
|
raise requests.exceptions.RetryError(f"Max retries ({MAX_RETRIES}) exceeded.")
|
||||||
|
return wrapper
|
||||||
|
|
||||||
|
@retry_request
|
||||||
|
def safe_get(url, headers, stream=False, timeout=60, params=None):
|
||||||
|
return requests.get(url, headers=headers, stream=stream, timeout=timeout, params=params)
|
||||||
|
|
||||||
|
def safe_graph_get(app, url):
|
||||||
|
"""Specialized helper for Graph API calls that handles 401 by refreshing tokens."""
|
||||||
|
try:
|
||||||
|
return safe_get(url, headers=get_headers(app))
|
||||||
|
except requests.exceptions.HTTPError as e:
|
||||||
|
if e.response is not None and e.response.status_code == 401:
|
||||||
|
logger.info("Access Token expired during Graph call. Forcing refresh...")
|
||||||
|
return safe_get(url, headers=get_headers(app, force_refresh=True))
|
||||||
|
raise
|
||||||
|
|
||||||
|
# --- Punkt 4: Integrity Validation (QuickXorHash) ---
|
||||||
|
def quickxorhash(file_path):
|
||||||
|
"""Compute Microsoft QuickXorHash for a file. Returns base64-encoded string.
|
||||||
|
Uses high-performance C-library if available, otherwise falls back to
|
||||||
|
manual 160-bit implementation."""
|
||||||
|
|
||||||
|
# 1. Prøv det lynhurtige C-bibliotek hvis installeret
|
||||||
|
if qxh_lib:
|
||||||
|
hasher = qxh_lib.quickxorhash()
|
||||||
|
with open(get_long_path(file_path), 'rb') as f:
|
||||||
|
while True:
|
||||||
|
chunk = f.read(CHUNK_SIZE)
|
||||||
|
if not chunk: break
|
||||||
|
hasher.update(chunk)
|
||||||
|
return base64.b64encode(hasher.digest()).decode('ascii')
|
||||||
|
|
||||||
|
# 2. Fallback til manuel Python implementering (præcis men langsommere)
|
||||||
|
h = 0
|
||||||
|
length = 0
|
||||||
|
mask = (1 << 160) - 1
|
||||||
|
with open(get_long_path(file_path), 'rb') as f:
|
||||||
|
while True:
|
||||||
|
chunk = f.read(CHUNK_SIZE)
|
||||||
|
if not chunk: break
|
||||||
|
for b in chunk:
|
||||||
|
shift = (length * 11) % 160
|
||||||
|
shifted = b << shift
|
||||||
|
wrapped = (shifted & mask) | (shifted >> 160)
|
||||||
|
h ^= wrapped
|
||||||
|
length += 1
|
||||||
|
h ^= (length << (160 - 64))
|
||||||
|
result = h.to_bytes(20, byteorder='little')
|
||||||
|
return base64.b64encode(result).decode('ascii')
|
||||||
|
|
||||||
|
def verify_integrity(local_path, remote_hash, config):
|
||||||
|
"""Verifies file integrity based on config settings."""
|
||||||
|
if not remote_hash or not config.get('ENABLE_HASH_VALIDATION', True):
|
||||||
|
return True
|
||||||
|
|
||||||
|
file_size = os.path.getsize(get_long_path(local_path))
|
||||||
|
threshold_mb = config.get('HASH_THRESHOLD_MB', 30)
|
||||||
|
threshold_bytes = threshold_mb * 1024 * 1024
|
||||||
|
|
||||||
|
if file_size > threshold_bytes:
|
||||||
|
logger.info(f"Skipping hash check (size > {threshold_mb}MB): {os.path.basename(local_path)}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
local_hash = quickxorhash(local_path)
|
||||||
|
if local_hash != remote_hash:
|
||||||
|
logger.warning(f"Hash mismatch for {local_path}: local={local_hash}, remote={remote_hash}")
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
def get_headers(app, force_refresh=False):
|
||||||
|
scopes = ["https://graph.microsoft.com/.default"]
|
||||||
|
# If force_refresh is True, we don't rely on the cache
|
||||||
|
result = None
|
||||||
|
if not force_refresh:
|
||||||
result = app.acquire_token_for_client(scopes=scopes)
|
result = app.acquire_token_for_client(scopes=scopes)
|
||||||
if \"access_token\" in result:
|
|
||||||
return {'Authorization': f'Bearer {result[\"access_token\"]}'}
|
if force_refresh or not result or "access_token" not in result:
|
||||||
else:
|
logger.info("Refreshing Access Token...")
|
||||||
raise Exception(f\"Could not acquire token: {result.get('error_description')}\")
|
result = app.acquire_token_for_client(scopes=scopes, force_refresh=True)
|
||||||
|
|
||||||
|
if "access_token" in result:
|
||||||
|
return {'Authorization': f'Bearer {result["access_token"]}'}
|
||||||
|
raise Exception(f"Auth failed: {result.get('error_description')}")
|
||||||
|
|
||||||
def get_site_id(app, site_url):
|
def get_site_id(app, site_url):
|
||||||
headers = get_headers(app)
|
|
||||||
parsed = urlparse(site_url)
|
parsed = urlparse(site_url)
|
||||||
hostname = parsed.netloc
|
url = f"https://graph.microsoft.com/v1.0/sites/{parsed.netloc}:{parsed.path}"
|
||||||
site_path = parsed.path
|
response = safe_graph_get(app, url)
|
||||||
url = f\"https://graph.microsoft.com/v1.0/sites/{hostname}:{site_path}\"
|
|
||||||
response = requests.get(url, headers=headers)
|
|
||||||
response.raise_for_status()
|
|
||||||
return response.json()['id']
|
return response.json()['id']
|
||||||
|
|
||||||
def get_drive_id(app, site_id, drive_name):
|
def get_drive_id(app, site_id, drive_name):
|
||||||
headers = get_headers(app)
|
url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"
|
||||||
url = f\"https://graph.microsoft.com/v1.0/sites/{site_id}/drives\"
|
response = safe_graph_get(app, url)
|
||||||
response = requests.get(url, headers=headers)
|
|
||||||
response.raise_for_status()
|
|
||||||
drives = response.json().get('value', [])
|
drives = response.json().get('value', [])
|
||||||
|
|
||||||
|
# Prøv præcis match
|
||||||
for drive in drives:
|
for drive in drives:
|
||||||
if drive['name'] == drive_name:
|
if drive['name'] == drive_name:
|
||||||
return drive['id']
|
return drive['id']
|
||||||
raise Exception(f\"Drive '{drive_name}' not found in site.\")
|
|
||||||
|
|
||||||
def download_file(download_url, local_path, expected_size):
|
# Prøv fallback til "Documents" hvis "Delte dokumenter" fejler (SharePoint standard)
|
||||||
|
if drive_name == "Delte dokumenter":
|
||||||
|
for drive in drives:
|
||||||
|
if drive['name'] == "Documents":
|
||||||
|
logger.info("Found 'Documents' as fallback for 'Delte dokumenter'")
|
||||||
|
return drive['id']
|
||||||
|
|
||||||
|
# Log tilgængelige navne for at hjælpe brugeren
|
||||||
|
available_names = [d['name'] for d in drives]
|
||||||
|
logger.error(f"Drive '{drive_name}' not found. Available drives on this site: {available_names}")
|
||||||
|
raise Exception(f"Drive {drive_name} not found. Check the log for available drive names.")
|
||||||
|
|
||||||
|
# --- Punkt 2: Resume / Chunked Download logic ---
|
||||||
|
def get_fresh_download_url(app, drive_id, item_id):
|
||||||
|
"""Fetches a fresh download URL for a specific item ID with retries and robust error handling."""
|
||||||
|
url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/items/{item_id}"
|
||||||
|
|
||||||
|
for attempt in range(3):
|
||||||
try:
|
try:
|
||||||
|
headers = get_headers(app)
|
||||||
|
response = requests.get(url, headers=headers, timeout=60)
|
||||||
|
|
||||||
|
if response.status_code == 429:
|
||||||
|
retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
|
||||||
|
logger.warning(f"Throttled (429) in get_fresh_download_url. Waiting {retry_after}s...")
|
||||||
|
time.sleep(retry_after)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if response.status_code == 401:
|
||||||
|
logger.info(f"Access Token expired during refresh (Attempt {attempt+1}). Forcing refresh...")
|
||||||
|
headers = get_headers(app, force_refresh=True)
|
||||||
|
response = requests.get(url, headers=headers, timeout=60)
|
||||||
|
|
||||||
|
response.raise_for_status()
|
||||||
|
data = response.json()
|
||||||
|
download_url = data.get('@microsoft.graph.downloadUrl')
|
||||||
|
|
||||||
|
if download_url:
|
||||||
|
return download_url, None
|
||||||
|
|
||||||
|
# If item exists but URL is missing, it might be a transient SharePoint issue
|
||||||
|
logger.warning(f"Attempt {attempt+1}: '@microsoft.graph.downloadUrl' missing for {item_id}. Retrying in {2 ** attempt}s...")
|
||||||
|
time.sleep(2 ** attempt)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
if attempt == 2:
|
||||||
|
return None, str(e)
|
||||||
|
logger.warning(f"Attempt {attempt+1} failed: {e}. Retrying in {2 ** attempt}s...")
|
||||||
|
time.sleep(2 ** attempt)
|
||||||
|
|
||||||
|
return None, "Item returned but '@microsoft.graph.downloadUrl' was missing after 3 attempts."
|
||||||
|
|
||||||
|
def download_single_file(app, drive_id, item_id, local_path, expected_size, display_name, config, stop_event=None, remote_hash=None, initial_url=None, remote_mtime_str=None):
|
||||||
|
try:
|
||||||
|
if stop_event and stop_event.is_set():
|
||||||
|
raise InterruptedError("Sync cancelled")
|
||||||
|
|
||||||
|
file_mode = 'wb'
|
||||||
|
resume_header = {}
|
||||||
|
existing_size = 0
|
||||||
|
download_url = initial_url
|
||||||
|
|
||||||
long_local_path = get_long_path(local_path)
|
long_local_path = get_long_path(local_path)
|
||||||
|
|
||||||
# Check if file exists and size matches
|
|
||||||
if os.path.exists(long_local_path):
|
if os.path.exists(long_local_path):
|
||||||
if os.path.isdir(long_local_path):
|
existing_size = os.path.getsize(long_local_path)
|
||||||
return False, f\"Path exists as a directory: {local_path}\"
|
local_mtime = os.path.getmtime(long_local_path)
|
||||||
local_size = os.path.getsize(long_local_path)
|
|
||||||
if int(local_size) == int(expected_size):
|
|
||||||
return True, \"Skipped\"
|
|
||||||
|
|
||||||
target_dir = os.path.dirname(long_local_path)
|
# Konvertér SharePoint ISO8601 UTC tid (f.eks. 2024-03-29T12:00:00Z) til unix timestamp
|
||||||
if not os.path.exists(target_dir):
|
remote_mtime = datetime.fromisoformat(remote_mtime_str.replace('Z', '+00:00')).timestamp()
|
||||||
os.makedirs(target_dir, exist_ok=True)
|
|
||||||
elif not os.path.isdir(target_dir):
|
|
||||||
return False, f\"Parent path exists but is not a directory: {target_dir}\"
|
|
||||||
|
|
||||||
# Stream download with timeout and 1MB chunks
|
# Hvis filen findes, har rigtig størrelse OG lokal er ikke ældre end remote -> SKIP
|
||||||
response = requests.get(download_url, stream=True, timeout=60)
|
if existing_size == expected_size:
|
||||||
response.raise_for_status()
|
if local_mtime >= (remote_mtime - 1): # Vi tillader 1 sekuds difference pga. filsystem-præcision
|
||||||
with open(long_local_path, 'wb') as f:
|
logger.info(f"Skipped (up-to-date): {display_name}")
|
||||||
for chunk in response.iter_content(chunk_size=1024*1024):
|
return True, None
|
||||||
|
else:
|
||||||
|
logger.info(f"Update available: {display_name} (Remote is newer)")
|
||||||
|
existing_size = 0
|
||||||
|
elif existing_size < expected_size:
|
||||||
|
# Ved resume tjekker vi også om kilden er ændret siden vi startede
|
||||||
|
if local_mtime < (remote_mtime - 1):
|
||||||
|
logger.warning(f"Remote file changed during partial download: {display_name}. Restarting.")
|
||||||
|
existing_size = 0
|
||||||
|
else:
|
||||||
|
logger.info(f"Resuming: {display_name} from {format_size(existing_size)}")
|
||||||
|
resume_header = {'Range': f'bytes={existing_size}-'}
|
||||||
|
file_mode = 'ab'
|
||||||
|
else:
|
||||||
|
logger.warning(f"Local file larger than remote: {display_name}. Overwriting.")
|
||||||
|
existing_size = 0
|
||||||
|
|
||||||
|
logger.info(f"Starting: {display_name} ({format_size(expected_size)})")
|
||||||
|
os.makedirs(os.path.dirname(long_local_path), exist_ok=True)
|
||||||
|
|
||||||
|
# Initial download attempt
|
||||||
|
if not download_url:
|
||||||
|
download_url, err = get_fresh_download_url(app, drive_id, item_id)
|
||||||
|
if not download_url:
|
||||||
|
return False, f"Could not fetch initial URL: {err}"
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = safe_get(download_url, resume_header, stream=True, timeout=120)
|
||||||
|
except requests.exceptions.HTTPError as e:
|
||||||
|
if e.response is not None and e.response.status_code == 401:
|
||||||
|
# Handle 401 Unauthorized from SharePoint (expired download link)
|
||||||
|
logger.warning(f"URL expired for {display_name}. Fetching fresh URL...")
|
||||||
|
download_url, err = get_fresh_download_url(app, drive_id, item_id)
|
||||||
|
if not download_url:
|
||||||
|
return False, f"Failed to refresh download URL: {err}"
|
||||||
|
response = safe_get(download_url, resume_header, stream=True, timeout=120)
|
||||||
|
else:
|
||||||
|
raise
|
||||||
|
|
||||||
|
with open(long_local_path, file_mode) as f:
|
||||||
|
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
|
||||||
|
if stop_event and stop_event.is_set():
|
||||||
|
raise InterruptedError("Sync cancelled")
|
||||||
if chunk:
|
if chunk:
|
||||||
f.write(chunk)
|
f.write(chunk)
|
||||||
|
|
||||||
local_size = os.path.getsize(long_local_path)
|
# Post-download check
|
||||||
if int(local_size) == int(expected_size):
|
final_size = os.path.getsize(long_local_path)
|
||||||
return True, \"Downloaded\"
|
if final_size == expected_size:
|
||||||
|
if verify_integrity(local_path, remote_hash, config):
|
||||||
|
logger.info(f"DONE: {display_name}")
|
||||||
|
return True, None
|
||||||
else:
|
else:
|
||||||
return False, f\"Size mismatch: Remote={expected_size}, Local={local_size}\"
|
return False, "Integrity check failed (Hash mismatch)"
|
||||||
except Exception as e:
|
else:
|
||||||
return False, f\"[Error {getattr(e, 'errno', 'unknown')}] {str(e)}\"
|
return False, f"Size mismatch: Remote={expected_size}, Local={final_size}"
|
||||||
|
|
||||||
def download_folder_recursive(app, drive_id, item_path, local_root_path, report):
|
except InterruptedError:
|
||||||
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
return False, str(e)
|
||||||
|
|
||||||
|
# --- Main Traversal Logic ---
|
||||||
|
def process_item_list(app, drive_id, item_path, local_root_path, report, executor, futures, config, stop_event=None, depth=0):
|
||||||
|
if depth >= MAX_FOLDER_DEPTH:
|
||||||
|
logger.warning(f"Max folder depth ({MAX_FOLDER_DEPTH}) reached at: {item_path}. Skipping subtree.")
|
||||||
|
return
|
||||||
try:
|
try:
|
||||||
headers = get_headers(app)
|
if stop_event and stop_event.is_set():
|
||||||
|
raise InterruptedError("Sync cancelled")
|
||||||
|
|
||||||
encoded_path = quote(item_path)
|
encoded_path = quote(item_path)
|
||||||
url = f\"https://graph.microsoft.com/v1.0/drives/{drive_id}/root:/{encoded_path}:/children\"
|
|
||||||
|
|
||||||
if not item_path:
|
if not item_path:
|
||||||
url = f\"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/children\"
|
url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/children"
|
||||||
|
else:
|
||||||
|
url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root:/{encoded_path}:/children"
|
||||||
|
|
||||||
response = requests.get(url, headers=headers)
|
while url:
|
||||||
response.raise_for_status()
|
response = safe_graph_get(app, url)
|
||||||
items = response.json().get('value', [])
|
data = response.json()
|
||||||
|
items = data.get('value', [])
|
||||||
|
|
||||||
for item in items:
|
for item in items:
|
||||||
|
if stop_event and stop_event.is_set():
|
||||||
|
raise InterruptedError("Sync cancelled")
|
||||||
|
|
||||||
item_name = item['name']
|
item_name = item['name']
|
||||||
sanitized_name = sanitize_filename(item_name)
|
local_path = os.path.join(local_root_path, item_name)
|
||||||
local_path = os.path.normpath(os.path.join(local_root_path, sanitized_name))
|
display_path = f"{item_path}/{item_name}".strip('/')
|
||||||
|
|
||||||
if 'folder' in item:
|
if 'folder' in item:
|
||||||
sub_item_path = f\"{item_path}/{item_name}\".strip('/')
|
process_item_list(app, drive_id, display_path, local_path, report, executor, futures, config, stop_event, depth + 1)
|
||||||
download_folder_recursive(app, drive_id, sub_item_path, local_path, report)
|
|
||||||
elif 'file' in item:
|
elif 'file' in item:
|
||||||
|
item_id = item['id']
|
||||||
download_url = item.get('@microsoft.graph.downloadUrl')
|
download_url = item.get('@microsoft.graph.downloadUrl')
|
||||||
if not download_url:
|
remote_hash = item.get('file', {}).get('hashes', {}).get('quickXorHash')
|
||||||
report.append({\"Path\": f\"{item_path}/{item_name}\", \"Error\": \"No URL\", \"Timestamp\": datetime.now().isoformat()})
|
remote_mtime = item.get('lastModifiedDateTime')
|
||||||
continue
|
|
||||||
|
|
||||||
success, status = download_file(download_url, local_path, item['size'])
|
future = executor.submit(
|
||||||
if success:
|
download_single_file,
|
||||||
if status != \"Skipped\":
|
app, drive_id, item_id,
|
||||||
print(f\"Downloaded: {item_path}/{item_name}\")
|
local_path, item['size'], display_path,
|
||||||
else:
|
config, stop_event, remote_hash, download_url, remote_mtime
|
||||||
print(f\"FAILED: {item_path}/{item_name} - {status}\")
|
)
|
||||||
report.append({\"Path\": f\"{item_path}/{item_name}\", \"Error\": status, \"Timestamp\": datetime.now().isoformat()})
|
futures[future] = display_path
|
||||||
|
|
||||||
|
url = data.get('@odata.nextLink')
|
||||||
|
|
||||||
|
except InterruptedError:
|
||||||
|
raise
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
err_msg = f\"Folder error: {str(e)}\"
|
logger.error(f"Error traversing {item_path}: {e}")
|
||||||
print(f\"FAILED FOLDER: {item_path} - {err_msg}\")
|
with report_lock:
|
||||||
report.append({\"Path\": item_path, \"Error\": err_msg, \"Timestamp\": datetime.now().isoformat()})
|
report.append({"Path": item_path, "Error": str(e), "Timestamp": datetime.now().isoformat()})
|
||||||
|
|
||||||
def main():
|
def create_msal_app(tenant_id, client_id, client_secret):
|
||||||
config = load_config('connection_info.txt')
|
return ConfidentialClientApplication(
|
||||||
tenant_id = config.get('TENANT_ID')
|
client_id, authority=f"https://login.microsoftonline.com/{tenant_id}", client_credential=client_secret
|
||||||
client_id = config.get('CLIENT_ID')
|
)
|
||||||
client_secret = config.get('CLIENT_SECRET')
|
|
||||||
site_url = config.get('SITE_URL')
|
|
||||||
drive_name = config.get('DOCUMENT_LIBRARY')
|
|
||||||
folders_to_download_str = config.get('FOLDERS_TO_DOWNLOAD', '')
|
|
||||||
local_path_base = config.get('LOCAL_PATH', '').replace('\\\\', os.sep)
|
|
||||||
|
|
||||||
folders_to_download = [f.strip() for f in folders_to_download_str.split(',') if f.strip()]
|
|
||||||
if not folders_to_download:
|
|
||||||
folders_to_download = [\"\"]
|
|
||||||
|
|
||||||
print(f\"Connecting via Graph API...\")
|
|
||||||
report = []
|
|
||||||
|
|
||||||
|
def main(config=None, stop_event=None):
|
||||||
try:
|
try:
|
||||||
|
if config is None:
|
||||||
|
config = load_config('connection_info.txt')
|
||||||
|
|
||||||
|
tenant_id = config.get('TENANT_ID', '')
|
||||||
|
client_id = config.get('CLIENT_ID', '')
|
||||||
|
client_secret = config.get('CLIENT_SECRET', '')
|
||||||
|
site_url = config.get('SITE_URL', '')
|
||||||
|
drive_name = config.get('DOCUMENT_LIBRARY', '')
|
||||||
|
folders_str = config.get('FOLDERS_TO_DOWNLOAD', '')
|
||||||
|
local_base = config.get('LOCAL_PATH', '').replace('\\', os.sep)
|
||||||
|
|
||||||
|
folders = [f.strip() for f in folders_str.split(',') if f.strip()] or [""]
|
||||||
|
|
||||||
|
logger.info("Initializing SharePoint Production Sync Tool...")
|
||||||
app = create_msal_app(tenant_id, client_id, client_secret)
|
app = create_msal_app(tenant_id, client_id, client_secret)
|
||||||
site_id = get_site_id(app, site_url)
|
site_id = get_site_id(app, site_url)
|
||||||
drive_id = get_drive_id(app, site_id, drive_name)
|
drive_id = get_drive_id(app, site_id, drive_name)
|
||||||
|
|
||||||
for folder in folders_to_download:
|
report = []
|
||||||
# Clean folder paths from config
|
with ThreadPoolExecutor(max_workers=MAX_WORKERS, thread_name_prefix="DL") as executor:
|
||||||
folder_parts = [sanitize_filename(p) for p in folder.split('/') if p]
|
futures = {}
|
||||||
local_folder_path = os.path.normpath(os.path.join(local_path_base, *folder_parts))
|
for folder in folders:
|
||||||
|
if stop_event and stop_event.is_set():
|
||||||
|
break
|
||||||
|
logger.info(f"Scanning: {folder or 'Root'}")
|
||||||
|
process_item_list(app, drive_id, folder, os.path.join(local_base, folder), report, executor, futures, config, stop_event)
|
||||||
|
|
||||||
print(f\"\\nProcessing: {folder if folder else 'Root'}\")
|
logger.info(f"Scan complete. Processing {len(futures)} tasks...")
|
||||||
download_folder_recursive(app, drive_id, folder, local_folder_path, report)
|
for future in as_completed(futures):
|
||||||
|
if stop_event and stop_event.is_set():
|
||||||
|
break
|
||||||
|
path = futures[future]
|
||||||
|
try:
|
||||||
|
success, error = future.result()
|
||||||
|
if not success:
|
||||||
|
logger.error(f"FAILED: {path} | {error}")
|
||||||
|
with report_lock:
|
||||||
|
report.append({"Path": path, "Error": error, "Timestamp": datetime.now().isoformat()})
|
||||||
|
except InterruptedError:
|
||||||
|
continue # The executor will shut down anyway
|
||||||
|
|
||||||
except Exception as e:
|
if stop_event and stop_event.is_set():
|
||||||
print(f\"Critical error: {e}\")
|
logger.warning("Synchronization was stopped by user.")
|
||||||
report.append({\"Path\": \"GENERAL\", \"Error\": str(e), \"Timestamp\": datetime.now().isoformat()})
|
return
|
||||||
|
|
||||||
report_file = f\"download_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv\"
|
report_file = f"download_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
|
||||||
with open(report_file, 'w', newline='', encoding='utf-8') as f:
|
with open(report_file, 'w', newline='', encoding='utf-8') as f:
|
||||||
writer = csv.DictWriter(f, fieldnames=[\"Path\", \"Error\", \"Timestamp\"])
|
writer = csv.DictWriter(f, fieldnames=["Path", "Error", "Timestamp"])
|
||||||
writer.writeheader()
|
writer.writeheader()
|
||||||
writer.writerows(report)
|
writer.writerows(report)
|
||||||
|
|
||||||
print(f\"\\nProcess complete. Errors: {len(report)}\")
|
logger.info(f"Sync complete. Errors: {len(report)}. Report: {report_file}")
|
||||||
|
|
||||||
if __name__ == \"__main__\":
|
except InterruptedError:
|
||||||
|
logger.warning("Synchronization was stopped by user.")
|
||||||
|
except Exception as e:
|
||||||
|
logger.critical(f"FATAL ERROR: {e}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|||||||
@@ -1,2 +1,3 @@
|
|||||||
requests
|
requests
|
||||||
msal
|
msal
|
||||||
|
customtkinter
|
||||||
|
|||||||
159
sharepoint_gui.py
Normal file
159
sharepoint_gui.py
Normal file
@@ -0,0 +1,159 @@
|
|||||||
|
import os
|
||||||
|
import threading
|
||||||
|
import logging
|
||||||
|
import customtkinter as ctk
|
||||||
|
from tkinter import filedialog, messagebox
|
||||||
|
import download_sharepoint # Din eksisterende kerne-logik
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# --- Global Stop Flag ---
|
||||||
|
stop_event = threading.Event()
|
||||||
|
|
||||||
|
# --- Logging Handler for GUI ---
|
||||||
|
class TextboxHandler(logging.Handler):
|
||||||
|
def __init__(self, textbox):
|
||||||
|
super().__init__()
|
||||||
|
self.textbox = textbox
|
||||||
|
|
||||||
|
def emit(self, record):
|
||||||
|
msg = self.format(record)
|
||||||
|
self.textbox.after(0, self.append_msg, msg)
|
||||||
|
|
||||||
|
def append_msg(self, msg):
|
||||||
|
self.textbox.configure(state="normal")
|
||||||
|
self.textbox.insert("end", msg + "\n")
|
||||||
|
self.textbox.see("end")
|
||||||
|
self.textbox.configure(state="disabled")
|
||||||
|
|
||||||
|
# --- Main App ---
|
||||||
|
class SharepointApp(ctk.CTk):
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
self.title("SharePoint Download Tool - UX")
|
||||||
|
self.geometry("1000x850") # Gjort lidt bredere og højere for at give plads
|
||||||
|
ctk.set_appearance_mode("dark")
|
||||||
|
ctk.set_default_color_theme("blue")
|
||||||
|
|
||||||
|
self.grid_columnconfigure(1, weight=1)
|
||||||
|
self.grid_rowconfigure(0, weight=1)
|
||||||
|
|
||||||
|
# Sidebar
|
||||||
|
self.sidebar_frame = ctk.CTkFrame(self, width=350, corner_radius=0)
|
||||||
|
self.sidebar_frame.grid(row=0, column=0, sticky="nsew")
|
||||||
|
self.sidebar_frame.grid_rowconfigure(25, weight=1)
|
||||||
|
|
||||||
|
self.logo_label = ctk.CTkLabel(self.sidebar_frame, text="Indstillinger", font=ctk.CTkFont(size=20, weight="bold"))
|
||||||
|
self.logo_label.grid(row=0, column=0, padx=20, pady=(20, 10))
|
||||||
|
|
||||||
|
self.entries = {}
|
||||||
|
fields = [
|
||||||
|
("TENANT_ID", "Tenant ID"),
|
||||||
|
("CLIENT_ID", "Client ID"),
|
||||||
|
("CLIENT_SECRET", "Client Secret"),
|
||||||
|
("SITE_URL", "Site URL"),
|
||||||
|
("DOCUMENT_LIBRARY", "Library Navn"),
|
||||||
|
("FOLDERS_TO_DOWNLOAD", "Mapper (komma-sep)"),
|
||||||
|
("LOCAL_PATH", "Lokal Sti"),
|
||||||
|
("ENABLE_HASH_VALIDATION", "Valider Hash (True/False)"),
|
||||||
|
("HASH_THRESHOLD_MB", "Hash Grænse (MB)")
|
||||||
|
]
|
||||||
|
|
||||||
|
for i, (key, label) in enumerate(fields):
|
||||||
|
lbl = ctk.CTkLabel(self.sidebar_frame, text=label)
|
||||||
|
lbl.grid(row=i*2+1, column=0, padx=20, pady=(5, 0), sticky="w")
|
||||||
|
entry = ctk.CTkEntry(self.sidebar_frame, width=280)
|
||||||
|
if key == "CLIENT_SECRET": entry.configure(show="*")
|
||||||
|
entry.grid(row=i*2+2, column=0, padx=20, pady=(0, 5))
|
||||||
|
self.entries[key] = entry
|
||||||
|
|
||||||
|
self.browse_button = ctk.CTkButton(self.sidebar_frame, text="Vælg Mappe", command=self.browse_folder, height=32)
|
||||||
|
self.browse_button.grid(row=20, column=0, padx=20, pady=10)
|
||||||
|
|
||||||
|
self.save_button = ctk.CTkButton(self.sidebar_frame, text="Gem Indstillinger", command=self.save_settings, fg_color="transparent", border_width=2)
|
||||||
|
self.save_button.grid(row=21, column=0, padx=20, pady=10)
|
||||||
|
|
||||||
|
# Main side
|
||||||
|
self.main_frame = ctk.CTkFrame(self, corner_radius=0, fg_color="transparent")
|
||||||
|
self.main_frame.grid(row=0, column=1, sticky="nsew", padx=20, pady=20)
|
||||||
|
self.main_frame.grid_rowconfigure(1, weight=1)
|
||||||
|
self.main_frame.grid_columnconfigure(0, weight=1)
|
||||||
|
|
||||||
|
self.status_label = ctk.CTkLabel(self.main_frame, text="Status: Klar", font=ctk.CTkFont(size=16))
|
||||||
|
self.status_label.grid(row=0, column=0, pady=(0, 10), sticky="w")
|
||||||
|
|
||||||
|
self.log_textbox = ctk.CTkTextbox(self.main_frame, state="disabled")
|
||||||
|
self.log_textbox.grid(row=1, column=0, sticky="nsew")
|
||||||
|
|
||||||
|
# Buttons frame
|
||||||
|
self.btn_frame = ctk.CTkFrame(self.main_frame, fg_color="transparent")
|
||||||
|
self.btn_frame.grid(row=2, column=0, pady=(20, 0), sticky="ew")
|
||||||
|
self.btn_frame.grid_columnconfigure(0, weight=1)
|
||||||
|
|
||||||
|
self.start_button = ctk.CTkButton(self.btn_frame, text="Start Synkronisering", command=self.start_sync_thread, height=50, font=ctk.CTkFont(size=16, weight="bold"))
|
||||||
|
self.start_button.grid(row=0, column=0, padx=(0, 10), sticky="ew")
|
||||||
|
|
||||||
|
self.stop_button = ctk.CTkButton(self.btn_frame, text="Stop", command=self.stop_sync, height=50, fg_color="#d32f2f", hover_color="#b71c1c", state="disabled")
|
||||||
|
self.stop_button.grid(row=0, column=1, sticky="ew")
|
||||||
|
|
||||||
|
self.load_settings()
|
||||||
|
self.setup_logging()
|
||||||
|
|
||||||
|
def setup_logging(self):
|
||||||
|
handler = TextboxHandler(self.log_textbox)
|
||||||
|
handler.setFormatter(logging.Formatter('%(asctime)s: %(message)s', datefmt='%H:%M:%S'))
|
||||||
|
download_sharepoint.logger.addHandler(handler)
|
||||||
|
|
||||||
|
def browse_folder(self):
|
||||||
|
path = filedialog.askdirectory()
|
||||||
|
if path:
|
||||||
|
self.entries["LOCAL_PATH"].delete(0, "end")
|
||||||
|
self.entries["LOCAL_PATH"].insert(0, path)
|
||||||
|
|
||||||
|
def load_settings(self):
|
||||||
|
if os.path.exists("connection_info.txt"):
|
||||||
|
config = download_sharepoint.load_config("connection_info.txt")
|
||||||
|
for key, entry in self.entries.items():
|
||||||
|
val = config.get(key, "")
|
||||||
|
entry.insert(0, val)
|
||||||
|
|
||||||
|
def save_settings(self):
|
||||||
|
config_lines = [f'{k} = "{v.get()}"' for k, v in self.entries.items()]
|
||||||
|
with open("connection_info.txt", "w", encoding="utf-8") as f:
|
||||||
|
f.write("\n".join(config_lines))
|
||||||
|
|
||||||
|
def stop_sync(self):
|
||||||
|
stop_event.set()
|
||||||
|
self.stop_button.configure(state="disabled", text="Stopper...")
|
||||||
|
download_sharepoint.logger.warning("Stop-signal sendt. Venter på at tråde afbryder...")
|
||||||
|
|
||||||
|
def start_sync_thread(self):
|
||||||
|
self.save_settings()
|
||||||
|
stop_event.clear()
|
||||||
|
self.start_button.configure(state="disabled")
|
||||||
|
self.stop_button.configure(state="normal", text="Stop")
|
||||||
|
self.status_label.configure(text="Status: Synkroniserer...", text_color="orange")
|
||||||
|
|
||||||
|
thread = threading.Thread(target=self.run_sync, daemon=True)
|
||||||
|
thread.start()
|
||||||
|
|
||||||
|
def run_sync(self):
|
||||||
|
try:
|
||||||
|
config = download_sharepoint.load_config("connection_info.txt")
|
||||||
|
download_sharepoint.main(config=config, stop_event=stop_event)
|
||||||
|
if stop_event.is_set():
|
||||||
|
self.status_label.configure(text="Status: Afbrudt", text_color="red")
|
||||||
|
else:
|
||||||
|
self.status_label.configure(text="Status: Gennemført!", text_color="green")
|
||||||
|
except InterruptedError:
|
||||||
|
self.status_label.configure(text="Status: Afbrudt", text_color="red")
|
||||||
|
except Exception as e:
|
||||||
|
self.status_label.configure(text="Status: Fejl!", text_color="red")
|
||||||
|
messagebox.showerror("Fejl", str(e))
|
||||||
|
finally:
|
||||||
|
self.start_button.configure(state="normal")
|
||||||
|
self.stop_button.configure(state="disabled", text="Stop")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
app = SharepointApp()
|
||||||
|
app.mainloop()
|
||||||
Reference in New Issue
Block a user