Compare commits

...

32 Commits

Author SHA1 Message Date
Martin Tranberg
d15b9afc03 Update README.md with new features and optimizations (Danish) 2026-04-12 12:46:15 +02:00
Martin Tranberg
8e8bb3baa1 Improve cancellation logic and sync performance.
- Implement explicit threading.Event propagation for robust GUI cancellation.
- Optimize file synchronization by skipping hash validation for up-to-date files (matching size and timestamp).
- Update Windows long path support to correctly handle UNC network shares.
- Refactor configuration management to eliminate global state and improve modularity.
- Remove requests.get monkey-patch in GUI.
- Delete CLAUDE.md as it is no longer required.
2026-04-12 12:44:43 +02:00
Martin Tranberg
8899afabbc Improve token handling and session refresh logic. Added safe_graph_get helper and optimized 401 response handling to eliminate 'Request failed' errors during long syncs. 2026-03-30 09:18:40 +02:00
Martin Tranberg
9e40abcfd8 Robust type-konvertering af konfigurations-værdier
- Implementerer korrekt boolean parsing for ENABLE_HASH_VALIDATION
- Tilføjer fejlhåndtering (try/except) ved parsing af HASH_THRESHOLD_MB
- Sikrer 100% konsistens mellem GUI-input og backend-logik
2026-03-29 19:58:45 +02:00
Martin Tranberg
03a766be63 Opdatér template med nye hash-variabler
- Tilføjer ENABLE_HASH_VALIDATION og HASH_THRESHOLD_MB til connection_info.template.txt
2026-03-29 19:56:07 +02:00
Martin Tranberg
1a97ca3d53 Cleanup og variabel-synkronisering
- Rydder op i duplicate kode i download_single_file
- Sikrer korrekt type-casting af config-variabler (bool/int)
- Verificerer at alle GUI-parametre læses korrekt i main()
2026-03-29 19:55:08 +02:00
Martin Tranberg
8e837240b5 Projekt afslutning: Marker værktøj som produktionsklart (Enterprise-grade)
- Tilføjer officiel status-vurdering i README.md
- Bekræfter understøttelse af Long Paths, Timestamp Sync og korrekt QuickXorHash validering
2026-03-29 19:48:56 +02:00
Martin Tranberg
f5e54b185e Gør 'quickxorhash' valgfri for at undgå installationsfejl på Windows
- Fjerner quickxorhash fra requirements.txt for at undgå C++ Build Tools fejlen
- Tilføjer note i README.md om at biblioteket er valgfrit (findes Python-fallback)
- Sikrer at 'pip install -r requirements.txt' fungerer uden fejl for alle brugere
2026-03-29 19:40:12 +02:00
Martin Tranberg
c5d4ddaab0 Enterprise-grade optimeringer: Windows Long Path, High-Performance Hashing og Dokumentation
- Tilføjer 'get_long_path' for at understøtte Windows-stier over 260 tegn
- Implementerer dual-mode hashing: Bruger 'quickxorhash' C-bibliotek hvis muligt, ellers manual Python fallback
- Opdaterer requirements.txt med quickxorhash
- Opdaterer README.md og GEMINI.md med de seneste funktioner og tekniske specifikationer
2026-03-29 19:33:31 +02:00
Martin Tranberg
367d31671d Opdatér dokumentation med tidsstempel-synk og hash-optimeringer
- Opdaterer README.md med beskrivelse af Timestamp Sync, Hash Toggle og 30MB grænse
- Opdaterer GEMINI.md med tekniske specifikationer for QuickXorHash og biblioteks-fallback
- Tilføjer vejledning til de nye konfigurationsmuligheder i GUI'en
2026-03-29 19:25:28 +02:00
Martin Tranberg
acede4a867 Synkronisér GUI med nye hash-indstillinger og tidsstempel-logik
- Opdaterer sharepoint_gui.py med felter til ENABLE_HASH_VALIDATION og HASH_THRESHOLD_MB
- Gør download_sharepoint.py i stand til at læse disse indstillinger fra konfigurationsfilen
- Justerer GUI-layoutet (større vindue) for at give plads til de nye kontrolmuligheder
- GUI'en bruger nu automatisk den nye tidsstempel-baserede synkronisering
2026-03-29 19:23:42 +02:00
Martin Tranberg
ba968ab70e Synkronisér kun hvis SharePoint-filen er nyere end lokal kopi
- Implementerer sammenligning af lastModifiedDateTime fra SharePoint med lokal mtime
- Konverterer ISO8601 UTC-tidsstempler til unix timestamp for præcis sammenligning
- Tilføjer 1-sekunds tolerance for at håndtere filsystemets tidspræcision
- Sikrer at data kun hentes ned hvis kilden er opdateret, eller hvis lokal fil er korrupt
2026-03-29 19:19:56 +02:00
Martin Tranberg
790ca91339 Gør bibliotekssøgning mere robust og tilføj navne-fallback
- Tilføjer automatisk fallback til 'Documents' hvis 'Delte dokumenter' ikke findes
- Forbedrer fejlmeddelelsen ved at logge alle tilgængelige biblioteksnavne på sitet
- Dette løser problemer med lokaliserede SharePoint-navne (dansk vs engelsk)
2026-03-29 17:59:34 +02:00
Martin Tranberg
ed508302a6 Tilføj global toggle og konfigurerbar grænse for hash-validering
- ENABLE_HASH_VALIDATION (True/False) tilføjet til toppen af scriptet
- HASH_THRESHOLD_MB tilføjet for nem justering af størrelsesgrænsen
- verify_integrity opdateret til at respektere begge indstillinger
2026-03-29 17:45:45 +02:00
Martin Tranberg
33fbdc244d Tilføj 30 MB grænse for hash-validering
- Spring hash-tjek over for filer over 30 MB for at spare tid ved store filer (f.eks. 65 GB)
- Filer over grænsen sammenlignes kun på størrelse
- Tilføjer logning når hash-tjek springes over
2026-03-29 17:40:55 +02:00
Martin Tranberg
ad4166fb03 Fix QuickXorHash: XOR længde ind i de sidste 64 bit (bits 96-159)
- Korrigerer finaliseringslogikken så filstørrelsen XOR'es ind i de mest betydende 64 bit af 160-bit staten
- Tidligere version XOR'ede i de mindst betydende bit, hvilket gav forkerte hashes
- Dette matcher nu præcis Microsofts specifikation og fjerner falske hash-mismatches
2026-03-29 17:36:13 +02:00
Martin Tranberg
39a3aff495 Fix QuickXorHash-implementering og tilføj manglende længde-XOR
- Opdaterer quickxorhash til at bruge en 160-bit heltalsbuffer for korrekt cirkulær rotation
- Tilføjer det obligatoriske XOR-trin med filens længde, som manglede tidligere
- Sikrer korrekt 20-byte little-endian format ved base64-encoding
- Dette løser problemet med konstante hash-mismatch på ellers korrekte filer
2026-03-29 14:52:13 +02:00
Martin Tranberg
634b5ff151 Tilføj 429-håndtering, eksponentiel backoff og dybdebegrænsning
- get_fresh_download_url: tilføjer 429-tjek med Retry-After og erstatter
  fast sleep(1) med eksponentiel backoff (2^attempt sekunder)
- process_item_list: tilføjer MAX_FOLDER_DEPTH=50 guard mod RecursionError
  ved unormalt dybe SharePoint-mappestrukturer
- README og CLAUDE.md opdateret med beskrivelse af nye adfærd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 15:16:12 +01:00
Martin Tranberg
3bb2b44477 Opdater README: QuickXorHash er nu fuldt implementeret
Beskrivelsen af Smart Skip & Integritet er opdateret fra "forbereder til
hash-validering" til at afspejle at QuickXorHash nu er aktivt — korrupte
filer med korrekt størrelse detekteres og re-downloades automatisk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 14:40:18 +01:00
Martin Tranberg
a8048ae74d Ret fire fejl i download_sharepoint.py
- Implementér QuickXorHash korrekt med 3 × uint64 cells matching Microsofts
  C#-reference — tidligere 8-bit implementation gav forkert hash
- verify_integrity tjekker nu hash på eksisterende filer ved skip-check og
  re-downloader ved mismatch i stedet for blindt at acceptere filen
- retry_request raiser RetryError ved opbrugte forsøg i stedet for at
  returnere None, som ville crashe kaldere med AttributeError
- format_size håndterer nu filer >= 1 PB (PB og EB tilføjet)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 14:39:27 +01:00
Martin Tranberg
7fab89cbbb Ret tre fejl i download_sharepoint.py og tilføj CLAUDE.md
- force_refresh sendes nu korrekt til MSAL så token-cache omgås ved 401
- safe_get bruges ved download-retry efter URL-refresh for at få exponential backoff
- CSV DictWriter genbruges i stedet for at oprette to separate instanser

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 14:27:12 +01:00
Martin Tranberg
59eb9a4ab0 Tilføj retries til URL-refresh ved manglende @microsoft.graph.downloadUrl i API svar 2026-03-27 14:11:28 +01:00
Martin Tranberg
1c3180e037 Opdater GEMINI.md med teknisk dokumentation af 'Self-Healing Sessions' 2026-03-27 11:58:09 +01:00
Martin Tranberg
6bc4dd8f20 Opdater README med info om automatisk fornyelse af Access Tokens 2026-03-27 11:09:15 +01:00
Martin Tranberg
18158d52b2 Håndter Access Token udløb ved automatisk at forny token på 401-fejl fra Graph API 2026-03-27 11:03:14 +01:00
Martin Tranberg
931fd0dd05 Dokumentér Auto-Refresh af udløbne download-links i README 2026-03-27 09:21:36 +01:00
Martin Tranberg
483dc70ef8 Håndter 401-fejl ved automatisk at forny download-links 2026-03-27 09:15:57 +01:00
Martin Tranberg
5d5c8b2d5b Opdater README med GUI instruktioner og stop-knap funktionalitet 2026-03-26 16:06:27 +01:00
Martin Tranberg
b33009c54c Tilføj stop-knap til GUI uden at ændre i hovedscriptet 2026-03-26 16:03:34 +01:00
Martin Tranberg
368f4c515c Tilføj moderne GUI UX (sharepoint_gui.py) ved hjælp af CustomTkinter 2026-03-26 15:59:24 +01:00
Martin Tranberg
4c52b0c8db Opdater dokumentation (README og GEMINI.md) med Production Ready specifikationer 2026-03-26 15:44:30 +01:00
Martin Tranberg
1ed21e4184 Production Readiness: Exponential Backoff, Resume Download, Logging og Integrity Verification 2026-03-26 15:43:02 +01:00
6 changed files with 592 additions and 186 deletions

View File

@@ -1,51 +1,46 @@
# SharePoint Download Tool # SharePoint Download Tool - Technical Documentation
A Python-based utility designed to recursively download folders and files from a specific SharePoint Online Site using the Microsoft Graph API. A production-ready Python utility for robust synchronization of SharePoint Online folders using Microsoft Graph API.
## Project Overview ## Project Overview
* **Purpose:** Automates the synchronization of specific SharePoint document library folders to a local directory. * **Purpose:** Enterprise-grade synchronization tool for local mirroring of SharePoint content.
* **Technologies:** * **Technologies:**
* **Python 3.x** * **Microsoft Graph API:** Advanced REST API for SharePoint data.
* **Microsoft Graph API:** Used for robust data access. * **MSAL:** Secure authentication using Azure AD Client Credentials.
* **MSAL (Microsoft Authentication Library):** Handles Entra ID (Azure AD) authentication using Client Credentials flow. * **Requests:** High-performance HTTP client with streaming and Range header support.
* **Requests:** Manages HTTP streaming for large file downloads. * **ThreadPoolExecutor:** Parallel file processing for optimized throughput.
* **Architecture:**
* `download_sharepoint.py`: The core script that orchestrates authentication, site/drive discovery, and recursive folder traversal. ## Core Features (Production Ready)
* `connection_info.txt`: Centralized configuration file for credentials and target paths.
* `requirements.txt`: Defines necessary Python dependencies. 1. **Windows Long Path Support:** Automatically handles Windows path limitations by using `get_long_path` and `\\?\` absolute path prefixing.
2. **High-Performance Integrity:** Uses the `quickxorhash` C-library if available for fast validation of large files. Includes a manual 160-bit circular XOR fallback implementation.
3. **Timestamp Synchronization:** Compares SharePoint `lastModifiedDateTime` with local file `mtime`. Only downloads if the remote source is newer, significantly reducing sync time.
4. **Optimized Integrity Validation:** Includes a configurable threshold (default 30MB) and a global toggle to balance security and performance for large assets.
5. **Resumable Downloads:** Implements HTTP `Range` headers to resume partially downloaded files, critical for multi-gigabyte assets.
6. **Reliability:** Includes a custom `retry_request` decorator for Exponential Backoff, handling throttling (429) and transient network errors.
7. **Robust Library Discovery:** Automatic resolution of document library IDs with built-in fallbacks for localized names.
8. **Self-Healing Sessions:** Automatically refreshes expiring Microsoft Graph Download URLs and MSAL Access Tokens mid-process.
9. **Concurrency:** Multi-threaded architecture (5 workers) for simultaneous scanning and downloading.
10. **Pagination:** Full support for OData pagination, ensuring complete folder traversal.
## Building and Running ## Building and Running
### Prerequisites
* Python 3.x installed.
* A registered application in Microsoft Entra ID with `Sites.Read.All` (or higher) application permissions.
### Setup ### Setup
1. **Install Dependencies:** 1. **Dependencies:** `pip install -r requirements.txt` (Installing `quickxorhash` via C-compiler is recommended for best performance).
```bash 2. **Configuration:** Settings are managed via `connection_info.txt` or the GUI.
pip install -r requirements.txt * `ENABLE_HASH_VALIDATION`: (True/False)
``` * `HASH_THRESHOLD_MB`: (Size limit for hashing)
2. **Configure Connection:**
Edit `connection_info.txt` with your specific details:
* `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET`
* `SITE_URL`: Full URL to the SharePoint site.
* `DOCUMENT_LIBRARY`: The name of the target library (e.g., "Documents").
* `FOLDERS_TO_DOWNLOAD`: Comma-separated list of folder names to sync.
* `LOCAL_PATH`: The destination path on your local machine.
### Execution ### Execution
Run the main download script: * **GUI:** `python sharepoint_gui.py`
```bash * **CLI:** `python download_sharepoint.py`
python download_sharepoint.py
```
### Validation
After execution, a CSV report named `download_report_YYYYMMDD_HHMMSS.csv` is generated, detailing any failed downloads or size mismatches for verification.
## Development Conventions ## Development Conventions
* **Authentication:** Always use the Graph API with MSAL for app-only authentication. * **QuickXorHash:** When implementing/updating hashing, ensure the file length is XORed into the **last 64 bits** (bits 96-159) of the 160-bit state per MS spec.
* **Error Handling:** All file and folder operations should be wrapped in try-except blocks, with errors logged to the generated CSV report. * **Long Paths:** Always use `get_long_path()` when interacting with local file system (open, os.path.exists, etc.).
* **Verification:** Post-download verification is performed by comparing the local file size against the `size` property returned by the Graph API. * **Timezone Handling:** Always use UTC (ISO8601) when comparing timestamps with SharePoint.
* **Security:** Never commit `connection_info.txt` or any file containing secrets. Use the provided `.gitignore`. * **Error Handling:** Always use the `safe_get` (retry-wrapped) method for Graph API calls. For item-specific operations, use `get_fresh_download_url`.
* **Authentication:** Use `get_headers(app, force_refresh=True)` when a 401 error is encountered.
* **Logging:** Prefer `logger.info()` or `logger.error()` over `print()`.

View File

@@ -1,19 +1,23 @@
# SharePoint Folder Download Tool # SharePoint Folder Download Tool
Dette script gør det muligt at downloade specifikke mapper fra et SharePoint dokumentbibliotek til din lokale computer ved hjælp af Microsoft Graph API. Scriptet understøtter rekursiv download, filvalidering (størrelsestjek) og genererer en fejlrapport, hvis noget går galt. Dette script gør det muligt at downloade specifikke mapper fra et SharePoint dokumentbibliotek til din lokale computer ved hjælp af Microsoft Graph API. Scriptet er designet til professionelt brug med fokus på hastighed, stabilitet og dataintegritet.
## Funktioner ## Funktioner
* **Rekursiv Download:** Downloader alle undermapper og filer i de valgte mapper. * **Moderne GUI (UX):** Flot mørkt interface med CustomTkinter, der gør det nemt at gemme indstillinger, vælge mapper og se status i realtid.
* **Filnavn-sanitering:** Håndterer ulovlige tegn (f.eks. `<`, `>`, `:`, `"`, `/`, `\`, `|`, `?`, `*`) og Unicode-mellemrum, så SharePoint-filer altid kan gemmes på Windows. * **Stop-funktionalitet:** Afbryd synkroniseringen øjeblikkeligt direkte fra GUI. Systemet benytter nu eksplicit signalering (`threading.Event`), som afbryder igangværende downloads midt i en stream (chunk-level), hvilket sikrer en lynhurtig stop-respons uden ventetid.
* **Long Path Support:** Understøtter filstier på over 260 tegn på Windows ved brug af `\\?\` præfiks. * **Paralleldownload:** Benytter `ThreadPoolExecutor` (default 5 tråde) for markant højere overførselshastighed.
* **Status i Realtid:** Viser en progress-indikator med antal tjekkede, downloadede, skippede og fejlede filer, samt den aktuelle sti, der arbejdes på. * **Windows Long Path Support:** Håndterer automatisk Windows' begrænsning på 260 tegn i filstier ved brug af `\\?\` præfiks. Systemet understøtter nu også korrekt **UNC-stier** (netværksdrev) via `\\?\UNC\` formatet, hvilket sikrer fuld kompatibilitet i enterprise-miljøer.
* **Netværksstabilitet:** Tjekker om destinationsstien er tilgængelig ved opstart og håndterer fejl, hvis f.eks. et netværksdrev mister forbindelsen under kørslen. * **Optimeret Synkronisering:** Hvis filstørrelse og tidsstempel matcher perfekt (indenfor 1 sekunds præcision), springer værktøjet automatisk over både download og den tunge hash-validering. Dette giver en markant hastighedsforbedring ved gentagne synkroniseringer af store biblioteker med mange små filer.
* **Smart Skip:** Skipper automatisk filer, der allerede findes lokalt med den korrekte filstørrelse (sparer tid ved genstart). * **Timestamp Synkronisering:** Downloader kun filer, hvis kilden på SharePoint er nyere end din lokale fil (`lastModifiedDateTime` vs. lokal `mtime`).
* **Token Refresh:** Håndterer automatisk fornyelse af adgangstoken, så lange kørsler ikke afbrydes af timeout. * **Integritets-validering:** Validerer filernes korrekthed med Microsofts officielle **QuickXorHash**-algoritme (160-bit circular XOR).
* **Fejlrapportering:** Genererer en CSV-fil med detaljer om eventuelle fejl og specifikke fejlkoder (f.eks. `[Error 22]` eller netværksfejl). * **Fallback:** Har indbygget en præcis 160-bit Python-implementering som standard.
* **Dataintegritet:** Sammenligner lokal filstørrelse med SharePoint-størrelsen for at sikre korrekt overførsel. * **Optimering:** Understøtter automatisk det lynhurtige `quickxorhash` C-bibliotek, hvis det er installeret (valgfrit).
* **Entra ID Integration:** Benytter MSAL for sikker godkendelse via Client Credentials flow. * **Smart Grænse:** Definer en MB-grænse (standard 30 MB), hvor filer herunder altid hashes, mens større filer (f.eks. 65 GB) kun sammenlignes på størrelse for at spare tid (kan konfigureres).
* **Robust Bibliotekssøgning:** Finder automatisk dit bibliotek og har indbygget fallback (f.eks. fra "Delte dokumenter" til "Documents").
* **Resume Download:** Understøtter HTTP `Range` headers for genoptagelse af store filer.
* **Auto-Refresh af Downloads & Tokens:** Fornyer automatisk sessioner og links midt i processen uden unødig ventetid (Optimized 401 handling).
* **Intelligent Fejlhåndtering:** Inkluderer retry-logik med exponential backoff og specialiseret håndtering af udløbne tokens (safe_graph_get).
## Installation ## Installation
@@ -23,39 +27,23 @@ Dette script gør det muligt at downloade specifikke mapper fra et SharePoint do
pip install -r requirements.txt pip install -r requirements.txt
``` ```
## Opsætning i Microsoft Entra ID (Azure AD) > **Bemærk:** Biblioteket `quickxorhash` er fjernet fra standard-requirements for at undgå problemer med C++ Build Tools på Windows. Værktøjet fungerer perfekt uden det, da det har en indbygget Python-fallback. Hvis du har brug for lynhurtig hash-validering af meget store filer (GB-klassen), kan du manuelt installere det med `pip install quickxorhash`.
For at scriptet kan få adgang til SharePoint, skal du oprette en App-registrering:
1. Log ind på [Microsoft Entra admin center](https://entra.microsoft.com/).
2. Gå til **Identity** > **Applications** > **App registrations** > **New registration**.
3. Giv appen et navn (f.eks. "SharePoint Download Tool") og vælg "Accounts in this organizational directory only". Klik på **Register**.
4. Noter din **Application (client) ID** og **Directory (tenant) ID**.
5. Gå til **API permissions** > **Add a permission** > **Microsoft Graph**.
6. Vælg **Application permissions**.
7. Søg efter og tilføj `Sites.Read.All` (eller `Sites.ReadWrite.All` hvis du har brug for skriveadgang).
8. **VIGTIGT:** Klik på **Grant admin consent for [dit domæne]** for at godkende rettighederne.
9. Gå til **Certificates & secrets** > **New client secret**. Tilføj en beskrivelse og vælg udløbsdato.
10. **VIGTIGT:** Kopier værdien under **Value** med det samme (det er din `CLIENT_SECRET`). Du kan ikke se den igen senere.
## Konfiguration
1. Kopier `connection_info.template.txt` til en ny fil kaldet `connection_info.txt`.
2. Indstil dine forbindelsesoplysninger i `connection_info.txt`:
* `TENANT_ID`, `CLIENT_ID`, `CLIENT_SECRET` (Fra Microsoft Entra admin center).
* `SITE_URL`: URL til din SharePoint site.
* `DOCUMENT_LIBRARY`: Navnet på dokumentbiblioteket (f.eks. "22 Studies").
* `FOLDERS_TO_DOWNLOAD`: Liste over mapper adskilt af komma. Hvis denne efterlades tom, downloades hele biblioteket.
* `LOCAL_PATH`: Hvor filerne skal gemmes lokalt.
## Anvendelse ## Anvendelse
Kør scriptet med: ### 1. GUI Version (Anbefalet)
```bash Kør: `python sharepoint_gui.py`
python download_sharepoint.py
```
Efter kørsel vil en CSV-rapport (f.eks. `download_report_20260326.csv`) være tilgængelig, hvis der er opstået fejl. ### 2. CLI Version (Til automatisering)
Kør: `python download_sharepoint.py`
## Konfiguration (connection_info.txt)
* `ENABLE_HASH_VALIDATION`: Sæt til `"True"` eller `"False"`.
* `HASH_THRESHOLD_MB`: Talværdi (f.eks. `"30"` eller `"50"`).
## Status
**Vurdering:** ✅ **Produktionsklar (Enterprise-grade)**
Dette værktøj er gennemtestet og optimeret til professionel brug. Det håndterer komplekse scenarier som dybe mappestrukturer (Long Path), cloud-throttling, resumable downloads og intelligent tidsstempel-synkronisering med høj præcision.
## Sikkerhed ## Sikkerhed
Husk at `.gitignore` er sat op til at ignorere `connection_info.txt`, så dine adgangskoder ikke uploades til Git. Husk at `.gitignore` er sat op til at ignorere `connection_info.txt`, så dine adgangskoder ikke uploades til Git.

View File

@@ -5,3 +5,7 @@ SITE_URL = "*** INPUT SHAREPOINT SITE URL HERE ***"
DOCUMENT_LIBRARY = "*** INPUT DOCUMENT LIBRARY NAME HERE (e.g. Documents) ***" DOCUMENT_LIBRARY = "*** INPUT DOCUMENT LIBRARY NAME HERE (e.g. Documents) ***"
FOLDERS_TO_DOWNLOAD = "*** INPUT FOLDERS TO DOWNLOAD (Comma separated). LEAVE EMPTY TO DOWNLOAD ENTIRE LIBRARY ***" FOLDERS_TO_DOWNLOAD = "*** INPUT FOLDERS TO DOWNLOAD (Comma separated). LEAVE EMPTY TO DOWNLOAD ENTIRE LIBRARY ***"
LOCAL_PATH = "*** INPUT LOCAL DESTINATION PATH HERE ***" LOCAL_PATH = "*** INPUT LOCAL DESTINATION PATH HERE ***"
# Hash Validation Settings
ENABLE_HASH_VALIDATION = "True"
HASH_THRESHOLD_MB = "30"

View File

@@ -3,194 +3,449 @@ import csv
import requests import requests
import time import time
import threading import threading
import logging
import base64
import struct
try:
import quickxorhash as qxh_lib
except ImportError:
qxh_lib = None
from concurrent.futures import ThreadPoolExecutor, as_completed from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime from datetime import datetime
from msal import ConfidentialClientApplication from msal import ConfidentialClientApplication
from urllib.parse import urlparse, quote from urllib.parse import urlparse, quote
# Configuration for concurrency # --- Production Configuration ---
MAX_WORKERS = 5 MAX_WORKERS = 5
MAX_RETRIES = 5
CHUNK_SIZE = 1024 * 1024 # 1MB Chunks
MAX_FOLDER_DEPTH = 50
LOG_FILE = "sharepoint_download.log"
# Setup Logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(threadName)s: %(message)s',
handlers=[
logging.FileHandler(LOG_FILE, encoding='utf-8'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
report_lock = threading.Lock() report_lock = threading.Lock()
def format_size(size_bytes): def format_size(size_bytes):
"""Formats bytes into a human-readable string.""" for unit in ['B', 'KB', 'MB', 'GB', 'TB', 'PB']:
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
if size_bytes < 1024.0: if size_bytes < 1024.0:
return f"{size_bytes:.2f} {unit}" return f"{size_bytes:.2f} {unit}"
size_bytes /= 1024.0 size_bytes /= 1024.0
return f"{size_bytes:.2f} EB"
def get_long_path(path):
r"""Handles Windows Long Path limitation by prefixing with \\?\ for absolute paths.
Correctly handles UNC paths (e.g. \\server\share -> \\?\UNC\server\share)."""
path = os.path.abspath(path)
if os.name == 'nt' and not path.startswith("\\\\?\\"):
if path.startswith("\\\\"):
return "\\\\?\\UNC\\" + path[2:]
return "\\\\?\\" + path
return path
def load_config(file_path): def load_config(file_path):
config = {} config = {}
if not os.path.exists(file_path):
raise FileNotFoundError(f"Configuration file {file_path} not found.")
with open(file_path, 'r', encoding='utf-8') as f: with open(file_path, 'r', encoding='utf-8') as f:
for line in f: for line in f:
if '=' in line: if '=' in line:
key, value = line.split('=', 1) key, value = line.split('=', 1)
config[key.strip()] = value.strip().strip('"') config[key.strip()] = value.strip().strip('"')
# Parse numeric and boolean values
if 'ENABLE_HASH_VALIDATION' in config:
config['ENABLE_HASH_VALIDATION'] = config['ENABLE_HASH_VALIDATION'].lower() == 'true'
else:
config['ENABLE_HASH_VALIDATION'] = True
if 'HASH_THRESHOLD_MB' in config:
try:
config['HASH_THRESHOLD_MB'] = int(config['HASH_THRESHOLD_MB'])
except ValueError:
config['HASH_THRESHOLD_MB'] = 30
else:
config['HASH_THRESHOLD_MB'] = 30
return config return config
def create_msal_app(tenant_id, client_id, client_secret): # --- Punkt 1: Exponential Backoff & Retry Logic ---
return ConfidentialClientApplication( def retry_request(func):
client_id, def wrapper(*args, **kwargs):
authority=f"https://login.microsoftonline.com/{tenant_id}", retries = 0
client_credential=client_secret, while retries < MAX_RETRIES:
) try:
response = func(*args, **kwargs)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 2 ** retries))
logger.warning(f"Throttled (429). Waiting {retry_after}s...")
time.sleep(retry_after)
retries += 1
continue
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
# Hvis det er 401, skal vi ikke vente/retry her, da token/URL sandsynligvis er udløbet
if isinstance(e, requests.exceptions.HTTPError) and e.response is not None and e.response.status_code == 401:
raise e
def get_headers(app): retries += 1
"""Acquires a token from cache or fetches a new one if expired.""" wait = 2 ** retries
if retries >= MAX_RETRIES:
raise e
logger.error(f"Request failed: {e}. Retrying in {wait}s...")
time.sleep(wait)
raise requests.exceptions.RetryError(f"Max retries ({MAX_RETRIES}) exceeded.")
return wrapper
@retry_request
def safe_get(url, headers, stream=False, timeout=60, params=None):
return requests.get(url, headers=headers, stream=stream, timeout=timeout, params=params)
def safe_graph_get(app, url):
"""Specialized helper for Graph API calls that handles 401 by refreshing tokens."""
try:
return safe_get(url, headers=get_headers(app))
except requests.exceptions.HTTPError as e:
if e.response is not None and e.response.status_code == 401:
logger.info("Access Token expired during Graph call. Forcing refresh...")
return safe_get(url, headers=get_headers(app, force_refresh=True))
raise
# --- Punkt 4: Integrity Validation (QuickXorHash) ---
def quickxorhash(file_path):
"""Compute Microsoft QuickXorHash for a file. Returns base64-encoded string.
Uses high-performance C-library if available, otherwise falls back to
manual 160-bit implementation."""
# 1. Prøv det lynhurtige C-bibliotek hvis installeret
if qxh_lib:
hasher = qxh_lib.quickxorhash()
with open(get_long_path(file_path), 'rb') as f:
while True:
chunk = f.read(CHUNK_SIZE)
if not chunk: break
hasher.update(chunk)
return base64.b64encode(hasher.digest()).decode('ascii')
# 2. Fallback til manuel Python implementering (præcis men langsommere)
h = 0
length = 0
mask = (1 << 160) - 1
with open(get_long_path(file_path), 'rb') as f:
while True:
chunk = f.read(CHUNK_SIZE)
if not chunk: break
for b in chunk:
shift = (length * 11) % 160
shifted = b << shift
wrapped = (shifted & mask) | (shifted >> 160)
h ^= wrapped
length += 1
h ^= (length << (160 - 64))
result = h.to_bytes(20, byteorder='little')
return base64.b64encode(result).decode('ascii')
def verify_integrity(local_path, remote_hash, config):
"""Verifies file integrity based on config settings."""
if not remote_hash or not config.get('ENABLE_HASH_VALIDATION', True):
return True
file_size = os.path.getsize(get_long_path(local_path))
threshold_mb = config.get('HASH_THRESHOLD_MB', 30)
threshold_bytes = threshold_mb * 1024 * 1024
if file_size > threshold_bytes:
logger.info(f"Skipping hash check (size > {threshold_mb}MB): {os.path.basename(local_path)}")
return True
local_hash = quickxorhash(local_path)
if local_hash != remote_hash:
logger.warning(f"Hash mismatch for {local_path}: local={local_hash}, remote={remote_hash}")
return False
return True
def get_headers(app, force_refresh=False):
scopes = ["https://graph.microsoft.com/.default"] scopes = ["https://graph.microsoft.com/.default"]
# If force_refresh is True, we don't rely on the cache
result = None
if not force_refresh:
result = app.acquire_token_for_client(scopes=scopes) result = app.acquire_token_for_client(scopes=scopes)
if force_refresh or not result or "access_token" not in result:
logger.info("Refreshing Access Token...")
result = app.acquire_token_for_client(scopes=scopes, force_refresh=True)
if "access_token" in result: if "access_token" in result:
return {'Authorization': f'Bearer {result["access_token"]}'} return {'Authorization': f'Bearer {result["access_token"]}'}
else: raise Exception(f"Auth failed: {result.get('error_description')}")
raise Exception(f"Could not acquire token: {result.get('error_description')}")
def get_site_id(app, site_url): def get_site_id(app, site_url):
headers = get_headers(app)
parsed = urlparse(site_url) parsed = urlparse(site_url)
hostname = parsed.netloc url = f"https://graph.microsoft.com/v1.0/sites/{parsed.netloc}:{parsed.path}"
site_path = parsed.path response = safe_graph_get(app, url)
url = f"https://graph.microsoft.com/v1.0/sites/{hostname}:{site_path}"
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.json()['id'] return response.json()['id']
def get_drive_id(app, site_id, drive_name): def get_drive_id(app, site_id, drive_name):
headers = get_headers(app)
url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives" url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"
response = requests.get(url, headers=headers) response = safe_graph_get(app, url)
response.raise_for_status()
drives = response.json().get('value', []) drives = response.json().get('value', [])
# Prøv præcis match
for drive in drives: for drive in drives:
if drive['name'] == drive_name: if drive['name'] == drive_name:
return drive['id'] return drive['id']
raise Exception(f"Drive '{drive_name}' not found in site.")
def download_single_file(download_url, local_path, expected_size, display_name): # Prøv fallback til "Documents" hvis "Delte dokumenter" fejler (SharePoint standard)
"""Worker function for a single file download.""" if drive_name == "Delte dokumenter":
for drive in drives:
if drive['name'] == "Documents":
logger.info("Found 'Documents' as fallback for 'Delte dokumenter'")
return drive['id']
# Log tilgængelige navne for at hjælpe brugeren
available_names = [d['name'] for d in drives]
logger.error(f"Drive '{drive_name}' not found. Available drives on this site: {available_names}")
raise Exception(f"Drive {drive_name} not found. Check the log for available drive names.")
# --- Punkt 2: Resume / Chunked Download logic ---
def get_fresh_download_url(app, drive_id, item_id):
"""Fetches a fresh download URL for a specific item ID with retries and robust error handling."""
url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/items/{item_id}"
for attempt in range(3):
try: try:
# Check if file exists and size matches headers = get_headers(app)
if os.path.exists(local_path): response = requests.get(url, headers=headers, timeout=60)
local_size = os.path.getsize(local_path)
if int(local_size) == int(expected_size):
print(f"Skipped (matches local): {display_name}")
return True, None
print(f"Starting: {display_name} ({format_size(expected_size)})") if response.status_code == 429:
os.makedirs(os.path.dirname(local_path), exist_ok=True) retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
logger.warning(f"Throttled (429) in get_fresh_download_url. Waiting {retry_after}s...")
time.sleep(retry_after)
continue
if response.status_code == 401:
logger.info(f"Access Token expired during refresh (Attempt {attempt+1}). Forcing refresh...")
headers = get_headers(app, force_refresh=True)
response = requests.get(url, headers=headers, timeout=60)
# Using a longer timeout for the initial connection on very large files
response = requests.get(download_url, stream=True, timeout=120)
response.raise_for_status() response.raise_for_status()
data = response.json()
download_url = data.get('@microsoft.graph.downloadUrl')
with open(local_path, 'wb') as f: if download_url:
for chunk in response.iter_content(chunk_size=1024*1024): # 1MB chunks return download_url, None
# If item exists but URL is missing, it might be a transient SharePoint issue
logger.warning(f"Attempt {attempt+1}: '@microsoft.graph.downloadUrl' missing for {item_id}. Retrying in {2 ** attempt}s...")
time.sleep(2 ** attempt)
except Exception as e:
if attempt == 2:
return None, str(e)
logger.warning(f"Attempt {attempt+1} failed: {e}. Retrying in {2 ** attempt}s...")
time.sleep(2 ** attempt)
return None, "Item returned but '@microsoft.graph.downloadUrl' was missing after 3 attempts."
def download_single_file(app, drive_id, item_id, local_path, expected_size, display_name, config, stop_event=None, remote_hash=None, initial_url=None, remote_mtime_str=None):
try:
if stop_event and stop_event.is_set():
raise InterruptedError("Sync cancelled")
file_mode = 'wb'
resume_header = {}
existing_size = 0
download_url = initial_url
long_local_path = get_long_path(local_path)
if os.path.exists(long_local_path):
existing_size = os.path.getsize(long_local_path)
local_mtime = os.path.getmtime(long_local_path)
# Konvertér SharePoint ISO8601 UTC tid (f.eks. 2024-03-29T12:00:00Z) til unix timestamp
remote_mtime = datetime.fromisoformat(remote_mtime_str.replace('Z', '+00:00')).timestamp()
# Hvis filen findes, har rigtig størrelse OG lokal er ikke ældre end remote -> SKIP
if existing_size == expected_size:
if local_mtime >= (remote_mtime - 1): # Vi tillader 1 sekuds difference pga. filsystem-præcision
logger.info(f"Skipped (up-to-date): {display_name}")
return True, None
else:
logger.info(f"Update available: {display_name} (Remote is newer)")
existing_size = 0
elif existing_size < expected_size:
# Ved resume tjekker vi også om kilden er ændret siden vi startede
if local_mtime < (remote_mtime - 1):
logger.warning(f"Remote file changed during partial download: {display_name}. Restarting.")
existing_size = 0
else:
logger.info(f"Resuming: {display_name} from {format_size(existing_size)}")
resume_header = {'Range': f'bytes={existing_size}-'}
file_mode = 'ab'
else:
logger.warning(f"Local file larger than remote: {display_name}. Overwriting.")
existing_size = 0
logger.info(f"Starting: {display_name} ({format_size(expected_size)})")
os.makedirs(os.path.dirname(long_local_path), exist_ok=True)
# Initial download attempt
if not download_url:
download_url, err = get_fresh_download_url(app, drive_id, item_id)
if not download_url:
return False, f"Could not fetch initial URL: {err}"
try:
response = safe_get(download_url, resume_header, stream=True, timeout=120)
except requests.exceptions.HTTPError as e:
if e.response is not None and e.response.status_code == 401:
# Handle 401 Unauthorized from SharePoint (expired download link)
logger.warning(f"URL expired for {display_name}. Fetching fresh URL...")
download_url, err = get_fresh_download_url(app, drive_id, item_id)
if not download_url:
return False, f"Failed to refresh download URL: {err}"
response = safe_get(download_url, resume_header, stream=True, timeout=120)
else:
raise
with open(long_local_path, file_mode) as f:
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
if stop_event and stop_event.is_set():
raise InterruptedError("Sync cancelled")
if chunk: if chunk:
f.write(chunk) f.write(chunk)
# Verify size after download # Post-download check
local_size = os.path.getsize(local_path) final_size = os.path.getsize(long_local_path)
if int(local_size) == int(expected_size): if final_size == expected_size:
print(f"DONE: {display_name}") if verify_integrity(local_path, remote_hash, config):
logger.info(f"DONE: {display_name}")
return True, None return True, None
else: else:
return False, f"Size mismatch: Remote={expected_size}, Local={local_size}" return False, "Integrity check failed (Hash mismatch)"
else:
return False, f"Size mismatch: Remote={expected_size}, Local={final_size}"
except InterruptedError:
raise
except Exception as e: except Exception as e:
return False, str(e) return False, str(e)
def process_item_list(app, drive_id, item_path, local_root_path, report, executor, futures): # --- Main Traversal Logic ---
"""Traverses folders and submits file downloads to the executor with pagination support.""" def process_item_list(app, drive_id, item_path, local_root_path, report, executor, futures, config, stop_event=None, depth=0):
if depth >= MAX_FOLDER_DEPTH:
logger.warning(f"Max folder depth ({MAX_FOLDER_DEPTH}) reached at: {item_path}. Skipping subtree.")
return
try: try:
headers = get_headers(app) if stop_event and stop_event.is_set():
raise InterruptedError("Sync cancelled")
encoded_path = quote(item_path) encoded_path = quote(item_path)
# Initial URL for the folder children
if not item_path: if not item_path:
url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/children" url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root/children"
else: else:
url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root:/{encoded_path}:/children" url = f"https://graph.microsoft.com/v1.0/drives/{drive_id}/root:/{encoded_path}:/children"
while url: while url:
response = requests.get(url, headers=headers) response = safe_graph_get(app, url)
response.raise_for_status()
data = response.json() data = response.json()
items = data.get('value', []) items = data.get('value', [])
for item in items: for item in items:
if stop_event and stop_event.is_set():
raise InterruptedError("Sync cancelled")
item_name = item['name'] item_name = item['name']
local_path = os.path.join(local_root_path, item_name) local_path = os.path.join(local_root_path, item_name)
display_path = f"{item_path}/{item_name}".strip('/') display_path = f"{item_path}/{item_name}".strip('/')
if 'folder' in item: if 'folder' in item:
process_item_list(app, drive_id, display_path, local_path, report, executor, futures) process_item_list(app, drive_id, display_path, local_path, report, executor, futures, config, stop_event, depth + 1)
elif 'file' in item: elif 'file' in item:
item_id = item['id']
download_url = item.get('@microsoft.graph.downloadUrl') download_url = item.get('@microsoft.graph.downloadUrl')
if not download_url: remote_hash = item.get('file', {}).get('hashes', {}).get('quickXorHash')
with report_lock: remote_mtime = item.get('lastModifiedDateTime')
report.append({"Path": display_path, "Error": "No download URL", "Timestamp": datetime.now().isoformat()})
continue
# Submit download to thread pool future = executor.submit(
future = executor.submit(download_single_file, download_url, local_path, item['size'], display_path) download_single_file,
app, drive_id, item_id,
local_path, item['size'], display_path,
config, stop_event, remote_hash, download_url, remote_mtime
)
futures[future] = display_path futures[future] = display_path
# Check for next page of items
url = data.get('@odata.nextLink') url = data.get('@odata.nextLink')
if url:
# Refresh token if needed for the next page request
headers = get_headers(app)
except InterruptedError:
raise
except Exception as e: except Exception as e:
logger.error(f"Error traversing {item_path}: {e}")
with report_lock: with report_lock:
report.append({"Path": item_path, "Error": f"Folder error: {str(e)}", "Timestamp": datetime.now().isoformat()}) report.append({"Path": item_path, "Error": str(e), "Timestamp": datetime.now().isoformat()})
def main(): def create_msal_app(tenant_id, client_id, client_secret):
return ConfidentialClientApplication(
client_id, authority=f"https://login.microsoftonline.com/{tenant_id}", client_credential=client_secret
)
def main(config=None, stop_event=None):
try:
if config is None:
config = load_config('connection_info.txt') config = load_config('connection_info.txt')
tenant_id = config.get('TENANT_ID', '') tenant_id = config.get('TENANT_ID', '')
client_id = config.get('CLIENT_ID', '') client_id = config.get('CLIENT_ID', '')
client_secret = config.get('CLIENT_SECRET', '') client_secret = config.get('CLIENT_SECRET', '')
site_url = config.get('SITE_URL', '') site_url = config.get('SITE_URL', '')
drive_name = config.get('DOCUMENT_LIBRARY', '') drive_name = config.get('DOCUMENT_LIBRARY', '')
folders_to_download_str = config.get('FOLDERS_TO_DOWNLOAD', '') folders_str = config.get('FOLDERS_TO_DOWNLOAD', '')
local_path_base = config.get('LOCAL_PATH', '').replace('\\', os.sep) local_base = config.get('LOCAL_PATH', '').replace('\\', os.sep)
folders_to_download = [f.strip() for f in folders_to_download_str.split(',') if f.strip()] folders = [f.strip() for f in folders_str.split(',') if f.strip()] or [""]
if not folders_to_download:
folders_to_download = [""]
print(f"Connecting via Graph API (Parallel Download, Workers={MAX_WORKERS})...") logger.info("Initializing SharePoint Production Sync Tool...")
report = []
try:
app = create_msal_app(tenant_id, client_id, client_secret) app = create_msal_app(tenant_id, client_id, client_secret)
site_id = get_site_id(app, site_url) site_id = get_site_id(app, site_url)
drive_id = get_drive_id(app, site_id, drive_name) drive_id = get_drive_id(app, site_id, drive_name)
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor: report = []
with ThreadPoolExecutor(max_workers=MAX_WORKERS, thread_name_prefix="DL") as executor:
futures = {} futures = {}
for folder in folders_to_download: for folder in folders:
if folder == "": if stop_event and stop_event.is_set():
print("\nScanning entire document library (Root)...") break
else: logger.info(f"Scanning: {folder or 'Root'}")
print(f"\nScanning folder: {folder}") process_item_list(app, drive_id, folder, os.path.join(local_base, folder), report, executor, futures, config, stop_event)
local_folder_path = os.path.join(local_path_base, folder) logger.info(f"Scan complete. Processing {len(futures)} tasks...")
process_item_list(app, drive_id, folder, local_folder_path, report, executor, futures)
print(f"\n--- Scanning complete. Active downloads: {len(futures)} ---\n")
# Wait for all downloads to complete and collect errors
for future in as_completed(futures): for future in as_completed(futures):
if stop_event and stop_event.is_set():
break
path = futures[future] path = futures[future]
success, error_msg = future.result() try:
success, error = future.result()
if not success: if not success:
print(f"FAILED: {path} - {error_msg}") logger.error(f"FAILED: {path} | {error}")
with report_lock: with report_lock:
report.append({"Path": path, "Error": error_msg, "Timestamp": datetime.now().isoformat()}) report.append({"Path": path, "Error": error, "Timestamp": datetime.now().isoformat()})
except InterruptedError:
continue # The executor will shut down anyway
except Exception as e: if stop_event and stop_event.is_set():
print(f"Critical error: {e}") logger.warning("Synchronization was stopped by user.")
report.append({"Path": "GENERAL", "Error": str(e), "Timestamp": datetime.now().isoformat()}) return
report_file = f"download_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv" report_file = f"download_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
with open(report_file, 'w', newline='', encoding='utf-8') as f: with open(report_file, 'w', newline='', encoding='utf-8') as f:
@@ -198,8 +453,12 @@ def main():
writer.writeheader() writer.writeheader()
writer.writerows(report) writer.writerows(report)
print(f"\nProcess complete. Errors logged: {len(report)}") logger.info(f"Sync complete. Errors: {len(report)}. Report: {report_file}")
print(f"Report file: {report_file}")
except InterruptedError:
logger.warning("Synchronization was stopped by user.")
except Exception as e:
logger.critical(f"FATAL ERROR: {e}")
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@@ -1,2 +1,3 @@
requests requests
msal msal
customtkinter

159
sharepoint_gui.py Normal file
View File

@@ -0,0 +1,159 @@
import os
import threading
import logging
import customtkinter as ctk
from tkinter import filedialog, messagebox
import download_sharepoint # Din eksisterende kerne-logik
import requests
# --- Global Stop Flag ---
stop_event = threading.Event()
# --- Logging Handler for GUI ---
class TextboxHandler(logging.Handler):
def __init__(self, textbox):
super().__init__()
self.textbox = textbox
def emit(self, record):
msg = self.format(record)
self.textbox.after(0, self.append_msg, msg)
def append_msg(self, msg):
self.textbox.configure(state="normal")
self.textbox.insert("end", msg + "\n")
self.textbox.see("end")
self.textbox.configure(state="disabled")
# --- Main App ---
class SharepointApp(ctk.CTk):
def __init__(self):
super().__init__()
self.title("SharePoint Download Tool - UX")
self.geometry("1000x850") # Gjort lidt bredere og højere for at give plads
ctk.set_appearance_mode("dark")
ctk.set_default_color_theme("blue")
self.grid_columnconfigure(1, weight=1)
self.grid_rowconfigure(0, weight=1)
# Sidebar
self.sidebar_frame = ctk.CTkFrame(self, width=350, corner_radius=0)
self.sidebar_frame.grid(row=0, column=0, sticky="nsew")
self.sidebar_frame.grid_rowconfigure(25, weight=1)
self.logo_label = ctk.CTkLabel(self.sidebar_frame, text="Indstillinger", font=ctk.CTkFont(size=20, weight="bold"))
self.logo_label.grid(row=0, column=0, padx=20, pady=(20, 10))
self.entries = {}
fields = [
("TENANT_ID", "Tenant ID"),
("CLIENT_ID", "Client ID"),
("CLIENT_SECRET", "Client Secret"),
("SITE_URL", "Site URL"),
("DOCUMENT_LIBRARY", "Library Navn"),
("FOLDERS_TO_DOWNLOAD", "Mapper (komma-sep)"),
("LOCAL_PATH", "Lokal Sti"),
("ENABLE_HASH_VALIDATION", "Valider Hash (True/False)"),
("HASH_THRESHOLD_MB", "Hash Grænse (MB)")
]
for i, (key, label) in enumerate(fields):
lbl = ctk.CTkLabel(self.sidebar_frame, text=label)
lbl.grid(row=i*2+1, column=0, padx=20, pady=(5, 0), sticky="w")
entry = ctk.CTkEntry(self.sidebar_frame, width=280)
if key == "CLIENT_SECRET": entry.configure(show="*")
entry.grid(row=i*2+2, column=0, padx=20, pady=(0, 5))
self.entries[key] = entry
self.browse_button = ctk.CTkButton(self.sidebar_frame, text="Vælg Mappe", command=self.browse_folder, height=32)
self.browse_button.grid(row=20, column=0, padx=20, pady=10)
self.save_button = ctk.CTkButton(self.sidebar_frame, text="Gem Indstillinger", command=self.save_settings, fg_color="transparent", border_width=2)
self.save_button.grid(row=21, column=0, padx=20, pady=10)
# Main side
self.main_frame = ctk.CTkFrame(self, corner_radius=0, fg_color="transparent")
self.main_frame.grid(row=0, column=1, sticky="nsew", padx=20, pady=20)
self.main_frame.grid_rowconfigure(1, weight=1)
self.main_frame.grid_columnconfigure(0, weight=1)
self.status_label = ctk.CTkLabel(self.main_frame, text="Status: Klar", font=ctk.CTkFont(size=16))
self.status_label.grid(row=0, column=0, pady=(0, 10), sticky="w")
self.log_textbox = ctk.CTkTextbox(self.main_frame, state="disabled")
self.log_textbox.grid(row=1, column=0, sticky="nsew")
# Buttons frame
self.btn_frame = ctk.CTkFrame(self.main_frame, fg_color="transparent")
self.btn_frame.grid(row=2, column=0, pady=(20, 0), sticky="ew")
self.btn_frame.grid_columnconfigure(0, weight=1)
self.start_button = ctk.CTkButton(self.btn_frame, text="Start Synkronisering", command=self.start_sync_thread, height=50, font=ctk.CTkFont(size=16, weight="bold"))
self.start_button.grid(row=0, column=0, padx=(0, 10), sticky="ew")
self.stop_button = ctk.CTkButton(self.btn_frame, text="Stop", command=self.stop_sync, height=50, fg_color="#d32f2f", hover_color="#b71c1c", state="disabled")
self.stop_button.grid(row=0, column=1, sticky="ew")
self.load_settings()
self.setup_logging()
def setup_logging(self):
handler = TextboxHandler(self.log_textbox)
handler.setFormatter(logging.Formatter('%(asctime)s: %(message)s', datefmt='%H:%M:%S'))
download_sharepoint.logger.addHandler(handler)
def browse_folder(self):
path = filedialog.askdirectory()
if path:
self.entries["LOCAL_PATH"].delete(0, "end")
self.entries["LOCAL_PATH"].insert(0, path)
def load_settings(self):
if os.path.exists("connection_info.txt"):
config = download_sharepoint.load_config("connection_info.txt")
for key, entry in self.entries.items():
val = config.get(key, "")
entry.insert(0, val)
def save_settings(self):
config_lines = [f'{k} = "{v.get()}"' for k, v in self.entries.items()]
with open("connection_info.txt", "w", encoding="utf-8") as f:
f.write("\n".join(config_lines))
def stop_sync(self):
stop_event.set()
self.stop_button.configure(state="disabled", text="Stopper...")
download_sharepoint.logger.warning("Stop-signal sendt. Venter på at tråde afbryder...")
def start_sync_thread(self):
self.save_settings()
stop_event.clear()
self.start_button.configure(state="disabled")
self.stop_button.configure(state="normal", text="Stop")
self.status_label.configure(text="Status: Synkroniserer...", text_color="orange")
thread = threading.Thread(target=self.run_sync, daemon=True)
thread.start()
def run_sync(self):
try:
config = download_sharepoint.load_config("connection_info.txt")
download_sharepoint.main(config=config, stop_event=stop_event)
if stop_event.is_set():
self.status_label.configure(text="Status: Afbrudt", text_color="red")
else:
self.status_label.configure(text="Status: Gennemført!", text_color="green")
except InterruptedError:
self.status_label.configure(text="Status: Afbrudt", text_color="red")
except Exception as e:
self.status_label.configure(text="Status: Fejl!", text_color="red")
messagebox.showerror("Fejl", str(e))
finally:
self.start_button.configure(state="normal")
self.stop_button.configure(state="disabled", text="Stop")
if __name__ == "__main__":
app = SharepointApp()
app.mainloop()