get-sources/README.md
Pierre-Olivier Mercier 459388fe60
All checks were successful
continuous-integration/drone/push Build is passing
Add S3 static page generator for bucket browsing
Implement Python-based generator that creates static HTML index pages for browsing S3 bucket contents. The generator produces nginx-style directory listings with hierarchical navigation.
2026-01-06 17:04:05 +07:00

115 lines
3.4 KiB
Markdown

# happyDomain S3 Static Page Generator
A Python-based tool that generates static HTML index pages for browsing S3 bucket contents. Creates nginx-style directory listings for the happyDomain download repository.
## Overview
This generator connects to an S3-compatible storage bucket, retrieves the list of all objects, and generates static `index.html` files for each directory. The generated pages provide a clean, browsable interface similar to nginx directory listings.
## Features
- Static HTML generation (no JavaScript required)
- nginx-style directory listings
- Support for S3-compatible storage
- Automatic pagination for large buckets (>1000 objects)
- Human-readable file sizes and dates
- Hierarchical directory navigation
## Requirements
- Python 3.11+
- Access to S3-compatible storage
- Environment variables for S3 credentials
## Local Development Setup
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure Environment Variables
Create a `.env` file or export these variables:
```bash
export S3_ENDPOINT_URL="https://blob.nemunai.re"
export S3_BUCKET="happydomain-dl"
export S3_REGION="garage"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export LOG_LEVEL="INFO" # Optional: DEBUG for verbose output
```
### 3. Run the Generator
```bash
python generator.py
```
The generated HTML files will be created in the `output/` directory, mirroring the structure of your S3 bucket.
## CI/CD Integration
This project is configured to run automatically in DroneCI. The pipeline:
1. **Generate Step**: Runs the Python generator to create all index.html files
2. **Deploy Step**: Uploads the generated files to the S3 bucket
### Environment Configuration
The following secrets must be configured in DroneCI:
- `s3_access_key`: S3 access key ID
- `s3_secret_key`: S3 secret access key
## How It Works
### 1. S3Client
Connects to the S3-compatible storage using boto3 and retrieves all objects in the bucket. Supports pagination for buckets with more than 1000 objects.
### 2. DirectoryTree
Parses S3 object keys (which include full paths) into a hierarchical directory structure. Tracks files and their metadata (size, last modified date).
### 3. HTMLGenerator
Uses Jinja2 templates to generate static HTML pages. Formats file sizes as human-readable values (e.g., "1.2M", "453K") and dates in a standard format.
### 4. Main Orchestrator
Coordinates the entire process:
- Loads configuration
- Connects to S3
- Builds directory tree
- Generates HTML for each directory
- Writes files to output directory
## Generated HTML Structure
Each generated `index.html` page includes:
- Page title showing current directory path
- Parent directory link (..) for navigation up
- List of subdirectories (sorted alphabetically)
- List of files with metadata:
- File name (linked to S3 object)
- Last modified date
- File size
The pages use a simple, monospace design similar to nginx directory listings.
## Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `S3_ENDPOINT_URL` | Yes | - | S3 endpoint URL |
| `S3_BUCKET` | Yes | - | S3 bucket name |
| `S3_REGION` | No | `us-east-1` | S3 region |
| `AWS_ACCESS_KEY_ID` | Yes* | - | S3 access key |
| `AWS_SECRET_ACCESS_KEY` | Yes* | - | S3 secret key |
| `LOG_LEVEL` | No | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
\* Can also use `S3_ACCESS_KEY` and `S3_SECRET_KEY`