Files
beaky/CLAUDE.md
2026-03-25 19:47:10 +01:00

4.9 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

What This Project Does

Beaky is a CLI tool for verifying the truthfulness of sports betting tickets. It reads ticket URLs from an Excel file, classifies the bets on each ticket (via web scraping or OCR), then resolves each bet against a football statistics API to determine if the ticket is genuine.

Commands

# Install (with dev dependencies)
pip install -e ".[dev]"

# Install Playwright browser (required for link classifier and screenshotter)
playwright install chromium

# Run the CLI
beaky <mode> [--config config/application.yml] [--id <ticket_id>] [--classifier {link,img,both}] [--dump]

# Modes:
#   screen   - screenshot all ticket URLs to data/screenshots/<id>.png
#   parse    - print all links loaded from Excel
#   compare  - classify tickets and print bet comparison table
#   resolve  - classify via link classifier, then resolve bets against football API

# Run the REST API (default: http://0.0.0.0:8000)
beaky-api

# Run tests
pytest

# Lint
ruff check .

# Format
ruff format .

Architecture

Data flows through four stages:

  1. Scanner (scanner/scanner.py) — Reads data/odkazy.xlsx and produces Link objects (id, url, date).

  2. Classifiers — Two independent classifiers both produce a Ticket (list of typed Bet objects):

    • Link classifier (link_classifier/classifier.py) — Launches a headless Chromium browser via Playwright, navigates to the ticket URL (a Czech Fortuna betting site), and parses the DOM using CSS selectors to extract bet details.
    • Image classifier (image_classifier/classifier.py) — Runs pytesseract OCR on screenshots in data/screenshots/, then uses regex to parse the raw text into bets. Block segmentation is driven by date-start and sport-prefix end triggers.
  3. Resolver (resolvers/resolver.py) — Takes a classified Ticket and resolves each bet's outcome (WIN/LOSE/VOID/UNKNOWN) by querying the api-sports.io football API. Matches fixtures using team name similarity (SequenceMatcher) and date proximity. Results are disk-cached in data/fixture_cache/ to avoid redundant API calls.

  4. CLI (cli.py) — Ties everything together. Handles --classifier and --dump flags; renders ANSI-colored comparison tables for side-by-side link-vs-image output.

  5. REST API (api/) — FastAPI app exposing a single endpoint. Runs the full pipeline (screenshot → both classifiers → resolve) for a given URL and returns the verdict. Classifiers and resolver are instantiated once at startup (app.state) and reused across requests.

Core Domain Models (datamodels/ticket.py)

Bet is an abstract Pydantic dataclass with a resolve(MatchInfo) -> BetOutcome method. Concrete subtypes include: WinDrawLose, WinDrawLoseDouble, WinLose, BothTeamScored, GoalAmount, GoalHandicap, HalfTimeResult, HalfTimeDouble, HalfTimeFullTime, CornerAmount, TeamCornerAmount, MoreOffsides, Advance, UnknownBet. Adding a new bet type requires: a new subclass here, detection regex in both classifiers, and a resolve() implementation.

REST API

Endpoint: POST /api/v1/resolve

{ "url": "<fortuna ticket url>", "debug": false }

Response includes verdict and per-bet outcome/fixture_id/confidence. With debug: true also returns raw link_ticket, img_ticket, and per-bet match_info.

Ticket ID is derived as md5(url) % 10^9 — stable across restarts. Screenshots are saved to data/screenshots/{ticket_id}.png.

Environment variables (all optional):

Var Default
BEAKY_CONFIG config/application.yml
BEAKY_HOST 0.0.0.0
BEAKY_PORT 8000
LOG_LEVEL value from config/application.ymlapi.log_level

OpenAPI docs available at /docs when the server is running.

Logging

All modules use logging (no print()). The CLI's user-facing output (cli.py) still uses print. Resolver debug output (fixture matching, API calls) goes through _ansi.log() which emits at DEBUG level with ANSI colors preserved. Set api.log_level: DEBUG in config/application.yml (or LOG_LEVEL=DEBUG env var) to see it.

Configuration

Config is loaded from config/application.yml into Pydantic dataclasses (Config, ScreenshotterConfig, ResolverConfig, ImgClassifierConfig, ApiConfig). Key fields:

  • path — path to the input Excel file
  • resolver.api_key — api-sports.io API key
  • resolver.league_map — maps Czech league name patterns to API league IDs (longest-match wins)
  • resolver.cache_path — disk cache directory (default: data/fixture_cache)
  • api.log_level — logging level for the API server (default: INFO)

Bet text language

All bet type strings are in Czech (from the Fortuna betting platform). Regex patterns in both classifiers match Czech text (e.g. "Výsledek zápasu", "Počet gólů").