A web scraper for chess coaches.
 
 
 
Go to file
Joshua Potter 99c89a3a6d Restructure and add documentation. Require specifying user-agent. 2023-11-27 20:06:42 -07:00
.githooks Initial commit. 2023-11-27 13:09:40 -07:00
.envrc Initial commit. 2023-11-27 13:09:40 -07:00
.gitignore Break coach listing download per-page. 2023-11-27 13:32:56 -07:00
README.md Restructure and add documentation. Require specifying user-agent. 2023-11-27 20:06:42 -07:00
default.nix Initial commit. 2023-11-27 13:09:40 -07:00
flake.lock Initial commit. 2023-11-27 13:09:40 -07:00
flake.nix Initial commit. 2023-11-27 13:09:40 -07:00
main.py Restructure and add documentation. Require specifying user-agent. 2023-11-27 20:06:42 -07:00
poetry.lock Initial commit. 2023-11-27 13:09:40 -07:00
pyproject.toml Initial commit. 2023-11-27 13:09:40 -07:00

README.md

chesscom-scraper

Caution! Be careful running this script.

We intentionally delay each batch of requests by 3 seconds. Make sure any adjustments to this script appropriately rate-limit.

Overview

This is a simple web scraper for chess.com coaches. Running:

$> python3 main.py --user-agent <your-email>

will query chess.com for all listed coach usernames as well as specific information about each of corresponding coach (their profile, recent activity, and stats). The result will be found in a newly created data directory with the following structure:

data
├── coach
│   ├── <username>
│   │   ├── <username>.html
│   │   ├── activity.json
│   │   └── stats.json
│   ├── ...
└── pages
    ├── <n>.txt
    ├── ...

Development

This script was written using Python (version 3.11.6). Packaging and dependency management relies on poetry (version 1.7.0). direnv can be used to a launch a dev shell upon entering this directory (refer to .envrc). Otherwise run via:

$> nix develop

Language Server

The python-lsp-server (version v1.9.0) is included in this flake, along with the python-lsp-black plugin for formatting purposes. pylsp is expected to be configured to use McCabe, pycodestyle, and pyflakes. Refer to your editor for configuration details.

Formatting

Formatting depends on the black (version 23.9.1) tool. A pre-commit hook is included in .githooks that can be used to format all *.py files prior to commit. Install via:

$> git config --local core.hooksPath .githooks/

If running direnv, this hook is installed automatically when entering the directory.