A web scraper for chess coaches.
 
 
 
Go to file
Joshua Potter 10801b560c
Generalize in anticipation of merging the lichess scraper. (#1)
* Add a general `Scraper` class.

* Setup main as primary entrypoint.

* Abstract original scraper into scraper class.

* Add better logging and cleaner bash commands.

* Ensure exporting works.
2023-11-30 15:15:15 -07:00
.githooks Initial commit. 2023-11-27 13:09:40 -07:00
app Generalize in anticipation of merging the lichess scraper. (#1) 2023-11-30 15:15:15 -07:00
.envrc Initial commit. 2023-11-27 13:09:40 -07:00
.gitignore Add `mypy` and lock poetry dependencies. 2023-11-28 04:55:09 -07:00
README.md Generalize in anticipation of merging the lichess scraper. (#1) 2023-11-30 15:15:15 -07:00
default.nix Initial commit. 2023-11-27 13:09:40 -07:00
flake.lock Initial commit. 2023-11-27 13:09:40 -07:00
flake.nix Package into app for `nix build`. 2023-11-28 05:53:09 -07:00
poetry.lock Generalize in anticipation of merging the lichess scraper. (#1) 2023-11-30 15:15:15 -07:00
pyproject.toml Generalize in anticipation of merging the lichess scraper. (#1) 2023-11-30 15:15:15 -07:00

README.md

chesscom-scraper

Caution! Be careful running this script.

We intentionally delay each batch of requests by 3 seconds. Make sure any adjustments to this script appropriately rate-limit.

Overview

This is a simple web scraper for chess.com coaches. The program searches for all listed coaches as well as specific information about each of them (their profile, recent activity, and stats). The result will be found in a newly created data directory with the following structure:

data
├── coach
│   ├── <username>
│   │   ├── <username>.html
│   │   ├── activity.json
│   │   └── stats.json
│   ├── ...
└── pages
    ├── <n>.txt
    ├── ...

Usage

If you have nix available, run:

$> nix run . -- --user-agent <your-email> -s chesscom

If not, ensure you have poetry on your machine and instead run the following:

$> poetry run python3 -m app -u <your-email> -s chesscom

Development

nix is used for development. The included flakes.nix file automatically loads in Python (version 3.11.6) with packaging and dependency management handled by poetry (version 1.7.0). direnv can be used to a launch a dev shell upon entering this directory (refer to .envrc). Otherwise run via:

$> nix develop

Language Server

The python-lsp-server (version v1.9.0) is included in this flake, along with the python-lsp-black plugin for formatting purposes. pylsp is expected to be configured to use McCabe, pycodestyle, and pyflakes. Refer to your editor for configuration details.

Formatting

Formatting depends on the black (version 23.9.1) tool. A pre-commit hook is included in .githooks that can be used to format all *.py files prior to commit. Install via:

$> git config --local core.hooksPath .githooks/

If running direnv, this hook is installed automatically when entering the directory.