Joshua Potter
10801b560c
* Add a general `Scraper` class. * Setup main as primary entrypoint. * Abstract original scraper into scraper class. * Add better logging and cleaner bash commands. * Ensure exporting works. |
||
---|---|---|
.githooks | ||
app | ||
.envrc | ||
.gitignore | ||
README.md | ||
default.nix | ||
flake.lock | ||
flake.nix | ||
poetry.lock | ||
pyproject.toml |
README.md
chesscom-scraper
Caution! Be careful running this script.
We intentionally delay each batch of requests by 3 seconds. Make sure any adjustments to this script appropriately rate-limit.
Overview
This is a simple web scraper for chess.com
coaches. The program searches for all listed coaches as well as specific
information about each of them (their profile, recent activity, and stats). The
result will be found in a newly created data
directory with the following
structure:
data
├── coach
│ ├── <username>
│ │ ├── <username>.html
│ │ ├── activity.json
│ │ └── stats.json
│ ├── ...
└── pages
├── <n>.txt
├── ...
Usage
If you have nix available, run:
$> nix run . -- --user-agent <your-email> -s chesscom
If not, ensure you have poetry on your machine and instead run the following:
$> poetry run python3 -m app -u <your-email> -s chesscom
Development
nix is used for development. The included flakes.nix
file automatically loads in Python (version 3.11.6) with packaging and
dependency management handled by poetry (version 1.7.0). direnv
can be used to a launch a dev shell upon entering this directory (refer to
.envrc
). Otherwise run via:
$> nix develop
Language Server
The python-lsp-server
(version v1.9.0) is included in this flake, along with the python-lsp-black
plugin for formatting purposes. pylsp
is expected to be configured to use
McCabe, pycodestyle,
and pyflakes. Refer to your editor for
configuration details.
Formatting
Formatting depends on the black
(version 23.9.1) tool. A pre-commit
hook is included in .githooks
that can
be used to format all *.py
files prior to commit. Install via:
$> git config --local core.hooksPath .githooks/
If running direnv, this hook is installed automatically when entering the directory.