coach-scraper/README.md

61 lines
2.1 KiB
Markdown
Raw Normal View History

2023-11-27 21:13:46 +00:00
# chesscom-scraper
2023-11-27 20:09:40 +00:00
2023-11-27 21:13:46 +00:00
**Caution! Be careful running this script.**
We intentionally delay each batch of requests by 3 seconds. Make sure any
adjustments to this script appropriately rate-limit.
2023-11-27 21:13:46 +00:00
## Overview
This is a simple web scraper for [chess.com](https://www.chess.com/coaches)
coaches. Running:
```bash
$> python3 main.py --user-agent <your-email>
2023-11-27 21:13:46 +00:00
```
will query [chess.com](https://www.chess.com) for all listed coach usernames as
well as specific information about each of corresponding coach (their profile,
recent activity, and stats). The result will be found in a newly created `data`
directory with the following structure:
2023-11-27 21:13:46 +00:00
```
data
├── coach
│ ├── <username>
│ │ ├── <username>.html
2023-11-27 21:13:46 +00:00
│ │ ├── activity.json
│ │ └── stats.json
│ ├── ...
└── pages
├── <n>.txt
├── ...
```
## Development
This script was written using Python (version 3.11.6). Packaging and dependency
management relies on [poetry](https://python-poetry.org/) (version 1.7.0).
[direnv](https://direnv.net/) can be used to a launch a dev shell upon entering
this directory (refer to `.envrc`). Otherwise run via:
2023-11-27 20:09:40 +00:00
```bash
$> nix develop
```
2023-11-27 21:13:46 +00:00
### Language Server
2023-11-27 20:09:40 +00:00
The [python-lsp-server](https://github.com/python-lsp/python-lsp-server)
(version v1.9.0) is included in this flake, along with the [python-lsp-black](https://github.com/python-lsp/python-lsp-black)
plugin for formatting purposes. `pylsp` is expected to be configured to use
[McCabe](https://github.com/PyCQA/mccabe), [pycodestyle](https://pycodestyle.pycqa.org/en/latest/),
and [pyflakes](https://github.com/PyCQA/pyflakes). Refer to your editor for
configuration details.
2023-11-27 21:13:46 +00:00
### Formatting
2023-11-27 20:09:40 +00:00
Formatting depends on the [black](https://black.readthedocs.io/en/stable/index.html)
(version 23.9.1) tool. A `pre-commit` hook is included in `.githooks` that can
be used to format all `*.py` files prior to commit. Install via:
```bash
$> git config --local core.hooksPath .githooks/
```
If running [direnv](https://direnv.net/), this hook is installed automatically
when entering the directory.