scripts/
Tools that generate pages on the site.
arxiv_daily.py — daily arXiv feed
Builds /Users/yong/Desktop/website/_pages/daily-arxiv.md: top recent arXiv submissions in astro-ph.HE, gr-qc, and nucl-th, ranked against your and your collaborators’ research.
How the ranking works
- Pulls every paper listed under your INSPIRE-HEP author record (
Yong.Gao.1). - Walks the author lists of those papers to find your top 10 most frequent collaborators (by INSPIRE BAI), then fetches each of their 15 most recent papers.
- The combined corpus (~146 papers as of writing) becomes your “interest profile.”
- If
scikit-learnis installed, candidates are scored with TF-IDF + cosine similarity. The score for a candidate is0.7 × max(similarity to any corpus paper) + 0.3 × mean(similarity to top-5 corpus papers). This rewards both narrow and broad relevance. - Otherwise, falls back to a log-weighted vocabulary profile (stdlib only, slightly worse but always works).
To enable TF-IDF (recommended):
python3 -m pip install --user scikit-learn
Save / hide / saved-list
Each candidate has a ★ Save and a Hide button.
- Save stores
{id, title, authors, abstract, category, saved_at}inlocalStorageunderyg-arxiv-saved. The full payload means the Saved page can render without re-fetching anything. - Hide stores the arXiv ID in
localStorageunderyg-arxiv-hidden; hidden papersdisplay: noneon subsequent renders. (The hide list is separate from the save list, so a hidden paper can also be saved if you want.)
Run it once
python3 scripts/arxiv_daily.py
Run it daily and auto-commit
scripts/arxiv_daily_auto.sh regenerates the page and, if anything changed, commits and pushes it. Wire it up via:
macOS launchd — ~/Library/LaunchAgents/com.yong.arxiv-daily.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key><string>com.yong.arxiv-daily</string>
<key>ProgramArguments</key>
<array>
<string>/Users/yong/Desktop/website/scripts/arxiv_daily_auto.sh</string>
</array>
<key>WorkingDirectory</key><string>/Users/yong/Desktop/website</string>
<key>StartCalendarInterval</key>
<dict><key>Hour</key><integer>8</integer><key>Minute</key><integer>30</integer></dict>
<key>StandardOutPath</key><string>/tmp/arxiv-daily.log</string>
<key>StandardErrorPath</key><string>/tmp/arxiv-daily.log</string>
</dict>
</plist>
Activate:
launchctl load ~/Library/LaunchAgents/com.yong.arxiv-daily.plist
Plain cron alternative:
30 8 * * * /Users/yong/Desktop/website/scripts/arxiv_daily_auto.sh >> /tmp/arxiv-daily.log 2>&1
The wrapper runs the generator, checks git diff --quiet -- _pages/daily-arxiv.md, and only commits + pushes when there’s a real change.
Tuning
Constants at the top of scripts/arxiv_daily.py:
| Knob | Default | What it does |
|---|---|---|
INSPIRE_AUTHOR |
Yong.Gao.1 |
Your INSPIRE BAI |
ARXIV_CATS |
astro-ph.HE, gr-qc, nucl-th |
Which arXiv categories to crawl |
DAYS_BACK |
7 |
Submission window |
TOP_N |
25 |
How many papers to surface |
TOP_COLLABORATORS |
10 |
How many co-authors to include |
PAPERS_PER_COLLAB |
15 |
Most-recent paper cap per collaborator |
STOPWORDS |
(built-in) | Extend if generic words dominate scores |