Sources & methodology
Where this material comes from
Storgy is built on a small number of well-known public-domain corpora and a couple of AI services. Here’s what each one provides and where it appears on the site.
Poem texts — Project Gutenberg
The full text of every poem on Storgy is sourced from Project Gutenberg, a digital library of public-domain books that has been operating since 1971. Each poem page includes a link back to the source ebook so you can verify the transcription against the original. We use the Berne Convention’s “life + 70” rule to classify public-domain status, so anything you read in full here is legally clear in the EU, UK, and Australia (and almost always in the US too).
Poet metadata — Wikidata
Birth and death dates, nationality, period, and the structured links between poets come from Wikidata. Wikidata is the authoritative structured-data project behind Wikipedia, with every fact backed by a citation. When the Wikidata entry conflicts with a poet’s page on Wikipedia, we prefer Wikidata — it’s the source the wider linked-data web treats as canonical.
Biography source text — Wikipedia
The first draft of every poet biography is built from public passages on Wikipedia under the CC BY-SA license. We then rewrite each biography in our own voice before publishing — Storgy bios are not Wikipedia copies, but Wikipedia is the factual ground we work from.
Poet portraits — Wikimedia Commons
Where a poet has a public-domain or freely-licensed portrait on Wikimedia Commons, we use it. Images are loaded directly from upload.wikimedia.org so attribution and license traceability live with the file.
The AI explanations — Anthropic Claude
Summaries, line-by-line analyses, theme essays, and the responses from the Poem Analyzer are produced by Anthropic’s Claude Sonnet model, which we ask to emit a fixed JSON shape so every output has the same seven sections. Each result is then passed through a second model to rewrite the kind of stiff, repetitive phrasing that AI text tends to fall into. The aim is short, concrete prose — not the over-padded essays a single LLM call usually returns.
What we don’t use
No scraped content from rights-managed sources. No copy-pasted Wikipedia biographies. No third-party SEO content farms. Living poets aren’t in the corpus because their work is still under copyright; the explainer can still process anything you paste in, but we won’t publish full texts we don’t have rights to.
Found a problem?
Misattributed poem, wrong birth year, dubious analysis, broken link — please tell us. Email nikola@gulevski.com and we’ll fix it.