Formulaic Language and Fixed Phrase Detection¶
Biblical Hebrew and Greek contain hundreds of formulaic expressions — fixed phrases with specialized theological or rhetorical functions that are more than the sum of their parts.
| Category | Examples |
|---|---|
| Prophetic oracle | כֹּה אָמַר יְהוָה (Thus says YHWH), נְאֻם יְהוָה (oracle of YHWH) |
| Prophetic narrative | וַיְהִי דְבַר יְהוָה (the word of YHWH came) |
| Covenantal | כָּרַת בְּרִית (cut a covenant) |
| Doxological | בָּרוּךְ יְהוָה (blessed be YHWH), הַלְלוּיָהּ |
| Dominical (NT) | ἀμὴν λέγω ὑμῖν (truly I say to you) |
| Citation (NT) | γέγραπται (it is written) |
| Epistolary (NT) | χάρις ὑμῖν καὶ εἰρήνη (grace and peace to you) |
Method: All pattern matching uses MACULA lemma columns.
The '*' wildcard matches any single lemma. Patterns are verse-boundary-safe.
Related notebooks:
both/lexicon/collocation_and_phrase.ipynb— PMI/G² collocation statisticsboth/lexicon/concordance.ipynb— general concordance generation
import sys
sys.path.insert(0, '../../../src')
from bible_grammar import (
HEBREW_FORMULAS, GREEK_FORMULAS,
ot_formula_frequency, nt_formula_frequency,
ot_formula_search, nt_formula_search,
formula_book_distribution,
ot_formula_profile, nt_formula_profile,
print_formula_concordance, print_formula_book_distribution,
print_ot_formula_profile, print_nt_formula_profile,
print_ot_top_ngrams, print_nt_top_ngrams,
formula_book_chart, formula_chapter_chart,
)
import pandas as pd
1. Hebrew N-gram Frequency — Top Fixed Phrases¶
Verse-boundary-safe n-gram extraction over MACULA lemmas. Bigrams and trigrams reveal the most repeated lemma pairs and triples.
# Top-30 Hebrew bigrams
print_ot_top_ngrams(2, min_count=20, top_n=30)
# Top-30 Hebrew trigrams — fixed phrases
print_ot_top_ngrams(3, min_count=10, top_n=30)
# Psalms-specific bigrams — what formulaic language dominates poetry?
print_ot_top_ngrams(2, min_count=5, top_n=20, book='Psa')
2. Prophetic Formula Analysis¶
The three major prophetic messenger formulas that anchor Israelite prophecy.
# Full profile of all curated Hebrew formulas
print_ot_formula_profile()
# כֹּה אָמַר יְהוָה — "Thus says YHWH" — distribution by book
ko_amar = HEBREW_FORMULAS['ko_amar_yhwh']['pattern']
print_formula_book_distribution(ko_amar, lang='H')
formula_book_chart(ko_amar, lang='H')
# Concordance for כֹּה אָמַר יְהוָה — first 20 occurrences
print_formula_concordance(ko_amar, lang='H', max_rows=20)
# נְאֻם יְהוָה — "oracle of YHWH"
neum_yhwh = HEBREW_FORMULAS['neum_yhwh']['pattern']
print_formula_book_distribution(neum_yhwh, lang='H')
# Compare the three prophetic formula distributions side by side
formulas_to_compare = ['ko_amar_yhwh', 'neum_yhwh', 'devar_yhwh']
prophetic_books = ['Isa', 'Jer', 'Eze', 'Hos', 'Joe', 'Amo', 'Oba', 'Jon', 'Mic', 'Nah', 'Hab', 'Zep', 'Hag', 'Zec', 'Mal']
rows = []
for fkey in formulas_to_compare:
pat = HEBREW_FORMULAS[fkey]['pattern']
gloss = HEBREW_FORMULAS[fkey]['gloss']
df = formula_book_distribution(pat, lang='H')
for _, row in df.iterrows():
rows.append({'formula': gloss, 'book': row['book'], 'count': row['count']})
cmp = pd.DataFrame(rows)
pivot = cmp.pivot_table(index='book', columns='formula', values='count', fill_value=0)
pivot = pivot.reindex([b for b in prophetic_books if b in pivot.index])
pivot
# Chapter distribution of כֹּה אָמַר יְהוָה in Jeremiah
formula_chapter_chart('Jer', 'ko_amar_yhwh', lang='H')
# Chapter distribution in Ezekiel
formula_chapter_chart('Eze', 'ko_amar_yhwh', lang='H')
3. Blessing / Curse Formulas¶
Deuteronomy contains the most concentrated collection of curse formulas (Deut 27–28). Psalms and wisdom literature dominate blessing vocabulary.
# בָּרוּךְ יְהוָה — "Blessed be YHWH" — doxological
barukh = HEBREW_FORMULAS['barukh_yhwh']['pattern']
print_formula_book_distribution(barukh, lang='H')
print_formula_concordance(barukh, lang='H', max_rows=20)
# אָרוּר — curse formula, single token
arur_pat = HEBREW_FORMULAS['arur']['pattern']
print_formula_book_distribution(arur_pat, lang='H')
print_formula_concordance(arur_pat, lang='H', max_rows=15)
# כָּרַת בְּרִית — "cut a covenant"
karat_pat = HEBREW_FORMULAS['karat_berit']['pattern']
print_formula_book_distribution(karat_pat, lang='H')
print_formula_concordance(karat_pat, lang='H', max_rows=20)
4. Greek NT Formulas¶
# Full profile of curated Greek formulas
print_nt_formula_profile()
# Top Greek bigrams
print_nt_top_ngrams(2, min_count=20, top_n=30)
# ἀμὴν λέγω ὑμῖν — "Truly I say to you" — Synoptic
amen_pat = GREEK_FORMULAS['amen_lego_hymin']['pattern']
print_formula_book_distribution(amen_pat, lang='G')
print_formula_concordance(amen_pat, lang='G', max_rows=20)
formula_book_chart(amen_pat, lang='G')
# ἀμὴν ἀμὴν λέγω ὑμῖν — Johannine double amen
amen2_pat = GREEK_FORMULAS['amen_amen_lego_hymin']['pattern']
print_formula_book_distribution(amen2_pat, lang='G')
print_formula_concordance(amen2_pat, lang='G', max_rows=25)
# γέγραπται — "It is written" (all forms of γράφω)
gegraptai = GREEK_FORMULAS['gegraptai']['pattern']
print_formula_book_distribution(gegraptai, lang='G')
# χάρις ὑμῖν καὶ εἰρήνη — epistolary grace-peace greeting
charis_pat = GREEK_FORMULAS['charis_kai_eirene']['pattern']
print_formula_concordance(charis_pat, lang='G', max_rows=20)
5. Formula Density — Which Books Are Most Formulaic?¶
Normalize formula counts by total tokens to compare books fairly.
from bible_grammar._utils import load_ot_data
import sys; sys.path.insert(0, '../../../src')
# Prophetic formula density per OT book
ot_df = load_ot_data()
ot_h = ot_df[ot_df['lang'] == 'H']
book_tokens = ot_h.groupby('book').size().rename('tokens')
# ko_amar_yhwh density
ko_dist = formula_book_distribution(HEBREW_FORMULAS['ko_amar_yhwh']['pattern'], lang='H').set_index('book')['count']
neum_dist = formula_book_distribution(HEBREW_FORMULAS['neum_yhwh']['pattern'], lang='H').set_index('book')['count']
devar_dist = formula_book_distribution(HEBREW_FORMULAS['devar_yhwh']['pattern'], lang='H').set_index('book')['count']
density = pd.DataFrame({
'ko_amar': ko_dist,
'neum': neum_dist,
'devar': devar_dist,
'tokens': book_tokens
}).fillna(0)
density['prophetic_total'] = density[['ko_amar', 'neum', 'devar']].sum(axis=1)
density['density_per_1k'] = (density['prophetic_total'] / density['tokens'] * 1000).round(2)
density.sort_values('density_per_1k', ascending=False).head(20)[['tokens', 'ko_amar', 'neum', 'devar', 'prophetic_total', 'density_per_1k']]
6. Variation Detection¶
Biblical formulas often appear in variant forms. Use the '*' wildcard
to find the core pattern while allowing intervening words.
# Find variants of the divine-presence formula: YHWH was with X
# Matching הָיָה ... יְהוָה ... עִם with wildcard between
presence_variants = ot_formula_search(['הָיָה', 'יְהוָה', 'עִם'])
print(f"'YHWH was with (X)' occurrences: {len(presence_variants)}")
presence_variants[['ref', 'context']].head(20)
# Wildcard: YHWH commanded — find ALL subjects of צָוָה next to יְהוָה
tsivah = ot_formula_search(['*', 'צָוָה', 'יְהוָה'])
print(f"'(X) commanded YHWH' / 'as YHWH commanded': {len(tsivah)} occurrences")
tsivah[['ref', 'context']].head(15)
# NT: amen variations — compare single vs. double
single = nt_formula_search(GREEK_FORMULAS['amen_lego_hymin']['pattern'])
double = nt_formula_search(GREEK_FORMULAS['amen_amen_lego_hymin']['pattern'])
print(f"Single amen (Synoptic): {len(single)} occurrences")
print(f"Double amen (Johannine): {len(double)} occurrences")