Formulaic Language and Fixed Phrase Detection¶

Biblical Hebrew and Greek contain hundreds of formulaic expressions — fixed phrases with specialized theological or rhetorical functions that are more than the sum of their parts.

Category	Examples
Prophetic oracle	כֹּה אָמַר יְהוָה (Thus says YHWH), נְאֻם יְהוָה (oracle of YHWH)
Prophetic narrative	וַיְהִי דְבַר יְהוָה (the word of YHWH came)
Covenantal	כָּרַת בְּרִית (cut a covenant)
Doxological	בָּרוּךְ יְהוָה (blessed be YHWH), הַלְלוּיָהּ
Dominical (NT)	ἀμὴν λέγω ὑμῖν (truly I say to you)
Citation (NT)	γέγραπται (it is written)
Epistolary (NT)	χάρις ὑμῖν καὶ εἰρήνη (grace and peace to you)

Method: All pattern matching uses MACULA lemma columns. The '*' wildcard matches any single lemma. Patterns are verse-boundary-safe.

Related notebooks:

both/lexicon/collocation_and_phrase.ipynb — PMI/G² collocation statistics
both/lexicon/concordance.ipynb — general concordance generation

In [ ]:

Copied!





import sys
sys.path.insert(0, '../../../src')

from bible_grammar import (
    HEBREW_FORMULAS, GREEK_FORMULAS,
    ot_formula_frequency, nt_formula_frequency,
    ot_formula_search, nt_formula_search,
    formula_book_distribution,
    ot_formula_profile, nt_formula_profile,
    print_formula_concordance, print_formula_book_distribution,
    print_ot_formula_profile, print_nt_formula_profile,
    print_ot_top_ngrams, print_nt_top_ngrams,
    formula_book_chart, formula_chapter_chart,
)
import pandas as pd
import sys
sys.path.insert(0, '../../../src')

from bible_grammar import (
    HEBREW_FORMULAS, GREEK_FORMULAS,
    ot_formula_frequency, nt_formula_frequency,
    ot_formula_search, nt_formula_search,
    formula_book_distribution,
    ot_formula_profile, nt_formula_profile,
    print_formula_concordance, print_formula_book_distribution,
    print_ot_formula_profile, print_nt_formula_profile,
    print_ot_top_ngrams, print_nt_top_ngrams,
    formula_book_chart, formula_chapter_chart,
)
import pandas as pd

1. Hebrew N-gram Frequency — Top Fixed Phrases¶

Verse-boundary-safe n-gram extraction over MACULA lemmas. Bigrams and trigrams reveal the most repeated lemma pairs and triples.

In [ ]:

Copied!

# Top-30 Hebrew bigrams
print_ot_top_ngrams(2, min_count=20, top_n=30)
# Top-30 Hebrew bigrams
print_ot_top_ngrams(2, min_count=20, top_n=30)

In [ ]:

Copied!

# Top-30 Hebrew trigrams — fixed phrases
print_ot_top_ngrams(3, min_count=10, top_n=30)
# Top-30 Hebrew trigrams — fixed phrases
print_ot_top_ngrams(3, min_count=10, top_n=30)

In [ ]:

Copied!

# Psalms-specific bigrams — what formulaic language dominates poetry?
print_ot_top_ngrams(2, min_count=5, top_n=20, book='Psa')
# Psalms-specific bigrams — what formulaic language dominates poetry?
print_ot_top_ngrams(2, min_count=5, top_n=20, book='Psa')

2. Prophetic Formula Analysis¶

The three major prophetic messenger formulas that anchor Israelite prophecy.

In [ ]:

Copied!

# Full profile of all curated Hebrew formulas
print_ot_formula_profile()
# Full profile of all curated Hebrew formulas
print_ot_formula_profile()

In [ ]:

Copied!

# כֹּה אָמַר יְהוָה — "Thus says YHWH" — distribution by book
ko_amar = HEBREW_FORMULAS['ko_amar_yhwh']['pattern']
print_formula_book_distribution(ko_amar, lang='H')
# כֹּה אָמַר יְהוָה — "Thus says YHWH" — distribution by book
ko_amar = HEBREW_FORMULAS['ko_amar_yhwh']['pattern']
print_formula_book_distribution(ko_amar, lang='H')

In [ ]:

Copied!

formula_book_chart(ko_amar, lang='H')
formula_book_chart(ko_amar, lang='H')

In [ ]:

Copied!

# Concordance for כֹּה אָמַר יְהוָה — first 20 occurrences
print_formula_concordance(ko_amar, lang='H', max_rows=20)
# Concordance for כֹּה אָמַר יְהוָה — first 20 occurrences
print_formula_concordance(ko_amar, lang='H', max_rows=20)

In [ ]:

Copied!

# נְאֻם יְהוָה — "oracle of YHWH"
neum_yhwh = HEBREW_FORMULAS['neum_yhwh']['pattern']
print_formula_book_distribution(neum_yhwh, lang='H')
# נְאֻם יְהוָה — "oracle of YHWH"
neum_yhwh = HEBREW_FORMULAS['neum_yhwh']['pattern']
print_formula_book_distribution(neum_yhwh, lang='H')

In [ ]:

Copied!





# Compare the three prophetic formula distributions side by side
formulas_to_compare = ['ko_amar_yhwh', 'neum_yhwh', 'devar_yhwh']
prophetic_books = ['Isa', 'Jer', 'Eze', 'Hos', 'Joe', 'Amo', 'Oba', 'Jon', 'Mic', 'Nah', 'Hab', 'Zep', 'Hag', 'Zec', 'Mal']

rows = []
for fkey in formulas_to_compare:
    pat = HEBREW_FORMULAS[fkey]['pattern']
    gloss = HEBREW_FORMULAS[fkey]['gloss']
    df = formula_book_distribution(pat, lang='H')
    for _, row in df.iterrows():
        rows.append({'formula': gloss, 'book': row['book'], 'count': row['count']})

cmp = pd.DataFrame(rows)
pivot = cmp.pivot_table(index='book', columns='formula', values='count', fill_value=0)
pivot = pivot.reindex([b for b in prophetic_books if b in pivot.index])
pivot
# Compare the three prophetic formula distributions side by side
formulas_to_compare = ['ko_amar_yhwh', 'neum_yhwh', 'devar_yhwh']
prophetic_books = ['Isa', 'Jer', 'Eze', 'Hos', 'Joe', 'Amo', 'Oba', 'Jon', 'Mic', 'Nah', 'Hab', 'Zep', 'Hag', 'Zec', 'Mal']

rows = []
for fkey in formulas_to_compare:
    pat = HEBREW_FORMULAS[fkey]['pattern']
    gloss = HEBREW_FORMULAS[fkey]['gloss']
    df = formula_book_distribution(pat, lang='H')
    for _, row in df.iterrows():
        rows.append({'formula': gloss, 'book': row['book'], 'count': row['count']})

cmp = pd.DataFrame(rows)
pivot = cmp.pivot_table(index='book', columns='formula', values='count', fill_value=0)
pivot = pivot.reindex([b for b in prophetic_books if b in pivot.index])
pivot

In [ ]:

Copied!

# Chapter distribution of כֹּה אָמַר יְהוָה in Jeremiah
formula_chapter_chart('Jer', 'ko_amar_yhwh', lang='H')
# Chapter distribution of כֹּה אָמַר יְהוָה in Jeremiah
formula_chapter_chart('Jer', 'ko_amar_yhwh', lang='H')

In [ ]:

Copied!

# Chapter distribution in Ezekiel
formula_chapter_chart('Eze', 'ko_amar_yhwh', lang='H')
# Chapter distribution in Ezekiel
formula_chapter_chart('Eze', 'ko_amar_yhwh', lang='H')

3. Blessing / Curse Formulas¶

Deuteronomy contains the most concentrated collection of curse formulas (Deut 27–28). Psalms and wisdom literature dominate blessing vocabulary.

In [ ]:

Copied!





# בָּרוּךְ יְהוָה — "Blessed be YHWH" — doxological
barukh = HEBREW_FORMULAS['barukh_yhwh']['pattern']
print_formula_book_distribution(barukh, lang='H')
print_formula_concordance(barukh, lang='H', max_rows=20)
# בָּרוּךְ יְהוָה — "Blessed be YHWH" — doxological
barukh = HEBREW_FORMULAS['barukh_yhwh']['pattern']
print_formula_book_distribution(barukh, lang='H')
print_formula_concordance(barukh, lang='H', max_rows=20)

In [ ]:

Copied!





# אָרוּר — curse formula, single token
arur_pat = HEBREW_FORMULAS['arur']['pattern']
print_formula_book_distribution(arur_pat, lang='H')
print_formula_concordance(arur_pat, lang='H', max_rows=15)
# אָרוּר — curse formula, single token
arur_pat = HEBREW_FORMULAS['arur']['pattern']
print_formula_book_distribution(arur_pat, lang='H')
print_formula_concordance(arur_pat, lang='H', max_rows=15)

In [ ]:

Copied!





# כָּרַת בְּרִית — "cut a covenant"
karat_pat = HEBREW_FORMULAS['karat_berit']['pattern']
print_formula_book_distribution(karat_pat, lang='H')
print_formula_concordance(karat_pat, lang='H', max_rows=20)
# כָּרַת בְּרִית — "cut a covenant"
karat_pat = HEBREW_FORMULAS['karat_berit']['pattern']
print_formula_book_distribution(karat_pat, lang='H')
print_formula_concordance(karat_pat, lang='H', max_rows=20)

4. Greek NT Formulas¶

In [ ]:

Copied!

# Full profile of curated Greek formulas
print_nt_formula_profile()
# Full profile of curated Greek formulas
print_nt_formula_profile()

In [ ]:

Copied!

# Top Greek bigrams
print_nt_top_ngrams(2, min_count=20, top_n=30)
# Top Greek bigrams
print_nt_top_ngrams(2, min_count=20, top_n=30)

In [ ]:

Copied!





# ἀμὴν λέγω ὑμῖν — "Truly I say to you" — Synoptic
amen_pat = GREEK_FORMULAS['amen_lego_hymin']['pattern']
print_formula_book_distribution(amen_pat, lang='G')
print_formula_concordance(amen_pat, lang='G', max_rows=20)
# ἀμὴν λέγω ὑμῖν — "Truly I say to you" — Synoptic
amen_pat = GREEK_FORMULAS['amen_lego_hymin']['pattern']
print_formula_book_distribution(amen_pat, lang='G')
print_formula_concordance(amen_pat, lang='G', max_rows=20)

In [ ]:

Copied!

formula_book_chart(amen_pat, lang='G')
formula_book_chart(amen_pat, lang='G')

In [ ]:

Copied!





# ἀμὴν ἀμὴν λέγω ὑμῖν — Johannine double amen
amen2_pat = GREEK_FORMULAS['amen_amen_lego_hymin']['pattern']
print_formula_book_distribution(amen2_pat, lang='G')
print_formula_concordance(amen2_pat, lang='G', max_rows=25)
# ἀμὴν ἀμὴν λέγω ὑμῖν — Johannine double amen
amen2_pat = GREEK_FORMULAS['amen_amen_lego_hymin']['pattern']
print_formula_book_distribution(amen2_pat, lang='G')
print_formula_concordance(amen2_pat, lang='G', max_rows=25)

In [ ]:

Copied!

# γέγραπται — "It is written" (all forms of γράφω)
gegraptai = GREEK_FORMULAS['gegraptai']['pattern']
print_formula_book_distribution(gegraptai, lang='G')
# γέγραπται — "It is written" (all forms of γράφω)
gegraptai = GREEK_FORMULAS['gegraptai']['pattern']
print_formula_book_distribution(gegraptai, lang='G')

In [ ]:

Copied!

# χάρις ὑμῖν καὶ εἰρήνη — epistolary grace-peace greeting
charis_pat = GREEK_FORMULAS['charis_kai_eirene']['pattern']
print_formula_concordance(charis_pat, lang='G', max_rows=20)
# χάρις ὑμῖν καὶ εἰρήνη — epistolary grace-peace greeting
charis_pat = GREEK_FORMULAS['charis_kai_eirene']['pattern']
print_formula_concordance(charis_pat, lang='G', max_rows=20)

5. Formula Density — Which Books Are Most Formulaic?¶

Normalize formula counts by total tokens to compare books fairly.

In [ ]:

Copied!





from bible_grammar._utils import load_ot_data
import sys; sys.path.insert(0, '../../../src')

# Prophetic formula density per OT book
ot_df = load_ot_data()
ot_h = ot_df[ot_df['lang'] == 'H']
book_tokens = ot_h.groupby('book').size().rename('tokens')

# ko_amar_yhwh density
ko_dist = formula_book_distribution(HEBREW_FORMULAS['ko_amar_yhwh']['pattern'], lang='H').set_index('book')['count']
neum_dist = formula_book_distribution(HEBREW_FORMULAS['neum_yhwh']['pattern'], lang='H').set_index('book')['count']
devar_dist = formula_book_distribution(HEBREW_FORMULAS['devar_yhwh']['pattern'], lang='H').set_index('book')['count']

density = pd.DataFrame({
    'ko_amar': ko_dist,
    'neum': neum_dist,
    'devar': devar_dist,
    'tokens': book_tokens
}).fillna(0)
density['prophetic_total'] = density[['ko_amar', 'neum', 'devar']].sum(axis=1)
density['density_per_1k'] = (density['prophetic_total'] / density['tokens'] * 1000).round(2)
density.sort_values('density_per_1k', ascending=False).head(20)[['tokens', 'ko_amar', 'neum', 'devar', 'prophetic_total', 'density_per_1k']]
from bible_grammar._utils import load_ot_data
import sys; sys.path.insert(0, '../../../src')

# Prophetic formula density per OT book
ot_df = load_ot_data()
ot_h = ot_df[ot_df['lang'] == 'H']
book_tokens = ot_h.groupby('book').size().rename('tokens')

# ko_amar_yhwh density
ko_dist = formula_book_distribution(HEBREW_FORMULAS['ko_amar_yhwh']['pattern'], lang='H').set_index('book')['count']
neum_dist = formula_book_distribution(HEBREW_FORMULAS['neum_yhwh']['pattern'], lang='H').set_index('book')['count']
devar_dist = formula_book_distribution(HEBREW_FORMULAS['devar_yhwh']['pattern'], lang='H').set_index('book')['count']

density = pd.DataFrame({
    'ko_amar': ko_dist,
    'neum': neum_dist,
    'devar': devar_dist,
    'tokens': book_tokens
}).fillna(0)
density['prophetic_total'] = density[['ko_amar', 'neum', 'devar']].sum(axis=1)
density['density_per_1k'] = (density['prophetic_total'] / density['tokens'] * 1000).round(2)
density.sort_values('density_per_1k', ascending=False).head(20)[['tokens', 'ko_amar', 'neum', 'devar', 'prophetic_total', 'density_per_1k']]

6. Variation Detection¶

Biblical formulas often appear in variant forms. Use the '*' wildcard to find the core pattern while allowing intervening words.

In [ ]:

Copied!





# Find variants of the divine-presence formula: YHWH was with X
# Matching הָיָה ... יְהוָה ... עִם with wildcard between
presence_variants = ot_formula_search(['הָיָה', 'יְהוָה', 'עִם'])
print(f"'YHWH was with (X)' occurrences: {len(presence_variants)}")
presence_variants[['ref', 'context']].head(20)
# Find variants of the divine-presence formula: YHWH was with X
# Matching הָיָה ... יְהוָה ... עִם with wildcard between
presence_variants = ot_formula_search(['הָיָה', 'יְהוָה', 'עִם'])
print(f"'YHWH was with (X)' occurrences: {len(presence_variants)}")
presence_variants[['ref', 'context']].head(20)

In [ ]:

Copied!





# Wildcard: YHWH commanded — find ALL subjects of צָוָה next to יְהוָה
tsivah = ot_formula_search(['*', 'צָוָה', 'יְהוָה'])
print(f"'(X) commanded YHWH' / 'as YHWH commanded': {len(tsivah)} occurrences")
tsivah[['ref', 'context']].head(15)
# Wildcard: YHWH commanded — find ALL subjects of צָוָה next to יְהוָה
tsivah = ot_formula_search(['*', 'צָוָה', 'יְהוָה'])
print(f"'(X) commanded YHWH' / 'as YHWH commanded': {len(tsivah)} occurrences")
tsivah[['ref', 'context']].head(15)

In [ ]:

Copied!





# NT: amen variations — compare single vs. double
single = nt_formula_search(GREEK_FORMULAS['amen_lego_hymin']['pattern'])
double = nt_formula_search(GREEK_FORMULAS['amen_amen_lego_hymin']['pattern'])
print(f"Single amen (Synoptic):  {len(single)} occurrences")
print(f"Double amen (Johannine): {len(double)} occurrences")
# NT: amen variations — compare single vs. double
single = nt_formula_search(GREEK_FORMULAS['amen_lego_hymin']['pattern'])
double = nt_formula_search(GREEK_FORMULAS['amen_amen_lego_hymin']['pattern'])
print(f"Single amen (Synoptic):  {len(single)} occurrences")
print(f"Double amen (Johannine): {len(double)} occurrences")