Biblical Hebrew Preposition Analysis¶
This notebook demonstrates the preposition analysis capabilities added in bible_grammar.prepositions.
Data source: MACULA OT syntax data, which tokenizes inseparable prepositions (ב, ל, כ) as separate rows with clean pointed lemmas.
Coverage: ~64,000 preposition tokens across the OT, 46 unique lemmas.
Functions covered:
prep_frequency()— frequency table of all prepositionsprep_by_book()— one preposition's distribution across booksprep_distribution_table()— side-by-side comparison by book groupprep_collocates()— top collocates for a given prepositionprep_object_types()— grammatical breakdown of what follows a prepcompare_preps()— side-by-side collocate comparison of two preps
import sys
sys.path.insert(0, '../../../src')
import pandas as pd
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_colwidth', 40)
from bible_grammar.prepositions import (
prep_frequency, prep_by_book, prep_distribution_table,
prep_collocates, prep_object_types, compare_preps,
MAJOR_PREPS, PREP_GLOSS,
)
1. Overall Preposition Frequency (OT-wide)¶
The top 7 prepositions (ל, ב, מִן, עַל, אֶל, כְּ, עַד) account for over 93% of all preposition tokens.
freq = prep_frequency(top_n=15)
freq
# Torah subset
print('Torah preposition frequency:')
prep_frequency(book_group='Torah', top_n=10)
2. Distribution by Book Group¶
How do the major prepositions vary across Torah, Former Prophets, Writings, and Latter Prophets?
dist = prep_distribution_table()
dist
# Normalized as percentage of total preps in each group
totals = dist.drop(index='Total').sum(axis=1)
dist_pct = dist.drop(index='Total').div(totals, axis=0).mul(100).round(1)
dist_pct.loc['Total'] = dist.loc['Total'].div(dist.loc['Total'].sum()).mul(100).round(1)
print('Percentage of all prepositions per group:')
dist_pct
3. Single Preposition Distribution Across Books¶
Where does ל appear most densely? Compare with the distribution of אֶל.
lamed_dist = prep_by_book('לְ')
lamed_dist
# el-distribution — אֶל skews toward narrative books
el_dist = prep_by_book('אֶל')
el_dist
4. Collocate Analysis — What Follows a Preposition?¶
prep_collocates() finds the word immediately following the preposition (word_num + 1 adjacency).
Filter by pos='noun' to see what nominal heads each preposition governs.
# Top noun collocates of lamed
prep_collocates('לְ', pos='noun', top_n=20)
# Top noun collocates of bet
prep_collocates('בְּ', pos='noun', top_n=20)
# Top noun collocates of min (מִן) — source/separation
prep_collocates('מִן', pos='noun', top_n=15)
5. Object Types — What Grammatical Category Follows Each Preposition?¶
What proportion of tokens following ל are nouns, verbs, suffixes, etc.?
print('Object types following לְ:')
display(prep_object_types('לְ'))
print('Object types following בְּ:')
display(prep_object_types('בְּ'))
print('Object types following מִן:')
display(prep_object_types('מִן'))
6. Compare Two Prepositions Side by Side¶
לְ and אֶל both mark direction/indirect object. How do their nominal collocates differ?
compare_preps('לְ', 'אֶל', pos='noun', top_n=20)
# עַל vs. בְּ (spatial prepositions)
compare_preps('עַל', 'בְּ', pos='noun', top_n=20)
7. Book-Group Focused Collocate Analysis¶
Does the collocate profile of a preposition change between the Torah and the Psalms?
print('לְ + noun collocates in Torah:')
display(prep_collocates('לְ', pos='noun', top_n=10, book_group='Torah'))
print('לְ + noun collocates in Writings:')
display(prep_collocates('לְ', pos='noun', top_n=10, book_group='Writings'))
8. Single-Book Focus¶
Drill into Genesis — what are the most common preposition + noun combinations?
print('Preposition frequency in Genesis:')
display(prep_frequency(book='Gen', top_n=10))
print('Top לְ + noun collocates in Genesis:')
display(prep_collocates('לְ', pos='noun', top_n=10, book='Gen'))