Biblical Hebrew Preposition Analysis¶

This notebook demonstrates the preposition analysis capabilities added in bible_grammar.prepositions.

Data source: MACULA OT syntax data, which tokenizes inseparable prepositions (ב, ל, כ) as separate rows with clean pointed lemmas.

Coverage: ~64,000 preposition tokens across the OT, 46 unique lemmas.

Functions covered:

prep_frequency() — frequency table of all prepositions
prep_by_book() — one preposition's distribution across books
prep_distribution_table() — side-by-side comparison by book group
prep_collocates() — top collocates for a given preposition
prep_object_types() — grammatical breakdown of what follows a prep
compare_preps() — side-by-side collocate comparison of two preps

In [ ]:

Copied!





import sys
sys.path.insert(0, '../../../src')

import pandas as pd
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_colwidth', 40)

from bible_grammar.prepositions import (
    prep_frequency, prep_by_book, prep_distribution_table,
    prep_collocates, prep_object_types, compare_preps,
    MAJOR_PREPS, PREP_GLOSS,
)
import sys
sys.path.insert(0, '../../../src')

import pandas as pd
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_colwidth', 40)

from bible_grammar.prepositions import (
    prep_frequency, prep_by_book, prep_distribution_table,
    prep_collocates, prep_object_types, compare_preps,
    MAJOR_PREPS, PREP_GLOSS,
)

1. Overall Preposition Frequency (OT-wide)¶

The top 7 prepositions (ל, ב, מִן, עַל, אֶל, כְּ, עַד) account for over 93% of all preposition tokens.

In [ ]:

Copied!

freq = prep_frequency(top_n=15)
freq
freq = prep_frequency(top_n=15)
freq

In [ ]:

Copied!

# Torah subset
print('Torah preposition frequency:')
prep_frequency(book_group='Torah', top_n=10)
# Torah subset
print('Torah preposition frequency:')
prep_frequency(book_group='Torah', top_n=10)

2. Distribution by Book Group¶

How do the major prepositions vary across Torah, Former Prophets, Writings, and Latter Prophets?

In [ ]:

Copied!

dist = prep_distribution_table()
dist
dist = prep_distribution_table()
dist

In [ ]:

Copied!





# Normalized as percentage of total preps in each group
totals = dist.drop(index='Total').sum(axis=1)
dist_pct = dist.drop(index='Total').div(totals, axis=0).mul(100).round(1)
dist_pct.loc['Total'] = dist.loc['Total'].div(dist.loc['Total'].sum()).mul(100).round(1)
print('Percentage of all prepositions per group:')
dist_pct
# Normalized as percentage of total preps in each group
totals = dist.drop(index='Total').sum(axis=1)
dist_pct = dist.drop(index='Total').div(totals, axis=0).mul(100).round(1)
dist_pct.loc['Total'] = dist.loc['Total'].div(dist.loc['Total'].sum()).mul(100).round(1)
print('Percentage of all prepositions per group:')
dist_pct

3. Single Preposition Distribution Across Books¶

Where does ל appear most densely? Compare with the distribution of אֶל.

In [ ]:

Copied!

lamed_dist = prep_by_book('לְ')
lamed_dist
lamed_dist = prep_by_book('לְ')
lamed_dist

In [ ]:

Copied!

# el-distribution — אֶל skews toward narrative books
el_dist = prep_by_book('אֶל')
el_dist
# el-distribution — אֶל skews toward narrative books
el_dist = prep_by_book('אֶל')
el_dist

4. Collocate Analysis — What Follows a Preposition?¶

prep_collocates() finds the word immediately following the preposition (word_num + 1 adjacency). Filter by pos='noun' to see what nominal heads each preposition governs.

In [ ]:

Copied!

# Top noun collocates of lamed
prep_collocates('לְ', pos='noun', top_n=20)
# Top noun collocates of lamed
prep_collocates('לְ', pos='noun', top_n=20)

In [ ]:

Copied!

# Top noun collocates of bet
prep_collocates('בְּ', pos='noun', top_n=20)
# Top noun collocates of bet
prep_collocates('בְּ', pos='noun', top_n=20)

In [ ]:

Copied!

# Top noun collocates of min (מִן) — source/separation
prep_collocates('מִן', pos='noun', top_n=15)
# Top noun collocates of min (מִן) — source/separation
prep_collocates('מִן', pos='noun', top_n=15)

5. Object Types — What Grammatical Category Follows Each Preposition?¶

What proportion of tokens following ל are nouns, verbs, suffixes, etc.?

In [ ]:

Copied!





print('Object types following לְ:')
display(prep_object_types('לְ'))

print('Object types following בְּ:')
display(prep_object_types('בְּ'))

print('Object types following מִן:')
display(prep_object_types('מִן'))
print('Object types following לְ:')
display(prep_object_types('לְ'))

print('Object types following בְּ:')
display(prep_object_types('בְּ'))

print('Object types following מִן:')
display(prep_object_types('מִן'))

6. Compare Two Prepositions Side by Side¶

לְ and אֶל both mark direction/indirect object. How do their nominal collocates differ?

In [ ]:

Copied!

compare_preps('לְ', 'אֶל', pos='noun', top_n=20)
compare_preps('לְ', 'אֶל', pos='noun', top_n=20)

In [ ]:

Copied!

# עַל vs. בְּ (spatial prepositions)
compare_preps('עַל', 'בְּ', pos='noun', top_n=20)
# עַל vs. בְּ (spatial prepositions)
compare_preps('עַל', 'בְּ', pos='noun', top_n=20)

7. Book-Group Focused Collocate Analysis¶

Does the collocate profile of a preposition change between the Torah and the Psalms?

In [ ]:

Copied!

print('לְ + noun collocates in Torah:')
display(prep_collocates('לְ', pos='noun', top_n=10, book_group='Torah'))

print('לְ + noun collocates in Writings:')
display(prep_collocates('לְ', pos='noun', top_n=10, book_group='Writings'))
print('לְ + noun collocates in Torah:')
display(prep_collocates('לְ', pos='noun', top_n=10, book_group='Torah'))

print('לְ + noun collocates in Writings:')
display(prep_collocates('לְ', pos='noun', top_n=10, book_group='Writings'))

8. Single-Book Focus¶

Drill into Genesis — what are the most common preposition + noun combinations?

In [ ]:

Copied!

print('Preposition frequency in Genesis:')
display(prep_frequency(book='Gen', top_n=10))

print('Top לְ + noun collocates in Genesis:')
display(prep_collocates('לְ', pos='noun', top_n=10, book='Gen'))
print('Preposition frequency in Genesis:')
display(prep_frequency(book='Gen', top_n=10))

print('Top לְ + noun collocates in Genesis:')
display(prep_collocates('לְ', pos='noun', top_n=10, book='Gen'))