API Reference¶
Query API — Narrative Guide¶
See Features & Code Examples for the full query API with worked examples.
Module Reference¶
bible_grammar
¶
core
¶
alignment
¶
Hebrew MT ↔ LXX translation equivalence analysis.
Since no freely-available word-level alignment dataset exists, this module uses verse-level co-occurrence statistics: for each OT verse, it pairs every Hebrew word (from TAHOT) with every Greek word (from LXX) in the same verse, then aggregates co-occurrence counts across the corpus.
This is a standard statistical approach to translation equivalence in computational biblical studies (cf. Tov 1999, Lust et al.), and works well for: - High-frequency verbs with consistent translations (e.g., עָשָׂה → ποιέω) - Asking "which Greek lemmas most often appear alongside Niphal of נָגַד?" - Comparing how different Hebrew stems of the same root are rendered in LXX
Limitations
- Multi-word verses introduce noise (a Greek word may co-occur with many Hebrew words in a long verse even if it only translates one of them)
- Short verses (~3-5 words) give much tighter, more reliable pairings
- Untranslated / added words in the LXX are not distinguished
build_alignment(heb_df=None, lxx_df=None)
¶
Build verse-level co-occurrence alignment between TAHOT and LXX.
Returns a DataFrame with columns
book_id, chapter, verse, heb_word, heb_strongs, heb_stem, heb_conjugation, heb_part_of_speech, lxx_word, lxx_lemma, lxx_lemma_translit, lxx_strongs, lxx_part_of_speech, lxx_tense, lxx_voice, lxx_mood
One row per (Hebrew word, Greek word) pair in the same verse.
hebrew_sources(*, lxx_lemma=None, lxx_strongs=None, heb_stem=None, heb_pos=None, book=None, min_count=2, top_n=20)
¶
Reverse lookup: given a Greek LXX lemma, what Hebrew roots/stems does it translate?
Parameters¶
lxx_lemma : Greek lemma (Unicode), e.g. 'ποιέω' lxx_strongs: Greek Strong's, e.g. 'G4160' heb_stem : Filter Hebrew results by stem book : Restrict to a book or list min_count : Minimum co-occurrence count top_n : Return top N results
Returns a DataFrame ranked by count
heb_strongs, heb_stem, heb_pos, count, pct
Examples¶
What Hebrew roots does ποιέω translate?¶
hebrew_sources(lxx_lemma='ποιέω')
What Hebrew stems does κύριος come from?¶
hebrew_sources(lxx_strongs='G2962')
save_alignment(df=None)
¶
Build (if needed) and save the alignment table.
translation_equivalents(*, heb_strongs=None, heb_stem=None, heb_conjugation=None, heb_pos=None, lxx_pos=None, book=None, book_group=None, min_count=2, top_n=20)
¶
Find the most common LXX Greek lemmas that co-occur with a Hebrew word/stem.
Parameters¶
heb_strongs : Hebrew Strong's number, e.g. 'H1254' (בָּרָא) or '{H1254A}' heb_stem : Stem filter, e.g. 'Niphal', 'Qal', 'Piel' heb_conjugation: Conjugation filter, e.g. 'Perfect', 'Imperfect' heb_pos : Part of speech filter, e.g. 'Verb', 'Noun' book : Restrict to a book or list of books book_group : 'torah', 'prophets', 'writings', etc. min_count : Minimum co-occurrence count to include top_n : Return top N lemmas by count
Returns a DataFrame ranked by co-occurrence count
lxx_lemma, lxx_lemma_translit, lxx_strongs, lxx_pos, count, pct (percentage of total matches)
Examples¶
What Greek lemmas translate בָּרָא (H1254)?¶
translation_equivalents(heb_strongs='H1254')
How is the Niphal of נָתַן (H5414) rendered in the LXX?¶
translation_equivalents(heb_strongs='H5414', heb_stem='Niphal')
What Greek verbs translate Qal verbs in Isaiah?¶
translation_equivalents(heb_pos='Verb', heb_stem='Qal', book='Isa')
db
¶
Persist and load the word DataFrame to/from SQLite and Parquet.
invalidate_cache()
¶
Clear all in-memory DataFrame caches (call after rebuilding the database).
load(parquet_path=PARQUET_PATH)
¶
Load TAHOT/TAGNT words from Parquet (cached). Falls back to SQLite.
load_lxx(parquet_path=LXX_PARQUET)
¶
Load LXX word data from Parquet (cached). Falls back to SQLite.
load_translations(parquet_path=TRANSLATIONS_PARQUET)
¶
Load translation verses from Parquet (cached). Falls back to SQLite.
save(df, db_path=DB_PATH, parquet_path=PARQUET_PATH)
¶
Write the words DataFrame to SQLite and Parquet.
save_lxx(df, db_path=DB_PATH, parquet_path=LXX_PARQUET)
¶
Write the LXX DataFrame to SQLite and Parquet.
save_translations(df, db_path=DB_PATH, parquet_path=TRANSLATIONS_PARQUET)
¶
Write the translations DataFrame to SQLite and Parquet.
ibm_align
¶
IBM Model 1 word-level alignment for Hebrew MT ↔ LXX.
Trains a statistical translation model on the full OT parallel corpus (~23,000 verse pairs) using the EM algorithm. Produces token-level P(Greek | Hebrew) and P(Hebrew | Greek) translation probabilities, then applies the intersection heuristic to build a confident word-level alignment.
Usage¶
from bible_grammar.core.ibm_align import build_word_alignment, load_word_alignment
Build and save (takes ~30s)¶
build_word_alignment()
Query¶
df = load_word_alignment()
df has: book_id, chapter, verse,¶
heb_word_num, heb_strongs, heb_word,¶
lxx_word_num, lxx_strongs, lxx_lemma,¶
p_h2g, p_g2h¶
build_word_alignment(n_iter=5, min_prob=0.1)
¶
Build word-level alignment using IBM Model 1 (intersection heuristic).
Trains P(LXX | Hebrew) and P(Hebrew | LXX) independently, then keeps only alignments where both directions agree (intersection). This gives high-precision alignments at the cost of some recall.
Parameters¶
n_iter : EM iterations (5 is typically sufficient for IBM Model 1) min_prob : Minimum probability threshold for inclusion
Returns a DataFrame saved to data/processed/word_alignment.parquet.
hebrew_sources_w(*, lxx_strongs=None, lxx_lemma=None, book=None, min_count=2, top_n=20)
¶
Uses IBM Model 1 intersection alignment.
translation_equivalents_w(*, heb_strongs=None, heb_stem=None, heb_pos=None, book=None, book_group=None, min_count=2, top_n=20)
¶
Find LXX Greek lemmas that are word-level aligned to a Hebrew lemma/stem. Uses IBM Model 1 intersection alignment — much more precise than verse-level co-occurrence.
Parameters match translation_equivalents() in alignment.py.
ingest
¶
Parse STEPBible TAHOT and TAGNT TSV files into pandas DataFrames.
lxx
¶
Load and export the CenterBLC/LXX (Rahlfs 1935) Septuagint via TextFabric.
Word-level features extracted
word, lemma, transliteration, gloss, strongs, part_of_speech, tense, voice, mood, person, number, gender, case_, morphology
Books are tagged as canonical (matching our OT book_id list) or deuterocanonical. Both are stored; callers can filter on is_deuterocanon.
load_lxx()
¶
Load the CenterBLC/LXX Septuagint into a DataFrame.
Requires TextFabric + GitHub access on first run (~200 MB download). Subsequent calls use the local TextFabric cache.
lxx_query
¶
LXX (Septuagint) query module.
Wraps the CenterBLC/LXX Rahlfs 1935 data already loaded into data/processed/lxx.parquet by lxx.py. Provides the same query API style as query.py and syntax.py — a filtered DataFrame + convenience helpers.
Columns in the parquet
source, book_id, lxx_book, chapter, verse, word_num, word, lemma, lemma_translit, transliteration, translation, strongs, morph_code, language, part_of_speech, tense, voice, mood, case_, number, gender, person, is_deuterocanon, stem, conjugation, state, noun_type, prefixes
Public API ────────── load_lxx_data() → full DataFrame (Parquet-cached) query_lxx(...) → filtered DataFrame lxx_freq_table(...) → frequency table of any column lxx_concordance(...) → concordance rows for a Strong's number lxx_verb_stats(...) → tense/voice/mood breakdown for a lemma or Strong's print_lxx_query(...) → formatted terminal summary
load_lxx_data(force_rebuild=False)
¶
Return the full LXX DataFrame from Parquet cache.
lxx_by_book(strongs=None, lemma=None, *, include_deuterocanon=False)
¶
Per-book occurrence count for a Strong's number or lemma. Books are in canonical OT order.
lxx_concordance(strongs, *, book=None, book_group=None, include_deuterocanon=False, top_n=None)
¶
Return concordance rows for a Strong's number — one row per occurrence, with book, chapter, verse, word form, lemma, gloss, and morphology.
lxx_freq_table(group_by, *, book=None, book_group=None, part_of_speech=None, include_deuterocanon=False, top_n=None)
¶
Frequency table over the LXX grouped by one or more columns.
Example¶
lxx_freq_table('tense', part_of_speech='Verb', book_group='prophets') lxx_freq_table(['book_id', 'part_of_speech'])
lxx_verb_stats(strongs=None, lemma=None, *, book=None, book_group=None, include_deuterocanon=False)
¶
Tense × Voice × Mood breakdown for a LXX verb lemma or Strong's number.
print_lxx_query(strongs=None, lemma=None, *, book=None, book_group=None, top_n=30, include_deuterocanon=False)
¶
Print a formatted summary for a LXX word: lexical info, total count, per-book distribution, and (for verbs) tense/voice/mood breakdown.
query_lxx(*, book=None, book_group=None, chapter=None, verse=None, strongs=None, lemma=None, part_of_speech=None, tense=None, voice=None, mood=None, case_=None, person=None, number=None, gender=None, include_deuterocanon=False)
¶
Filtered query over the LXX corpus.
Parameters¶
book : book_id or list of book_ids (e.g. 'Gen', ['Isa','Jer']) book_group : 'torah'|'historical'|'wisdom'|'prophets' chapter : chapter number verse : verse number strongs : Strong's G-number(s), e.g. 'G2316' or ['G2316','G2962'] lemma : Greek lemma string part_of_speech : 'Verb'|'Noun'|'Adjective'|… tense : 'Aorist'|'Present'|'Perfect'|'Imperfect'|'Future'|'Pluperfect' voice : 'Active'|'Middle'|'Passive' mood : 'Indicative'|'Subjunctive'|'Imperative'|'Infinitive'|'Participle'|'Optative' case_ : 'Nominative'|'Genitive'|'Accusative'|'Dative'|'Vocative' person : '1st'|'2nd'|'3rd' number : 'Singular'|'Plural'|'Dual' gender : 'Masculine'|'Feminine'|'Neuter' include_deuterocanon : include deuterocanonical / Apocrypha books (default False)
morphology
¶
Decode STEPBible morphology codes into structured fields.
Hebrew/Aramaic (TAHOT) Grammar column examples: HVqp3ms = Hebrew Verb Qal Perfect 3ms HNcmsa = Hebrew Noun common masc sing absolute HR/Ncfsa = two tokens joined (prefix / main word) HC/Td/Ncfsa = three tokens joined
Greek (TAGNT) dStrongs+Grammar column examples (after splitting on '='): N-NSF = Noun Nominative Singular Feminine V-AAI-3S = Verb Aorist Active Indicative 3rd Singular ADV = Adverb CONJ = Conjunction T-ASM = Article Accusative Singular Masculine
decode_greek(grammar_field)
¶
Decode the grammar portion of a TAGNT dStrongs+Grammar cell. The input should be just the grammar part (after stripping 'G1234=').
Examples:
'N-NSF' -> Noun Nominative Singular Feminine 'V-AAI-3S' -> Verb Aorist Active Indicative 3rd Singular 'V-PAP-DPM' -> Verb Present Active Participle Dative Plural Masculine 'ADV' -> Adverb 'CONJ' -> Conjunction
decode_hebrew(morph_code)
¶
Decode a full TAHOT Grammar cell which may contain slash-joined tokens. Returns fields for the main (last/rightmost) content word; prefix fields are stored under 'prefixes'.
extract_greek_grammar(dstrongs_grammar)
¶
Split a TAGNT 'dStrongs = Grammar' cell into (strongs, grammar_code). E.g. 'G0976=N-NSF' -> ('G0976', 'N-NSF')
peshitta_query
¶
Word-level Peshitta NT morphology via the ETCBC/syrnt Text-Fabric dataset.
Each row in the returned DataFrame represents one Syriac word token with
book, chapter, verse — location word — Syriac script sp — part of speech (noun, verb, prep, conj, pron, det, part, intj) gn — gender (m, f, NA) nu — number (s, p, NA) ps — person (p1, p2, p3, NA) st — state (emphatic, construct, absolute, NA) vs — verbal stem (peal, pael, aphel, ethpeel, ethpaal, ettaphal, NA) vt — verb tense / aspect (perf, impf, imptv, ptca, ptcp, inf, NA) stem_sedra — Sedra transliteration of the word form root_sedra — Sedra transliteration of the lexical root lexeme_sedra — Sedra transliteration of the dictionary lemma
query_peshitta()
¶
Load and return all Peshitta NT word-level morphology as a DataFrame.
query
¶
Filtered query API over the word DataFrame.
lxx_query(*, book=None, lxx_book=None, testament=None, chapter=None, verse=None, book_group=None, include_deuterocanon=False, part_of_speech=None, tense=None, voice=None, mood=None, person=None, number=None, gender=None, case_=None)
¶
Query the LXX Septuagint word data (CenterBLC/LXX, Rahlfs 1935).
Parameters¶
book : canonical book_id (e.g. 'Gen', 'Isa') or list lxx_book : LXX-native book name (e.g. 'Exod', '1Mac') testament : 'OT' (canonical) — NT not in LXX chapter / verse : filter to specific chapter or verse book_group : 'torah', 'prophets', 'writings', 'gospels', 'pauline' include_deuterocanon : include deuterocanonical books (default False) part_of_speech : 'Verb', 'Noun', 'Adjective', etc. tense : 'Aorist', 'Present', 'Perfect', etc. voice : 'Active', 'Middle', 'Passive' mood : 'Indicative', 'Participle', 'Infinitive', etc. person : '1st', '2nd', '3rd' number : 'Singular', 'Plural' gender : 'Masculine', 'Feminine', 'Neuter' case_ : 'Nominative', 'Genitive', 'Dative', 'Accusative', 'Vocative'
Examples¶
Aorist passives in Isaiah LXX¶
lxx_query(book='Isa', tense='Aorist', voice='Passive')
All verbs in LXX Genesis¶
lxx_query(book='Gen', part_of_speech='Verb')
Deuterocanonical books only¶
lxx_query(include_deuterocanon=True, lxx_book='Sir')
query(*, source=None, book=None, testament=None, chapter=None, verse=None, language=None, part_of_speech=None, stem=None, conjugation=None, tense=None, voice=None, mood=None, person=None, number=None, gender=None, case_=None, state=None, book_group=None)
¶
Return a filtered DataFrame of word rows.
All filters are case-insensitive substring matches unless noted. book_group accepts: 'torah', 'prophets', 'writings', 'gospels', 'pauline'
Examples¶
Niphal perfect verbs in Genesis¶
query(book='Gen', stem='Niphal', conjugation='Perfect')
All verbs in the Torah¶
query(book_group='torah', part_of_speech='Verb')
Greek aorist passive indicatives in Paul¶
query(book_group='pauline', tense='Aorist', voice='Passive', mood='Indicative')
reload()
¶
Force reload from disk (useful after rebuilding the database).
translation_query(*, translation=None, book=None, testament=None, chapter=None, verse=None, book_group=None, search=None)
¶
Query translation verses (KJV, VulgClementine).
Parameters¶
translation : 'KJV' or 'VulgClementine' (or list of both) book : book_id string or list, e.g. 'Gen' or ['Mat','Mrk'] testament : 'OT' or 'NT' chapter : filter to a single chapter number verse : filter to a single verse number book_group : 'torah', 'prophets', 'writings', 'gospels', 'pauline' search : case-insensitive substring search within verse text
Examples¶
translation_query(translation='KJV', book='Gen', chapter=1) translation_query(book_group='pauline', search='grace') translation_query(translation=['KJV','VulgClementine'], book='Jhn', chapter=3, verse=16)
reference
¶
Book reference metadata: canonical order, names, testament, chapter counts.
book_ids_for_group(group)
¶
Return book IDs in canonical order for a named group (e.g. 'torah', 'pauline').
syntax
¶
MACULA Greek syntax layer — general-purpose query API.
Wraps the Nestle1904 TSV from the macula-greek submodule and caches it as Parquet for fast reloads. All downstream modules (speaker, syntactic studies, discourse analysis) import from here rather than reading the TSV directly.
Schema (column subset that matters) ───────────────────────────────────── xml_id : unique word token ID (e.g. n40001001001) ref : "MAT 1:1!1" style reference book : our canonical book_id (e.g. "Mat") chapter : int verse : int word_num : int (1-based position in verse) text : surface form lemma : dictionary headword strong : Strong's number as plain integer string (e.g. "2424") strong_g : "G"-prefixed form compatible with rest of project (e.g. "G2424") morph : morph code (e.g. "N-NSF") class_ : noun / verb / adj / prep / conj / det / ptcl / adv type_ : common / proper / personal / relative / … role : s / v / o / io / p / vc / adv / aux / o2 gloss : English word gloss person : first / second / third number : singular / plural gender : masculine / feminine / neuter case_ : nominative / genitive / dative / accusative / vocative tense : aorist / present / perfect / future / imperfect / pluperfect voice : active / passive / middle mood : indicative / subjunctive / infinitive / participle / imperative / optative subjref : xml_id of the subject referent (links subject nouns/pronouns to clauses) referent : xml_id this word refers to (coreferential chain) domain : Louw-Nida semantic domain code(s)
Usage ───── from bible_grammar.core.syntax import ( load_syntax, query_syntax, speech_verbs, referent_chain, MACULA_BOOK_MAP, )
df = load_syntax() # full 137k-row DataFrame verbs = query_syntax(strong='2424') # all Jesus tokens speech = speech_verbs(book='Mat') # speech-introducing verbs in Matthew
clause_roles(book=None, chapter=None, verse=None)
¶
Return the syntactic role of each word in the given scope. Useful for building subject-verb-object triples per verse.
jesus_speaking_verses(books=None)
¶
Return a set of (book, chapter, verse) tuples where a speech verb has Jesus (Strong 2424) as its subject via subjref.
This is the core of speaker attribution for christological titles.
load_syntax(force_rebuild=False)
¶
Load MACULA Greek syntax DataFrame.
On first call, parses the TSV and caches as Parquet (~4 s). Subsequent calls load the Parquet (~0.3 s). Pass force_rebuild=True to re-parse the TSV.
query_syntax(*, book=None, chapter=None, verse=None, strong=None, lemma=None, role=None, class_=None, tense=None, voice=None, mood=None, case_=None, has_subjref=None, has_referent=None)
¶
Filtered query over the MACULA syntax table.
strong can be given as plain integer ('2424'), G-prefixed ('G2424'), or int (2424) — all three forms are accepted.
referent_chain(xml_id)
¶
Return all tokens whose 'referent' field points to xml_id — i.e., all tokens in the same co-reference chain.
speech_verbs(book=None, *, subject_strong=None)
¶
Return rows for speech-introducing verbs (λέγω, φημί, etc.).
If subject_strong is given (e.g. 2424 for Jesus), further restrict to verbs whose subjref points to a token with that Strong's number.
Returns a DataFrame with one row per speech-introducing verb token, with an added 'speaker_strong' column.
syntax_ot
¶
MACULA Hebrew syntax layer — general-purpose query API for the OT.
Wraps the WLC lowfat XML files from the macula-hebrew submodule and caches a flattened DataFrame as Parquet for fast reloads. Parallel to syntax.py (MACULA Greek) so the same query patterns work for both testaments.
Schema (word-level columns) ─────────────────────────── xml_id : unique token ID (e.g. o010010010011) ref : "GEN 1:1!1" style reference book : our canonical book_id (e.g. "Gen") chapter : int verse : int word_num : int (1-based position in verse) text : surface form with cantillation (unicode) lemma : lexical headword transliteration: transliteration strongnumberx : extended Strong's number (e.g. "7225") strong_h : "H"-prefixed form compatible with rest of project (e.g. "H7225") stronglemma : lemma of the Strong's entry morph : morphology code (e.g. "Vqp3ms") pos : part of speech (verb / noun / prep / conj / …) class_ : syntactic class (verb / noun / prep / …) type_ : qatal / wayyiqtol / common / proper / … lang : H (Hebrew) or A (Aramaic) stem : qal / niphal / piel / pual / hiphil / hophal / hithpael / … gender : masculine / feminine / common number : singular / plural / dual person : first / second / third state : absolute / construct / determined role : s / v / o / io / p / adv / … (syntactic role in clause) gloss : short English gloss english : contextual English translation frame : argument frame (e.g. "A0:id; A1:id;") subjref : xml_id of the subject referent participantref : xml_id of the antecedent for pronouns/suffixes greek : LXX Greek word (inline OT↔LXX alignment) greekstrong : LXX Strong's number (plain integer string, e.g. "4160") greek_g : "G"-prefixed LXX Strong's (e.g. "G4160") lexdomain : Louw-Nida style semantic domain coredomain : core semantic domain sdbh : SDBH semantic domain
Usage ───── from bible_grammar.core.syntax_ot import ( load_syntax_ot, query_syntax_ot, MACULA_OT_BOOK_MAP, )
df = load_syntax_ot() # full 475k-row DataFrame verbs = query_syntax_ot(strong_h='H1254') # all בָּרָא tokens niphal = query_syntax_ot(stem='niphal', book='Isa') lxx_paieo = query_syntax_ot(greekstrong='4160') # OT words translated as ποιέω
clause_roles_ot(book=None, chapter=None, verse=None)
¶
Return the syntactic role of each word in the given scope. Useful for subject-verb-object analysis per clause.
load_syntax_ot(force_rebuild=False)
¶
Load MACULA Hebrew syntax DataFrame.
On first call, parses all 930 lowfat XML files and caches as Parquet (~30–60 s depending on machine). Subsequent calls load from Parquet (~0.5 s). Pass force_rebuild=True to re-parse from XML.
lxx_alignment(strong_h=None, *, book=None, min_count=3, top_n=10)
¶
Return the LXX word(s) most frequently used to translate a Hebrew lemma, derived from the inline greek/greekstrong columns in MACULA Hebrew.
Returns a DataFrame: greek_g, greek_lemma, greekstrong, count, pct. This is a word-level OT↔LXX alignment from the syntax tree itself (as opposed to the IBM Model 1 alignment in ibm_align.py).
query_syntax_ot(*, book=None, chapter=None, verse=None, strong_h=None, strongnumberx=None, lemma=None, role=None, stem=None, pos=None, lang=None, tense=None, person=None, gender=None, number=None, state=None, greekstrong=None, has_subjref=None, has_participantref=None)
¶
Filtered query over the MACULA Hebrew syntax table.
strong_h accepts 'H7225', 'H7225A', or plain '7225'. greekstrong accepts G-prefixed ('G4160'), plain int (4160), or string ('4160'). tense is matched against the type_ column (e.g. 'qatal', 'wayyiqtol').
targum_query
¶
Load cached Targum verse texts downloaded by scripts/fetch_targum_data.py.
Coverage
Targum Onkelos — Gen, Exo, Lev, Num, Deu Targum Jonathan — Jos, Jdg, Isa, Jer, Ezk, Hos, Amo, Mic, Nah, Hab, Zec Targum to Psalms — Psa
Each row: targum, book_id, chapter, verse, text (Aramaic).
Usage
from bible_grammar.core.targum_query import load_targum tg = load_targum() onkelos_gen = tg[(tg.targum == 'Onkelos') & (tg.book_id == 'Gen')]
load_targum(targum=None, book_id=None)
¶
Load Targum data; optionally filter by targum name and/or book_id.
Parameters¶
targum : str, optional One of 'Onkelos', 'Jonathan', 'Psalms'. None returns all. book_id : str, optional OSIS book code (e.g. 'Gen', 'Isa', 'Psa'). None returns all.
Raises¶
FileNotFoundError
If the parquet cache is missing — run scripts/fetch_targum_data.py first.
discourse
¶
formulaic
¶
Formulaic language and fixed phrase detection for the Hebrew OT and Greek NT.
Biblical Hebrew and Greek contain hundreds of fixed expressions — prophetic formulas, doxological phrases, oath/blessing/curse formulas, epistolary greetings — that carry specialized theological or rhetorical functions.
This module provides
- N-gram frequency extraction (verse-boundary-safe)
- Formula search by lemma sequence with optional '*' wildcard
- Curated HEBREW_FORMULAS and GREEK_FORMULAS reference dictionaries
- Book/chapter distribution analysis
- Whole-corpus formula profile (all curated formulas at once)
Notes on lemma matching ─────────────────────── All pattern matching uses the MACULA Hebrew/Greek lemma column. Inflected forms (e.g. וַיְהִי) are matched via their root lemma (הָיָה). The special token '*' in a pattern matches any single lemma. Patterns must not cross verse boundaries (formulas are intra-verse).
Questions this answers ────────────────────── • How often does כֹּה אָמַר יְהוָה appear, and in which books? • Where does ἀμὴν λέγω ὑμῖν cluster in the Gospels? • What are the top-30 Hebrew bigrams in the Psalms? • Which books use the most prophetic formulas?
Public API ────────── HEBREW_FORMULAS → dict of curated OT formulas GREEK_FORMULAS → dict of curated NT formulas
ot_formula_frequency(n, min_count, book) → DataFrame (ngram, count) nt_formula_frequency(n, min_count, book) → DataFrame (ngram, count) ot_formula_search(pattern, book) → DataFrame (ref, context) nt_formula_search(pattern, book) → DataFrame (ref, context) formula_book_distribution(pattern, lang) → DataFrame (book, count, pct) ot_formula_profile() → DataFrame (all HEBREW_FORMULAS) nt_formula_profile() → DataFrame (all GREEK_FORMULAS)
print_formula_concordance(pattern, lang) → None print_formula_book_distribution(pattern, lang) → None print_ot_formula_profile() → None print_nt_formula_profile() → None print_ot_top_ngrams(n, min_count, top_n) → None print_nt_top_ngrams(n, min_count, top_n) → None
formula_book_chart(pattern, lang) → Path | None formula_chapter_chart(book, formula_key, lang) → Path | None
formula_book_distribution(pattern, *, lang='H')
¶
Count occurrences of a lemma pattern per book.
Returns: book, count, pct — ordered canonically.
formula_chapter_chart(book, formula_key, *, lang='H')
¶
Bar chart of formula occurrences per chapter within a single book.
nt_formula_frequency(n=2, *, min_count=5, book=None)
¶
Top Greek n-gram lemma sequences in the NT (verse-boundary-safe).
Returns: ngram (space-joined lemmas), count.
nt_formula_profile()
¶
Run all GREEK_FORMULAS searches and return counts.
Returns: key, gloss, function, pattern, count — sorted by count desc.
nt_formula_search(pattern, *, book=None, context_words=3)
¶
Search for a lemma sequence in the Greek NT.
pattern : list of lemmas or space-separated string. Use '*' as single-token wildcard. Returns: ref, book, chapter, verse, match_text, context.
ot_formula_frequency(n=2, *, min_count=5, book=None)
¶
Top Hebrew n-gram lemma sequences in the OT (verse-boundary-safe).
Returns: ngram (space-joined lemmas), count.
ot_formula_profile()
¶
Run all HEBREW_FORMULAS searches and return counts.
Returns: key, gloss, function, pattern, count — sorted by count desc.
ot_formula_search(pattern, *, book=None, context_words=3)
¶
Search for a lemma sequence in the Hebrew OT.
pattern : list of lemmas or space-separated string. Use '*' as single-token wildcard. Returns: ref, book, chapter, verse, match_text, context.
genre_compare
¶
Genre comparison: morphological pattern heatmaps across canonical sections.
Compares how grammatical features distribute differently across the literary genres of the Hebrew OT and Greek NT:
OT comparisons ────────────── • Verb stem distribution (Qal / Niphal / Piel / Hiphil / …) • Verb conjugation (Perfect / Imperfect / Wayyiqtol / Participle / …) • Part-of-speech mix (Noun / Verb / Adjective / Particle / …) • Noun state (Absolute / Construct)
NT comparisons ────────────── • Verb tense (Aorist / Present / Perfect / Imperfect / Future) • Verb voice (Active / Middle / Passive / Deponent) • Verb mood (Indicative / Participle / Infinitive / Subjunctive / Imperative) • Part-of-speech mix
Usage¶
from bible_grammar.discourse.genre_compare import genre_compare, print_genre_compare from bible_grammar.discourse.genre_compare import genre_heatmap, genre_report
Terminal tables¶
print_genre_compare('OT', feature='verb_stem') print_genre_compare('NT', feature='verb_tense')
Heatmap chart¶
genre_heatmap('OT', feature='verb_conjugation', output_path='output/charts/ot-genre-conjugation.png')
Full Markdown report¶
genre_report(output_dir='output/reports')
genre_compare(corpus='OT', feature='verb_stem', *, normalize=True)
¶
Build a genre×feature count (or percentage) matrix.
Parameters¶
corpus : 'OT' or 'NT' feature : one of OT: 'verb_stem', 'verb_conjugation', 'pos', 'noun_state' NT: 'verb_tense', 'verb_voice', 'verb_mood', 'pos' normalize: if True, values are % of tokens in each genre (row %)
Returns¶
DataFrame with genres as rows and feature categories as columns. Includes a 'Total' column with the absolute token count.
genre_heatmap(corpus='OT', feature='verb_stem', *, output_path=None, pct=True, figsize=None, title=None)
¶
Produce and save a heatmap: rows=genres, cols=feature categories.
Returns the path to the saved PNG.
genre_report(output_dir='output/reports/both/genre', *, ot_features=None, nt_features=None)
¶
Generate a comprehensive Markdown genre comparison report.
Returns path to the saved Markdown file.
print_genre_compare(corpus='OT', feature='verb_stem')
¶
Print a formatted genre comparison table to stdout.
information_structure
¶
Information structure analysis — clause-linking, parataxis/hypotaxis ratios, fronted elements, and postpositive particle profiles for Hebrew OT and Greek NT.
Scope and limitations True topic/focus identification requires full syntactic annotation beyond what MACULA provides. This module computes well-defined approximations: - Pre-verbal element counts from verb-form order analysis (OT) - Parataxis/hypotaxis ratios from connective inventory counts (OT + NT) - Explicit subject-pronoun frequency as a potential focus marker (NT) - Postpositive discourse particle density (NT) These are labeled as proxies throughout; they are linguistic metrics, not full discourse analyses.
Hebrew metrics ────────────── parataxis_ratio — verses beginning with וְ/וַ (wayyiqtol or waw-qatal) hypotaxis_ratio — subordinate clauses (כִּי, אֲשֶׁר, inf. construct, rel.) fronted_ratio — non-verb-initial clauses (nominal, adverbial before verb) nominal_clause_pct — clauses with no verb (copula-free)
Greek metrics ───────────── de_density — δέ per 1,000 tokens (topic shift / continuation) gar_density — γάρ per 1,000 tokens (explanatory/given) oun_density — οὖν per 1,000 tokens (inferential) men_density — μέν per 1,000 tokens (correlative/contrast) asyndeton_pct — verses with no clause-linking particle explicit_subj_pct — finite verb clauses with explicit subject pronoun
Questions this answers ────────────────────── • Is Deuteronomy more hypotactic than Genesis? (law vs. narrative) • Which NT book uses the most γάρ? (Paul is dominant) • How does John's clause-linking compare to Mark's? • Which OT books have the highest fronted-element ratio?
Public API ────────── ot_information_profile(book) → dict nt_information_profile(book) → dict ot_clause_linking_comparison(books) → DataFrame nt_clause_linking_comparison(books) → DataFrame
print_ot_information_profile(book) → None print_nt_information_profile(book) → None print_ot_clause_linking_comparison(books) → None print_nt_clause_linking_comparison(books) → None
nt_clause_linking_chart(books) → Path | None nt_information_heatmap(books) → Path | None ot_clause_linking_chart(books) → Path | None
nt_clause_linking_chart(books)
¶
Stacked bar chart of NT postpositive particle densities per book.
nt_clause_linking_comparison(books)
¶
Side-by-side information structure metrics for NT books.
nt_information_heatmap(books)
¶
Heatmap of NT information structure metrics across books.
nt_information_profile(book)
¶
Clause-linking and information structure metrics for an NT Greek book.
Returns a dict with: total_tokens, de_density, gar_density, oun_density, men_density, kai_density, explicit_subj_pct.
ot_clause_linking_chart(books)
¶
Grouped bar chart of OT parataxis, hypotaxis, and fronted element ratios.
ot_clause_linking_comparison(books)
¶
Side-by-side information structure metrics for OT books.
ot_information_profile(book)
¶
Clause-linking and information structure metrics for an OT Hebrew book.
Returns a dict with: total_tokens, parataxis_ratio, hypotaxis_ratio, fronted_ratio, nominal_clause_pct, inf_construct_per1k.
speaker
¶
Speaker attribution for NT direct speech.
Combines two complementary strategies
- Verse allowlists — hard-coded verse sets for titles where the self- referential instances are known and invariant (e.g. the 7 Johannine I AM predicate sayings).
- MACULA subjref — speech verbs (λέγω, φημί, …) whose subjref field links to a Jesus token (Strong 2424), giving a verse-level set of "Jesus-speaking" verses detected from the syntax tree.
Strategy 2 is built once and cached as a frozenset for the session.
Public API ────────── is_jesus_speaking(book, chapter, verse) → bool jesus_speaking_verse_set(books) → frozenset of (book, ch, vs)
The christological_titles module uses both
• Allowlist titles bypass the strategy-2 filter (confidence='high' with an explicit self_ref_verses list). • For medium/low titles, filter the co-occurrence results to only verses where is_jesus_speaking() is True.
filter_to_jesus_speech(verses, title=None)
¶
From a list of (book, chapter, verse) tuples, return only those where Jesus is speaking. Uses allowlist if available, else MACULA subjref.
is_jesus_speaking(book, chapter, verse, title=None)
¶
Return True if Jesus is the speaker in the given verse.
Checks in order
- If title is given and has an allowlist, use that (exact and fast).
- Otherwise fall back to MACULA subjref detection.
jesus_speaking_verse_set(books=None, *, force_rebuild=False)
¶
Return a frozenset of (book_id, chapter, verse) tuples where a speech-introducing verb has Jesus (Strong 2424) as its grammatical subject, as determined by MACULA's subjref links.
Cached after first call; pass force_rebuild=True to re-derive.
speech_acts
¶
Speech act classification for biblical direct discourse (OT Hebrew + NT Greek).
Applies a rule-based classifier based on Searle (1969) speech act taxonomy to verses with direct speech, using morphological and lexical cues.
Speech act types ──────────────── ASSERTIVE — claims a state of affairs (predicate nominals, "I am YHWH") DIRECTIVE — commands/requests (imperatives, negated yiqtol/jussive) COMMISSIVE — commits speaker to future action (1st-person yiqtol + promise) EXPRESSIVE — praise, lament, thanksgiving (interjections, doxological lemmas) DECLARATIVE — changes reality by utterance (blessing, curse, naming, forgiveness) MIXED — verse with multiple dominant types (most common in short verses)
Classification is probabilistic: the type with the highest cue score wins. When no cue fires, the verse is classified 'unclassified'.
Notes ───── This is a lexical/morphological approximation. Full illocutionary analysis requires pragmatic context. Edge cases (rhetorical questions as directives, indirect speech) are not handled.
Questions this answers ────────────────────── • What speech act types dominate YHWH's speech in Isaiah vs. Jeremiah? • How does Jesus's directive frequency compare between Matthew and John? • Which Pauline letters have the most commissive (promise) content? • Are Deuteronomy's laws primarily directives or also declaratives?
Public API ────────── SPEECH_ACT_TYPES → list of type labels OT_SPEECH_CUE_WEIGHTS → dict of cue definitions (OT) NT_SPEECH_CUE_WEIGHTS → dict of cue definitions (NT)
ot_speech_act_data(book, speaker) → DataFrame (verses with type) ot_speech_act_profile(speaker, book) → DataFrame (type, count, pct) nt_speech_act_data(book) → DataFrame nt_speech_act_profile(book) → DataFrame
print_ot_speech_act_profile(speaker, book) → None print_nt_speech_act_profile(book) → None print_speech_act_comparison(speakers, lang) → None
speech_act_chart(speaker_or_book, lang) → Path | None speech_act_heatmap(speakers, lang) → Path | None
nt_speech_act_comparison(books)
¶
Side-by-side speech act profiles for a list of NT books.
nt_speech_act_data(book=None)
¶
NT verses classified by speech act type.
Returns: ref, book, chapter, verse, speech_act_type.
nt_speech_act_profile(book=None)
¶
Distribution of speech act types for an NT book.
ot_speech_act_comparison(books, *, speaker=None)
¶
Side-by-side speech act type profiles for a list of OT books.
Returns a pivot: rows=speech_act_type, cols=books, cells=% of book's verses.
ot_speech_act_data(book=None, *, speaker=None)
¶
OT verses classified by speech act type.
Optionally filter by book and/or by speaker lemma (e.g., speaker='יְהוָה' for YHWH's speech).
Returns: ref, book, chapter, verse, speech_act_type — one row per verse.
ot_speech_act_profile(speaker=None, *, book=None)
¶
Distribution of speech act types for an OT book/speaker.
Returns: speech_act_type, count, pct.
speech_act_chart(books, *, lang='H', speaker=None)
¶
Stacked bar chart of speech act type percentages per book.
speech_act_heatmap(books, *, lang='H', speaker=None)
¶
Heatmap of speech act percentages: rows=type, cols=books.
stylometrics
¶
Register and style analysis — authorship stylometrics for the Hebrew OT and Greek NT.
Quantitative style metrics that capture lexical richness, syntactic register, and morphological fingerprints for individual books or proposed authorial units.
Hebrew metrics ────────────── ttr — type-token ratio (unique lemmas / total tokens) msttr — mean segmental TTR (window-based, fair for cross-length comparison) hapax_density — hapax lemma tokens / total tokens (%) wayyiqtol_density — wayyiqtol tokens / total tokens (%) inf_construct_density — infinitive constructs per 1,000 tokens asher_density — אֲשֶׁר (relative clause marker) per 1,000 tokens particle_density — key discourse particles per 1,000 tokens verb_pct — verbal tokens as % of total noun_pct — nominal tokens as % of total
Greek metrics ───────────── ttr msttr hapax_density ptc_to_finite_ratio — participle tokens / finite verb tokens optative_density — optative verbs per 1,000 tokens hina_density — ἵνα per 1,000 tokens inf_density — infinitive verbs per 1,000 tokens verb_pct noun_pct
Questions this answers ────────────────────── • Is Isaiah 1–39 stylistically different from Isaiah 40–66? • Which Pauline letters cluster together by style? • How does Mark's Greek compare to Luke's in participle usage? • Which OT books have the richest vocabulary (MSTTR)?
Public API ────────── book_style_profile(book, lang) → dict of all style metrics style_comparison(books, lang) → DataFrame (books × metrics) msttr(book, lang, window) → float
print_style_profile(book, lang) → None print_style_comparison(books, lang) → None
style_radar_chart(books, lang) → Path | None style_heatmap(books, lang) → Path | None
book_style_profile(book, *, lang='H')
¶
Compute all style metrics for a single book.
Returns a dict keyed by metric name. All density values are per 1,000 tokens unless otherwise noted.
msttr(book, *, lang='H', window=1000)
¶
Mean Segmental TTR for a book. Window default = 1,000 tokens.
style_comparison(books, *, lang='H')
¶
Style metrics for a list of books, side by side.
Returns a DataFrame with books as rows and style metrics as columns. Counts are normalized per 1,000 tokens where applicable.
exercise_pdf
¶
exercise_pdf package — re-exports everything for backwards compatibility.
All public names remain importable from src.bible_grammar.exercise_pdf as before.
ExercisePDF
¶
add_answer_key_contrast(entries)
¶
Draw a compact answer key page for the contrast drill.
add_answer_key_sort(entries)
¶
Draw a compact answer key page for the function-sort exercise.
add_bg_table(entries, show_answers=True)
¶
Draw a Biconsonantal/Geminate drill table with 5 fillable fields per row.
add_contrast_table(entries, show_answers=True)
¶
Draw a Qal-Hiphil contrast table with fillable Translation and Function columns.
add_coverage_table(rows)
¶
Draw a simple 2-col coverage table.
add_drill_with_answer_key(headers, rows, answers, col_ratios=None, heb_cols=None, translit_cols=None, section_title='Items 1–20', answer_title='Answer Key', greek_cols=None, use_greek=False, answer_heb_cols=None)
¶
Render a drill section followed by an answer key section.
add_generic_table(headers, rows, col_ratios=None, heb_cols=None, translit_cols=None, show_answers=True, answer_rows=None, answer_heb_cols=None)
¶
Draw a generic parse table with arbitrary columns.
headers: column header strings rows: list of row data (each row is a list of strings, same length as headers) col_ratios: proportional widths (must sum to ~1.0); if None, equal widths heb_cols: indices of columns that contain Hebrew/Aramaic text (right-aligned, ArialHebrew font) translit_cols: indices of columns that contain transliteration (Latin Extended / IPA). Uses Arial Unicode MS if registered, otherwise falls back to Helvetica. show_answers: if True, draw green answer rows below each input row answer_rows: if show_answers, the answer data (same shape as rows); if None, answers = rows answer_heb_cols: additional column indices that are Hebrew only in answer rows (blank input in question row)
add_multi_part_drill(parts, heb_cols=None, translit_cols=None, greek_cols=None, use_greek=False)
¶
Render multiple drill parts each followed by their answer key.
list of dicts with keys:
title — section heading string (e.g. 'Part A — Long Vowels (1–5)') headers — column header list rows — blank/prompt rows answers — answer rows col_ratios — (optional) override col_ratios for this part translit_cols — (optional) override translit_cols for this part greek_cols — (optional) override greek_cols for this part
add_nh_table(entries, show_answers=True)
¶
Draw a Niphal-Hiphil contrast table with fillable Stem/Conjugation/PGN/Root columns.
add_note(text)
¶
Draw a note/info box.
add_passage(block)
¶
Draw ref + Hebrew + English (+ optional watchout).
add_sort_table(entries, show_answers=True)
¶
Draw a semantic-function sorting table with a single fillable Function column.
add_verb_table(verbs, show_answers=True)
¶
Draw the parse table for one or more verbs.
GreekExercisePDF
¶
Bases: ExercisePDF
Base class for BBG Greek exercises. Uses GreekFont for Greek text columns.
add_greek_table(headers, rows, col_ratios=None, greek_cols=None, show_answers=True, answer_rows=None)
¶
Draw a parse table with Greek text support. greek_cols: indices of columns that display Greek (rendered with GreekFont). Other columns behave like add_generic_table.
__main__
¶
Run all exercise PDF builders.
Usage
python3 -m src.bible_grammar.exercise_pdf python3 src/bible_grammar/exercise_pdf/main.py
intertextuality
¶
intertextuality
¶
Intertextuality network: OT verse / chapter / book → NT quotations.
Given any OT anchor (verse, chapter, or whole book), finds all NT verses that quote or allude to it via the scrollmapper cross-reference data (OpenBible.info, CC-BY), scored by community vote confidence.
Three query modes¶
verse — single OT verse → all NT citations (e.g. Isa 53:5) chapter — OT chapter → NT citation network (e.g. Psa 22) book — whole OT book → NT network overview (e.g. Isaiah)
Output¶
• Terminal table (reference, NT verse, votes, KJV text) • NetworkX graph with matplotlib layout saved as PNG • Standalone HTML report (table + embedded graph + KJV snippets) • CSV of all edges
Usage¶
from bible_grammar.intertextuality.intertextuality import ( intertextuality, print_intertextuality, intertextuality_graph, intertextuality_report, )
Terminal output¶
print_intertextuality('Isa', chapter=53) print_intertextuality('Psa', chapter=22) print_intertextuality('Isa', chapter=53, verse=5)
Network graph PNG¶
intertextuality_graph('Isa', chapter=53, output_path='output/charts/isa53-network.png')
Full HTML + CSV report¶
intertextuality_report('Isa', chapter=53, output_dir='output/reports')
intertextuality(ot_book, *, chapter=None, verse=None, min_votes=20, include_kjv=True)
¶
Return a DataFrame of OT→NT quotation links.
Parameters¶
ot_book : OT book ID (e.g. 'Isa', 'Psa', 'Gen') chapter : OT chapter (None = all chapters in the book) verse : OT verse (None = all verses in the chapter) min_votes : minimum community-vote score for inclusion include_kjv: fetch KJV text snippets for each NT verse
Returns¶
DataFrame with columns: ot_ref, ot_book, ot_chapter, ot_verse, nt_ref, nt_book, nt_chapter, nt_verse, votes, ot_text, nt_text
intertextuality_graph(ot_book, *, chapter=None, verse=None, min_votes=20, output_path=None, figsize=(14, 9), layout='spring')
¶
Render the OT→NT quotation network as a PNG.
Node types
• OT verses — square markers, blue • NT books — circle markers, coral (book-level aggregation) • NT verses — circle markers, orange (if verse/chapter scope)
Edge weight proportional to vote score.
Returns the path to the saved PNG.
intertextuality_report(ot_book, *, chapter=None, verse=None, min_votes=20, output_dir='output/reports/both/intertextuality')
¶
Generate an HTML + CSV report for an OT→NT intertextuality network.
Returns path to the saved Markdown (HTML) file.
print_intertextuality(ot_book, *, chapter=None, verse=None, min_votes=20)
¶
Print a formatted intertextuality table to stdout.
lxx_consistency
¶
LXX translation consistency: how uniformly does each LXX book/translator render a given Hebrew root?
Uses IBM Model 1 word-level alignment to measure
- Overall consistency score (0–100): percentage of aligned tokens that use the most common LXX rendering for that book
- Per-book rendering profile: which Greek lemma(s) each book uses and how often
- Cross-book divergences: books whose primary rendering differs from the corpus-wide primary
High consistency (>90%) = the LXX translator treated this word uniformly. Low consistency or cross-book divergence may indicate: - Different translation philosophy between books - Semantic range of the Hebrew root not captured by one Greek word - Textual / recensional differences between LXX traditions
Usage¶
from bible_grammar.intertextuality.lxx_consistency import ( lxx_consistency, print_lxx_consistency, consistency_heatmap )
Single root¶
lxx_consistency('H7307') # רוּחַ spirit/wind
Print formatted report¶
print_lxx_consistency('H7307')
Multiple roots¶
print_lxx_consistency('H2617') # חֶסֶד lovingkindness
Heatmap of rendering choices across books¶
consistency_heatmap('H7307', output_path='output/charts/ruach-lxx-consistency.png')
batch_consistency(roots, *, min_count=3)
¶
Run lxx_consistency() for multiple roots; return a summary DataFrame.
strongs, lemma, gloss, total_aligned, corpus_primary,
corpus_primary_pct, overall_consistency, n_divergent_books
consistency_heatmap(roots, *, min_count=3, output_path=None, title=None)
¶
Generate a heatmap of LXX rendering choices across books for one or more Hebrew roots.
Rows = LXX lemmas, Columns = OT books, Cell value = % of tokens in that book using that rendering. Books/lemmas with no data are blank.
Parameters¶
roots : Single strongs string or list min_count : Minimum tokens per book to include output_path : Save to PNG if provided; otherwise display inline title : Chart title (auto-generated if None)
lxx_consistency(heb_strongs, *, min_count=3, min_book_count=2)
¶
Measure per-book LXX translation consistency for a Hebrew root.
Parameters¶
heb_strongs : Hebrew Strong's number, e.g. 'H7307', 'H1697' min_count : Minimum total aligned tokens in a book to include it min_book_count : Minimum books required for a meaningful analysis
Returns a dict
strongs : normalised strongs lemma : Hebrew lemma (from lexicon) gloss : English gloss total_aligned : total word-level alignment tokens corpus_primary : most common LXX lemma across all books corpus_primary_pct: percentage using corpus_primary overall_consistency: weighted average consistency across books books : list of per-book dicts (sorted by book order): { book_id, book_name, total, # aligned tokens in this book primary_lemma, # most-used LXX lemma in this book primary_pct, # % using primary_lemma consistency, # same as primary_pct (0–100) diverges, # True if primary_lemma != corpus_primary rendering_profile: dict {lxx_lemma: count} } divergent_books : list of book_ids where primary != corpus_primary
print_lxx_consistency(heb_strongs, *, min_count=3)
¶
Print a formatted LXX translation consistency report.
parallel
¶
Parallel passage viewer: Hebrew MT | LXX Greek | KJV side by side.
For OT passages, shows all three columns verse by verse. For NT passages, shows Greek (TAGNT) | KJV. Word-level detail is also available for deeper analysis.
Usage¶
from bible_grammar.intertextuality.parallel import parallel_passage, print_parallel, parallel_words
OT passage — three columns¶
parallel_passage('Gen', 1, 1, end_verse=5)
NT passage — two columns¶
parallel_passage('Jhn', 1, 1, end_verse=14)
Cross-chapter range¶
parallel_passage('Isa', 53, 1, end_chapter=53, end_verse=12)
Print formatted to console¶
print_parallel('Gen', 1, 1, end_verse=5)
Word-level detail for a single verse¶
parallel_words('Gen', 1, 1)
parallel_passage(book_id, start_chapter, start_verse, *, end_chapter=None, end_verse=None, include_lxx=True, include_vulgate=False)
¶
Build a verse-by-verse parallel passage table.
Parameters¶
book_id : Bible book ID (e.g. 'Gen', 'Isa', 'Jhn') start_chapter : Starting chapter number start_verse : Starting verse number end_chapter : Ending chapter (defaults to start_chapter) end_verse : Ending verse (defaults to start_verse) include_lxx : Include LXX column for OT books (default True) include_vulgate: Include Latin Vulgate column (default False)
Returns¶
DataFrame with columns: reference, hebrew (OT) or greek_nt (NT), lxx (OT only), kjv, [vulgate]
Examples¶
parallel_passage('Gen', 1, 1, end_verse=5) parallel_passage('Isa', 53, 1, end_verse=12) parallel_passage('Jhn', 1, 1, end_verse=18) parallel_passage('Psa', 22, 1, end_verse=5)
parallel_words(book_id, chapter, verse, *, include_lxx=True)
¶
Word-level parallel for a single verse.
Returns a dict with
'reference': str 'hebrew' : DataFrame of TAHOT words (OT only) 'lxx' : DataFrame of LXX words (OT only) 'greek_nt' : DataFrame of TAGNT words (NT only) 'kjv' : str
print_parallel(book_id, start_chapter, start_verse, *, end_chapter=None, end_verse=None, include_lxx=True, include_vulgate=False, width=72)
¶
Print a formatted parallel passage to the console.
Each verse is printed as a block: reference header, then each text column labeled and wrapped.
quotation_align
¶
NT quotation word alignment: traces which Hebrew words NT authors quote, and whether they follow LXX vocabulary or diverge toward the MT.
For each NT→OT quotation pair, this module: 1. Fetches the NT verse (TAGNT), OT verse (TAHOT), and LXX verse 2. Identifies which NT Greek words appear in the LXX rendering of the same OT verse (LXX-following words) 3. Identifies NT Greek words whose Hebrew root equivalent (via IBM Model 1 alignment) differs from the LXX rendering (MT-leaning divergences) 4. Produces per-word alignment verdicts: LXX | MT-diverge | neutral
Usage¶
from bible_grammar.intertextuality.quotation_align import quotation_align, print_quotation_align
Single NT verse¶
quotation_align('Mat', 4, 4)
Print formatted analysis¶
print_quotation_align('Mat', 4, 4)
Batch: all high-confidence quotations in Hebrews¶
from bible_grammar.intertextuality.quotation_align import batch_align df = batch_align(nt_book='Heb', min_votes=50)
batch_align(*, nt_book=None, ot_book=None, min_votes=50)
¶
Run quotation_align() across all matching NT quotations.
Returns a summary DataFrame with one row per (NT verse, OT ref): nt_ref, ot_ref, votes, lxx_following_pct, mt_diverge_count, summary, total_content_words
print_quotation_align(nt_book, nt_chapter, nt_verse, *, min_votes=5)
¶
Print a formatted word-alignment analysis for an NT verse.
quotation_align(nt_book, nt_chapter, nt_verse, *, min_votes=5, content_only=True)
¶
Word-level alignment analysis for an NT verse's OT quotations.
For each OT cross-reference, returns a list of word-alignment dicts, one per content word in the NT verse.
Parameters¶
nt_book/nt_chapter/nt_verse : NT verse coordinates min_votes : Minimum cross-reference vote threshold content_only: If True, only analyse nouns, verbs, adjectives, adverbs
Returns a list of dicts, one per (NT verse, OT ref) pair: { nt_ref : str e.g. 'Mat 4:4' ot_ref : str e.g. 'Deu 8:3' votes : int words : list of per-word dicts: { nt_word : surface form nt_strongs : normalised Greek strongs nt_lemma : lemma (from TAGNT) nt_pos : part of speech lxx_match : bool — this strongs appears in the LXX verse heb_root : str — Hebrew strongs aligned to this Greek word (IBM Model 1) heb_word : str — Hebrew surface form verdict : 'LXX' | 'MT-diverge' | 'LXX+MT' | 'neutral' LXX = matches LXX exactly (Greek strongs in LXX verse) MT-diverge= NT word's Hebrew root is present in OT verse but the LXX renders it differently LXX+MT = both (LXX uses this strongs AND it aligns to OT Hebrew) neutral = function word or no alignment data } lxx_following_pct : float — % content words that follow LXX mt_diverge_count : int — content words that diverge from LXX toward MT summary : str — 'follows LXX' | 'mixed' | 'MT-leaning' }
quotations
¶
NT quotations / allusions of the OT — three-way text comparison.
Cross-reference data from scrollmapper (OpenBible.info CC-BY). For each NT→OT reference pair, retrieves: - NT verse (TAGNT Greek word forms) - OT Hebrew verse (TAHOT) - LXX Greek verse (CenterBLC/LXX)
Usage¶
from bible_grammar.intertextuality.quotations import nt_quotations, verse_comparison
All NT→OT references with votes >= 50¶
df = nt_quotations(min_votes=50)
Detailed three-way comparison for a specific NT verse¶
cmp = verse_comparison('Heb', 2, 8)
nt_quotations(*, nt_book=None, ot_book=None, min_votes=10, top_n=None)
¶
Return NT→OT cross-references, filtered by relevance vote score.
Parameters¶
nt_book : Restrict NT side to one or more book_ids (e.g. 'Heb', 'Rom') ot_book : Restrict OT side to one or more book_ids (e.g. 'Isa', 'Psa') min_votes : Minimum vote score (higher = stronger consensus). Default 10. Use >= 50 for probable direct quotes; >= 100 for certain quotes. top_n : Return only the top N results sorted by votes descending.
Returns¶
DataFrame with columns: nt_book, nt_chapter, nt_verse, ot_book, ot_chapter, ot_verse, votes, from_ref_raw, to_ref_raw
quotation_summary(*, nt_book=None, ot_book=None, min_votes=25)
¶
Summarize NT→OT quotation density: how many quotation pairs per NT book.
Returns a DataFrame with columns
nt_book, total_references, unique_nt_verses, unique_ot_verses, top_ot_source
quotation_table(book, chapter, verse, *, min_votes=5)
¶
Tabular form of verse_comparison — one row per (NT word position, OT ref), suitable for display in a notebook.
nt_verse_ref, ot_verse_ref, votes,
nt_words (space-joined), ot_words (space-joined), lxx_words (space-joined)
verse_comparison(book, chapter, verse, *, min_votes=5)
¶
Three-way comparison for an NT verse: NT Greek | OT Hebrew | LXX Greek.
Parameters¶
book, chapter, verse : NT verse reference (book_id, e.g. 'Heb') min_votes : Minimum vote threshold for cross-references to include
Returns a dict with
'nt' : list of word dicts (word, lemma, strongs, pos, tense, voice, mood) 'refs': list of {ot_ref, ot_words, lxx_words, votes} dicts, one per OT target
lexical
¶
collocation
¶
Collocation statistics: find words that co-occur with a target word significantly more often than chance predicts.
Uses Pointwise Mutual Information (PMI) and log-likelihood ratio (G²) as association measures. Words with high PMI appear near the target far more often than their corpus frequency alone would predict.
Usage¶
from bible_grammar.lexical.collocation import collocations, print_collocations
What words appear near רוּחַ (spirit) in the OT?¶
print_collocations('H7307', window=5, corpus='OT')
What words cluster around λόγος in the NT?¶
print_collocations('G3056', window=5, corpus='NT')
Return raw DataFrame for notebook display¶
df = collocations('H7307', window=5, corpus='OT')
collocation_network(targets, *, window=5, corpus='OT', min_count=3, top_n=15, output_path=None)
¶
Generate a collocation network chart for one or more target roots.
Rows = target roots, Cols = top collocates (by combined log-likelihood), Cells = observed co-occurrence count. Saved as PNG.
collocations(target, *, window=5, corpus='OT', book=None, book_group=None, min_count=3, top_n=30, exclude_particles=True)
¶
Find words that significantly co-occur with a target Hebrew or Greek root.
Parameters¶
target : Strong's number, e.g. 'H7307', 'G3056' window : Number of words either side to consider a co-occurrence corpus : 'OT', 'NT', or 'LXX' book : Restrict to a single book (e.g. 'Gen', 'Rom') book_group : Restrict to a book group ('torah', 'pauline', etc.) min_count : Minimum raw co-occurrence count to include a collocate top_n : Return top N collocates by PMI exclude_particles: If True, skip Hebrew grammatical particles (H9xxx)
Returns a DataFrame with columns
strongs, lemma, gloss, co_count, target_count, collocate_count, corpus_size, pmi, log_likelihood, expected
Sorted by log_likelihood descending.
print_collocations(target, *, window=5, corpus='OT', book=None, min_count=3, top_n=20)
¶
Print a formatted collocation report for a target Hebrew/Greek root.
concordance
¶
Works across all three corpora
- TAHOT (Hebrew/Aramaic OT) — match on strongs or surface word
- TAGNT (Greek NT) — match on strongs or surface word
- LXX (Greek OT) — match on lemma, lemma_translit, or strongs
Usage¶
from bible_grammar.lexical.concordance import concordance, lemma_frequency
Every occurrence of בָּרָא (H1254) in the OT¶
concordance(strongs='H1254')
Qal stem only, with KJV context¶
concordance(strongs='H1254', stem='Qal', context='KJV')
Greek ποιέω in NT¶
concordance(strongs='G4160', corpus='NT')
ποιέω in LXX with book filter¶
concordance(strongs='G4160', corpus='LXX', book='Gen')
Frequency table: how often does each lemma appear?¶
lemma_frequency(strongs='H5414', corpus='OT')
concordance(*, strongs=None, word=None, lemma=None, lemma_translit=None, stem=None, part_of_speech=None, corpus='OT', book=None, book_group=None, context='KJV', include_hebrew=True, include_lxx=False, sort_by='canonical')
¶
Find all occurrences of a lemma with verse context.
Parameters¶
strongs : Strong's number, e.g. 'H1254' or 'G4160' word : Match on surface word form (partial match) lemma : Match on lemma (LXX Unicode lemma, e.g. 'ποιέω') lemma_translit : Match on LXX transliterated lemma, e.g. 'poieo' stem : Filter by Hebrew stem (Qal, Niphal, etc.) part_of_speech : Filter by POS (Verb, Noun, etc.) corpus : 'OT', 'NT', or 'LXX' book : book_id or list of book_ids to restrict search book_group : 'torah', 'prophets', 'writings', 'gospels', 'pauline' context : 'KJV', 'Vulgate', 'Hebrew', 'Greek', or None for no context include_hebrew : Include Hebrew word column in output (OT only) include_lxx : Also search the LXX when corpus='OT' sort_by : 'canonical' (Bible order) or 'book' (alphabetical)
Returns¶
DataFrame with columns: reference, book_id, chapter, verse, word, strongs, [stem, part_of_speech, ...morphology...], context_text
lemma_frequency(*, strongs=None, stem=None, part_of_speech=None, corpus='OT', book_group=None, top_n=50)
¶
Frequency breakdown of a lemma by book.
Parameters¶
strongs : Strong's number stem : Filter by Hebrew stem part_of_speech : Filter by POS corpus : 'OT', 'NT', or 'LXX' book_group : Limit to a named group top_n : Top N books by count
Returns a DataFrame: book_id, book_name, count, pct
top_lemmas(*, corpus='OT', part_of_speech=None, stem=None, book=None, book_group=None, top_n=30, min_count=1)
¶
Most frequent lemmas (by Strong's number) across a corpus or book set.
Returns a DataFrame: strongs, count, pct
domain_search
¶
Louw-Nida semantic domain search for the Greek NT (MACULA Greek).
Each word token in the MACULA Greek NT carries a domain column
(e.g. '033006') and an ln column (e.g. '33.69'), both referencing
the Louw-Nida Greek-English Lexicon of the New Testament.
Domain format ───────────── domain: zero-padded 6-digit string '033006' (domain 33, subdomain 6) ln: dot-separated decimal '33.69' (domain 33, section 69)
Top-level domains 1–93; subdomains further classify meaning. A single word may carry multiple domain entries (space-separated).
Questions this answers ────────────────────── • What Communication-domain verbs does God use in the NT? • Which Supernatural-Being words cluster in the book of Revelation? • What Moral/Ethical-quality terms appear in Paul? • Which Judgment-domain verbs does Jesus take as subject in the Gospels? • What is the domain profile of Hebrews vs Romans?
Public API ────────── query_domain(domain, ...) → filtered token DataFrame domain_profile(book, ...) → domain distribution for a book domain_role_search(domain, subject_strongs, ...) → domain words where entity is subject top_domain_words(domain, ...) → most frequent words in a domain print_domain_summary(domain, ...) → terminal summary DOMAIN_NAMES → dict mapping domain number → name
domain_comparison(books, *, top_n=15, exclude_discourse=True)
¶
Compare domain profiles across multiple NT books.
Returns a pivot table: rows=domain, cols=books, cells=% of book's tokens.
domain_profile(book, *, top_n=20, exclude_discourse=True)
¶
Semantic domain profile for a single NT book — what percentage of its vocabulary falls in each top-level Louw-Nida domain?
Parameters¶
book : NT book_id (e.g. 'Rom', 'Rev') top_n : return top N domains by token count exclude_discourse : if True, exclude domains 83–92 (discourse markers, particles, pronouns) for a cleaner content-word profile
domain_role_search(domain, subject_strongs, corpus='NT', *, books=None, top_n=20)
¶
Find tokens in the given Louw-Nida domain(s) where the grammatical subject is one of the given Strong's number(s).
Combines domain filtering with syntactic role search — e.g.: "all Communication-domain verbs where God is the subject" "all Judgment-domain words where Jesus is the agent"
Parameters¶
domain : Louw-Nida domain number(s) subject_strongs : Strong's number(s) for the subject corpus : 'NT' only (OT domain data uses different taxonomy) books : restrict to specific books top_n : return top N lemmas by count
Returns a DataFrame with lemma, gloss, ln, count columns.
print_domain_role(domain, subject_strongs, *, books=None, top_n=20, label=None, subject_label=None)
¶
Print a formatted table: domain words with given entity as subject.
print_domain_summary(domain, *, book=None, top_n=20, label=None)
¶
Print a formatted table of the top words in a Louw-Nida domain.
query_domain(domain, *, book=None, part_of_speech=None, subdomain=None, exact_ln=None, has_subjref=False)
¶
Return all NT tokens belonging to the given Louw-Nida domain(s).
Parameters¶
domain : top-level domain number(s) e.g. 33, '33', [12, 33] or a domain name that will be matched against DOMAIN_NAMES book : restrict to specific book(s) part_of_speech : filter by word class ('verb', 'noun', etc.) subdomain : 6-digit domain code to match exactly, e.g. '033006' exact_ln : exact Louw-Nida reference, e.g. '33.69' has_subjref : if True, only return tokens that have a subjref link
top_domain_words(domain, *, book=None, part_of_speech=None, top_n=20)
¶
Most frequent lemmas in a given Louw-Nida domain.
Returns a DataFrame with columns: lemma, strong_g, gloss, ln, count.
hapax
¶
Hapax legomena: words occurring exactly once (or rarely) in a corpus or book.
A "hapax legomenon" (Greek: "said only once") is a word that appears exactly once in a given corpus. Biblical hapaxes are significant for lexicography and translation because their meaning must be inferred from context, cognates, or ancient translations.
Usage¶
from bible_grammar.lexical.hapax import hapax_legomena, hapax_table, hapax_summary
All hapaxes in the OT (by Strong's lemma)¶
hapax_legomena(corpus='OT')
Hapaxes in a specific book¶
hapax_legomena(book='Job') # Job has the most OT hapaxes hapax_legomena(book='Rev') # NT book
Hapaxes by POS¶
hapax_legomena(corpus='OT', part_of_speech='Verb')
Allow 'rare' words (appearing <= N times)¶
hapax_legomena(corpus='OT', max_count=5)
Print formatted table¶
hapax_table(book='Job', top_n=20)
Summary stats: hapax count per book, sorted by count¶
hapax_summary(corpus='OT')
hapax_legomena(*, corpus=None, book=None, part_of_speech=None, max_count=1, min_count=1, include_gloss=True, include_context=True, scope='corpus')
¶
Find hapax legomena — words occurring rarely in a given scope.
Parameters¶
corpus : 'OT' or 'NT'. Defaults to 'OT' (or inferred from book). book : Restrict search to a specific book (e.g. 'Job', 'Rev'). part_of_speech : Filter by POS (e.g. 'Verb', 'Noun'). max_count : Maximum total occurrences to qualify (default 1 = strict hapax). min_count : Minimum occurrences (default 1, set higher to exclude missing data). include_gloss : Add lemma/gloss columns from TBESH/TBESG lexicons. include_context: Add KJV verse text for the (first) occurrence. scope : 'corpus' — count occurrences across the whole OT/NT. 'book' — count occurrences within the specified book only. Use scope='book' with book= to find words unique to one book.
Returns¶
DataFrame with columns: strongs, lemma, gloss, word (surface), book_id, chapter, verse, reference, corpus_count, [context_text] Sorted by canonical book order, then chapter, verse.
hapax_summary(corpus='OT', *, max_count=1, part_of_speech=None)
¶
hapax_table(book=None, corpus='OT', *, top_n=50, max_count=1, part_of_speech=None, scope='corpus')
¶
Print a formatted hapax legomena table to the console.
Parameters¶
book : Restrict to a specific book (optional) corpus : 'OT' or 'NT' (used when book is None) top_n : Max rows to print max_count: Maximum occurrences to qualify scope : 'corpus' or 'book'
lexicon
¶
Lexicon module — public API over the STEPBible TBESH (Hebrew) and TBESG (Greek) brief lexicons that ship with the stepbible-data submodule.
Previously these were private helpers inside wordstudy.py. Promoting them here makes lexicon lookups available to any module and to notebook code.
Data model ────────── Each entry is a dict: strongs : canonical extended Strong's (e.g. "H1254A", "G4160") lemma : Hebrew/Greek headword translit : transliteration pos_code : language:type-gender-extra (e.g. "H:V", "G:N-M") gloss : short English gloss definition : full definition (HTML stripped)
Usage ───── from bible_grammar.lexical.lexicon import lookup, search_gloss, lex_entry
Single lookup¶
entry = lookup('H1254') # בָּרָא entry = lookup('G4160') # ποιέω entry = lookup('G2424') # Ἰησοῦς
Fuzzy gloss search¶
results = search_gloss('create', lang='H')
Print formatted entry¶
lex_entry('H7965') # שָׁלוֹם
Available after submodule init; gracefully returns {} if lexicon missing.¶
lemma_index(lang)
¶
Return a {lemma → strongs} reverse lookup for fast lemma-to-Strong's resolution. lang='H' or 'G'.
Normalises Greek lemmas to lowercase NFC. Used by resolve_strongs() and the word-study module.
lex_entry(strongs)
¶
Print a formatted lexicon entry to stdout.
lookup(strongs)
¶
Return the lexicon entry for a Strong's number.
Accepts any of: 'H1254', 'H1254A', 'G2424', 'G2424G'. Returns {} if not found or lexicon file missing.
search_gloss(query, *, lang=None, max_results=20)
¶
Search entries whose gloss or definition contains query (case-insensitive).
lang : 'H' for Hebrew only, 'G' for Greek only, None for both. Returns a list of entry dicts (includes 'strongs' key).
morph_chart
¶
Morphological distribution charts: visualise how a root's grammatical forms distribute across books.
For Hebrew verbs: stem × conjugation breakdown per book (stacked bar). For Greek verbs: tense × voice per book (stacked bar). For Greek nouns: case distribution per book (stacked bar).
Usage¶
from bible_grammar.lexical.morph_chart import morph_distribution, morph_chart
Hebrew verb — stem breakdown across books¶
morph_distribution('H1696') # דָבַר to speak morph_chart('H1696', output_path='output/charts/dabar-stems.png')
Greek verb — tense × voice across books¶
morph_distribution('G3004') # λέγω to say morph_chart('G3004', output_path='output/charts/lego-tense.png')
Greek noun — case distribution¶
morph_distribution('G3056') # λόγος word morph_chart('G3056', output_path='output/charts/logos-case.png')
morph_chart(strongs, *, chart_type='stacked_bar', min_book_count=3, output_path=None, title=None, pct=True)
¶
Generate a morphological distribution chart for a root.
Parameters¶
strongs : Hebrew or Greek Strong's number chart_type : 'stacked_bar' (default) or 'heatmap' min_book_count : Minimum tokens per book to include output_path : Save to PNG if provided; otherwise /tmp/ title : Chart title (auto-generated if None) pct : If True, normalise to 100% per book (stacked_bar only)
morph_distribution(strongs, *, min_book_count=3)
¶
Compute per-book morphological distribution for a root.
Returns a dict
strongs, lemma, gloss, is_hebrew, pos, dim1, dim2, pivot : DataFrame (rows=books, cols=morph categories, values=count) pivot_pct: same but normalised to 100% per book
print_morph_distribution(strongs, *, min_book_count=3)
¶
Print a formatted morphological distribution table.
phrase
¶
Phrase search for Hebrew OT, Greek NT, and LXX.
Finds consecutive word sequences within a verse. Each position in the phrase can be specified as a Strong's number, a lemma, a morphology constraint dict, or a wildcard.
Usage¶
from bible_grammar.lexical.phrase import phrase_search
Hebrew: דְּבַר יְהוָה "word of the LORD"¶
phrase_search(['H1697', 'H3068'])
Greek NT: λόγος θεοῦ "word of God"¶
phrase_search(['G3056', 'G2316'], corpus='NT')
LXX: same phrase in Septuagint¶
phrase_search(['G3056', 'G2316'], corpus='LXX')
Accept lemmas directly (resolved automatically)¶
phrase_search(['λόγος', 'θεός'], corpus='NT') phrase_search(['דָּבָר', 'יְהוָה'], corpus='OT')
Mixed: Niphal perfect followed by any noun¶
phrase_search([{'stem': 'Niphal', 'conjugation': 'Perfect'}, {'pos': 'Noun'}])
Wildcard: word of ??? God (any word between)¶
phrase_search(['H1697', '*', 'H0430'])
Constraint dict keys (for morphology-based positions): strongs, lemma, pos, stem, conjugation, tense, voice, mood, person, number, gender, case_, state
phrase_search(tokens, *, corpus='OT', book=None, book_group=None, chapter=None, include_kjv=True, max_results=500)
¶
Search for a consecutive word sequence within verses.
Parameters¶
tokens : List of search tokens. Each element may be: - A Strong's number string: 'H1697', 'G3056' - A lemma string: 'λόγος', 'שָׁלוֹם', 'דָּבָר' - A constraint dict: {'pos': 'Verb', 'stem': 'Niphal'} - '*' or None — wildcard (matches any word) corpus : 'OT' (Hebrew/Aramaic TAHOT), 'NT' (Greek TAGNT), or 'LXX' book : Restrict to one book or list of books (book_id, e.g. 'Gen') book_group : 'torah', 'prophets', 'writings', 'gospels', 'pauline' chapter : Restrict to a single chapter number include_kjv : Attach KJV verse text to results (default True) max_results : Cap on number of results returned (default 500)
Returns a DataFrame with columns
book_id, chapter, verse, word_1 .. word_N (surface form of each matched word) strongs_1 .. strongs_N reference (e.g. 'Gen 1:1') kjv_text (if include_kjv=True)
Examples¶
דְּבַר יְהוָה "word of the LORD" anywhere in OT¶
phrase_search(['H1697', 'H3068'])
Same phrase in Jeremiah only¶
phrase_search(['H1697', 'H3068'], book='Jer')
λόγος θεοῦ in NT¶
phrase_search(['G3056', 'G2316'], corpus='NT')
Using lemmas¶
phrase_search(['λόγος', 'θεός'], corpus='NT')
Niphal perfect followed by a noun in Isaiah¶
phrase_search([{'stem': 'Niphal', 'conjugation': 'Perfect'}, {'pos': 'Noun'}], book='Isa')
Wildcard: H1697 + anything + H3068¶
phrase_search(['H1697', '*', 'H3068'])
print_phrase_results(df, *, max_rows=30, show_strongs=False)
¶
Print phrase search results in a readable format.
print_proximity_results(df, *, max_rows=25)
¶
Print proximity search results in a readable format.
proximity_search(tokens, *, within=5, ordered=False, corpus='OT', book=None, book_group=None, include_kjv=True, max_results=500)
¶
Find verses where two or more tokens appear within N words of each other, optionally crossing verse boundaries.
Parameters¶
tokens : List of 2+ search tokens (same formats as phrase_search: Strong's numbers, lemmas, or morphology constraint dicts). Wildcards ('*') are not meaningful here and will be ignored. within : Maximum word distance between the first and last matched token (counts intervening words, including across verse boundaries). E.g. within=5 means at most 4 words between the two terms. ordered : If True, tokens must appear in the given order (left to right). If False (default), any order is accepted. corpus : 'OT', 'NT', or 'LXX' book : Restrict to one book or list of books book_group: 'torah', 'prophets', 'writings', 'gospels', 'pauline' include_kjv: Attach KJV verse text for the verse of the first match token max_results: Cap on number of results (default 500)
Returns a DataFrame with columns
book_id, chapter_1, verse_1, word_num_1, word_1, strongs_1, book_id, chapter_2, verse_2, word_num_2, word_2, strongs_2, distance, reference, kjv_text (if include_kjv)
For 3+ tokens: columns extend to _3, _4 etc., distance is span of all.
Examples¶
אמונה and חסד within 5 words in Psalms¶
proximity_search(['H0530', 'H2617'], within=5, book_group='writings')
ברית and שלום within 8 words anywhere in OT¶
proximity_search(['H1285', 'H7965'], within=8)
πίστις and ἀγάπη within 7 words in Paul¶
proximity_search(['G4102', 'G26'], within=7, corpus='NT', book_group='pauline')
Ordered: H6944 (holy) before H2617 (kindness) within 10 words¶
proximity_search(['H6944', 'H2617'], within=10, ordered=True)
semantic_profile
¶
Semantic range explorer: unified full-profile report for any Hebrew or Greek root.
Combines
- Lexicon entry (lemma, gloss, definition, POS)
- Corpus frequency and book distribution
- Morphological form breakdown
- LXX translation equivalents and consistency (Hebrew only)
- OT → LXX → NT trajectory (Hebrew only)
- Top collocates (statistically significant neighbors)
- Example verses (KJV)
Produces either a formatted terminal report or a shareable Markdown file with an embedded distribution chart.
Usage¶
from bible_grammar.lexical.semantic_profile import semantic_profile, print_semantic_profile from bible_grammar.lexical.semantic_profile import save_semantic_profile
Terminal report¶
print_semantic_profile('H7965') # שָׁלוֹם peace print_semantic_profile('G3056') # λόγος word
Save as Markdown + PNG chart¶
save_semantic_profile('H7965', output_dir='output/reports') save_semantic_profile('G3056', output_dir='output/reports')
print_semantic_profile(strongs, **kwargs)
¶
Print a formatted semantic profile to stdout.
save_semantic_profile(strongs, *, output_dir=None, collocate_window=5, min_collocate_count=3, top_collocates=10, example_verses=5)
¶
Save a complete semantic profile as a Markdown report with embedded chart.
Returns the path to the saved Markdown file.
semantic_profile(strongs, *, collocate_window=5, min_collocate_count=3, top_collocates=10, example_verses=4)
¶
Build a complete semantic profile for a Hebrew or Greek root.
Returns a dict combining all available analyses.
stats
¶
Frequency tables and aggregation helpers.
freq_table(df, groupby, sort=True)
¶
Return a count DataFrame grouped by one or more columns.
greek_verb_forms(book_group=None, book=None)
¶
Tense × voice × mood counts for Greek verbs.
niphal_perfects_by_book()
¶
Count of niphal perfect verbs in each OT book.
pos_distribution(source='TAHOT', book=None)
¶
Part-of-speech distribution for a source (TAHOT or TAGNT).
verb_stems_by_book(testament=None, book=None)
¶
Count Hebrew verb stems, optionally filtered by testament or book.
synonym
¶
Synonym comparison: side-by-side profile of near-synonym Hebrew or Greek roots.
For each root shows OT/NT frequency, morphological distribution, primary LXX translation equivalent(s), and NT usage — making differences in usage pattern, register, and theological trajectory visible at a glance.
Usage¶
from bible_grammar.lexical.synonym import compare_synonyms, print_synonym_comparison
Two Hebrew roots for "love"¶
compare_synonyms(['H157', 'H2836'])
Three roots for "word"¶
compare_synonyms(['H1697', 'H0565', 'H6310'])
Greek words for love¶
compare_synonyms(['G26', 'G5368', 'G5360'], corpus='NT')
Accept lemmas too¶
compare_synonyms(['אָהַב', 'חָשַׁק']) compare_synonyms(['ἀγάπη', 'φιλία'], corpus='NT')
compare_synonyms(terms, *, corpus=None, book=None, book_group=None, top_books=5, top_forms=5)
¶
Build a side-by-side comparison profile for two or more synonym roots.
Parameters¶
terms : List of Strong's numbers or lemmas to compare corpus : 'OT', 'NT', or None (auto-detected from strongs prefix) book : Restrict all profiles to one book or list of books book_group : 'torah', 'prophets', 'writings', 'gospels', 'pauline' top_books : Number of top books to include in each profile (default 5) top_forms : Number of top morphological forms to include (default 5)
Returns a list of profile dicts (one per term), each containing: strongs, lemma, translit, gloss, definition, total, by_book (DataFrame), top_forms (DataFrame), lxx_equivalents (DataFrame, Hebrew only), nt_trajectory (list of dicts, Hebrew only), shared_lxx (set — LXX lemmas shared with other terms in the set)
print_synonym_comparison(terms, *, corpus=None, book=None, book_group=None)
¶
Print a formatted side-by-side synonym comparison.
Parameters match compare_synonyms().
synonym_table(terms, *, corpus=None)
¶
Compact tabular summary — one row per term — suitable for notebook display.
strongs, lemma, gloss, total_occurrences,
lxx_primary, lxx_primary_pct, nt_occurrences
termmap
¶
Theological term mapping: trace Hebrew roots across OT, LXX, and NT.
For a given set of Hebrew Strong's numbers, builds a structured table showing: - OT occurrence count and distribution - Primary LXX translation equivalent(s) with word-level alignment confidence - NT occurrence count of the primary LXX term
This replaces the verse-level co-occurrence approach in alignment.py with IBM Model 1 word-level alignment for much higher precision.
Usage¶
from bible_grammar.lexical.termmap import term_map, print_term_map, THEOLOGICAL_TERMS
Single term¶
term_map('H1285') # בְּרִית "covenant"
Batch — built-in set of key theological terms¶
df = term_map(THEOLOGICAL_TERMS) print_term_map(df)
Custom list¶
term_map(['H1285', 'H2617', 'H571'])
print_term_map(df=None, *, theme=None)
¶
Print a formatted theological term mapping table.
Parameters¶
df : Output of term_map(). If None, runs term_map(THEOLOGICAL_TERMS). theme : Filter to a single theme (substring match, case-insensitive).
term_map(strongs=None, *, min_alignment_count=3, top_lxx=3)
¶
Build a term-mapping table for one or more Hebrew Strong's numbers.
Parameters¶
strongs : Single strongs string, list of strings, or dict {theme: [strongs, ...]} (defaults to THEOLOGICAL_TERMS) min_alignment_count : Minimum word-level alignment count to include an LXX equiv top_lxx : How many top LXX equivalents to show per root
Returns a DataFrame with columns
theme, heb_strongs, heb_lemma, heb_gloss, ot_count, lxx_lemma_1, lxx_strongs_1, lxx_pct_1, lxx_nt_count_1, lxx_lemma_2, lxx_strongs_2, lxx_pct_2, lxx_nt_count_2, lxx_lemma_3, lxx_strongs_3, lxx_pct_3, lxx_nt_count_3
term_map_table(strongs=None)
¶
Compact pivot: one row per (Hebrew root, LXX equivalent), suitable for display in a notebook or export to CSV.
theme, heb_strongs, heb_lemma, heb_gloss, ot_count,
lxx_lemma, lxx_strongs, lxx_gloss, lxx_pct, nt_count
trajectory
¶
Cross-testament word trajectory: Hebrew OT → LXX → Greek NT.
For any Hebrew or Greek Strong's number, stitches together the full lexical journey of a word across three corpora into a single pipeline report and chart.
Questions this answers ────────────────────── • How does שָׁלוֹם (shalom) travel from Hebrew OT → LXX εἰρήνη → NT? • Is NT δικαιοσύνη (righteousness) using LXX vocabulary or fresh coinage? • How does the frequency and distribution of רוּחַ / πνεῦμα shift from OT to LXX to NT? • Which Greek NT authors use the most LXX-derived covenant vocabulary?
Pipeline stages ─────────────── 1. OT Hebrew — word_study(): lemma, definition, total count, by-book, morphological forms 2. OT→LXX — lxx_alignment(): which Greek words translate this Hebrew root, with frequency and percentage 3. LXX corpus — lxx_by_book() / query_lxx(): how the LXX Greek word is distributed across the Septuagint 4. NT Greek — TAGNT query: total count, by-book distribution 5. Continuity — compares LXX primary rendering with NT usage to assess whether NT adopts LXX vocabulary (high continuity) or diverges (new word choice)
Public API ────────── word_trajectory(strongs, ...) → dict with all pipeline stages print_trajectory(strongs, ...) → formatted terminal report trajectory_chart(strongs, ...) → 3-panel bar chart PNG save_trajectory_report(strongs, ...) → Markdown report + chart
batch_trajectories(strongs_list, *, output_dir='output/reports/ot/lexicon')
¶
Generate trajectory reports for a list of Strong's numbers. Returns list of Markdown file paths.
print_trajectory(strongs, *, top_n=10)
¶
Print a formatted cross-testament trajectory to stdout.
save_trajectory_report(strongs, *, output_dir='output/reports/ot/lexicon', top_n=15)
¶
Generate a full Markdown + chart report for a word's cross-testament trajectory. Returns path to saved Markdown file.
trajectory_chart(strongs, *, output_path=None, figsize=(15, 5))
¶
Three-panel horizontal bar chart: OT by-book | LXX by-book | NT by-book. Returns path to saved PNG.
word_trajectory(strongs, *, top_lxx_forms=3)
¶
Full cross-testament trajectory for a Hebrew or Greek Strong's number.
Parameters¶
strongs : Strong's number, e.g. 'H7965' or 'G1515' top_lxx_forms : how many LXX rendering forms to include
Returns a dict with keys
strongs, lemma, translit, gloss, definition, is_hebrew, ot_total, ot_by_book, # Hebrew/Aramaic OT (always present for H) morph_forms, # morphological breakdown lxx_primary, lxx_primary_g, # top LXX Greek word + Strong's (for H words) lxx_alignment, # DataFrame: Greek renderings with % lxx_total, lxx_by_book, # LXX corpus distribution lxx_consistency, # % of tokens using primary rendering nt_strongs, nt_lemma, # NT counterpart Strong's and lemma nt_total, nt_by_book, # NT distribution continuity, # 'high'|'medium'|'low'|'none' continuity_note, # plain-text explanation
wordstudy
¶
Word study tool: one-stop profile for any Hebrew or Greek lemma.
Combines lexicon definitions (TBESH/TBESG), corpus statistics, morphological form inventory, LXX translation equivalents (for Hebrew), and example verses into a single structured result.
Usage¶
from bible_grammar.lexical.wordstudy import word_study, print_word_study
Hebrew word by Strong's¶
word_study('H1254') # בָּרָא "to create"
Greek word by Strong's¶
word_study('G4160') # ποιέω "to do/make"
Print formatted output¶
print_word_study('H1254') print_word_study('G4160')
print_word_study(strongs, *, example_verses=5)
¶
Print a formatted word study to stdout.
resolve_strongs(term)
¶
Resolve a term to a Strong's number.
Accepts
- A Strong's number directly: 'H1285', 'G3056'
- A Hebrew lemma (with or without vowel points): 'שָׁלוֹם', 'שלום'
- A Greek lemma: 'λόγος', 'εἰρήνη'
Returns the Strong's number string, or None if not found.
word_study(strongs, *, example_verses=5)
¶
Complete word study for a Strong's number.
Parameters¶
strongs : e.g. 'H1254', 'H1254A', 'G4160', 'G4160G' example_verses : Number of example verses to include (default 5)
Returns a dict with keys
strongs, lemma, translit, gloss, pos_code, definition, total_occurrences, by_book (DataFrame), morphological_forms (DataFrame), translation_equivalents (OT Hebrew only, DataFrame), nt_usage (OT Hebrew only — NT occurrences if any), lxx_usage (OT Hebrew only — LXX Greek equivalent stats), examples (list of {reference, word, context} dicts)
word_study_table(strongs)
¶
Compact tabular form of a word study — one row per occurrence, with reference, inflected form, morphology, and KJV verse. Equivalent to concordance() but with lexicon gloss in the header.
names
¶
christological_titles
¶
Christological titles: frequency analysis of titles Jesus used to refer to Himself.
IMPORTANT — speaker attribution caveat ─────────────────────────────────────── The STEPBible TAGNT data does not tag speakers. This module therefore counts all occurrences of each title pattern in the Gospels (and Acts/Epistles where relevant), not only instances where Jesus is the speaker or referent.
For most titles this is a minor caveat
• "Son of Man" — every instance in the Gospels is Jesus speaking of Himself. • Johannine "I AM" sayings — all are Jesus speaking. • "Son of God" — also used by demons, the high priest, the centurion; ~30% of occurrences are NOT Jesus self-referential. • "Son of David" — almost entirely crowds/petitioners addressing Jesus, not self-reference. • "Lord" / "Christ" — too broad; included for completeness with caveats.
The self_ref_confidence field in each title record documents the reliability.
Pattern strategy ──────────────── Each title is matched by requiring that a set of Strong's numbers ALL appear in the same verse (verse-level co-occurrence). This is conservative — it may miss a verse where the title is split across very long sentences, but it avoids false positives from phrase-search windowing across sentence boundaries.
Usage ───── from bible_grammar.names.christological_titles import ( title_counts, print_title_counts, title_report, TITLE_REGISTRY, )
Terminal table¶
print_title_counts()
With NT epistles included¶
print_title_counts(scope='NT')
Markdown report¶
title_report(output_dir='output/reports')
print_title_counts(scope='gospels', *, speaker_filter=False)
¶
Print a formatted title frequency table to stdout.
title_chart(scope='gospels', *, groups=None, output_path=None, figsize=(13, 7))
¶
Grouped bar chart: x=titles, groups=Gospel books (or NT sections). Returns path to the saved PNG.
title_counts(scope='gospels', *, groups=None, speaker_filter=False)
¶
Return a DataFrame: rows=titles, columns=book counts + Total.
Parameters¶
scope : 'gospels' (Mat/Mrk/Luk/Jhn only) or 'NT' (all 27 books) groups : optional list of group names to include; None = all speaker_filter : if True, restrict each title to verses where Jesus is the speaker — using curated allowlists (high-confidence titles) or MACULA subjref detection (others).
title_report(output_dir='output/reports/nt/names', *, include_verses=True)
¶
Generate a Markdown report with frequency table, chart, and verse listings. Returns the path to the saved file.
divine_names
¶
Divine name and christological term frequency analysis.
Tracks the major divine names and christological titles across the OT, LXX, and NT:
OT Hebrew ───────── H3068G יהוה YHWH (the Tetragrammaton) H0430 אלהים Elohim (God) H0136 אדני Adonai (Lord) H3050 יה Yah (short form of YHWH) H7706 שדי El Shaddai (God Almighty) H0410 אל El (God / Mighty One)
LXX Greek ───────── G2962 κύριος Kyrios (Lord — renders YHWH) G2316 θεός Theos (God — renders Elohim)
NT Greek ──────── G2316 θεός Theos (God) G2962 κύριος Kyrios (Lord) G2424G Ἰησοῦς Iesous (Jesus) G5547 Χριστός Christos (Christ / Anointed) G3962 πατήρ Pater (Father — in theological contexts) G4151 πνεῦμα Pneuma (Spirit — in theological contexts)
Usage¶
from bible_grammar.names.divine_names import divine_name_table, print_divine_names from bible_grammar.names.divine_names import divine_names_chart, divine_names_report
Overview tables¶
print_divine_names('OT') print_divine_names('NT')
Distribution chart¶
divine_names_chart('OT', output_path='output/charts/ot-divine-names.png')
Full Markdown report¶
divine_names_report(output_dir='output/reports')
divine_name_by_section(corpus='OT')
¶
Return a DataFrame: rows = divine names, columns = canonical sections, values = count per section.
divine_name_summary(corpus='OT')
¶
Return a summary DataFrame: one row per divine name with total count, percentage of all divine-name tokens, and top 3 books.
divine_name_table(corpus='OT')
¶
Return a DataFrame with one row per book, one column per divine name.
corpus: 'OT', 'NT', or 'LXX'
divine_names_chart(corpus='OT', *, output_path=None, top_n_books=20, chart_type='stacked_bar', figsize=(13, 6))
¶
Generate and save a chart of divine name distribution.
chart_type
'stacked_bar' — stacked bar chart, x=books, stacks=names (OT/LXX only makes sense) 'heatmap' — rows=divine names, cols=sections, cells=counts
Returns path to saved PNG.
divine_names_report(output_dir='output/reports/both/names', *, corpora=None)
¶
Generate a comprehensive Markdown report covering OT, LXX, and NT divine names.
Returns the path to the saved Markdown file.
print_divine_names(corpus='OT')
¶
Print a formatted summary report to stdout.
role_search
¶
Syntactic role search — "who does what to whom" across OT and NT.
Uses MACULA Hebrew (syntax_ot) and MACULA Greek (syntax) subjref/frame links to find verbs by their grammatical subject, object, or argument.
Questions this answers ────────────────────── • What verbs take God as their grammatical subject in the OT? • What does God act upon — what are the objects of His verbs? • What does creation/salvation/judgment language look like when YHWH acts? • What verbs take Jesus as subject in the Gospels? • What does God say about Himself? (speech verbs + divine subject) • What verbs travel from OT (Hebrew) → LXX (Greek) when God is the agent? • Cross-testament: same semantic field, God as subject, OT vs NT
Public API ────────── subject_verbs(subject_strongs, corpus, ...) → DataFrame of verb tokens verb_subjects(verb_strongs, corpus, ...) → DataFrame of subject tokens subject_objects(subject_strongs, corpus, ...) → DataFrame of verb+object pairs object_verbs(object_strongs, corpus, ...) → DataFrame of verbs that act on an entity role_report(subject_strongs, ...) → Markdown report + chart print_role_summary(subject_strongs, ...) → terminal table print_object_summary(subject_strongs, ...) → terminal table of objects
Typical usage ───────────── from bible_grammar.names.role_search import subject_verbs, print_role_summary, role_report from bible_grammar.names.role_search import subject_objects, print_object_summary
What does YHWH do in the OT?¶
subject_verbs(['H3068','H0430'], corpus='OT')
What does YHWH act upon?¶
subject_objects(['H3068','H0430'], corpus='OT')
What does Jesus do in the Gospels?¶
subject_verbs(['G2424'], corpus='NT', books=['Mat','Mrk','Luk','Jhn'])
What does Jesus act upon in the Gospels?¶
subject_objects(['G2424'], corpus='NT', books=['Mat','Mrk','Luk','Jhn'])
Cross-testament: divine agency verbs (OT Hebrew + LXX equivalents)¶
role_report(['H3068','H0430'], corpus='OT', output_dir='output/reports')
divine_action_comparison(*, ot_strongs=None, nt_strongs=None, top_n=20, output_path=None, figsize=(14, 7))
¶
Side-by-side comparison: God's verbs in OT Hebrew vs NT Greek.
For OT verbs, also shows the inline LXX Greek equivalent (greek_g column) to facilitate direct lexical comparison with NT vocabulary.
Returns (ot_df, nt_df, chart_path).
object_verbs(object_strongs, corpus='OT', *, books=None)
¶
Return the verbs that take the given entity as their grammatical object.
Symmetric to subject_verbs: answers "what is done TO this entity?"
OT: uses the MACULA Hebrew frame A1 slot. NT: finds tokens with role='o'/'o2' matching the Strong's number, then collects the co-verse verb tokens.
Returns a DataFrame with verb_lemma, verb_gloss, count (+ subject info for OT).
print_object_summary(subject_strongs, corpus='OT', *, books=None, top_n=20, label=None)
¶
Print a formatted table of objects acted upon by verbs with the given subject.
print_role_summary(subject_strongs, corpus='OT', *, books=None, top_n=20, label=None)
¶
Print a formatted table of verbs with the given subject to stdout.
role_chart(subject_strongs, corpus='OT', *, books=None, top_n=20, label=None, output_path=None, figsize=(12, 6))
¶
Horizontal bar chart of top verbs for the given subject. Returns path to saved PNG.
role_report(subject_strongs, corpus='OT', *, books=None, top_n=30, label=None, output_dir=None, include_cross_testament=True)
¶
Generate a Markdown report: top verbs for the given subject, with chart, book distribution, and optional cross-testament panel.
Returns path to saved Markdown file.
subject_objects(subject_strongs, corpus='OT', *, books=None, verb_strongs=None, min_count=1, top_n=None, include_tokens=False)
¶
Return the objects (patients) acted upon by verbs whose subject is one of the given Strong's numbers.
OT method: parses the MACULA Hebrew verb frame column (A0=agent, A1=patient)
to find object tokens when the A0 slot resolves to a target Strong's number.
NT method: finds verb tokens with the subject's subjref, then collects co-verse tokens tagged role='o' or role='io'.
Parameters¶
subject_strongs : Strong's number(s) for the acting subject corpus : 'OT' or 'NT' books : restrict to specific book_ids verb_strongs : optional — restrict to specific verbs (by Strong's) min_count : minimum object frequency top_n : return only top N objects by count include_tokens : if True return the raw token DataFrame
Returns a DataFrame with columns
subject_verbs(subject_strongs, corpus='OT', *, books=None, stem=None, tense=None, min_count=1, top_n=None, include_tokens=False)
¶
Return verbs whose grammatical subject is one of the given Strong's numbers.
Parameters¶
subject_strongs : Strong's number(s) for the subject, e.g. ['H3068','H0430'] or 'G2424' corpus : 'OT' or 'NT' books : restrict to specific book_ids stem : OT only — filter by verb stem (e.g. 'qal', 'niphal') tense : OT: clause type ('wayyiqtol', 'qatal', …) NT: tense ('aorist', 'present', …) min_count : minimum occurrence count for aggregated results top_n : return only top N verbs by count include_tokens : if True return full token DataFrame; if False (default) return aggregated (lemma, gloss, count) summary
Returns a DataFrame sorted by count descending.
verb_subjects(verb_strongs, corpus='OT', *, books=None)
¶
Return the subjects (by Strong's number and lemma) that appear as the grammatical subject of a given verb.
Useful for: "who calls, commands, saves, creates…"
nt
¶
greek_prepositions
¶
Greek preposition frequency, case-binding, and collocate analysis.
Supports both the Greek NT (MACULA Nestle1904 via syntax.py) and the Septuagint/LXX (CenterBLC via lxx.py).
Case binding is determined by adjacency join: the case_ of the word at word_num+1 after the preposition. Preposition tokens themselves carry no case in either source.
Case values are normalized to Title case: Accusative, Genitive, Dative, Nominative (Nominative is rare and typically signals a set phrase or error).
Primary functions
greek_prep_frequency() — frequency table of all prepositions greek_prep_by_book() — one preposition across all books greek_prep_distribution_table() — major preps by book group greek_prep_cases() — case-binding profile for one prep greek_prep_collocates() — top collocates, optionally filtered by case compare_greek_preps() — side-by-side collocate comparison nt_lxx_compare() — case profile comparison: NT vs. LXX
Print wrappers
print_greek_prep_frequency() print_greek_prep_by_book() print_greek_prep_distribution() print_greek_prep_cases() print_greek_prep_collocates() print_compare_greek_preps() print_nt_lxx_compare()
compare_greek_preps(lemma1, lemma2, corpus='nt', case=None, top_n=20, book_group=None)
¶
Side-by-side collocate comparison of two prepositions.
Returns¶
DataFrame: collocate, count_
greek_prep_by_book(lemma, corpus='nt')
¶
Distribution of a single preposition across all books in the corpus.
Returns¶
DataFrame: book, count, pct_of_lemma (canonical book order)
greek_prep_cases(lemma, corpus='nt', book=None, book_group=None)
¶
greek_prep_collocates(lemma, corpus='nt', case=None, obj_pos=None, top_n=20, book=None, book_group=None)
¶
Top words immediately following the preposition.
Parameters¶
lemma : Greek prep lemma, e.g. 'ἐν' corpus : 'nt', 'lxx', or 'both' case : filter by case binding, e.g. 'Dative', 'Accusative', 'Genitive' obj_pos : filter the following word by POS substring, e.g. 'noun' top_n : rows to return book / book_group : scope
Returns¶
DataFrame: collocate, obj_class, count
greek_prep_distribution_table(corpus='nt', lemmas=None)
¶
Side-by-side count of major prepositions by book group.
Returns¶
DataFrame indexed by book group, one column per preposition. Includes a 'Total' row.
greek_prep_frequency(corpus='nt', book=None, book_group=None, top_n=20)
¶
nt_coreference
¶
NT coreference and anaphora chain analysis.
The MACULA Greek NT dataset includes a referent column on ~14,471 tokens
(primarily pronouns and relative clauses) that contains a MACULA xml_id
pointing to the token's antecedent. This enables tracking who a pronoun
or relative clause refers to throughout a passage or book.
Format: space-separated xml_id list (multiple antecedents for plural pronouns) Example: 'n43014023002' → Ἰησοῦς (Jesus) @ Jhn 14:23
~14,471 NT tokens have referent data — primarily pronouns (αὐτός,
relative pronoun ὅς, etc.) and occasionally demonstratives.
Questions this answers ────────────────────── • How many times is Jesus referenced by pronoun in each Gospel? • Where does Paul refer back to himself in his letters? • How dense is pronominal reference to the Spirit in John 14–16? • Which participants receive the most pronominal references per book? • What is the referent profile of αὐτός (him/her/it) per book?
Public API ────────── nt_referent_data(book=None) → DataFrame (tokens with referent) nt_referent_frequency(book=None, top_n=20) → DataFrame (antecedents by ref count) nt_entity_chain(entity_xml_id, book=None) → DataFrame (all tokens referring to entity) nt_pronoun_referents(pronoun, book=None, top_n=20) → DataFrame nt_book_entity_density(book, top_n=15) → DataFrame
print_nt_referent_overview() → None print_nt_referent_frequency(book=None, top_n=20) → None print_nt_entity_chain(entity_xml_id, ...) → None print_nt_pronoun_referents(pronoun, ...) → None print_nt_book_entity_density(book, top_n=15) → None
nt_referent_book_chart(top_n=20) → Path | None nt_entity_density_chart(book, top_n=15) → Path | None
KNOWN_ENTITIES → dict[str, str]
nt_book_entity_density(book, *, top_n=15)
¶
Most frequently referenced entities within a single NT book.
Returns: antecedent_lemma, antecedent_gloss, antecedent_ref, ref_count, chapter_spread.
nt_entity_chain(entity_xml_id, *, book=None)
¶
All tokens that reference a specific entity (given by its MACULA xml_id).
Useful for tracking how often and where a named person or entity is referred to by pronoun.
nt_entity_chapter_distribution(entity_xml_id, book)
¶
Count of pronominal references to an entity by chapter within a book.
nt_pronoun_referents(pronoun, *, book=None, top_n=20)
¶
What entities does a given pronoun lemma refer to, and how often?
Parameters¶
pronoun : lemma e.g. 'αὐτός', 'ὅς', 'ἐκεῖνος', 'οὗτος'
nt_referent_data(book=None)
¶
All NT tokens that have a referent annotation, with resolved columns.
Returns the original DataFrame rows plus
antecedent_lemma, antecedent_gloss, antecedent_book, antecedent_ref
(based on the first referent ID if multiple are present).
nt_referent_frequency(book=None, *, top_n=20)
¶
Most referenced antecedents (by count of pronominal references).
antecedent_lemma, antecedent_gloss, antecedent_ref,
| Type | Description |
|---|---|
DataFrame
|
antecedent_book, ref_count. |
nt_demonstratives
¶
Greek NT demonstrative pronoun/adjective profile.
Analyses the 1,709 demonstrative tokens (type_='demonstrative') in the MACULA Greek Nestle1904 dataset. Covers οὗτος (this/these, 1,388 tokens) and ἐκεῖνος (that/those, 244 tokens), plus τοιοῦτος (such a kind, 57) and τοσοῦτος (so great, 20).
Key pedagogical points for BBG Ch13: - οὗτος (near) vs. ἐκεῖνος (far) — the two primary demonstratives - Three uses: attributive (modifying a noun), substantival (standing alone as noun), predicate (with copula — "this is…") - Attributive position: demonstrative FOLLOWS the article+noun (article-noun-dem) unlike ordinary adjectives
Public API ────────── nt_demo_data(book=None) → DataFrame (all demonstrative tokens) nt_demo_frequency() → DataFrame (lemma frequency table) nt_demo_case_profile(lemma=None) → DataFrame (case distribution) nt_demo_gender_profile(lemma=None) → DataFrame (gender distribution) nt_demo_use_profile(lemma=None) → DataFrame (attributive vs. substantival) nt_demo_book_distribution() → DataFrame (count + pct per NT book) nt_demo_genre_profile() → DataFrame (count by genre group) nt_demo_near_far_comparison() → DataFrame (οὗτος vs. ἐκεῖνος by genre) nt_demo_top_cooccurrences(lemma, n=15) → DataFrame (most co-verse nouns)
print_nt_demo_overview() → None print_nt_demo_frequency() → None print_nt_demo_case(lemma=None) → None print_nt_demo_gender(lemma=None) → None print_nt_demo_use() → None print_nt_demo_book_distribution() → None print_nt_demo_genre_profile() → None print_nt_demo_near_far() → None
nt_demo_frequency_chart() → Path | None nt_demo_case_chart(lemma=None) → Path | None nt_demo_genre_heatmap() → Path | None nt_demo_book_chart() → Path | None
nt_demo_book_distribution()
¶
Count of demonstrative tokens per NT book in canonical order.
nt_demo_case_profile(lemma=None)
¶
Case distribution, optionally filtered to one lemma ('οὗτος' or 'ἐκεῖνος').
nt_demo_data(book=None)
¶
All demonstrative pronoun/adjective tokens.
nt_demo_frequency()
¶
Lemma frequency table.
nt_demo_gender_profile(lemma=None)
¶
Gender distribution, optionally filtered to one lemma.
nt_demo_genre_profile()
¶
Demonstrative count and percentage per NT genre group.
nt_demo_near_far_comparison()
¶
οὗτος vs. ἐκεῖνος count by NT genre — near/far contrast.
nt_demo_top_cooccurrences(strong=OUTOS, n=15)
¶
Nouns that appear in the same verse as the given demonstrative most often. Useful for seeing what things/persons οὗτος and ἐκεῖνος point to.
nt_demo_use_profile(lemma=None)
¶
Attributive/predicate vs. substantival use distribution.
nt_discourse
¶
Greek NT discourse particles analysis.
Analyses the distribution and function of major Greek discourse particles (δέ, γάρ, οὖν, ἵνα, ὅτι, ἀλλά, καί) across the GNT, backed by the MACULA Greek syntax layer.
Public API ────────── nt_particle_frequency(book=None) → DataFrame (particle frequency table) nt_particle_by_book() → DataFrame (particle counts per NT book) nt_particle_genre_profile() → DataFrame (particle % by genre group) nt_particle_function(particle, book=None)→ DataFrame (function classification) nt_hina_profile(book=None) → DataFrame (ἵνα clause type counts) nt_hoti_profile(book=None) → DataFrame (ὅτι function counts)
print_nt_particle_overview() → None print_nt_particle_frequency(book=None) → None print_nt_particle_genre_profile() → None print_nt_hina_profile(book=None) → None print_nt_hoti_profile(book=None) → None
nt_particle_frequency_chart(book=None) → Path | None nt_particle_genre_heatmap() → Path | None nt_particle_book_chart(particle) → Path | None
nt_hina_profile(book=None)
¶
Classify ἵνα tokens by clause function (purpose/content/result/epexegetic).
Classification uses a heuristic based on the governing verb's semantics: - verbs of command/desire → content - motion/goal verbs → purpose - result context (ὥστε equivalent usage) → result - adjective/noun head → epexegetic Default: purpose.
nt_hoti_profile(book=None)
¶
Classify ὅτι tokens by function (recitative/causal/content).
Heuristic: tokens immediately after a speech/perception verb → recitative; tokens where gloss is 'because' → causal; remainder → content.
nt_particle_book_chart(particle_display='δέ')
¶
Bar chart of a single particle's count across NT books.
nt_particle_by_book()
¶
Particle token counts per NT book × particle. Returns a crosstab.
nt_particle_frequency(book=None)
¶
Particle frequency table. Columns: strong, lemma, display, count, pct, primary_function.
nt_particle_frequency_chart(book=None)
¶
Horizontal bar chart of particle frequency.
nt_particle_genre_heatmap()
¶
Heatmap of particle % by NT genre group.
nt_particle_genre_profile()
¶
Particle % by NT genre group. Rows=genre, cols=particles.
print_nt_hina_profile(book=None)
¶
Print ἵνα clause function classification.
print_nt_hoti_profile(book=None)
¶
Print ὅτι function classification.
print_nt_particle_frequency(book=None)
¶
Print particle frequency table for a book or the whole GNT.
print_nt_particle_genre_profile()
¶
Print particle % by NT genre group.
print_nt_particle_overview()
¶
Print a statistical overview of GNT discourse particles.
nt_louw_nida
¶
Louw-Nida sub-domain precision queries for the Greek NT.
The MACULA Greek NT dataset carries an ln column with sub-domain codes
in Louw-Nida format (e.g., '33.69', '31.35', '92.24'). The existing
domain_search.py module handles top-level domain queries (domain 33 =
Communication); this module exposes sub-domain granularity.
LN code format ────────────── '33.69' — domain 33, sub-entry 69 (teach/instruct) '92.24' — domain 92 (article ὁ) '93.169a' — proper noun variant
Multiple codes may be space-separated: '10.24 33.19'
~127,291 NT tokens with ln data (92.4% of all 137,779 NT tokens).
6,941 unique sub-domain codes across 93 top-level domains.
Questions this answers ────────────────────── • What are the most frequent sub-domain codes in Romans? • Which lemmas fall in LN 31 (Hold a View, Believe, Trust)? • How does LN 33 (Communication) break down across its sub-domains? • Which books use the most Judgment/Punishment vocabulary (LN 38/56)? • Compare Paul's ethics vocabulary (LN 88) vs. John's?
Public API ────────── nt_ln_data(subdomain=None, domain=None, book=None) → DataFrame nt_ln_subdomain_frequency(domain, book=None, top_n=20) → DataFrame nt_ln_top_lemmas(subdomain, top_n=20, book=None) → DataFrame nt_ln_book_distribution(subdomain) → DataFrame nt_ln_genre_profile(subdomain) → DataFrame nt_ln_domain_breakdown(domain, book=None, top_n=20) → DataFrame nt_ln_comparison(books, domain, top_n=15) → pivot DataFrame
print_nt_ln_overview() → None print_nt_ln_subdomain_frequency(domain, ...) → None print_nt_ln_top_lemmas(subdomain, ...) → None print_nt_ln_book_distribution(subdomain) → None print_nt_ln_domain_breakdown(domain, ...) → None print_nt_ln_comparison(books, domain, ...) → None
nt_ln_subdomain_chart(domain, ...) → Path | None nt_ln_book_chart(subdomain, top_n=20) → Path | None nt_ln_genre_heatmap(domains, books) → Path | None
LN_DOMAIN_NAMES → dict[int, str]
nt_ln_book_distribution(subdomain)
¶
Book-by-book distribution of tokens in a given LN sub-domain.
nt_ln_comparison(books, domain, *, top_n=15)
¶
Compare LN sub-domain profiles across multiple NT books.
Returns a pivot: rows=subdomain, cols=books, cells=% of book's domain tokens.
nt_ln_data(subdomain=None, *, domain=None, book=None)
¶
NT tokens with LN data, optionally filtered by sub-domain or top-level domain.
Parameters¶
subdomain : exact LN code to match, e.g. '33.69', '31.35' domain : top-level domain number to match (e.g. 33 matches all 33.x) book : book abbreviation or list of abbreviations
nt_ln_domain_breakdown(domain, *, book=None, top_n=20)
¶
For a given top-level LN domain, show its sub-domain breakdown with the top lemma for each sub-domain.
nt_ln_genre_heatmap(domains, *, books=None)
¶
Heatmap of top-level LN domain percentages across a set of NT books.
nt_ln_genre_profile(subdomain)
¶
Genre distribution for a given LN sub-domain.
nt_ln_subdomain_frequency(domain, *, book=None, top_n=20)
¶
Frequency of sub-domain codes within a given top-level LN domain.
Returns: subdomain, count, pct, top_lemma.
nt_ln_top_lemmas(subdomain, *, top_n=20, book=None)
¶
Most frequent lemmas tagged with a given LN sub-domain code.
nt_moods
¶
Greek NT mood usage analysis — subjunctive, infinitive, and imperative.
Provides distribution, construction type classification, and genre comparison for the three non-indicative finite/non-finite moods taught in BBG ch31–33.
Public API ────────── nt_mood_data(mood, book=None) → DataFrame (tokens for one mood) nt_mood_profile(book=None) → DataFrame (all mood distribution) nt_subjunctive_profile(book=None) → DataFrame (subj tense/voice) nt_infinitive_profile(book=None) → DataFrame (inf tense/voice) nt_imperative_profile(book=None) → DataFrame (imp tense/voice/person) nt_subjunctive_constructions(book=None) → DataFrame (purpose/conditional/hortatory) nt_infinitive_constructions(book=None) → DataFrame (complementary/articular/etc.) nt_imperative_tense_comparison(book=None) → DataFrame (present vs aorist imperatives) nt_mood_genre_profile() → DataFrame (mood % by genre) nt_mood_book_distribution(mood) → DataFrame (count per NT book)
print_nt_mood_overview() → None print_nt_subjunctive_profile(book=None) → None print_nt_infinitive_profile(book=None) → None print_nt_imperative_profile(book=None) → None print_nt_subjunctive_constructions(book=None) → None print_nt_infinitive_constructions(book=None)→ None print_nt_imperative_tense_comparison(book=None) → None print_nt_mood_genre_profile() → None
nt_mood_chart(book=None) → Path | None nt_subjunctive_chart(book=None) → Path | None nt_imperative_chart(book=None) → Path | None nt_mood_genre_heatmap() → Path | None
nt_imperative_chart(book=None)
¶
Bar chart comparing present vs. aorist imperatives.
nt_imperative_profile(book=None)
¶
Imperative tense × person distribution.
nt_imperative_tense_comparison(book=None)
¶
Present vs. aorist imperative comparison.
The tense distinction is meaningful: - Present imperative = continue doing / do repeatedly - Aorist imperative (2nd person) = start doing / single command Returns DataFrame: tense, voice, person, count, pct.
nt_infinitive_constructions(book=None)
¶
Classify infinitive tokens by construction type.
Uses syntactic role ('role' column) and preceding article/particle: - role='o' or complementary context → complementary infinitive - preceded by article (G3588) → articular infinitive - preceded by preposition → prepositional infinitive phrase - role='s' → subject infinitive
nt_infinitive_profile(book=None)
¶
Infinitive tense × voice distribution.
nt_mood_book_distribution(mood)
¶
Token count per NT book for a specific mood.
nt_mood_chart(book=None)
¶
Horizontal bar chart of mood distribution.
nt_mood_data(mood, book=None)
¶
All tokens for a specific mood ('subjunctive', 'infinitive', 'imperative', etc.).
nt_mood_genre_heatmap()
¶
Heatmap of mood % by NT genre group.
nt_mood_genre_profile()
¶
Mood % by NT genre group (excluding indicative to show non-finite forms clearly).
nt_mood_profile(book=None)
¶
All verb mood distribution. Returns DataFrame: form, count, pct.
nt_subjunctive_chart(book=None)
¶
Stacked bar chart of subjunctive constructions.
nt_subjunctive_constructions(book=None)
¶
Classify subjunctive tokens by construction type.
Looks at the immediately preceding particle to classify: - ἵνα / ὅπως → purpose / content - ἐάν → conditional (3rd class condition) - μή → prohibitive (aorist subj. = negated command) or negative - ἄν → indefinite / general - no governing particle → hortatory (1st person) or other
nt_subjunctive_profile(book=None)
¶
Subjunctive tense × voice distribution.
print_nt_mood_overview()
¶
Print a statistical overview of GNT verb moods.
nt_noun_profile
¶
Greek NT noun/case morphology profile.
Provides case distribution, declension pattern frequency, gender breakdown, and article co-occurrence statistics across the GNT, backed by the MACULA Greek syntax layer.
Public API ────────── nt_noun_data(book=None) → DataFrame (all GNT noun tokens) nt_noun_case_profile(book=None) → DataFrame (case distribution) nt_noun_gender_profile(book=None) → DataFrame (gender distribution) nt_noun_number_profile(book=None) → DataFrame (number distribution) nt_noun_case_gender(book=None) → DataFrame (case × gender crosstab) nt_noun_top_lemmas(n=30, book=None) → DataFrame (most frequent noun lemmas) nt_noun_lemma_case(lemmas=None) → DataFrame (lemma × case crosstab) nt_noun_book_distribution() → DataFrame (count + pct per NT book) nt_noun_genre_profile() → DataFrame (case % by genre group) nt_article_stats(book=None) → DataFrame (article vs anarthrous counts)
print_nt_noun_overview() → None print_nt_noun_case(book=None) → None print_nt_noun_gender(book=None) → None print_nt_noun_case_gender(book=None) → None print_nt_noun_top_lemmas(n=20) → None print_nt_noun_genre_profile() → None print_nt_noun_book_distribution() → None print_nt_article_stats(book=None) → None
nt_noun_case_chart(book=None) → Path | None nt_noun_gender_chart(book=None) → Path | None nt_noun_genre_heatmap() → Path | None nt_noun_case_gender_heatmap(book=None) → Path | None nt_noun_book_chart() → Path | None
nt_article_stats(book=None)
¶
Article (ὁ/ἡ/τό) vs anarthrous count per case. Returns DataFrame: case, with_article, without_article, pct_articular.
nt_noun_book_chart()
¶
Bar chart of noun counts across NT books.
nt_noun_book_distribution()
¶
Noun token count and % per NT book. Columns: book, count, pct, pct_of_book_words.
nt_noun_case_chart(book=None)
¶
Horizontal bar chart of case distribution.
nt_noun_case_gender(book=None)
¶
Case × gender crosstab (counts).
nt_noun_case_gender_heatmap(book=None)
¶
Heatmap of case × gender (counts).
nt_noun_case_profile(book=None)
¶
Case distribution across GNT nouns. Returns DataFrame: form, count, pct.
nt_noun_data(book=None)
¶
All GNT noun tokens (class_='noun'), optionally filtered to one book.
nt_noun_gender_chart(book=None)
¶
Bar chart of gender distribution.
nt_noun_gender_profile(book=None)
¶
Gender distribution across GNT nouns. Returns DataFrame: form, count, pct.
nt_noun_genre_heatmap()
¶
Heatmap of case % by NT genre group.
nt_noun_genre_profile()
¶
Case % breakdown by NT genre group. Rows=genre, cols=cases.
nt_noun_lemma_case(lemmas=None, top_n=15)
¶
Lemma × case crosstab (counts). Pass a lemma list or get top-n by frequency.
nt_noun_number_profile(book=None)
¶
Number distribution across GNT nouns. Returns DataFrame: form, count, pct.
nt_noun_top_lemmas(n=30, book=None)
¶
Top-n most frequent noun lemmas. Columns: lemma, strong_g, count, pct, top_gloss.
print_nt_noun_overview()
¶
Print a statistical overview of GNT noun morphology.
nt_participles
¶
Greek NT participle usage analysis.
Provides adverbial vs. adjectival participle classification, tense × voice profiles, genitive absolute counts, perfect participle statistics, and genre comparison for all participial tokens in the GNT.
Public API ────────── nt_participle_data(book=None) → DataFrame (all GNT participle tokens) nt_participle_tense_profile(book=None) → DataFrame (tense distribution) nt_participle_voice_profile(book=None) → DataFrame (voice distribution) nt_participle_tense_voice(book=None) → DataFrame (tense × voice crosstab) nt_participle_role_profile(book=None) → DataFrame (syntactic role counts) nt_participle_top_lemmas(n=20, book=None) → DataFrame (most frequent ptc lemmas) nt_participle_book_distribution() → DataFrame (count + pct per NT book) nt_participle_genre_profile() → DataFrame (tense % by genre) nt_genitive_absolutes(book=None) → DataFrame (genitive absolute tokens) nt_perfect_participles(book=None) → DataFrame (perfect participle tokens)
print_nt_participle_overview() → None print_nt_participle_tense(book=None) → None print_nt_participle_voice(book=None) → None print_nt_participle_tense_voice(book=None)→ None print_nt_participle_role(book=None) → None print_nt_participle_top_lemmas(n=20) → None print_nt_participle_genre_profile() → None print_nt_genitive_absolutes(book=None) → None print_nt_perfect_participles(n=20) → None print_nt_participle_book_distribution() → None
nt_participle_tense_chart(book=None) → Path | None nt_participle_genre_heatmap() → Path | None nt_participle_book_chart() → Path | None
nt_genitive_absolutes(book=None)
¶
Genitive absolute participle tokens.
A genitive absolute has a participle in the genitive case whose subject is different from the main clause subject. Identified by: mood=participle, case_=genitive. Returns sample token rows with reference and gloss.
nt_participle_book_distribution()
¶
Participle token count and % per NT book.
nt_participle_data(book=None)
¶
All GNT participle tokens (mood='participle'), optionally filtered to one book.
nt_participle_genre_profile()
¶
Tense % breakdown for participles by NT genre group.
nt_participle_role_profile(book=None)
¶
Syntactic role distribution for participles.
Roles from MACULA: 's' (subject), 'v' (verb/predicate), 'o' (object), 'adv' (adverbial), 'p' (predicate), 'io' (indirect object), etc. Adverbial participles have role 'adv'; adjectival have role 'p' or attributive.
nt_participle_tense_profile(book=None)
¶
Tense distribution for participles. Returns DataFrame: form, count, pct.
nt_participle_tense_voice(book=None)
¶
Tense × voice crosstab for participles (counts).
nt_participle_top_lemmas(n=20, book=None)
¶
Top-n most frequent participle lemmas.
nt_participle_voice_profile(book=None)
¶
Voice distribution for participles. Returns DataFrame: form, count, pct.
nt_perfect_participles(book=None)
¶
Perfect participle tokens with reference, lemma, and gloss.
print_nt_participle_overview()
¶
Print a statistical overview of GNT participle morphology.
nt_verb_profile
¶
Greek NT verb morphology profile.
Provides tense × voice × mood statistics across the GNT, paralleling the Hebrew stem notebooks (qal.py, niphal.py, etc.) but operating on the MACULA Greek syntax layer instead of the OT data.
Public API ────────── nt_verb_data() → DataFrame (all GNT verb tokens) nt_verb_tense_profile(book=None) → DataFrame (tense distribution) nt_verb_voice_profile(book=None) → DataFrame (voice distribution) nt_verb_mood_profile(book=None) → DataFrame (mood distribution) nt_verb_tense_voice(book=None) → DataFrame (tense × voice crosstab) nt_verb_tense_mood(book=None) → DataFrame (tense × mood crosstab) nt_verb_top_lemmas(n=30, book=None) → DataFrame (most frequent verb lemmas) nt_verb_lemma_tense(lemmas=None) → DataFrame (lemma × tense crosstab) nt_verb_book_distribution() → DataFrame (count + pct per NT book) nt_verb_genre_profile() → DataFrame (tense % by genre group)
print_nt_verb_overview() → None print_nt_verb_tense(book=None) → None print_nt_verb_voice(book=None) → None print_nt_verb_mood(book=None) → None print_nt_verb_tense_voice(book=None) → None print_nt_verb_top_lemmas(n=20) → None print_nt_verb_genre_profile() → None print_nt_verb_book_distribution() → None
nt_verb_tense_chart(book=None) → Path | None nt_verb_voice_chart(book=None) → Path | None nt_verb_mood_chart(book=None) → Path | None nt_verb_genre_heatmap() → Path | None nt_verb_book_chart() → Path | None nt_verb_tense_voice_heatmap(book=None) → Path | None
nt_verb_book_chart()
¶
Bar chart of verb counts across NT books.
nt_verb_book_distribution()
¶
Token count and % per NT book. Columns: book, count, pct, pct_of_book_words.
nt_verb_data(book=None)
¶
All GNT verb tokens with morphological columns, optionally filtered to one book.
nt_verb_genre_heatmap()
¶
Heatmap of tense % by NT genre group.
nt_verb_genre_profile()
¶
Tense % breakdown by NT genre group. Rows=genre, cols=tenses.
nt_verb_lemma_tense(lemmas=None, top_n=15)
¶
Lemma × tense crosstab (counts). Pass a lemma list or get top-n by frequency.
nt_verb_mood_chart(book=None)
¶
Horizontal bar chart of mood distribution.
nt_verb_mood_profile(book=None)
¶
Count verb tokens by mood. Returns DataFrame: form, count, pct.
nt_verb_tense_chart(book=None)
¶
Horizontal bar chart of tense distribution.
nt_verb_tense_mood(book=None)
¶
Tense × mood crosstab (counts).
nt_verb_tense_profile(book=None)
¶
Count verb tokens by tense. Returns DataFrame: form, count, pct.
nt_verb_tense_voice(book=None)
¶
Tense × voice crosstab (counts).
nt_verb_tense_voice_heatmap(book=None)
¶
Heatmap of tense × voice (counts).
nt_verb_top_lemmas(n=30, book=None)
¶
Top-n most frequent verb lemmas. Columns: lemma, strong_g, count, pct, top_gloss.
nt_verb_voice_chart(book=None)
¶
Bar chart of voice distribution.
nt_verb_voice_profile(book=None)
¶
Count verb tokens by voice. Returns DataFrame: form, count, pct.
print_nt_verb_overview()
¶
Print a quick statistical overview of GNT verb morphology.
ot
¶
aramaic_nominal
¶
Biblical Aramaic nominal morphology analysis.
Covers noun state/gender/number distribution, pronoun types, preposition frequency, and adjective usage in the Aramaic sections of Daniel and Ezra. All functions filter to lang='A'.
Public API ────────── aramaic_noun_data(book=None) → DataFrame (all Aramaic noun tokens) aramaic_pron_data(book=None) → DataFrame (all Aramaic pronoun tokens) aramaic_prep_data(book=None) → DataFrame (all Aramaic preposition tokens) aramaic_adj_data(book=None) → DataFrame (all Aramaic adjective tokens)
aramaic_noun_state_profile(book=None) → DataFrame (absolute/construct/determined) aramaic_noun_gender_profile(book=None) → DataFrame (masculine/feminine) aramaic_noun_number_profile(book=None) → DataFrame (singular/plural/dual) aramaic_noun_gender_state(book=None) → DataFrame (gender × state crosstab) aramaic_noun_top_lemmas(n=20, book=None) → DataFrame (most frequent noun lemmas) aramaic_noun_state_by_book() → DataFrame (state counts per book) aramaic_pron_type_profile(book=None) → DataFrame (pronoun type distribution) aramaic_prep_frequency(n=15, book=None) → DataFrame (top prepositions by lemma) aramaic_class_distribution(book=None) → DataFrame (all word classes)
print_aramaic_nominal_overview() → None print_aramaic_noun_state(book=None) → None print_aramaic_noun_gender(book=None) → None print_aramaic_noun_top_lemmas(n=20, book=None) → None print_aramaic_noun_state_by_book() → None print_aramaic_pron_profile(book=None) → None print_aramaic_prep_frequency(n=15, book=None) → None
aramaic_noun_state_chart(book=None) → Path | None aramaic_noun_state_book_chart() → Path | None aramaic_prep_chart(book=None) → Path | None
aramaic_adj_data(book=None)
¶
All Aramaic adjective tokens (lang='A', class_='adj').
aramaic_class_distribution(book=None)
¶
Distribution of all word classes in the Aramaic sections.
aramaic_noun_data(book=None)
¶
All Aramaic noun tokens (lang='A', class_='noun').
aramaic_noun_gender_profile(book=None)
¶
Noun gender distribution.
aramaic_noun_gender_state(book=None)
¶
Gender × state crosstab (counts).
aramaic_noun_number_profile(book=None)
¶
Noun number distribution.
aramaic_noun_state_book_chart()
¶
Stacked bar chart of noun state % per book (Daniel vs. Ezra).
aramaic_noun_state_by_book()
¶
Noun state counts per book (Daniel and Ezra).
aramaic_noun_state_chart(book=None)
¶
Horizontal bar chart of Aramaic noun state distribution.
aramaic_noun_state_profile(book=None)
¶
Noun state distribution (absolute/construct/determined).
aramaic_noun_top_lemmas(n=20, book=None)
¶
Top-n most frequent Aramaic noun lemmas.
aramaic_prep_chart(book=None)
¶
Horizontal bar chart of top Aramaic prepositions.
aramaic_prep_data(book=None)
¶
All Aramaic preposition tokens (lang='A', class_='prep').
aramaic_prep_frequency(n=15, book=None)
¶
Top-n Aramaic prepositions by frequency. Columns: lemma, count, pct, top_gloss.
aramaic_pron_data(book=None)
¶
All Aramaic pronoun tokens (lang='A', class_='pron').
aramaic_pron_type_profile(book=None)
¶
Pronoun type distribution (pronominal/personal/demonstrative/etc.).
print_aramaic_nominal_overview()
¶
Print a statistical overview of Biblical Aramaic nominal morphology.
aramaic_profile
¶
Biblical Aramaic verb morphology analysis.
Analyses the Aramaic verb stems (Peal, Pael, Haphel, Peil, Hithpeel, etc.) from the Daniel and Ezra sections of the MACULA Hebrew WLC dataset. All functions filter to tokens where lang='A'.
Public API ────────── aramaic_data(book=None) → DataFrame (all Aramaic tokens) aramaic_verb_data(book=None) → DataFrame (Aramaic verbs only) aramaic_stem_profile(book=None) → DataFrame (stem distribution) aramaic_conj_profile(book=None) → DataFrame (conjugation distribution) aramaic_stem_conj(stem=None) → DataFrame (stem × conjugation crosstab) aramaic_top_roots(n=30, book=None) → DataFrame (most frequent roots) aramaic_book_distribution() → DataFrame (token count per book) aramaic_stem_by_book() → DataFrame (stem counts per book)
print_aramaic_overview() → None print_aramaic_stem_profile(book=None) → None print_aramaic_conj_profile(book=None) → None print_aramaic_stem_conj(stem=None) → None print_aramaic_top_roots(n=20, book=None) → None print_aramaic_book_distribution() → None
aramaic_stem_chart(book=None) → Path | None aramaic_conj_chart(book=None) → Path | None aramaic_stem_book_chart() → Path | None
aramaic_book_distribution()
¶
Token count (all classes) per book. Columns: book, tokens, verbs, pct_verbs.
aramaic_conj_profile(book=None)
¶
Conjugation (type_) distribution for Aramaic verbs.
aramaic_data(book=None)
¶
All Aramaic tokens (lang='A'), optionally filtered to one book.
aramaic_stem_by_book()
¶
Stem counts per book (Daniel and Ezra).
aramaic_stem_conj(stem=None)
¶
Stem × conjugation crosstab for Aramaic verbs.
If stem is None, shows all stems; otherwise filters to that stem.
aramaic_stem_profile(book=None)
¶
Stem distribution for Aramaic verbs. Returns DataFrame: form, count, pct.
aramaic_top_roots(n=30, book=None)
¶
Top-n most frequent Aramaic verb roots by lemma.
aramaic_verb_data(book=None)
¶
Aramaic verb tokens (lang='A', class_='verb').
print_aramaic_overview()
¶
Print a statistical overview of Biblical Aramaic in the MACULA dataset.
ot_discourse
¶
Hebrew OT discourse structure analysis — narrative peak and episode boundary detection.
Builds on the existing verbal_syntax module to provide higher-level discourse analysis:
- Narrative peak scoring — identify the climactic section of a narrative using density of: wayyiqtol, direct speech, rare lexical items, short clauses
- Episode boundary detection — identify probable episode breaks using: disjunctive clauses (waw + non-verb), scene-setting formulas (וַיְהִי), temporal/spatial markers, and wayyiqtol chain gaps
- Wayyiqtol density by chapter — where does the action concentrate?
- Direct speech density by chapter
Terminology (after Longacre, The Grammar of Discourse, 1983): PEAK — the climactic paragraph/episode, marked by dense wayyiqtol + speech EPISODE — a coherent narrative unit, separated by disjunctive/scene-setting clauses BACKBONE — the wayyiqtol chain that drives the main narrative forward
Questions this answers ────────────────────── • Where is the narrative peak of Genesis 22? • How many episode boundaries are in Exodus 1–15? • Which chapters of 1 Samuel have the highest wayyiqtol density? • Where does direct speech cluster in Ruth?
Public API ────────── ot_discourse_wayyiqtol_density(book) → DataFrame (chapter, count, density) ot_discourse_speech_density(book) → DataFrame (chapter, speech_count, density) ot_discourse_peak_score(book, window=5) → DataFrame (chapter, peak_score) ot_discourse_episode_boundaries(book) → DataFrame (verse refs at episode breaks) ot_discourse_narrative_profile(book) → dict (combined metrics)
print_ot_discourse_overview(book) → None print_ot_wayyiqtol_density(book) → None print_ot_speech_density(book) → None print_ot_peak_score(book) → None print_ot_episode_boundaries(book) → None
ot_discourse_density_chart(book) → Path | None ot_discourse_peak_chart(book) → Path | None
ot_discourse_episode_boundaries(book, *, min_gap=3)
¶
Detect probable episode boundaries in an OT narrative book.
An episode boundary is flagged when
- A wayyiqtol chain ends (gap of >= min_gap verses without wayyiqtol)
- OR a הָיָה (wayehi) scene-setting formula appears
- OR the first verb of a new section is disjunctive (waw + qatal)
ot_discourse_lexical_diversity(book)
¶
Type-token ratio (TTR) per chapter — higher TTR may signal heightened style.
Returns: chapter, total_tokens, unique_lemmas, ttr.
ot_discourse_narrative_profile(book)
¶
Summary of discourse metrics for an OT book.
Returns a dict with: total_tokens, wayyiqtol_pct, peak_chapter, peak_score.
ot_discourse_peak_score(book)
¶
Composite narrative peak score per chapter.
Score = 0.4 × wayyiqtol_density + 0.3 × speech_density + 0.3 × ttr_normalised. Higher scores suggest narrative climax / peak. All components are normalised to 0–1 range before combining.
Returns: chapter, wayyiqtol_density, speech_density, ttr, peak_score.
ot_discourse_speech_density(book)
¶
Speech-verb counts per chapter — a proxy for direct speech density.
Counts tokens with type_ == 'qatal' or 'wayyiqtol' whose lemma is אָמַר (the dominant speech verb). Also includes נָאַם (oracle of) tokens.
Returns: chapter, speech_count, total_tokens, density (per 100 tokens).
ot_discourse_wayyiqtol_density(book, *, normalize_by_tokens=True)
¶
Wayyiqtol counts (and density) per chapter in a given OT book.
Returns: chapter, total_tokens, wayyiqtol_count, density (per 100 tokens).
ot_noun_profile
¶
Hebrew OT noun morphology profile.
Provides state (absolute/construct/determined), gender, number, and article usage statistics across the Hebrew Bible, backed by the MACULA Hebrew WLC dataset. All functions filter to lang='H' (Hebrew) unless otherwise noted.
Public API ────────── ot_noun_data(book=None) → DataFrame (all Hebrew noun tokens) ot_adj_data(book=None) → DataFrame (all Hebrew adjective tokens) ot_noun_gender_profile(book=None) → DataFrame (gender distribution) ot_noun_number_profile(book=None) → DataFrame (number distribution) ot_noun_state_profile(book=None) → DataFrame (state distribution) ot_noun_gender_state(book=None) → DataFrame (gender × state crosstab) ot_noun_top_lemmas(n=30, book=None) → DataFrame (most frequent noun lemmas) ot_noun_lemma_state(lemmas=None, top_n=15)→ DataFrame (lemma × state crosstab) ot_noun_book_distribution() → DataFrame (count + pct per OT book) ot_noun_genre_profile() → DataFrame (state % by genre group) ot_article_usage(book=None) → DataFrame (article token stats) ot_construct_top_lemmas(n=20, book=None) → DataFrame (top construct-state nouns)
print_ot_noun_overview() → None print_ot_noun_gender(book=None) → None print_ot_noun_state(book=None) → None print_ot_noun_top_lemmas(n=20, book=None) → None print_ot_construct_top_lemmas(n=20, book=None) → None print_ot_noun_genre_profile() → None print_ot_noun_book_distribution() → None print_ot_article_usage(book=None) → None
ot_noun_state_chart(book=None) → Path | None ot_noun_gender_chart(book=None) → Path | None ot_noun_genre_heatmap() → Path | None ot_noun_book_chart() → Path | None
ot_adj_data(book=None)
¶
All Hebrew adjective tokens (class_='adj', lang='H').
ot_article_usage(book=None)
¶
Article token (class_='art') counts vs. noun counts by genre or single book.
In Hebrew the definite article ה attaches to the following word (noun, adj, participle). MACULA WLC represents it as a separate class_='art' token. Returns DataFrame: scope, nouns, articles, pct_articular.
ot_construct_top_lemmas(n=20, book=None)
¶
Top-n noun lemmas that most frequently appear in the construct state.
ot_noun_book_chart()
¶
Bar + line chart of noun counts and % across OT books.
ot_noun_book_distribution()
¶
Noun token count and % per OT book (Hebrew only). Includes % of book words.
ot_noun_data(book=None)
¶
All Hebrew noun tokens (class_='noun', lang='H'), optionally filtered to one book.
ot_noun_gender_chart(book=None)
¶
Bar chart of noun gender distribution.
ot_noun_gender_profile(book=None)
¶
Gender distribution across Hebrew nouns. Returns DataFrame: form, count, pct.
ot_noun_gender_state(book=None)
¶
Gender × state crosstab (counts).
ot_noun_genre_heatmap()
¶
Heatmap of noun state % by OT genre group.
ot_noun_genre_profile()
¶
State % breakdown by OT genre group. Rows=genre, cols=states.
ot_noun_lemma_state(lemmas=None, top_n=15)
¶
Lemma × state crosstab. Pass a lemma list or get top-n by frequency.
ot_noun_number_profile(book=None)
¶
Number distribution across Hebrew nouns. Returns DataFrame: form, count, pct.
ot_noun_state_chart(book=None)
¶
Horizontal bar chart of noun state distribution.
ot_noun_state_profile(book=None)
¶
State distribution (absolute/construct/determined). Returns DataFrame: form, count, pct.
ot_noun_top_lemmas(n=30, book=None)
¶
Top-n most frequent Hebrew noun lemmas. Columns: lemma, count, pct, top_gloss.
print_ot_noun_overview()
¶
Print a statistical overview of Hebrew OT noun morphology.
ot_numbers
¶
Hebrew OT number morphology profile.
Analyses the 6,881 number tokens (class_='num') in the MACULA Hebrew WLC dataset, covering cardinal and ordinal numbers, gender polarity, construct chains, and distribution across the OT corpus.
Key pedagogical point for BBH Ch11: the gender-polarity rule — cardinal numbers 3–10 take the opposite gender of the noun they count (masculine form counts feminine nouns and vice versa). אֶחָד/שְׁנַיִם (1–2) agree normally; 11–19 use both; 20+ are invariable.
Public API ────────── ot_number_data(book=None) → DataFrame (all num-class tokens) ot_number_frequency() → DataFrame (lemma frequency table) ot_number_gender_profile(book=None) → DataFrame (gender distribution) ot_number_state_profile(book=None) → DataFrame (state distribution) ot_number_book_distribution() → DataFrame (count + pct per OT book) ot_number_genre_profile() → DataFrame (count by genre group) ot_number_polarity_table() → DataFrame (gender × value for cardinals 1–10) ot_top_number_lemmas(n=20) → DataFrame (most frequent number lemmas)
print_ot_number_overview() → None print_ot_number_frequency(n=20) → None print_ot_number_gender(book=None) → None print_ot_number_state(book=None) → None print_ot_number_book_distribution() → None print_ot_number_genre_profile() → None print_ot_number_polarity() → None
ot_number_frequency_chart() → Path | None ot_number_genre_chart() → Path | None ot_number_book_chart() → Path | None
ot_number_book_distribution()
¶
Count of number tokens per OT book in canonical order.
ot_number_data(book=None)
¶
All Hebrew number tokens (class_='num', lang='H').
ot_number_frequency()
¶
Frequency table: lemma × count × pct, sorted by count.
ot_number_gender_profile(book=None)
¶
Gender distribution of number tokens.
ot_number_genre_profile()
¶
Number token count and percentage per genre group.
ot_number_polarity_table()
¶
Gender distribution for cardinals 1–10 illustrating the polarity rule.
Numbers 3–10 show the reverse-gender pattern: the masculine form (no ה-) counts feminine nouns; the feminine form (-ה) counts masculine nouns.
ot_number_state_profile(book=None)
¶
State distribution of number tokens.
ot_top_number_lemmas(n=20)
¶
Most frequent number lemmas with gloss and Strong's.
ot_participant
¶
OT participant tracking and entity chains for the Hebrew Old Testament.
Follows named participants (by lemma or participantref ID) through narrative:
what they do (subject position), what is done to them (object), and where
they speak. Uses the MACULA Hebrew WLC participantref column where populated,
with lemma-based fallback for the many cases where it is absent.
Data notes ────────── participantref — sparse MACULA entity reference IDs; ~10–20% of tokens speaker — verse-level speaker column (from MACULA lowfat XML) type_ — verb form (wayyiqtol, qatal, etc.) The OT lowfat syntax tree (load_syntax_ot) has subject/object role data.
Approach ──────── 1. Primary: participantref lookup — most precise when populated 2. Fallback: lemma match — find all tokens with the named lemma
Questions this answers ────────────────────── • What verbs does Abraham appear as subject of across Genesis? • What happens to Moses (as object) in Exodus? • In which chapters of Genesis does Jacob appear most? • How do the subject-verb profiles of Abraham vs. Moses compare?
Public API ────────── KNOWN_OT_PARTICIPANTS → dict of participant anchors
ot_participant_data(lemma, book=None) → DataFrame ot_participant_subject_verbs(lemma, book=None) → DataFrame ot_participant_object_verbs(lemma, book=None) → DataFrame ot_participant_chain(book, lemma) → DataFrame (chapter presence) ot_entity_density(book) → DataFrame (chapter, entities) ot_participant_compare(lemmas) → DataFrame pivot
print_ot_participant_profile(lemma, book=None) → None print_ot_participant_chain(book, lemma) → None print_ot_participant_compare(lemmas) → None
ot_participant_chain_chart(book, lemmas) → Path | None ot_entity_density_chart(book) → Path | None
ot_entity_density(book, top_n_entities=15)
¶
Per-chapter count of distinct participant lemmas from KNOWN_OT_PARTICIPANTS.
Returns: chapter, entity, mention_count — one row per (chapter, entity) pair.
ot_participant_chain(book, participant)
¶
Chapter-by-chapter presence of a participant in a book.
Returns: chapter, mention_count — one row per chapter.
ot_participant_compare(participants, *, book=None)
¶
Side-by-side mention counts and book spread for a list of participants.
Returns: participant, total_mentions, books_present, top_book.
ot_participant_data(participant, *, book=None)
¶
All tokens referencing a participant (by KNOWN_OT_PARTICIPANTS key or Hebrew lemma).
Returns the raw token DataFrame filtered to that participant.
ot_participant_object_verbs(participant, *, book=None, top_n=20)
¶
Verbs for which this participant is the syntactic object/complement.
Uses same-verse co-occurrence heuristic: verbal tokens in verses containing the participant, where the participant is not the first nominal (rough proxy).
Returns: verb_lemma, gloss, count.
ot_participant_subject_verbs(participant, *, book=None, top_n=30)
¶
Verbs for which this participant is the syntactic subject.
Uses the OT lowfat syntax tree (subject role). Falls back to heuristic: wayyiqtol immediately following the participant's name token (same verse).
Returns: verb_lemma, gloss, count — sorted descending.
ot_predicate_args
¶
Hebrew OT predicate-argument structure (semantic role labeling).
The MACULA Hebrew WLC dataset includes a frame column on verb tokens that
encodes semantic role structure — who does what to whom — using PropBank-style
argument labels:
A0 — proto-agent (subject / initiator) A1 — proto-patient (object / affected entity)
Each argument value is one or more MACULA xml_id references pointing to the actual token(s) that fill that role.
Format: 'A0:
~68,207 Hebrew verb tokens have frame data (out of ~475,911 total).
~27,395 have an A1 (patient) argument; ~40,812 have A0 only.
Questions this answers ────────────────────── • What does God (A0) do as agent in the Torah? • What gets created/destroyed/given (A1) across the OT? • Which verbs does YHWH appear as agent most often? • Which verbs take Israel/the people as patient most often? • What does the agent of a given verb profile look like?
Public API ────────── ot_frame_data(book=None, lang='H') → DataFrame (verb tokens with frame) ot_agent_verbs(agent_lemma, top_n=20, ...) → DataFrame (what agent does as A0) ot_patient_verbs(patient_lemma, top_n=20, ...) → DataFrame (what acts on patient as A1) ot_verb_agents(verb_lemma, top_n=20, ...) → DataFrame (who is A0 of this verb) ot_verb_patients(verb_lemma, top_n=20, ...) → DataFrame (who is A1 of this verb) ot_frame_pairs(book=None, top_n=20) → DataFrame (most common A0,verb,A1 triples)
print_ot_agent_verbs(agent_lemma, ...) → None print_ot_patient_verbs(patient_lemma, ...) → None print_ot_verb_agents(verb_lemma, ...) → None print_ot_verb_patients(verb_lemma, ...) → None print_ot_frame_pairs(book=None, top_n=20) → None
ot_agent_verbs_chart(agent_lemma, ...) → Path | None ot_patient_verbs_chart(patient_lemma, ...) → Path | None
ot_agent_verbs(agent_lemma, *, top_n=20, book=None, lang='H')
¶
Verbs where the given lemma is the A0 (proto-agent / subject).
Returns: lemma (verb), gloss, count.
ot_frame_data(book=None, *, lang='H')
¶
All Hebrew verb tokens that have frame (A0/A1) data.
Returns the original DataFrame rows plus resolved columns
a0_lemma, a0_gloss, a1_lemma, a1_gloss
(using the first listed A0/A1 reference; semicolon-separated multiples are stored in a0_refs/a1_refs as raw lists).
ot_frame_pairs(book=None, *, top_n=20, lang='H', require_a1=True)
¶
Most common (A0, verb, A1) triples across the OT.
Parameters¶
require_a1 : if True (default), only include rows where A1 is present.
ot_patient_verbs(patient_lemma, *, top_n=20, book=None, lang='H')
¶
Verbs where the given lemma is the A1 (proto-patient / object).
Returns: lemma (verb), gloss, count.
ot_verb_agents(verb_lemma, *, top_n=20, book=None, lang='H')
¶
Who (A0) typically performs the given verb?
Returns: a0_lemma, a0_gloss, count.
ot_verb_patients(verb_lemma, *, top_n=20, book=None, lang='H')
¶
What (A1) typically receives the given verb?
Returns: a1_lemma, a1_gloss, count.
ot_semantic_domains
¶
Hebrew OT semantic domain analysis (SDBH / coredomain / lexdomain).
The MACULA Hebrew WLC dataset carries three semantic annotation columns:
coredomain — 190 thematic categories from the MARBLE SDBH project (e.g. 046=Deity, 042=Covenant, 189=Worship, 088=Justice) Space-separated when a word spans multiple categories. ~160,923 non-empty tokens.
lexdomain — Hierarchical SDBH code (up to 12 digits in groups of 3) Top level: 001=Objects, 002=Events, 003=Referents, 004=Markers ~244,721 non-empty tokens.
sdbh — Fine-grained 15-digit MARBLE lexical entry identifier. ~244,734 non-empty tokens.
Questions this answers ────────────────────── • What are the most common semantic domains in Isaiah? • Which books have the most "Deity" (046) vocabulary? • What are the top lemmas in the "Covenant" (042) domain? • How does "Worship" vocabulary distribute across Torah vs. Prophets? • What coredomain categories cluster in Leviticus vs. Psalms?
Public API ────────── ot_domain_data(domain=None, book=None, lang='H') → DataFrame ot_domain_frequency(book=None, top_n=30) → DataFrame (domain counts) ot_top_domain_lemmas(domain, top_n=20, book=None) → DataFrame ot_domain_book_distribution(domain) → DataFrame ot_domain_genre_profile(domain=None) → DataFrame ot_domain_comparison(books, top_n=15) → pivot DataFrame ot_coredomain_profile(book, top_n=20) → DataFrame
print_ot_domain_overview() → None print_ot_domain_frequency(book=None, top_n=30) → None print_ot_top_lemmas(domain, top_n=20, book=None) → None print_ot_domain_book_distribution(domain) → None print_ot_domain_genre_profile(domain) → None print_ot_domain_comparison(books, top_n=15) → None
ot_domain_frequency_chart(book=None, top_n=25) → Path | None ot_domain_book_chart(domain, top_n=20) → Path | None ot_domain_genre_chart(domain) → Path | None ot_domain_heatmap(books, top_n=15) → Path | None
COREDOMAIN_NAMES → dict[str, str] THEOLOGY_COREDOMAINS → dict[str, list[str]]
ot_coredomain_profile(book, *, top_n=20, lang='H')
¶
Semantic domain fingerprint for a single OT book.
ot_domain_book_distribution(domain, *, lang='H')
¶
Book-by-book distribution of tokens in a given coredomain.
ot_domain_comparison(books, *, top_n=15, lang='H')
¶
Compare coredomain profiles across multiple OT books.
Returns a pivot: rows=domain, cols=books, cells=% of book's coded tokens.
ot_domain_data(domain=None, *, book=None, lang='H')
¶
All Hebrew OT tokens, optionally filtered by coredomain and/or book.
Parameters¶
domain : coredomain code (e.g. '46', 46, 'Deity') or None for all book : book abbreviation or list of abbreviations lang : 'H' (Hebrew, default) or 'A' (Aramaic) or None (all)
ot_domain_frequency(book=None, *, top_n=30, lang='H', exclude_empty=True)
¶
Count how many tokens fall in each coredomain category.
Returns a DataFrame with: code, domain_name, count, pct.
ot_domain_genre_profile(domain=None, *, lang='H')
¶
Genre distribution for a given domain (or all domains if None).
Returns: genre, count, pct.
ot_theology_profile(group, *, book=None)
¶
Token counts for a named theological domain group.
Parameters¶
group : key in THEOLOGY_COREDOMAINS (e.g. 'Covenant', 'Worship') book : optional book filter
Returns lemma-level counts within that theological cluster.
ot_top_domain_lemmas(domain, *, top_n=20, book=None, lang='H')
¶
Most frequent lemmas in a given coredomain category.
Returns: lemma, strong_h, gloss, count.
ot_speaker
¶
OT speaker attribution — who speaks in the Hebrew Bible.
Uses MACULA Hebrew subjref links on speech verb tokens to resolve
which entity is the grammatical subject of each speech act.
Questions this answers ────────────────────── • What does YHWH say in Isaiah? • How many verses in each OT book contain divine speech? • What does Moses say vs. what does God say in Deuteronomy? • Who speaks in Job — God, Job, the friends, the narrator? • What proportion of Jeremiah is direct divine speech?
Public API ────────── speaker_verses(speaker_strongs, corpus_df, ...) → DataFrame of speech-verb tokens divine_speech_by_book(...) → per-book divine speech counts print_speaker_summary(speaker_strongs, ...) → terminal table speaker_report(speaker_strongs, ...) → Markdown report
OT speech verbs tracked ─────────────────────── אָמַר H0559 say דָּבַר H1696 speak קָרָא H7121 call / proclaim עָנָה H6030 answer צָוָה H6680 command שָׁלַח H7971 send (indirect speech) נָאַם H5001 declare (prophetic formula: 'oracle of YHWH')
divine_speech_by_book(speaker_strongs=None, *, count_mode='verses')
¶
Per-book count of divine speech events.
Parameters¶
speaker_strongs : Strong's for the speaker (default: GOD_OT_SPEECH) count_mode : 'verses' (distinct ref strings) or 'tokens' (raw verb count)
Returns a DataFrame with columns: book, count, pct_of_total. The pct column is percentage of all speech-verb tokens in that book attributed to the given speaker.
divine_speech_verses(book, speaker_strongs=None, *, verb_lemma=None)
¶
Return sorted list of verse refs where the given speaker takes a speech verb as subject in the specified book.
Useful for: list all verses in Isaiah where YHWH speaks directly.
print_divine_speech_by_book(speaker_strongs=None, *, label=None, min_count=1)
¶
Print per-book divine speech verse counts with percentage.
print_speaker_summary(speaker_strongs, *, books=None, top_n=20, label=None)
¶
Print a formatted table of speech verbs attributed to the given speaker.
speaker_report(speaker_strongs, *, books=None, top_n=30, label=None, output_dir='output/reports')
¶
Generate a Markdown report of speech acts by the given speaker.
Returns path to saved Markdown file.
speaker_verses(speaker_strongs, *, books=None, speech_verbs=None, include_tokens=False, top_n=None)
¶
Return speech-verb tokens whose grammatical subject resolves to the given Strong's number(s).
Parameters¶
speaker_strongs : Strong's number(s) for the speaker, e.g. ['H3068','H0430'] books : restrict to specific book_ids speech_verbs : set of Strong's numbers for speech verbs to check (default: SPEECH_VERB_STRONGS) include_tokens : if True, return full token rows; if False, return aggregated (verb_lemma, gloss, book, count) summary top_n : return top N results (aggregated mode only)
Returns a DataFrame sorted by count descending.
who_speaks(book, *, top_n=15)
¶
For a single OT book, show a breakdown of ALL speakers by frequency — how many speech-verb tokens does each entity take as subject?
Useful for character studies: who dominates dialogue in Job? in Genesis?
Returns a DataFrame with columns: speaker_strong, speaker_lemma, speaker_gloss, verb_count.
poetry
¶
Hebrew poetry analysis: cola splitting, parallelism detection, and word-pair statistics.
Hebrew poetry is organized into poetic lines (verses) divided into cola (half-lines) by the cantillation accent system embedded in the MT text. The Etnahta (U+0591) marks the main mid-verse division (end of A-colon); the Silluq / Sof Pasuq marks verse end (end of B- or final colon). Stronger disjunctive accents (Zaqef, Revia, Tifha, Athnach) subdivide further, yielding C-cola in longer verses.
Parallelism types (after Lowth / Watson / Alter): synonymous — A and B express the same idea with different words antithetic — A and B express contrasting ideas synthetic — B extends, intensifies, or completes A emblematic — one colon is literal, one figurative (simile/metaphor)
This module detects cola boundaries from accents, extracts content-word pairs across cola, scores lexical / semantic similarity, and classifies parallelism.
Public API ────────── split_cola(verse_df) → list[DataFrame] — one df per colon verse_parallel_pairs(book, ch, vs) → DataFrame of A/B word pairs parallelism_type(book, ch, vs) → str classification book_word_pairs(book) → most common A/B parallel word pairs print_verse_analysis(book, ch, vs) → formatted terminal output parallel_word_pair_table(book) → canonical parallel pair inventory POETRY_BOOKS → list of primary Hebrew poetry books
acrostic_known(book)
¶
Return list of known acrostic chapters/ranges for a book.
book_meter_stats(book)
¶
Compute meter statistics for every verse in a book.
Returns DataFrame with columns: chapter, verse, pattern, meter_type, stresses_a, stresses_b, syllables_a, syllables_b.
book_parallelism_stats(book)
¶
Count parallelism types across all verses in a book.
Returns a DataFrame: type | count | pct
book_word_pairs(book, *, min_count=2, top_n=40, content_only=True)
¶
Most frequent A/B parallel word pairs across all verses in a book.
Returns a DataFrame: lemma_a | gloss_a | lemma_b | gloss_b | count sorted by count descending.
compare_poetry_books(books=None)
¶
Compare parallelism type distribution across multiple poetry books. Returns a pivot DataFrame: rows=type, cols=books, values=pct.
detect_acrostic(book, chapter, start_verse, end_verse, *, stanza_size=1)
¶
Detect an alphabetic acrostic across a verse range.
Parameters¶
book, chapter : book and chapter start_verse, end_verse : inclusive verse range stanza_size : verses per stanza (1 for most acrostics; 8 for Ps 119 where each stanza of 8 verses starts with the same letter)
Returns dict with keys
hits : list of (verse, expected_letter, actual_letter, match) match_count : int total : int pct_match : float is_acrostic : bool (match_count / total >= 0.75) pattern : 'full' | 'partial' | 'none'
detect_chiasm(book, chapter, start_verse, end_verse, *, min_score=0.1)
¶
Detect chiastic (A B B' A') structure across a verse range.
Parameters¶
book, chapter : book ID and chapter number start_verse, end_verse : inclusive verse range (within one chapter) min_score : minimum average mirror-pair Jaccard to report a hit
Returns a dict with keys
pattern : list of str labels e.g. ['A', 'B', 'B'', 'A''] verses : list of (ch, vs) tuples in order pairs : list of ((ch1,vs1), (ch2,vs2), score) for each mirror pair pivot : (ch, vs) or None if even number of verses mean_score : float — average Jaccard across mirror pairs is_chiasm : bool — True if mean_score >= min_score lemma_sets : dict mapping (ch,vs) → frozenset of lemmas
is_superscription(book, chapter, verse)
¶
Return True if this verse appears to be a Psalm superscription.
A verse is considered a superscription if
- It is verse 1 of a Psalm chapter
- The majority of its content words are superscription lemmas
- The total word count is short (≤8 content words — headings are brief)
so the "no mid-verse split" heuristic is unreliable; we use lemma ratio instead.
parallelism_type(book, chapter, verse)
¶
Classify the parallelism type for a verse.
Returns (type_label, confidence_score) where type_label is one of: 'synonymous' — high lexical/domain overlap, same polarity 'antithetic' — low lexical overlap, antithetic markers present 'synthetic' — low overlap, no antithetic marker; B extends A 'single_colon'— verse has only one colon (no split found)
Confidence is 0.0–1.0; classifications below 0.3 are uncertain.
This is a heuristic classifier — it captures the dominant type for clear cases and marks ambiguous ones as 'synthetic' (the catch-all).
poetry_report(book, *, output_dir='output/reports', top_n_pairs=30)
¶
Generate a Markdown report on the poetry of a book
- parallelism type distribution
- top parallel word pairs
Returns path to saved Markdown file.
print_acrostic(book, chapter, start_verse, end_verse, *, stanza_size=1)
¶
Print an acrostic analysis for a verse range.
print_book_pairs(book, *, top_n=20, min_count=3)
¶
Print the most frequent parallel word pairs in a poetry book.
print_chiasm(book, chapter, start_verse, end_verse, *, min_score=0.1)
¶
Print a formatted chiasm analysis for a verse range.
print_meter_stats(book)
¶
Print a summary of meter patterns for a book.
print_parallelism_stats(book)
¶
Print parallelism type distribution for a book.
print_verse_analysis(book, chapter, verse, *, show_accents=False)
¶
Print a formatted cola analysis for a single verse.
print_verse_meter(book, chapter, verse)
¶
Print meter analysis for one verse.
split_cola(verse_df)
¶
Split a verse DataFrame (one row per word) into cola based on cantillation
accents embedded in the text column.
Returns a list of DataFrames: [colon_A, colon_B] or [A, B, C] for longer verses. Each colon includes the word that carries the dividing accent.
The primary split is at the Etnahta (U+0591). If no Etnahta is found, falls back to splitting on Zaqef Gadol (U+0595) or Revia (U+0597). Single-word or empty verses return [verse_df].
verse_cola(book, chapter, verse)
¶
Return the cola for a single verse.
verse_meter(book, chapter, verse)
¶
Estimate the meter pattern for one verse.
Returns dict with
cola : list of int — stress count per colon pattern : str e.g. '3+2' or '3+3' syllables : list of int — syllable count per colon meter_type : 'qinah(3+2)' | 'balanced(3+3)' | 'other'
verse_parallel_pairs(book, chapter, verse)
¶
Return a DataFrame of content-word pairs across cola A and B for a verse.
For each content word in colon A, pairs it with each content word in colon B, yielding a row with: lemma_a, gloss_a, strong_a, lemma_b, gloss_b, strong_b, same_lemma (bool), domain_overlap (float), book, chapter, verse
prepositions
¶
Preposition frequency and collocate analysis for Biblical Hebrew.
Data source: MACULA OT syntax data (syntax_ot), which provides clean pointed lemmas and positional word_num fields for adjacency joins.
Primary functions
prep_frequency() — frequency table of all prepositions prep_by_book() — one preposition's distribution across books prep_distribution_table() — side-by-side comparison of major prepositions prep_collocates() — top collocates for a given preposition prep_object_types() — grammatical breakdown of what follows a prep compare_preps() — side-by-side collocate comparison of two preps find_governing_prep() — look backwards from a word position for a governing preposition inf_cst_by_prep() — frequency table of infinitive constructs by governing preposition
Print wrappers
print_prep_frequency() print_prep_by_book() print_prep_distribution() print_prep_collocates() print_compare_preps() print_inf_cst_by_prep()
compare_preps(lemma1, lemma2, pos=None, top_n=20, book_group=None)
¶
Side-by-side collocate comparison of two prepositions.
Parameters¶
lemma1, lemma2 : str Pointed Hebrew lemmas, e.g. 'לְ' and 'בְּ'. pos : str, optional Filter following words by POS substring. top_n : int Number of top collocates per preposition. book_group : str, optional
Returns¶
DataFrame with columns: collocate, gloss, count_lemma1, count_lemma2 Sorted by count_lemma1 descending.
find_governing_prep(book_df, pos)
¶
Look back up to 3 positions from pos in book_df to find a governing preposition. Stops early if a verb, noun, or adjective is encountered.
Parameters¶
book_df : DataFrame A reset-indexed slice of the MACULA syntax table for a single book. pos : int Integer row position of the word whose governing prep is sought.
Returns¶
str Diacritics-stripped lemma of the governing preposition, or '(none)'.
inf_cst_by_prep(book=None)
¶
prep_by_book(lemma)
¶
prep_collocates(lemma, pos=None, top_n=20, book=None, book_group=None)
¶
Top words immediately following a preposition (direct object / NN head).
Uses word_num + 1 adjacency within the same verse. Skips conjunctions, particles, and other prepositions in the result unless pos is set.
Parameters¶
lemma : str Pointed Hebrew lemma of the preposition, e.g. 'לְ'. pos : str, optional Filter the following word by part-of-speech substring, e.g. 'noun'. top_n : int Number of collocates to return. book : str, optional book_group : str, optional
Returns¶
DataFrame with columns: collocate, pos, gloss, count
prep_distribution_table(lemmas=None)
¶
prep_frequency(book=None, book_group=None, top_n=20)
¶
Frequency table of all prepositions by lemma.
Parameters¶
book : str, optional Single OT book abbreviation (e.g. 'Gen'). book_group : str, optional One of 'Torah', 'Former Prophets', 'Writings', 'Latter Prophets'. top_n : int Number of rows to return (most frequent first).
Returns¶
DataFrame with columns: lemma, gloss, count, pct
reporting
¶
charts
¶
Matplotlib/seaborn chart helpers. All functions accept output_path to save PNG.
bar_chart(df, x, y='count', title='', xlabel='', ylabel='Count', top_n=20, output_path=None, figsize=_DEFAULT_FIGSIZE)
¶
Horizontal bar chart of a freq_table DataFrame.
grouped_bar(df, x, hue, y='count', title='', top_n=None, output_path=None, figsize=(14, 6))
¶
Grouped bar chart (e.g. stem counts per book).
heatmap(df, index, columns, values='count', title='', output_path=None, figsize=(14, 8))
¶
Pivot-based heatmap (e.g. tense × voice).
export
¶
HTML and CSV export utilities for berean-bible-bots analyses.
Provides two levels of export:
-
Low-level helpers — convert any DataFrame or analysis result to CSV or an HTML fragment / standalone page.
-
High-level exporters — one function per analysis type that collects all the relevant DataFrames, renders a styled standalone HTML report, and also writes companion CSV files.
Output directories¶
output/exports/csv/ — raw CSV files (one per table) output/exports/html/ — self-contained HTML reports (inline CSS, embedded charts)
Usage¶
from bible_grammar.reporting.export import ( export_csv, export_html_page, export_word_study, export_semantic_profile, export_genre_compare, export_divine_names, export_all, )
Single DataFrame → CSV¶
export_csv(df, 'my-table')
Full word study report¶
export_word_study('H7965') # שָׁלוֹם export_word_study('G3056') # λόγος
Genre comparison report¶
export_genre_compare('OT') export_genre_compare('NT')
Divine names report¶
export_divine_names()
Everything at once¶
export_all()
export_all(*, word_studies=None)
¶
Run all available exporters and return a dict of output paths.
list of Strongs numbers to export individually.
Defaults to the pre-generated semantic profiles.
export_csv(df, slug, *, subdir='')
¶
Write a DataFrame to CSV.
Parameters¶
df : DataFrame to export slug : filename stem (no extension) subdir: optional subdirectory under CSV_DIR
Returns the Path to the saved file.
export_divine_names(corpora=None)
¶
Export divine names analysis to CSV + HTML.
export_genre_compare(corpus='OT')
¶
Export genre comparison to CSV + HTML for all features.
export_html_page(sections, title, slug, *, subtitle='', source_note='STEPBible TAHOT/TAGNT/TALXX (CC BY 4.0, Tyndale House Cambridge)')
¶
Build and save a standalone HTML report.
Parameters¶
sections : list of dicts, each with keys: 'heading' : str (h2 level) 'subheading' : str (optional h3) 'text' : str (optional paragraph) 'df' : DataFrame to render as table (optional) 'chart' : path to PNG to embed inline (optional) 'html' : raw HTML fragment to insert verbatim (optional) 'pct_cols' : list of column names to format as % (optional) title : page
slug : output filename stem subtitle : optional subtitle line source_note : footer attribution
Returns the Path to the saved file.
export_semantic_profile(strongs)
¶
Export a semantic profile to HTML + CSV.
export_word_study(strongs, *, example_verses=5)
¶
Export a word study to CSV + HTML.
Returns dict with keys 'html', 'csv_by_book', 'csv_morphology', 'csv_collocates'.
profiles
¶
Per-book language profile reports.
Generates standardized one-page statistical summaries for any Bible book
- Word count, vocabulary richness (type-token ratio)
- Part-of-speech distribution vs. corpus average
- Verb stem breakdown (OT) or tense/voice distribution (NT)
- Top 20 most frequent lemmas (Strong's numbers)
- Hapax legomena count (words appearing only once in the book)
Usage¶
from bible_grammar.reporting.profiles import book_profile, print_profile, save_profile_report
In-memory dict of stats¶
profile = book_profile('Gen')
Print a formatted text summary to console¶
print_profile('Gen')
Save a markdown report¶
save_profile_report('Gen', 'reports/profiles/Gen_profile.md')
Batch: all OT books¶
from bible_grammar.core.reference import all_book_ids for book_id in all_book_ids('OT'): save_profile_report(book_id)
batch_profiles(testament=None, book_ids=None, output_dir=None)
¶
Generate profile reports for multiple books.
Parameters¶
testament : 'OT', 'NT', or None for all books book_ids : Explicit list of book IDs (overrides testament) output_dir: Directory for reports (default: reports/profiles/)
Returns list of saved file paths.
book_profile(book_id)
¶
Compute a complete statistical profile for a single Bible book.
Returns a dict with keys
book_id, book_name, testament, canonical_order, total_words, unique_strongs, hapax_count, ttr, pos_distribution, verb_detail, top_lemmas, baseline_delta
print_profile(book_id)
¶
Print a formatted profile to stdout.
theological_reports
¶
Pre-built theological study reports using the word-trajectory pipeline.
Generates curated cross-testament lexical studies for a standard set of theologically significant Hebrew/Greek terms.
Public API ────────── run_theological_report(term_key, ...) → report dict run_all_theological_reports(...) → list of paths print_all_trajectories(...) → terminal survey THEOLOGICAL_TRAJECTORIES → dict of term entries
print_all_trajectories(keys=None)
¶
Print a terminal survey of all (or selected) theological trajectories.
print_theological_summary()
¶
Print a compact cross-testament summary table for all terms.
run_all_theological_reports(*, output_dir='output/reports/theological', keys=None)
¶
Generate trajectory reports for all (or selected) theological terms.
Parameters¶
output_dir : directory for output keys : optional subset of THEOLOGICAL_TRAJECTORIES keys
Returns list of Markdown file paths.
run_theological_report(term_key, *, output_dir='output/reports/theological')
¶
Generate a full trajectory report for one theological term.
Parameters¶
term_key : key from THEOLOGICAL_TRAJECTORIES (e.g. 'shalom', 'ruach') output_dir : directory for Markdown + chart output
Returns the term entry dict with added 'report_path' and 'trajectory' keys.
theological_summary_table()
¶
Return a DataFrame summarising all theological term trajectories: term | strongs | ot_total | lxx_total | nt_total | continuity
stems
¶
hiphil
¶
Hiphil verb morphology analysis for Biblical Hebrew instruction.
The Hiphil (הִפְעִיל) is the causative-active stem of Biblical Hebrew. It typically: • Causes an action: הֵבִיא (he brought) ← causative of בּוֹא (to come) • Declares a state: הִצְדִּיק (declared righteous) ← declarative • Produces an effect: הִמְטִיר (caused to rain) ← factitive / denominative • Has unique roots that almost exclusively appear in the Hiphil
All functions use MACULA Hebrew WLC data (load_syntax_ot).
Public API ────────── hiphil_data() → DataFrame (all Hiphil tokens) hiphil_conjugation_profile(book=None) → DataFrame (type_ distribution) hiphil_top_roots(n=30, book=None) → DataFrame (most frequent roots) hiphil_root_conjugation(roots=None) → DataFrame (root × conjugation crosstab) hiphil_book_distribution() → DataFrame (count + pct per book) hiphil_stem_comparison(books=None) → DataFrame (all stems % by book) hiphil_dominant_roots(min_pct=70) → DataFrame (roots >X% Hiphil) hiphil_semantic_categories() → DataFrame (semantic function counts)
print_hiphil_overview() → None print_hiphil_conjugation(book=None) → None print_hiphil_top_roots(n=20, book=None) → None print_hiphil_root_conjugation(roots=None) → None print_hiphil_book_distribution() → None print_hiphil_dominant_roots() → None print_hiphil_semantic_categories() → None
hiphil_conjugation_chart(book=None) → Path | None hiphil_book_chart() → Path | None hiphil_stem_chart(books=None) → Path | None hiphil_root_heatmap(top_n=15) → Path | None
hiphil_report(output_dir=None) → Path (full Markdown report)
hiphil_object_verbs(book=None)
¶
Alias kept for backwards compatibility — use hiphil_top_roots() instead.
hiphil_report(output_dir=None)
¶
Generate a complete Hiphil morphology report (Markdown + PNG charts).
Saves
output/reports/ot/verbs/hiphil_report.md output/charts/ot/verbs/hiphil_*.png
Returns path to the Markdown report.
hithpael
¶
Hithpael verb morphology analysis for Biblical Hebrew instruction.
The Hithpael (הִתְפַּעֵל) is the reflexive-intensive stem of Biblical Hebrew. It typically: • Expresses a reflexive of the Piel: הִתְקַדֵּשׁ (consecrated oneself) ← Piel קִדֵּשׁ • Expresses a reciprocal action: הִתְרָאוּ (saw one another) • Expresses an iterative/frequentative: הִתְהַלֵּךְ (walked about, walked continually) • Expresses a tolerative/reflexive: הִתְמַכֵּר (sold oneself) • Has a denominative sense for some roots: הִתְנַבֵּא (acted as a prophet)
All functions use MACULA Hebrew WLC data (load_syntax_ot).
Public API ────────── hithpael_data() → DataFrame (all Hithpael tokens) hithpael_conjugation_profile(book=None) → DataFrame (type_ distribution) hithpael_top_roots(n=30, book=None) → DataFrame (most frequent roots) hithpael_root_conjugation(roots=None) → DataFrame (root × conjugation crosstab) hithpael_book_distribution() → DataFrame (count + pct per book) hithpael_stem_comparison(books=None) → DataFrame (all stems % by book) hithpael_dominant_roots(min_pct=70) → DataFrame (roots >X% Hithpael) hithpael_semantic_categories() → DataFrame (semantic function counts)
print_hithpael_overview() → None print_hithpael_conjugation(book=None) → None print_hithpael_top_roots(n=20, book=None) → None print_hithpael_root_conjugation(roots=None) → None print_hithpael_book_distribution() → None print_hithpael_dominant_roots() → None print_hithpael_semantic_categories() → None
hithpael_conjugation_chart(book=None) → Path | None hithpael_book_chart() → Path | None hithpael_stem_chart(books=None) → Path | None hithpael_root_heatmap(top_n=15) → Path | None
hithpael_report(output_dir=None) → Path (full Markdown report)
hophal
¶
Hophal verb morphology analysis for Biblical Hebrew instruction.
The Hophal (הָפְעַל) is the passive of the Hiphil stem. It typically: • Expresses the passive of the Hiphil causative: הוּבָא (was brought) ← Hiphil הֵבִיא • Expresses a causative-passive state: הוּמַת (was put to death) ← Hiphil הֵמִית • Expresses a causative-passive result: הֻגַּד (was told / reported) ← Hiphil הִגִּיד
It is the least common of the seven major stems (~480 tokens), largely because the Niphal covers most simple passive needs and the Hophal is restricted to roots where the Hiphil causative is already established.
All functions use MACULA Hebrew WLC data (load_syntax_ot).
Public API ────────── hophal_data() → DataFrame (all Hophal tokens) hophal_conjugation_profile(book=None) → DataFrame (type_ distribution) hophal_top_roots(n=30, book=None) → DataFrame (most frequent roots) hophal_root_conjugation(roots=None) → DataFrame (root × conjugation crosstab) hophal_book_distribution() → DataFrame (count + pct per book) hophal_stem_comparison(books=None) → DataFrame (all stems % by book) hophal_dominant_roots(min_pct=70) → DataFrame (roots >X% Hophal) hophal_semantic_categories() → DataFrame (semantic function counts)
print_hophal_overview() → None print_hophal_conjugation(book=None) → None print_hophal_top_roots(n=20, book=None) → None print_hophal_root_conjugation(roots=None) → None print_hophal_book_distribution() → None print_hophal_dominant_roots() → None print_hophal_semantic_categories() → None
hophal_conjugation_chart(book=None) → Path | None hophal_book_chart() → Path | None hophal_stem_chart(books=None) → Path | None hophal_root_heatmap(top_n=15) → Path | None
hophal_report(output_dir=None) → Path (full Markdown report)
niphal
¶
Niphal verb morphology analysis for Biblical Hebrew instruction.
The Niphal (נִפְעַל) is the reflexive-passive stem of Biblical Hebrew. It typically: • Expresses the passive of the Qal: נִשְׁמַר (was kept) ← Qal שָׁמַר (kept) • Expresses a reflexive action: נִלְחַם (fought) ← idea of fighting for oneself • Expresses a reciprocal action: נוֹעֲדוּ (met each other) ← idea of mutual meeting • Expresses a tolerative: נִמְכַּר (allowed himself to be sold) • Has a middle / stative sense: נִכְלַם (felt ashamed)
All functions use MACULA Hebrew WLC data (load_syntax_ot).
Public API ────────── niphal_data() → DataFrame (all Niphal tokens) niphal_conjugation_profile(book=None) → DataFrame (type_ distribution) niphal_top_roots(n=30, book=None) → DataFrame (most frequent roots) niphal_root_conjugation(roots=None) → DataFrame (root × conjugation crosstab) niphal_book_distribution() → DataFrame (count + pct per book) niphal_stem_comparison(books=None) → DataFrame (all stems % by book) niphal_dominant_roots(min_pct=70) → DataFrame (roots >X% Niphal) niphal_semantic_categories() → DataFrame (semantic function counts)
print_niphal_overview() → None print_niphal_conjugation(book=None) → None print_niphal_top_roots(n=20, book=None) → None print_niphal_root_conjugation(roots=None) → None print_niphal_book_distribution() → None print_niphal_dominant_roots() → None print_niphal_semantic_categories() → None
niphal_conjugation_chart(book=None) → Path | None niphal_book_chart() → Path | None niphal_stem_chart(books=None) → Path | None niphal_root_heatmap(top_n=15) → Path | None
niphal_report(output_dir=None) → Path (full Markdown report)
niphal_report(output_dir=None)
¶
Generate a complete Niphal morphology report (Markdown + PNG charts).
Returns path to the Markdown report.
piel
¶
Piel verb morphology analysis for Biblical Hebrew instruction.
The Piel (פִּעֵל) is the intensive-active stem of Biblical Hebrew. It typically: • Expresses an intensive action: שִׁבֵּר (shattered) ← Qal שָׁבַר (broke) • Expresses a factitive (makes state into action): קִדֵּשׁ (made holy) ← קָדוֹשׁ (holy) • Expresses a declarative: הִצְדִּיק → Piel צִדֵּק (declared righteous) • Expresses a denominative (verb from noun): דִּבֵּר (spoke) ← דָּבָר (word) • Has a simple active sense for some roots that rarely appear in Qal
All functions use MACULA Hebrew WLC data (load_syntax_ot).
Public API ────────── piel_data() → DataFrame (all Piel tokens) piel_conjugation_profile(book=None) → DataFrame (type_ distribution) piel_top_roots(n=30, book=None) → DataFrame (most frequent roots) piel_root_conjugation(roots=None) → DataFrame (root × conjugation crosstab) piel_book_distribution() → DataFrame (count + pct per book) piel_stem_comparison(books=None) → DataFrame (all stems % by book) piel_dominant_roots(min_pct=70) → DataFrame (roots >X% Piel) piel_semantic_categories() → DataFrame (semantic function counts)
print_piel_overview() → None print_piel_conjugation(book=None) → None print_piel_top_roots(n=20, book=None) → None print_piel_root_conjugation(roots=None) → None print_piel_book_distribution() → None print_piel_dominant_roots() → None print_piel_semantic_categories() → None
piel_conjugation_chart(book=None) → Path | None piel_book_chart() → Path | None piel_stem_chart(books=None) → Path | None piel_root_heatmap(top_n=15) → Path | None
piel_report(output_dir=None) → Path (full Markdown report)
piel_report(output_dir=None)
¶
Generate a complete Piel morphology report (Markdown + PNG charts).
pual
¶
Pual verb morphology analysis for Biblical Hebrew instruction.
The Pual (פֻּעַל) is the passive of the Piel stem. It typically: • Expresses the passive of the Piel intensive: שֻׁבַּר (was shattered) ← Piel שִׁבֵּר • Expresses a passive factitive state: קֻדַּשׁ (was made holy / was consecrated) • Expresses a passive declarative: צֻדַּק (was declared righteous) • Is far less common than the Piel (~450 tokens vs. ~6,500)
All functions use MACULA Hebrew WLC data (load_syntax_ot).
Public API ────────── pual_data() → DataFrame (all Pual tokens) pual_conjugation_profile(book=None) → DataFrame (type_ distribution) pual_top_roots(n=30, book=None) → DataFrame (most frequent roots) pual_root_conjugation(roots=None) → DataFrame (root × conjugation crosstab) pual_book_distribution() → DataFrame (count + pct per book) pual_stem_comparison(books=None) → DataFrame (all stems % by book) pual_dominant_roots(min_pct=70) → DataFrame (roots >X% Pual) pual_semantic_categories() → DataFrame (semantic function counts)
print_pual_overview() → None print_pual_conjugation(book=None) → None print_pual_top_roots(n=20, book=None) → None print_pual_root_conjugation(roots=None) → None print_pual_book_distribution() → None print_pual_dominant_roots() → None print_pual_semantic_categories() → None
pual_conjugation_chart(book=None) → Path | None pual_book_chart() → Path | None pual_stem_chart(books=None) → Path | None pual_root_heatmap(top_n=15) → Path | None
pual_report(output_dir=None) → Path (full Markdown report)
qal
¶
Qal verb morphology analysis for Biblical Hebrew instruction.
The Qal (קַל) is the base stem of Biblical Hebrew — the simplest, most common, and most semantically transparent verb form. It accounts for roughly half of all OT verb tokens. Every other stem (Niphal, Piel, Pual, Hiphil, Hophal, Hithpael) is derived from it.
Semantic range of the Qal
• Simple action (active): כָּתַב (wrote), אָכַל (ate), הָלַךְ (went) • Simple state (stative): יָדַע (knew), אָהַב (loved), כָּבֵד (was heavy) • Fientive (dynamic) vs. stative verbs behave differently in pointing
All functions use MACULA Hebrew WLC data (load_syntax_ot).
Public API ────────── qal_data() → DataFrame (all Qal tokens) qal_conjugation_profile(book=None) → DataFrame (type_ distribution) qal_top_roots(n=30, book=None) → DataFrame (most frequent roots) qal_root_conjugation(roots=None) → DataFrame (root × conjugation crosstab) qal_book_distribution() → DataFrame (count + pct per book) qal_stem_comparison(books=None) → DataFrame (all stems % by book) qal_dominant_roots(min_pct=70) → DataFrame (roots >X% Qal) qal_semantic_categories() → DataFrame (semantic function counts)
print_qal_overview() → None print_qal_conjugation(book=None) → None print_qal_top_roots(n=20, book=None) → None print_qal_root_conjugation(roots=None) → None print_qal_book_distribution() → None print_qal_dominant_roots() → None print_qal_semantic_categories() → None
qal_conjugation_chart(book=None) → Path | None qal_book_chart() → Path | None qal_stem_chart(books=None) → Path | None qal_root_heatmap(top_n=15) → Path | None
qal_report(output_dir=None) → Path (full Markdown report)
qal_report(output_dir=None)
¶
Generate a complete Qal morphology report (Markdown + PNG charts).
verbal_syntax
¶
Hebrew verbal syntax analysis for 2nd-year Biblical Hebrew study.
Focuses on the syntactic and discourse-level behaviour of the Hebrew verb system
• verb_form_profile — distribution of all conjugation types by book / passage • wayyiqtol_chains — find and describe consecutive wayyiqtol sequences • infinitive_usage — inf construct vs. absolute with governing prepositions • clause_type_profile — nominal vs. verbal clause ratio; question / negation / relative • stem_distribution — Qal/Nif/Piel/Hif/Hitpael etc. by book
All functions use the MACULA Hebrew WLC data (syntax_ot) which provides
word-level type_ (wayyiqtol, qatal, yiqtol, …) and role (v, s, o, adv, p)
annotations — more reliable than the raw TAHOT morphology codes for syntactic work.
Sub-modules¶
verb_forms — verb_form_profile, wayyiqtol_chains, stem_distribution, aspect_comparison, GENRE_SETS clause_types — clause_type_profile infinitives — infinitive_usage disjunctive — disjunctive_clauses, disjunctive_in_chains conditionals — conditional_clauses, conditional_summary relative_clauses — relative_clauses, relative_clause_summary particles — discourse_particles, discourse_particle_summary
aspect_comparison(books, chapter=None)
¶
Build a side-by-side verb form profile for multiple books.
Returns a DataFrame indexed by verb form with one column per book (count, pct).
aspect_comparison_chart(books, chapter=None, *, output_path=None)
¶
Save a grouped bar chart comparing verb form percentages across books.
clause_type_profile(book)
¶
Compute clause-type statistics for a book.
Returns a DataFrame with columns: feature, count, per_100_verses. Features: verbal_clauses, nominal_clauses, negations, conditionals, relative_clauses, questions, total verses.
conditional_clauses(book, chapter=None)
¶
Find all conditional clauses in a book or chapter.
chapter, verse, particle, particle_label,
protasis_verb_text, protasis_verb_form, protasis_verb_stem, condition_type, verse_text.
conditional_summary(book)
¶
Return a summary DataFrame of conditional clause type counts for a book. Columns: condition_type, count, pct.
discourse_particle_summary(book)
¶
Return a summary DataFrame of discourse particle function counts for a book. Columns: particle_label, discourse_function, count, pct_of_particle.
discourse_particles(book, chapter=None, *, particles=None)
¶
Tag all discourse particle tokens in a book or chapter.
chapter, verse, particle_label, particle_text,
discourse_function, following_text, verse_text.
disjunctive_clauses(book, chapter=None)
¶
Find all disjunctive (noun/subject-first) clauses in a book or chapter.
chapter, verse, opener_text, opener_class, opener_type,
leading_conj, verb_form, discourse_function, full_text.
disjunctive_in_chains(book, chapter)
¶
Cross-reference wayyiqtol chains with disjunctive clauses in a chapter.
Returns a list of dicts describing each wayyiqtol chain, annotated with any disjunctive clauses that appear WITHIN or IMMEDIATELY AFTER the chain.
infinitive_usage(book)
¶
Analyse infinitive construct and absolute usage in a book.
Returns a dict with keys
inf_cst_total, inf_abs_total, inf_cst_by_prep, inf_cst_by_role, inf_abs_examples, inf_cst_examples
print_aspect_comparison(books, chapter=None, *, show_counts=False)
¶
Print a side-by-side verb form percentage comparison for multiple books.
print_clause_type_profile(book)
¶
Print a formatted clause-type profile for a book.
print_conditional_clauses(book, chapter=None, *, max_rows=40)
¶
Print a formatted table of conditional clauses in a book or chapter.
print_conditional_summary(book)
¶
Print a compact summary of conditional clause types for a book.
print_discourse_particles(book, chapter=None, *, particles=None, max_rows=50, omit_waw=True)
¶
Print a formatted report of discourse particles in a book or chapter.
print_disjunctive_clauses(book, chapter=None, *, max_rows=40)
¶
Print a formatted list of disjunctive clauses in a book or chapter.
print_disjunctive_in_chains(book, chapter)
¶
Print wayyiqtol chains annotated with interrupting disjunctive clauses.
print_infinitive_usage(book)
¶
Print a formatted infinitive usage analysis for a book.
print_particle_summary(book)
¶
Print a compact summary of all discourse particle functions for a book.
print_relative_clauses(book, chapter=None, *, max_rows=40)
¶
Print a formatted table of relative clauses in a book or chapter.
print_relative_summary(book)
¶
Print a compact summary of relative clause types for a book.
print_stem_distribution(book)
¶
Print a formatted stem distribution for a book.
print_verb_form_profile(book, chapter=None)
¶
Print a formatted verb form profile for a book or chapter.
print_wayyiqtol_chains(book, chapter)
¶
Print a formatted wayyiqtol chain analysis for a chapter.
relative_clause_summary(book)
¶
Return a cross-tabulation of inferred role × verb form for a book.
stem_chart(book, *, output_path=None)
¶
Horizontal bar chart of verb stem distribution. Returns Path or None.
stem_distribution(book)
¶
Count verb tokens by stem (binyan) for a book. Returns DataFrame: stem, count, pct.
verb_form_chart(book, chapter=None, *, output_path=None)
¶
Bar chart of verb form distribution. Returns Path or None.
verb_form_profile(book, chapter=None)
¶
Count occurrences of each Hebrew verb conjugation type in a book or chapter.
Returns a DataFrame with columns: form, count, pct. Rows are ordered by VERB_FORM_ORDER (wayyiqtol → inf.abs).
verbal_syntax_report(book, *, output_dir='output/reports/ot/verbs')
¶
Generate a Markdown report covering all five verbal syntax analyses for a book. Returns path to the saved file.
wayyiqtol_chains(book, chapter)
¶
Identify wayyiqtol chains in a chapter.
Returns a list of chain dicts, each with: start_verse, end_verse, length, verbs (list), break_type, break_form.
clause_types
¶
Clause type profile and nominal vs. verbal clause analysis.
clause_type_profile(book)
¶
Compute clause-type statistics for a book.
Returns a DataFrame with columns: feature, count, per_100_verses. Features: verbal_clauses, nominal_clauses, negations, conditionals, relative_clauses, questions, total verses.
print_clause_type_profile(book)
¶
Print a formatted clause-type profile for a book.
conditionals
¶
Conditional clause (אִם / לוּ / לוּלֵא) analysis.
conditional_clauses(book, chapter=None)
¶
Find all conditional clauses in a book or chapter.
chapter, verse, particle, particle_label,
protasis_verb_text, protasis_verb_form, protasis_verb_stem, condition_type, verse_text.
conditional_summary(book)
¶
Return a summary DataFrame of conditional clause type counts for a book. Columns: condition_type, count, pct.
print_conditional_clauses(book, chapter=None, *, max_rows=40)
¶
Print a formatted table of conditional clauses in a book or chapter.
print_conditional_summary(book)
¶
Print a compact summary of conditional clause types for a book.
disjunctive
¶
Disjunctive (noun/subject-first) clause analysis.
disjunctive_clauses(book, chapter=None)
¶
Find all disjunctive (noun/subject-first) clauses in a book or chapter.
chapter, verse, opener_text, opener_class, opener_type,
leading_conj, verb_form, discourse_function, full_text.
disjunctive_in_chains(book, chapter)
¶
Cross-reference wayyiqtol chains with disjunctive clauses in a chapter.
Returns a list of dicts describing each wayyiqtol chain, annotated with any disjunctive clauses that appear WITHIN or IMMEDIATELY AFTER the chain.
print_disjunctive_clauses(book, chapter=None, *, max_rows=40)
¶
Print a formatted list of disjunctive clauses in a book or chapter.
print_disjunctive_in_chains(book, chapter)
¶
Print wayyiqtol chains annotated with interrupting disjunctive clauses.
infinitives
¶
Infinitive construct and absolute usage analysis.
particles
¶
Discourse particle tagging and analysis.
discourse_particle_summary(book)
¶
Return a summary DataFrame of discourse particle function counts for a book. Columns: particle_label, discourse_function, count, pct_of_particle.
discourse_particles(book, chapter=None, *, particles=None)
¶
Tag all discourse particle tokens in a book or chapter.
chapter, verse, particle_label, particle_text,
discourse_function, following_text, verse_text.
print_discourse_particles(book, chapter=None, *, particles=None, max_rows=50, omit_waw=True)
¶
Print a formatted report of discourse particles in a book or chapter.
print_particle_summary(book)
¶
Print a compact summary of all discourse particle functions for a book.
relative_clauses
¶
Relative clause (אֲשֶׁר / שֶׁ / דִּי) analysis.
print_relative_clauses(book, chapter=None, *, max_rows=40)
¶
Print a formatted table of relative clauses in a book or chapter.
print_relative_summary(book)
¶
Print a compact summary of relative clause types for a book.
relative_clause_summary(book)
¶
Return a cross-tabulation of inferred role × verb form for a book.
relative_clauses(book, chapter=None, *, markers=None)
¶
Find all relative clauses in a book or chapter.
chapter, verse, marker, antecedent_text, antecedent_class,
inferred_role, rel_verb_form, rel_verb_text, verse_text.
verb_forms
¶
Verb form profile, wayyiqtol chains, stem distribution, and aspect comparison.
aspect_comparison(books, chapter=None)
¶
Build a side-by-side verb form profile for multiple books.
Returns a DataFrame indexed by verb form with one column per book (count, pct).
aspect_comparison_chart(books, chapter=None, *, output_path=None)
¶
Save a grouped bar chart comparing verb form percentages across books.
print_aspect_comparison(books, chapter=None, *, show_counts=False)
¶
Print a side-by-side verb form percentage comparison for multiple books.
print_stem_distribution(book)
¶
Print a formatted stem distribution for a book.
print_verb_form_profile(book, chapter=None)
¶
Print a formatted verb form profile for a book or chapter.
print_wayyiqtol_chains(book, chapter)
¶
Print a formatted wayyiqtol chain analysis for a chapter.
stem_chart(book, *, output_path=None)
¶
Horizontal bar chart of verb stem distribution. Returns Path or None.
stem_distribution(book)
¶
Count verb tokens by stem (binyan) for a book. Returns DataFrame: stem, count, pct.
verb_form_chart(book, chapter=None, *, output_path=None)
¶
Bar chart of verb form distribution. Returns Path or None.
verb_form_profile(book, chapter=None)
¶
Count occurrences of each Hebrew verb conjugation type in a book or chapter.
Returns a DataFrame with columns: form, count, pct. Rows are ordered by VERB_FORM_ORDER (wayyiqtol → inf.abs).
wayyiqtol_chains(book, chapter)
¶
Identify wayyiqtol chains in a chapter.
Returns a list of chain dicts, each with: start_verse, end_verse, length, verbs (list), break_type, break_form.