Digital HumanitiesResearch MethodsUGC NET English

Digital Humanities

What happens when computers meet literary criticism? Digital Humanities is the field that asks that question. It uses data, algorithms, and digital tools to study literature and culture at scales no single reader could reach — while also asking what computers do to how we read, write, and understand the human self.

What Is Digital Humanities?

Start with a simple problem. The history of world literature involves tens of millions of texts written in hundreds of languages over thousands of years. No scholar can read all of it. Traditional literary criticism copes by focusing on a small set of texts decided to be worth reading — the canon. But that means literary history is actually the history of a tiny, carefully filtered selection.

Digital Humanities asks: what if we used computers to study the rest? What if, instead of reading a handful of novels intensively, we analysed thousands of novels computationally — finding patterns across the whole field that no single reader could see?

That is one side of Digital Humanities — the quantitative side. The other side asks a different question: what do computers do to literature itself? When novels are written for screens, when readers navigate hypertext rather than turning pages, when algorithms generate poetry — what happens to literary meaning?

Why it appears in two UGC NET units:

  • Research Methods unit: TEI encoding, corpus linguistics, digital archives, computational stylistics — these are methods that researchers use.
  • Theory unit: distant reading (Moretti), posthumanism (Hayles), hypertext and electronic literature — these are theoretical arguments about what literature is and how it works in the digital age.

Key Thinkers

📊

Franco Moretti

Distant reading & quantitative literary history

'Conjectures on World Literature' (2000); Graphs, Maps, Trees (2005); Distant Reading (2013)

Coined 'distant reading.' Proposed using statistical graphs, geographic maps, and evolutionary trees to study literary history at the scale of thousands of texts — not the individual masterwork.

💻

N. Katherine Hayles

Posthumanism, technotext & electronic literature

How We Became Posthuman (1999); Writing Machines (2002); Electronic Literature (2008)

Argued that digital media change how we read and write. She also argued they change how we understand human identity itself. Coined 'technotext' — works where the medium is part of the meaning, not just a container for it.

🖥️

Lev Manovich

Cultural analytics & the logic of new media

The Language of New Media (2001); Cultural Analytics (2020)

Identified five principles of new media. Founded 'cultural analytics' — the use of computational methods to study culture at scale. His Software Studies lab visualised millions of images and pages.

🔗

Ted Nelson

Hypertext theory

Literary Machines (1981); Computer Lib / Dream Machines (1974)

Coined 'hypertext' in 1963. Proposed Project Xanadu — a non-linear, two-way linked global information system. His ideas directly shaped Tim Berners-Lee's invention of the World Wide Web.

📈

Matthew Jockers

Macroanalysis & computational stylistics

Macroanalysis (2013); Text Analysis with R (2014)

Used machine learning to analyse 3,500 nineteenth-century novels — tracking theme, style, and influence across a corpus too large for any single reader. Pioneered computational stylistics.

⚙️

Stephen Ramsay

Algorithmic criticism

Reading Machines: Toward an Algorithmic Criticism (2011)

Argued that algorithms are hermeneutic tools — ways of 'deforming' texts to reveal patterns invisible to close reading. Proposed that building digital tools is itself a form of humanistic argument.

Key Concepts

Each concept starts with a simple everyday idea before the technical definition.

📊

Distant Reading

Franco Moretti — 'Conjectures on World Literature' (2000); Graphs, Maps, Trees (2005)

💬

Start Here — Simple Idea

Imagine trying to understand the history of the novel by reading one masterpiece at a time. You could read for your entire life and cover maybe a few hundred books. But the nineteenth century alone produced tens of thousands of novels in dozens of languages. You will never read them all. Franco Moretti asked: what if we stopped reading individual books and instead looked at patterns across thousands of them at once? That is distant reading — stepping back so far that the individual book disappears and the large pattern becomes visible.

📌

Definition

Distant reading is Franco Moretti's term for the analysis of literary history through large-scale computational data — graphs of genre trends over time, maps of where novels are set, evolutionary trees of narrative forms — rather than through close reading of individual texts. It treats literature as a system of thousands of texts and asks what patterns only become visible at that scale.

Explanation

Moretti's starting point is a paradox. Literary studies claims to study 'literature.' But the canon of texts actually taught and analysed is a tiny fraction of all the literature ever produced. The novels scholars read and write about are maybe 0.5% of all novels published. What about the other 99.5%? Distant reading proposes to study that larger field. Three main methods in Moretti's work: 1. Graphs: plotting the rise and fall of novel genres over time reveals patterns — why did the epistolary novel disappear in the 1820s? Why did detective fiction peak in the 1890s? Statistical graphs show these macro-historical shifts. 2. Maps: plotting where novels are set, where characters travel, and where literature is published reveals the geography of literary culture — which cities appear, which regions are absent, what the spatial imagination of a period looks like. 3. Trees: borrowing from evolutionary biology, Moretti maps how literary forms branch, compete, and die — why do some narrative devices spread across genres while others disappear? The key controversy: Moretti does not actually read the thousands of novels himself. He works with summaries, databases, and research assistants. His critics — especially scholars who believe in reading texts slowly and carefully, word by word — argue that this produces statistical noise, not literary insight. His defenders argue that it reveals genuine structures of literary history that close reading simply cannot see. For UGC NET: Moretti is the exam's central Digital Humanities figure. Know 'distant reading,' Graphs Maps Trees (2005), and the contrast with close reading.

💡 Indian & Literary Examples

Moretti's most famous finding from distant reading: he analysed the titles of 7,000 British novels published between 1740 and 1850 and found that titles shortened dramatically over this period — from long descriptive titles (The Life and Strange Surprising Adventures of Robinson Crusoe...) to single-word or short titles (Emma, Waverley, Ivanhoe). This shift happened not because authors decided to change style but because of market pressures from circulating libraries. No scholar reading individual novels one by one would have noticed this pattern — it only emerges from the dataset. Indian application: A distant reading of Indian English novels published between 1947 and 2000 could map how certain themes (Partition, urban modernity, diaspora) rose and fell in frequency across decades — revealing the large-scale patterns of postcolonial literary history that are invisible when you study Rushdie or Anita Desai one at a time.

📝

Text Encoding Initiative (TEI)

Founded 1987 — international standard for digital text encoding in XML

💬

Start Here — Simple Idea

Imagine you are reading a Shakespeare play. The words are one thing — but there is so much more: which character is speaking, which lines are stage directions, which words appear differently in the First Folio versus the Second Quarto, which passages scholars have annotated. If you want to put this play on a computer so researchers can search it, compare editions, and add annotations — how do you record all of that information? You need a system that goes beyond just typing the words. TEI is that system. It is a standard way of tagging texts so computers can understand not just what the words say but what they mean structurally.

📌

Definition

The Text Encoding Initiative (TEI) is an international, community-maintained standard for encoding literary, linguistic, and historical texts in XML (eXtensible Markup Language). TEI allows scholars to represent not just the text itself but its structure, variants, annotations, and editorial decisions in a machine-readable, interoperable format suitable for digital archives, searchable databases, and computational analysis.

Explanation

XML (eXtensible Markup Language) is a way of tagging information so that a computer can understand its structure. HTML — the language of web pages — is a simplified version of this idea. TEI adapts XML specifically for humanities texts. A TEI-encoded text can record: - The text itself, word by word - Who speaks each line (for plays and dialogues) - Structural divisions: chapters, stanzas, acts, scenes - Variant readings across different editions or manuscripts - Scholarly annotations and editorial notes - Named entities: people, places, dates mentioned in the text - Corrections, deletions, and additions in manuscripts Why does this matter for research? 1. Searchability: a TEI-encoded corpus of Victorian novels can be searched for every mention of a specific word, character type, or narrative device across thousands of texts simultaneously 2. Interoperability: TEI files from different projects can be combined and analysed together because they follow the same standard 3. Preservation: TEI encoding separates the text from any specific software — a TEI file from 1990 is still readable today, while a file created in a proprietary word processor from 1990 may be inaccessible 4. Scholarly transparency: TEI encoding makes editorial decisions explicit — every choice is documented in the markup Key TEI projects: the Folger Shakespeare Library's digital editions, the Women Writers Project (encoding women's writing from 1400–1850), the Walt Whitman Archive.

💡 Indian & Literary Examples

Example of TEI in practice: The Bichitra project at Jadavpur University is a digital variorum edition of Tagore's complete works — encoding his Bengali manuscripts, published editions, and English translations in TEI. A researcher can click on any word in a Tagore poem and see how it appeared in different editions, what manuscript variants exist, and what the English translation says. This would be impossible without TEI encoding. For UGC NET Research Methods: TEI is the most exam-critical technical concept in Digital Humanities. Know: XML, TEI, digital edition, variorum, interoperability.

🔗

Hypertext & Electronic Literature

Ted Nelson (coined 'hypertext,' 1963); N. Katherine Hayles (Electronic Literature, 2008)

💬

Start Here — Simple Idea

Every book you have ever read goes in one direction: page 1, page 2, page 3, to the end. The author controls the path. But what if a story had forks — choices at every paragraph that take you to a completely different next paragraph? What if the reader could determine the story's path? What if the story linked outward to other texts, images, and sounds? That is hypertext: non-linear text where the reader navigates through links rather than following a single fixed path.

📌

Definition

Hypertext is non-linear text with embedded links (hyperlinks) that allow readers to navigate between different sections, texts, or media in a non-sequential order determined by the reader's choices. Electronic literature is literature created specifically for and inseparable from digital environments — hypertext fiction, interactive narratives, code poetry, and other forms where the computational medium is integral to the work's meaning.

Explanation

Ted Nelson coined 'hypertext' in 1963, years before the internet existed. His vision — Project Xanadu — was a global information network where any text could link to any other text. The links would work in both directions. They would be preserved forever. No single organisation would control the network. The World Wide Web, invented by Tim Berners-Lee in 1989, was a simpler, one-way implementation of Nelson's vision. For literary theory, hypertext matters because it changes the fundamental relationship between author, text, and reader: - In print, the author controls the sequence: the reader follows the author's path - In hypertext, the author creates a network of possibilities: the reader chooses the path - This connects to Roland Barthes's famous distinction. Barthes divided texts into two types: 'readerly' (the reader passively follows a fixed path) and 'writerly' (the reader actively produces meaning). Hypertext makes the writerly text literal — the reader's choices determine what text they read. - It also connects to the idea of intertextuality. Intertextuality means that every text refers to, echoes, and links to other texts. Hypertext makes those invisible links visible and clickable. Key works of electronic literature: - Michael Joyce, afternoon, a story (1987): the first published hypertext novel; 539 lexias (text nodes) connected by 950 links; no fixed beginning or end - Shelley Jackson, Patchwork Girl (1995): a hypertext retelling of Frankenstein; the body of the monster as the body of the text — fragmented, assembled, linked - Mark Z. Danielewski, House of Leaves (2000): a print novel that simulates hypertext through footnotes, multiple parallel texts, and non-linear navigation — the most influential 'print hypertext' N. Katherine Hayles's Electronic Literature (2008) maps the full field of born-digital literature: works that cannot be meaningfully printed or read outside their digital medium.

💡 Indian & Literary Examples

Indian example: The digital poetry of Amarendra Chakrabarti and others working in Indian languages has begun to use hypertext and multimedia to create works that link Bengali or Hindi poetry to sound, image, and translation simultaneously — forms impossible in print. The e-lit movement in India is small but growing. Literary connection: Salman Rushdie's Midnight's Children (1981) is often cited as a print text with hypertext qualities — its digressions, nested stories, and multiple narrative threads simulate the non-linearity that electronic hypertext literalises. The novel reads like a text that wants to link outward in all directions at once.

📈

Corpus Linguistics & Computational Stylistics

Matthew Jockers — Macroanalysis (2013); Ted Underwood — Distant Horizons (2019)

💬

Start Here — Simple Idea

Suppose you want to know whether Jane Austen's writing style changed between her early and late novels. You could read all six novels carefully and note your impressions — but your impressions would be subjective. Or you could use a computer to count exactly how often she uses certain words, sentence structures, and grammatical patterns in each novel — and then compare the numbers. That is computational stylistics: using counting and statistics to study style with a precision that human reading alone cannot achieve.

📌

Definition

Corpus linguistics is the study of language through large electronically searchable text collections (corpora). Computational stylistics applies statistical and computational methods to the analysis of literary style — measuring word frequency, sentence length, vocabulary richness, and syntactic patterns across texts to identify stylistic signatures, track change over time, and test claims about authorship or influence.

Explanation

The key insight of computational stylistics: style is not just what you notice when you read — it is also produced by patterns of word choice so subtle and frequent that no human reader consciously registers them. The most stylistically distinctive words in an author's work are often not unusual vocabulary but the most common words — 'the,' 'of,' 'and,' 'but' — used at distinctive frequencies. Major applications: 1. Authorship attribution: can a computer determine who wrote an anonymous text by analysing its stylistic patterns? The most famous case: Patrick Juola used computational stylistics to identify J.K. Rowling as the author of The Cuckoo's Calling (2013), published pseudonymously as Robert Galbraith. The technique compares word frequency distributions across known and unknown texts. 2. Tracking literary change: Ted Underwood's Distant Horizons (2019) uses predictive models trained on thousands of texts to show how the concept of 'literary prestige' changed over 200 years — which features made a text likely to be reviewed, anthologised, or remembered. 3. Genre analysis: Matthew Jockers analysed 3,500 nineteenth-century novels and found that themes cluster into recognisable patterns — sentiment, character types, and plot structures that appear across the corpus and can be tracked statistically. 4. Stylometric dating: computational analysis of manuscript texts can help date anonymous works by comparing their stylistic features to dated texts from the same period. For UGC NET Research Methods: corpus linguistics is the most directly applicable Digital Humanities method to language and literary research. Know: corpus, concordance, word frequency, stylometry, authorship attribution.

💡 Indian & Literary Examples

The most famous Indian authorship attribution case in recent digital humanities: computational stylistic analysis has been applied to disputed texts in Sanskrit and Tamil literature — attempting to determine whether works attributed to specific ancient authors share the stylistic fingerprints of their attributed texts. The Mahabharata's accretion layers — different sections added at different historical periods — have been studied through corpus methods to identify linguistic markers of different compositional dates. For contemporary Indian English fiction: a corpus of post-1947 Indian English novels could be analysed to track how the vocabulary of 'India' (which words, which concepts, which names) changed across decades — showing how the literary imagination of the nation shifted from independence to globalisation.

🗄️

Digital Archives & Preservation

Field-wide — Internet Archive, Library of Congress, Project Gutenberg

💬

Start Here — Simple Idea

A book printed 500 years ago can still be read today if it has been kept in reasonable conditions. A computer file created 30 years ago may be completely unreadable today. The software that created it no longer runs. The hardware it ran on no longer exists. The file format is obsolete. Books are surprisingly durable. Digital files are surprisingly fragile. Digital preservation is the set of practices that fight this fragility — making sure that digital texts, archives, and born-digital literature remain accessible as technology constantly changes.

📌

Definition

Digital archives are organised, searchable collections of digitised or born-digital materials — manuscripts, newspapers, literary texts, photographs, audio recordings — made accessible online for research. Digital preservation is the active management of these materials over time: migrating files to new formats, maintaining hardware and software, documenting metadata, and ensuring long-term accessibility as technology changes.

Explanation

Digital preservation matters for three distinct types of material: 1. Digitised heritage materials: manuscripts, rare books, newspapers, and historical documents scanned and encoded. Examples: the British Library's Digitised Manuscripts collection, the Wellcome Collection, India's National Archives digitisation projects. Without preservation, these files can become inaccessible through format obsolescence even if the physical originals survive. 2. Born-digital literary works: hypertext fiction, digital poetry, and interactive narratives created for specific software or hardware platforms. The problem: a hypertext novel created for Storyspace software on a 1990s Macintosh may be completely unreadable on a modern computer. The Electronic Literature Organization's Preservation, Archiving, and Dissemination (PAD) project documents this crisis. 3. Born-digital scholarly outputs: digital editions, databases, and research tools built on websites that become inaccessible when funding ends, servers shut down, or institutional priorities change. This is called 'link rot' — the estimated half-life of a web URL is about 5 years. Key preservation strategies: - Format migration: regularly converting files to current standard formats - Emulation: creating software that mimics old hardware/software environments on new machines - Redundancy: keeping multiple copies in geographically distributed locations - Metadata: documenting what files are, how they were created, and how to access them Key institutions: Internet Archive (archive.org — preserves the web itself, with the Wayback Machine), Library of Congress Digital Preservation, Digital Preservation Coalition (UK).

💡 Indian & Literary Examples

Indian example: The Endangered Archives Programme (British Library) has funded digitisation projects for endangered South Asian materials — palm-leaf manuscripts in Kerala, Bengali jatra scripts, oral literature recordings from tribal communities. Without active digital preservation, these materials risk double jeopardy: the physical originals deteriorate while the digital surrogates become inaccessible through format obsolescence. The Sahitya Akademi's digital library initiative — digitising award-winning Indian literature across 24 languages — faces the same preservation challenges. Creating a digital archive is only the first step; maintaining it over decades requires ongoing investment in preservation infrastructure.

📉

Cultural Analytics & Data Visualisation

Lev Manovich — Cultural Analytics (2020); Franco Moretti — Graphs, Maps, Trees (2005)

💬

Start Here — Simple Idea

You have data on 10,000 novels: publication date, genre, author's gender, setting, length, critical reception. How do you understand what all that data means? You could read a table of numbers. But the human brain is terrible at finding patterns in numbers. You could make a graph instead. Suddenly a pattern that was invisible in the numbers leaps out visually. Cultural analytics is the use of data and visualisation to study culture at scale. It turns humanities research into something you can see.

📌

Definition

Cultural analytics is the use of computational methods and data visualisation to study cultural objects — literature, film, art, music — at massive scale. It combines the interpretive questions of the humanities with the quantitative methods of data science. Data visualisation is the representation of cultural data through graphs, maps, network diagrams, and other visual forms that make large-scale patterns perceptible.

Explanation

Lev Manovich founded the Software Studies Initiative (later the Cultural Analytics Lab) at UC San Diego, where his team used computational methods to analyse millions of images, film frames, and book pages simultaneously. His approach treats culture as data — vast datasets that can be explored, visualised, and interpreted using the same tools that scientists use for genomic data or climate records. Key visualisation methods in Digital Humanities: 1. Timeline graphs: plotting the frequency of words, themes, or genres across time — revealing cultural trends invisible at the level of individual texts. Google Books Ngram Viewer allows anyone to do this with millions of digitised books. 2. Geographic maps: plotting where novels are set, where authors were born, where books were published — revealing the spatial imagination of literary periods and the uneven geography of literary production. 3. Network diagrams: mapping relationships between characters in a novel (who speaks to whom, whose names appear near whose), between authors (who cites whom, who reviewed whom), or between texts (which books are mentioned in which other books). 4. Image analysis: Manovich's team visualised the covers of Time magazine from 1923 to 2009, revealing how cover design changed over time; analysed 1,000 paintings per artist to show stylistic evolution; compared the page layouts of millions of books. The key methodological question: do these visualisations reveal genuine patterns in cultural history, or do they make arbitrary patterns look meaningful through the authority of visual form? This is the central debate in cultural analytics — and it is a UGC NET Research Methods question.

💡 Indian & Literary Examples

The most useful Indian application: mapping the geography of Indian English fiction. A data visualisation project could plot where each major Indian English novel is set (Bombay, Bengal, Kerala, the diaspora), who the author is, and when it was published — revealing how the imaginary geography of Indian English literature has shifted from the independence-era focus on the rural and the village to the contemporary focus on the urban, the global city, and the diaspora. This kind of visualisation would make visible in a single image what would take hundreds of pages of descriptive criticism to establish — which is precisely cultural analytics' claim about its value.

Key Texts in Digital Humanities

Franco Moretti — Graphs, Maps, Trees (2005)

The foundational text of distant reading and cultural analytics

Graphs, Maps, Trees is not a work of literary criticism in the traditional sense — it is a manifesto and a demonstration. Moretti takes three quantitative methods from the natural sciences and applies them to literary history. Graphs: Moretti plots the rise and fall of novel genres in Britain, Japan, and Nigeria across the eighteenth and nineteenth centuries. The graphs reveal synchronised patterns: genres rise and fall on 25–30 year cycles, regardless of geography or culture. No reading of individual novels would reveal this pattern — it is invisible at the level of the single text. Maps: Moretti maps the geography of Parisian literature — where in Paris novels are set, how characters move through the city. He finds that Parisian novels cluster around specific neighbourhoods at specific historical moments, reflecting the class geography of the city. He maps the geography of Italian novels and finds a striking absence: the south of Italy barely appears. Literature's spatial imagination excludes whole regions. Trees: borrowing from evolutionary biology, Moretti maps how narrative devices (cliffhangers, free indirect discourse, villain types in detective fiction) branch and evolve across the corpus of novels. Some devices spread everywhere; others appear and die. The evolutionary tree shows which formal features 'survive' into the twentieth century and which go extinct. The critical controversy: Fredric Jameson praised Moretti's method but noted that the graphs say little about individual texts or about the meaning of cultural forms. Jonathan Goodwin and John Pepper argued that Moretti's statistical methods were methodologically naive. The debate exposed the fault line between qualitative and quantitative approaches to literary history. For the exam: Moretti, Graphs Maps Trees (2005), distant reading, quantitative literary history.

N. Katherine Hayles — How We Became Posthuman (1999)

Posthumanism, information, and the body in Digital Humanities theory

How We Became Posthuman is Hayles's account of how 'information' came to be understood as separable from its material embodiment — and why this is dangerous. The book traces three interlocking histories: the history of cybernetics (the science of control and communication in animals and machines), the history of science fiction representations of the posthuman, and the history of literary theory's own disembodied text. Hayles's central argument: in mid-twentieth-century cybernetics (Norbert Wiener, Claude Shannon), information was reconceptualised as a pattern separable from the medium that carries it. Information could, in principle, be uploaded from one medium (a human brain) to another (a computer). This gave rise to the fantasy of the 'posthuman': the idea that human consciousness could be downloaded into a digital substrate, achieving immortality by escaping the body. Hayles argues this is wrong and dangerous. Information is never separable from its material embodiment. The 'information' in a human brain cannot be separated from the biological substrate that produces it. Consciousness is not a pattern floating free of matter — it is constituted by specific material processes. For Digital Humanities, this argument has a direct implication: digital texts are not simply the same as print texts in a different container. The medium changes the meaning. A novel read on a screen is not the same as the same novel read on paper — the embodied experience of reading, the interface, the materiality of the medium all shape what the text means and how it is understood. This is the theoretical foundation for Hayles's concept of 'technotext': literary works in which the physical form — the screen, the interface, the code — is part of the text's meaning, not just its delivery mechanism. For the exam: Hayles, posthumanism, technotext, How We Became Posthuman (1999), Writing Machines (2002).

Michael Joyce — afternoon, a story (1987)

The first published hypertext novel — non-linearity, reader agency, and the death of closure

afternoon, a story is the most important work of electronic literature. Published on floppy disk in 1987 and later distributed by Eastgate Systems, it consists of 539 'lexias' (text nodes) connected by 950 links. There is no fixed beginning and no fixed ending. The reader navigates by clicking on words or pressing Enter, and different choices lead to different textual paths through the story. The story concerns Peter, a man who may or may not have witnessed a car accident involving his ex-wife and son earlier that morning. But the narrative resists resolution — the same scenes appear in different orders, different narrative voices contradict each other, and the ending (if any) varies depending on which path the reader takes. The literary theoretical connections are explicit: Joyce was aware of Barthes's 'writerly text,' Derrida's critique of linear narrative, and Bakhtin's polyphony. afternoon literalises all three: it is a text that actively requires the reader to produce meaning by choosing paths; it refuses the linear hierarchy of the printed book; it gives different 'voices' (narrative perspectives) equal weight without privileging any single authoritative view. The problem: afternoon is nearly unreadable on current computers because Storyspace, the software it runs on, was designed for early Macintosh systems. This makes it a paradigm case for digital preservation — one of the most important works of electronic literature is at risk of becoming permanently inaccessible. For the exam: afternoon, a story (1987); Michael Joyce; hypertext fiction; Eastgate Systems; Storyspace; electronic literature; digital preservation.

What DH Offers

  • • Reveals patterns across thousands of texts that no single reader could discover — the macro-history of literature
  • • Makes the canon's exclusions visible — by showing what was not studied, not digitised, not preserved
  • • Provides rigorous, transparent, reproducible methods for literary research — countering the subjectivity of impressionistic criticism
  • • Creates searchable, preservable digital archives of endangered texts and manuscripts
  • • Opens new questions about what literature is in the digital age — hypertext, electronic literature, algorithmic writing
  • • Bridges Research Methods and Literary Theory — two separate UGC NET units that DH brings together

Critiques & Limitations

  • • The canon problem at scale: DH reproduces and amplifies the canon's exclusions — Western, anglophone, already-digitised texts dominate corpora
  • • Counting words cannot produce literary understanding — computation reveals patterns but cannot interpret them (Stanley Fish's critique)
  • • The 'hack vs. yack' problem: building impressive tools without genuine humanistic insight
  • • Digital preservation is expensive and institutionally fragile — many DH projects disappear when funding ends
  • • Requires technical skills (XML, Python, R, statistics) that are not part of standard humanities training
  • • Non-Western, oral, and minority language traditions remain largely outside DH corpora

MCQ Practice — Digital Humanities

Question 1 of 10Score: 0

Who coined the term 'distant reading' and in which work?

Two-Mark Exam Questions

What is 'distant reading' and who coined the term?

Distant reading was coined by Franco Moretti in 'Conjectures on World Literature' (2000). It proposes analysing literary history through large-scale computational data — graphs of genre trends, maps of literary settings, evolutionary trees of narrative forms — rather than close reading of individual texts. The goal is to see patterns across thousands of texts that no single reader could read.

What is the TEI and why is it important for Digital Humanities research?

The Text Encoding Initiative (TEI) is an international standard (since 1987) for encoding literary and historical texts in XML. It allows scholars to record not just words but structure, variants, annotations, and editorial decisions in a machine-readable format. TEI-encoded texts are searchable, interoperable across projects, and format-independent — critical for digital editions and long-term preservation.

What does N. Katherine Hayles mean by 'technotext'?

Technotext (Hayles, Writing Machines, 2002) refers to literary works in which the physical medium — the screen, the interface, the code, the book's own materiality — is integral to the meaning. The 'text' cannot be separated from the material form that presents it. Mark Z. Danielewski's House of Leaves is Hayles's key print example; Michael Joyce's afternoon is her key digital example.

Who coined the term 'hypertext' and what does it mean?

Ted Nelson coined 'hypertext' in 1963. It refers to non-linear text with embedded links that allow readers to navigate between nodes in reader-determined, non-sequential order. Hypertext is the theoretical foundation of the World Wide Web (HTML = HyperText Markup Language) and of electronic literature such as Michael Joyce's afternoon, a story (1987).

What are Lev Manovich's five principles of new media?

Manovich (The Language of New Media, 2001) identifies: (1) Numerical representation — all new media objects are numerical and algorithmically manipulable; (2) Modularity — made of independent elements that can be reassembled; (3) Automation — numerical and modular structure enables automated creation and distribution; (4) Variability — exist in potentially infinite versions; (5) Cultural transcoding — culture is translated into computer data and subjected to computer logic.

What is 'corpus linguistics' in the context of literary research?

Corpus linguistics uses large electronically searchable text collections (corpora) to study language patterns at scale. In literary research: it enables word frequency analysis across thousands of texts, tracks vocabulary change over time, identifies stylistic signatures for authorship attribution, and reveals patterns of representation (how often certain words or character types appear) across an entire genre or period.

What is 'computational stylistics' and what is it used for?

Computational stylistics (or stylometry) uses statistical analysis of style — word frequency, sentence length, vocabulary richness, function word patterns — to study literary texts. Main uses: (1) authorship attribution (identifying anonymous or disputed texts by stylistic fingerprint — famously used to identify J.K. Rowling as Robert Galbraith); (2) tracking stylistic change across an author's career; (3) identifying genre boundaries through shared stylistic features.

What is 'digital preservation' and why is it a Research Methods concern?

Digital preservation is the long-term management of digital materials — texts, images, code — to ensure continued accessibility as technology changes. It matters for research because: digital files degrade through format obsolescence and bit rot; born-digital literary works (hypertext fiction) may become unreadable as their software platforms become obsolete; and digital archives can disappear through funding loss or institutional change. Active preservation requires format migration, emulation, redundancy, and metadata documentation.

What is 'data visualisation' and why does Moretti use it?

Data visualisation represents large datasets through graphs, maps, and network diagrams to reveal patterns imperceptible in raw data. Moretti uses graphs to show genre trends over time, maps to show the geography of literary settings, and evolutionary trees to show how narrative forms branch and change. Visualisation converts statistical patterns into a form the human visual system can interpret — making large-scale literary history accessible to humanistic analysis.

What is the central tension or critique of Digital Humanities?

The central tension is between computation and interpretation: whether quantitative methods (distant reading, data visualisation) produce genuine humanistic insight or merely repackage existing knowledge in visual form. Critics (Stanley Fish) argue that counting words cannot produce literary understanding. Defenders (Moretti, Ramsay) argue that computation reveals patterns genuinely inaccessible to reading. A second critique: DH has focused on Western, anglophone, already-digitised texts, reproducing the canon's exclusions at massive scale.

Name two first published hypertext fiction works.

(1) Michael Joyce, afternoon, a story (1987) — 539 lexias, 950 links, no fixed beginning or end; published by Eastgate Systems on floppy disk. (2) Shelley Jackson, Patchwork Girl (1995) — a hypertext retelling of Frankenstein; the body of the monster as the fragmented, linked body of the text. Both run on Storyspace software and face digital preservation challenges on current systems.

What is Franco Moretti's 'Graphs, Maps, Trees' about?

Graphs, Maps, Trees (2005) is Moretti's demonstration of distant reading using three borrowed methods: graphs (plotting genre rise and fall over time, revealing 25–30 year cycles), maps (the geography of literary settings, revealing which regions literature imagines and which it ignores), and trees (evolutionary diagrams of how narrative devices branch, compete, and go extinct). It is the foundational text of quantitative literary history and Digital Humanities.

Model Essay Answers

Explain Franco Moretti's concept of 'distant reading' and its implications for literary study.

Franco Moretti coined 'distant reading' in 'Conjectures on World Literature' (2000) as a deliberate provocation against the dominant tradition of close reading. Close reading — examining the language, imagery, and form of individual texts — is the core method of literary criticism from New Criticism onwards. But Moretti's provocation is simple: close reading can only study the texts we have already decided are worth reading. The canon is maybe 0.5% of all literature ever produced. What about the rest? Distant reading proposes to study 'the rest' — the enormous mass of non-canonical texts — by stepping back from individual works and looking at large-scale patterns across thousands or millions of texts simultaneously. The tools are borrowed from the natural sciences: statistical graphs, geographic maps, and evolutionary trees. In Graphs, Maps, Trees (2005), Moretti demonstrates the method. Graphs plot the rise and fall of novel genres over time — showing that genres cycle on 25–30 year rhythms regardless of national literary tradition, a pattern invisible at the level of individual texts. Maps reveal the geography of literary settings — which cities, regions, and countries literature imagines; which it leaves blank. Trees borrow from evolutionary biology to show how narrative devices branch, compete, and go extinct across literary history. The implications are radical: First, it changes the object of study. Literary criticism has always studied individual texts and authors. Distant reading studies systems, patterns, and structures that only emerge at scale. The unit of analysis is not the novel but the genre, not the author but the literary field. Second, it changes the relationship between reading and evidence. In distant reading, the critic often does not read the texts being studied — they work from databases, summaries, and statistical analyses. Critics like Jonathan Goodwin have questioned whether this produces genuine literary insight or merely sophisticated counting. Third, it reveals the politics of the canon. If distant reading shows that the texts we study are a tiny, systematically selected fraction of all texts produced, it raises the question: what principle of selection produced the canon? Distant reading cannot answer that question — but it makes the question unavoidable. The key limitation: distant reading excels at revealing macro-historical patterns but cannot tell us what any individual text means or why it matters. It is a complement to close reading, not a replacement. Moretti has always acknowledged this — his critics sometimes forget it. For UGC NET: Moretti (distant reading, Graphs Maps Trees), the contrast with close reading, and the key finding about genre cycles are exam-critical.

What is the significance of TEI (Text Encoding Initiative) for humanities research?

The Text Encoding Initiative (TEI), founded in 1987, is the most important infrastructure project in Digital Humanities. It provides a standardised framework — based on XML — for encoding literary, linguistic, and historical texts in a form that is simultaneously human-readable, machine-processable, and designed for long-term preservation. The significance of TEI operates at four levels: 1. Scholarly precision: TEI encoding allows editors to be explicit about decisions that print editions handle invisibly. When an editor prints a corrected reading, the TEI file can record both the correction and the original, along with who made the correction and why. Manuscript variants, disputed readings, emendations, and editorial choices are all encoded explicitly. This makes the scholarly apparatus transparent and machine-searchable in ways that footnotes and textual apparatus in print editions cannot achieve. 2. Searchability and analysis: a TEI-encoded corpus of Victorian novels can be searched for every mention of a specific word across all texts, every passage spoken by female characters, every reference to a specific place. This enables research questions that close reading simply cannot answer at scale. The Women Writers Project (encoding women's writing from 1400–1850) has enabled scholarship on the full range of women's literary production — not just the few canonical figures who survived into the twentieth-century canon. 3. Interoperability: because all TEI files follow the same standard, different projects can share data. A TEI edition of Tagore's poems created in Kolkata can be combined with a TEI edition created in Oxford — the files speak the same language. This enables comparative research across institutions and across national traditions. 4. Preservation: TEI separates the intellectual content of a text from any specific software or hardware platform. A TEI file created in 1995 is still fully readable and usable today, even though the software of 1995 is largely obsolete. By contrast, texts encoded in proprietary formats of the same period may be permanently inaccessible. TEI's format independence is its most important long-term contribution. For India, TEI is particularly significant for endangered language and manuscript traditions. Projects like the Bichitra digital Tagore edition (Jadavpur University) and the Endangered Archives Programme (British Library) use TEI to encode materials at risk of physical loss — creating permanent, accessible, scholarly records. For UGC NET Research Methods: TEI is the most exam-critical technical concept. Know: XML, encoding standard, digital edition, variorum, preservation, interoperability.

Discuss the relationship between Digital Humanities and traditional literary criticism. Are they in conflict?

The relationship between Digital Humanities and traditional literary criticism has been marked by genuine intellectual tension — but also, increasingly, by productive convergence. Understanding the debate requires distinguishing what each tradition does well. Traditional literary criticism — from New Criticism through deconstruction, postcolonialism, and feminist theory — excels at close, interpretive engagement with individual texts. It asks: what does this specific passage mean? How does this metaphor function? What does this novel's silences reveal about power and ideology? These questions require sustained, attentive reading of the kind that only human interpretation can provide. Digital Humanities — distant reading, corpus linguistics, data visualisation — excels at revealing patterns across large numbers of texts simultaneously. It asks: how did this genre evolve over a century? Which words are statistically more common in men's writing than women's writing in the Victorian period? How did the vocabulary of 'empire' change between 1800 and 1900? These questions require computational scale that human reading cannot achieve. The apparent conflict arises when advocates of each method make totalising claims. Franco Moretti's early formulations of distant reading implied that close reading was methodologically limited — that literary history required quantitative scale to be rigorous. Critics like Stanley Fish responded that counting words cannot produce literary understanding — that humanistic insight requires interpretation, not calculation. But this opposition is false. Distant reading and close reading answer different questions. They are not competitors but collaborators. Moretti's graphs showing genre cycles raise the interpretive question: why do genres cycle on 25–30 year rhythms? The graph cannot answer that question — it requires the interpretive resources of traditional criticism, including sociology, psychology, and literary history. Similarly, a computational finding that a specific word appears more frequently in postcolonial novels than in colonial-era novels is not itself an interpretation. It becomes interpretation when a critic asks: what does this frequency reveal about the changing concerns of the postcolonial literary imagination? The computation produces a pattern; the critic produces the meaning. The most productive current work in Digital Humanities holds both methods together — using computation to identify patterns, and close reading to interpret them. Ted Underwood's Distant Horizons (2019) is the best example: predictive modelling identifies what features distinguished reviewed from unreviewed novels in the nineteenth century, and then close reading interprets what those features reveal about the politics of literary taste. For UGC NET: the key position is that DH and traditional criticism are complementary, not in conflict — each answering questions the other cannot.

Frequently Asked Questions

Is Digital Humanities in the UGC NET English syllabus?

Yes — Digital Humanities appears in two units of the UGC NET English syllabus. In the Research Methods unit, it appears as computational methods, digital archives, and TEI. In the Literary Theory and Criticism unit, it appears through figures like N. Katherine Hayles (posthumanism, technotext) and as part of discussions of new media and electronic literature. Franco Moretti's distant reading and Lev Manovich's five principles of new media are the most frequently cited in exam question predictions for 2025–26.

What is the difference between distant reading and close reading?

Close reading (the method of New Criticism, deconstruction, etc.) analyses individual texts with sustained interpretive attention to language, form, and meaning. Distant reading (Moretti) analyses patterns across thousands or millions of texts simultaneously using computational methods — graphs, maps, statistical analysis. Close reading asks 'what does this passage mean?'; distant reading asks 'how did this genre change over a century?' They answer different questions and are best understood as complementary, not competing.

What are the five principles of new media (Manovich)?

Lev Manovich (The Language of New Media, 2001) identifies: (1) Numerical representation — all new media objects are numerical; (2) Modularity — made of independent modules (pixels, characters, objects) that can be assembled and reassembled; (3) Automation — numerical and modular structure allows automated creation and distribution; (4) Variability — new media exist in potentially infinite versions, not fixed objects; (5) Cultural transcoding — culture is translated into computer data, subjected to computer logic. These five are exam-critical for UGC NET.

Who is N. Katherine Hayles and why does she matter?

N. Katherine Hayles is the key theorist of the humanities-technology interface. Her How We Became Posthuman (1999) argues that the human/machine distinction has always been blurred — we are constitutively entangled with technology. Her concept of 'technotext' (Writing Machines, 2002) argues that the physical medium — screen, interface, book's materiality — is part of a text's meaning, not just its container. Electronic Literature (2008) maps born-digital literary forms. For UGC NET: Hayles, posthumanism, technotext.

What is the biggest critique of Digital Humanities?

Two main critiques: (1) The 'hack vs. yack' problem — DH has been criticised for privileging technical 'making' (building tools and archives) over interpretive 'thinking' (humanistic theory and criticism), producing impressive visualisations without genuine insight. Stanley Fish argued that counting words cannot produce literary understanding. (2) The canon problem — DH has focused on Western, anglophone, already-digitised texts, reproducing and amplifying the exclusions of the traditional canon at massive computational scale. Non-Western, non-English, oral, and marginally literate traditions remain largely absent from DH corpora.

What is the relationship between hypertext and poststructuralism?

Hypertext theory and poststructuralism developed in parallel and share deep structural similarities. Barthes's 'writerly text' (S/Z, 1970) — a text that the reader produces rather than consumes — is literalised by hypertext, where the reader's link choices determine the textual path. Derrida's critique of linear, hierarchical 'logocentrism' maps onto hypertext's non-hierarchical, multi-directional network structure. Foucault's concept of the 'archive' informs digital archive theory. Hypertext did not emerge from poststructuralism — but the two traditions recognised each other as intellectual allies.

What should I prioritise for UGC NET Digital Humanities questions?

Priority 1 (highest exam probability): Franco Moretti — distant reading, Graphs Maps Trees (2005), genre cycles. Priority 2: Lev Manovich — five principles of new media (numerical representation, modularity, automation, variability, cultural transcoding). Priority 3: N. Katherine Hayles — posthumanism, technotext, How We Became Posthuman (1999). Priority 4: TEI — XML, digital edition, encoding standard. Priority 5: Ted Nelson — hypertext (coined 1963); Michael Joyce — afternoon, a story (1987). Common trap: confusing 'distant reading' with 'not reading' — Moretti's method is a form of reading, just at a different scale.

How does Digital Humanities connect to Research Methods for UGC NET?

Digital Humanities is directly relevant to three Research Methods topics in the UGC NET syllabus: (1) Data collection and corpus building — corpus linguistics, digital archives, TEI as methods of gathering and organising research data; (2) Research tools and software — concordance software, text analysis programs, data visualisation tools; (3) Research ethics and citation — open access, copyright in digital editions, proper citation of digital resources. The overlap between DH and Research Methods is the reason the field appears in both theory and methods units of the exam.