[T]hey fancied that they could detect in numbers, to a greater extent than in fire and earth and water, many analogues of what is and comes into being—such and such a property of number being justice, and such and such soul or mind , another opportunity , and similarly, more or less, with all the rest—and since they saw further that the properties and ratios of the musical scales are based on numbers, and since it seemed clear that all other things have their whole nature modeled upon numbers, and that numbers are the ultimate things in the whole physical universe.
-Aristotle, Metaphysics, Book 1, 985b
Much of what interests me in data mapping and data extraction in light of network maps, concept modeling, vector space modelling, etc., is that they are not only methods, but also metaphysics.
Dealing with a corpus is a bit like the Pre-Socratics trying to find the underlying something that comprises reality. The big ideas–or Big Idea–that connects or threads the works together. The corpus may have alcoves and pockets, islands and peninsulas, but unity and commonality exists. Patterns exist.
This takes me back to Thales, who said that everything was water, which makes sense when you compare water with the other elements: earth, air, and fire. It can be fluid, solid, and steam in fairly everyday practices and environments. We had a water world. But not like the Kevin Costner movie.
But then came Thales’ disciple, Anaximander, who argued that reality could not be composed of any single element because an infinity of any single element would extinguish the others. A water world would not give birth to fire, for example. So for Anaximander, the answer was abstraction, creating an almighty, unifying One. This would be the source for the others, but not exhibit any singular quality, like dryness.
And so on an so forth, Pre-Socratics speculating on the foundations of reality. I’ve always enjoyed Heraclitus’ insistence on flux and impermanence, “that all things are in motion all the time, but this escapes our perception,” as Aristotle describes in his Physics. Such constant change evens out in the grand scheme.
Pythagoras and the Pythagorean outlook, as described by Aristotle in his Metaphysics, proves to be one of the more fascinating foundations: numbers. For them, numbers structured the universe, grounding reality, crafting the cosmos. There must be ten planets, they argued, because ten was the perfect number, even though observation often came to nine, not ten, planets. One was especially valuable for its generative properties. And the musicality of ratios–the groundwork for much of our musical vocabulary, like “octave”–governed the movement of spheres and heavens.
Of this metaphysical outlook, Aristotle was critical, arguing that the Pythagoreans, as he describes them, imposed this structure on the world instead of trying to rely on observation. Aristotle also argued that one could not use abstract, non-sensible elements, like numbers, to compose sensible reality, as such elements had no physical properties. How can a five be heavy?
Aristotle was likely guilty of misrepresentation and misunderstanding to some extent, confusing epistemology for metaphysics–i.e. how we “know” the world with numbers is not how the world is composed. But, still, he raises a key point: how our representations of reality construct our reality and the role that quantitative data has in constructing experience.
The Pre-Socrates were trying to fathom the metaphysical bedrock of being. Today, in DH, the goal is often far less expansive and the methods are far more scientific, but as we build our models and visualize our word clouds, we must be careful to avoid the Pythagorean pitfall: narrating reality as numbers, not just with numbers.
-2-
A sample is like a slice of pizza, and a population is a whole pizza. If you take a slice of pepperoni pizza, your sample slice will likely reflect the population. But taking a single slice from a half-pepperoni, half cheese pizza may lead to poor results.
I used this analogy countless times when tutoring. It often worked. But it captures one of my favorite things about Statistics: its function as synecdoche. The slice represents the whole pizza. The sample represents the population.
And yet, it gets more complicated, because the statistic is also constructing the population in a sense. As Tavia Nyong’o pointed out at the Affect conference, no one is the “average.” We are all part of it, but not necessarily representative of it–though it is meant to represent us. And yet, such averages create structures that do, in very material ways, affect us. K-12 pedagogy, for example, or political science and sociology creates protocol and practice across the country.
Thus, the role of representation, the way the population gets mapped and centralized, has considerable power and brings considerable baggage, and as we explore the corpus that surrounds us, we must be careful that we don’t construct a tenth planet.
As Lisa Rhodie writes:
Topic models (and LDA is one kind of topic modeling algorithm) are generative, unsupervised methods of discovering latent patterns in large collections of natural language text: generative because topic models produce new data that describe the corpora without altering it; unsupervised because the algorithm uses a form of probability rather than metadata to create the model; and latent patterns because the tests are not looking for top-down structural features but instead use word-by-word calculations to discover trends in language.
Thus, topic modeling is a form of pattern-finding, using word-by-word “co-occurring” patterns to construct topics that can then get mapped and tracked to highlight certain latent or emergent patterns. Done by a machine, it can cull together thousands of texts and tinker with these topics, representing patterns. They may or may not be significant patterns, but that forms the text step.
But, as Rhodie continues, “As literary scholars well know, however, poems exercise language in ways purposefully inverse to other forms of writing, such as journal articles, encyclopedia entries, textbooks, and newspaper articles.” And as she goes on to argue, and show, in her paper, word use in a poetic corpus doesn’t fit the framework of the topic model as snugly as we wish.
And that is the interesting thing about these word-centered mapping techniques, like topic modelling or the WEMs that Ben Schmit writes about: they do not use metadata to drive them in the ways that a database search may. The words themselves, culled from texts, provide the material. LANGUAGE and concrete poets would love this sort of work, especially WEMs. And one wonders, looking at these maps, where the line falls between art, composition, and craft, on the one side, and method and statistic on the other.
As Ben Schmit writes, “A topic model aims to reduce words down some core meaning so you can see what each individual document in a library is really about. Effectively, this is about getting rid of words so we can understand documents more clearly. WEMs do nearly the opposite: they try to ignore information about individual documents so that you can better understand the relationships between words.”
By showing this relationship, one maps and orients, according to the programming, a set of words. Sometimes the connections are intuitive. “Bank,” for example, connects with “cashier” and “hank,” but not words like riverbank, reflecting the polysemy (or multiple meaning) of the world. Heavily drawing on analogies and phonemic connections, these word maps construct their own sorts of texts. Not read in a linear or “human” way per se, but in terms of their own patterns. Like a piece by Morton Feldman, the unique textuality and thinking of the piece evolves according to its own dynamics and tonalities, creating something like this:
Indeed, these visualizations, have a fascinating art to them, both in terms of construction and seeing, interpreting and reading, making the interpretation of data deeply heurmenuetic.
In this way, such displays and results–in some methods more than others–may be quantitative and mined on a mass general scale, but still require an interpretation that is far from straightforward. It may be quantitative, but falls closer to the qualitative spectrum, or rather helps us draw qualities and conclusions in ways that much data does not.
It is not a mere schema being used to represent the world and construct the world. Instead, it as schema being used to visualize and interpret the world. A schema that may use data and meta data, but much like a writer, constructs a text that says something unique. That has a generative quality.
In the Gay Science, Nietzsche describes the searching of most thinkers in a troubling, yet provocative image: as someone pulling back veils. But no one veil reveals the face. There is no end to the pulling back.
To put it in a Pre-Socratic sense, there is no bedrock.
In this way, these tools may let us reach certain knowledge that was previously beyond our grasp. But it is not “too scientific.” It is not a death to traditionally “humanistic” values like interpretation. Instead, it is simply a different tactic at narration and creation.