26.2 Picture Perception

26: Visual Message Design and Learning: The Role of Static and Dynamic Illustrations
PDF

26.1	Scope
26.2	Picture Perception
26.3	Memory Models
26.4	Pictures and Knowledge Acquisition
26.5	Conclusions
	References

26.2 Picture Perception

26.2.1 Theories of Picture Perception

When is a surface with marks on it a "picture?" How do pictures carry meaning? What kinds of meaning can pictures carry (see 8.8.2, 15.8)? Is there a grammar of picturing? Is picture perception essentially innate, or is it a skill that must *be learned?

Questions such as these have provoked conjecture from philosophers, psychologists, art historians, sernioticians, and computer scientists. It is a fascinating, disputatious literature, one with implications for researchers in educational communication and technology-although widely neglected.

This section of the chapter provides a concise introduction to the major scientific theories of picture perception. To set the discussion of modem theories in historical context, the article begins with a description of the theory of linear perspective developed during the Italian Renaissance. Then two major conflicting theories are introduced: James J. Gibson's resemblance theory, in which meaning is we are able to perceive the invariant shapes of the objects in a picture (e.g., w at oes an mv ant cat" look like?), Gibson uses the concept to avoid some of the problems of perspective theory (e.g., how can we identify an object in a picture if it is depicted from a point of view we have never seen?). Nevertheless, Gibson's theory of pictorial representation is based primarily on the optical correspondence of the picture and the environment, and it is the structure of the stimulus that is the driving force in picture perception.

For recent discussions of Gibson's work see Cutting (1982, 1987), Fodor and Pylyshyn (1981), Natsoulas (1983), Reed and Jones (1982), Rogers and Costall (1983), and Wilcox and Edwards (1982).

26.2.4 Constructivism: E. H. Gontbrich

Perception, as Neisser (1976) puts it, is where reality and cognition meet. Whereas Gibson assigns the major role in this meeting to reality, constructivists such as Gornbrich emphasize the role of cognition. Pictures do not "tell their own story," Gombrich argues, the viewer must construct a meaning (see also 7.3. 1).

Pictures will be interpreted differently depending on the attitude taken by the eye of the beholder. What we see, or think we see, is filtered through a variety of mental sets and expectations. For example, briefly shown playing cards in which hearts are colored black are sometime seen as purple (Bruner & Postman, 1949).

One special class of expectations consists of the artistic conventions in common use. Gombrich (1969) traces the history of Western art, showing how cultural and technological changes have altered the criteria for pictorial realism. What is judged to be a "good likeness" is a function of the conventions and drawing techniques that now look "wrong" and amateurish to our modem eye.

A more pervasive example of a system of pictorial convention in use today is the outline drawing. The use of lines to represent the edges of objects is a substantial departure from nature. The objects in the world are not bounded by lines, and it is due to convention that we perceive outline drawings as depicting shapes rather than arrangements of wires. Whereas the convention that shapes can be represented by outlines is a rapidly acquired understanding, the ability to interpret some conventions such as implied motion cues may require extensive experience or even direct instruction (Levie, 1978).

Such conventions are not arbitrary. Artists are not free to adopt any technique they choose. In fact, the history of naturalistic art can be thought of as a series of innovations in the technique of approximating what is seen by viewing the environment. But Gombrich argues that realism in art is more than just an effort to record the optical data present in nature. The artists must produce an "illusion of reality" that matches the viewer's concept (schema) of what a picture of a given kind should look like. And how are these schemata acquired? By repeated exposure to the art of the day. These schemata then function as the standards for judging reality in subsequent picture viewing.

Such schemata can also affect our perceptions of nature. "We not only believe what we see: to some extent we see what we belie e" (Gregory, 1970, p. 86). Our experience with art may lead us to look at the natural environment in new ways. For example, the sensitive museum visitor may note that the pastel patches of impressionist paintings can be observed in nature as well. So the ways of representing nature can become ways of seeing nature. Similarly, artists vacillate between painting what they see in nature and seeing in nature what they paint on canvas.

One controversial claim by Gombrich (1972) is that pictures lack the "statement function7' of words. For example, he argues that the statement "The cat sits on the mat" cannot be directly pictured. A picture of a cat on a mat depicts a particular cat in a particular environment as seen from a particular viewpoint. An equivalent verbal message would be something like: -There is a cat seen from behind." Gombrich would not, however, propose that pictures are a poor source of ideas. Indeed, the conceptual richness of pictorial representation is a central theme of his work.

For further comment on this approach, see Blinder (1983), Carrier (1983), Gregory (1973, 1981), Heffernan (1985), and Katz (1983).

26.2.5 A Generative Theory: Margaret Hagen

Is picture perception primarily a bottom-up process as Gibson claims, or a top-down process as Gombrich claims? Hagen (1978, 1980a) provides a generative theory of representation that suggests a reconciliation: "Meaning is not given by the head to the unstructured stimulus, nor is it given by the stimulus to the unstructured head. The relation between the two is reciprocal and symmetrical" (1980a, p. 45).

In developing her thesis, Hagen describes differences between how we perceive the natural world and how we perceive "the world within the picture." For example, as compared to natural perception, picture perception compresses the perceived third dimension and increases the awareness of the angle among objects (the spread). Thus, picture perception has a special character that is based partly on ecological geometry (the natural perspective of the visual environment, and partly on the creativity or generativity of the perceiver.

Recently Hagen (1986) has provided a category system for describing the geometrical foundations of many styles of representational art: early Egyptian art, Roman murals, Northwest Coast Indian art, Japanese art, Mayan art, and ice age cave art, to name just a few. For example, there are several options for the location of the artist's station point. It can be close to the subject of the picture, at a moderate distance, or at optical infinity, in which case vanishing points and the convergence of parallel lines (e.g., railroad tracks meeting at the horizon) are obviated. Also, the system can involve the use of a single station point or multiple station points. Hagen observes that each system of depiction is "correct" when judged according to its assumptions. Thus in evaluating the art of other times and cultures, we must reject the premise that the prevailing post-Renaissance system of Western art is the only valid system for representing reality, a position also taken by Arnheim.

26.2.6 A Gestalt Approach: Rudolf Arnheint

According to Amheim, picture perception is not primarily an act of direct perception as Gibson claims, nor is it a response to changing conventions as Gombrich claims. Picture perception is primarily a matter of organizing the lines and other elements of a picture into shapes and patterns according to innate laws of structure. Arnheim (1954) lies the principles of Gestalt psychology to the study of art. He shows how the laws of organization (e.g., the rules of grouping, the laws of simplicity and good continuation) can be found in the art of many periods. Meaning, he argues, has always been embodied in the Gestalt, the whole which is greater than the sum of its parts. Picture making is also derived from Gestalt principles:

The urge to create simple shapes ... cannot be explained as an urge to copy nature; it can be understood only when one realizes that perceiving is not passive recording but understanding, that understanding can take place only through the conception of definable shapes. For this reason art begins not with attempts to duplicate nature, but with highly abstract general principles that take the form of elementary shapes (Amheim, 1986, pp. 161-162).

Arnheim observes that our judgment of the art of other times and cultures suffers from "a prejudice generated by the particular conventions of Western art since the Renaissance" (Amheirn, 1986, p. 159). Furthermore, current technique is so pervasive that we assume that it is the only correct way to make pictures. But the techniques of unfamiliar art styles are not, as sometimes supposed, due to lack of skill or accidentally acquired convention; nor are they deliberate distortions devised for some artistic purpose. Each style is based on an internally consistent system of solutions to visual problems, solutions that are no more in need of justification than contemporary technique.

Arnheim (1969) is also known for his advocacy of "visual thinking." He rejects the belief that reasoning occurs only through the use of language. In fact, he argues that thinking occurs primarily through abstract imagery. Arnheim champions the role of art in education and stresses the importance of teaching students to become fluent in thinking with shapes.

Another recurrent theme in Arnheim's work is the nature of abstraction. Representational art involves one kind of abstraction. Portraits, for example, are more abstract than their real-world referents. In such cases, "abstractness is a means by which a picture interprets what it portrays" (Arnheim, 1969, p. 137). On the other hand, pictures may be less abstract than the concepts they symbolize. For example, the silhouette of a cow on a roadside sign, although quite abstract, is still less abstract than the concept "cattle crossing." Amheim (1974) discusses some of the problem,, faced by educators in determining the most effective kin( and level of abstraction to use in instructional illustrations

Although Gestalt ideas have been eschewed by cognitive psychologists, recent discoveries in visual anatomy an( physiology and the study of perceptual organization have attracted some renewed interest in the area (Hoffman & Dodwell, 1985; Kubovy, 198 1, # 1056).

26.2.7 Picture Perception as Purposive Behavior; Julian Hochberg

Hochberg opposes the Gestalt approach, arguing that "the whole stimulus configuration cannot in general be taken as the effective determinant for perception" (Peterson 8 Hochberg, 1983, p. 192). Here is why: All aspects of a picture cannot be perceived in a single glance. Vision is shared only in a small central area of the visual field-an area about the size of your thumbnail when held at arm's length. On the retina of the eye, acuity falls off rapidly from this area (the fovea). Since detailed discriminations are possible only on the fovea, it is necessary to scan pictures in order to take ii all the details. Scanning does not occur in smooth sweeps but rather as a series of very rapid jumps called saccades ant brief stops called fixations- normally about one-third second each. The information obtained from these separate fixations must be integrated into a mental map. Thus "at an: given time most of the picture as we perceive it is not ont the retina of the eye, nor on the plane of the picture-it is 11 the mind's eye" (Hochberg, 1972). So the whole is not perceived directly, as Amheim claims; it is the result of synthesis based on the analysis of parts. These interactions between the picture, eye movements, and cognitions, are "high skilled sequential purposive behaviors" that are, according to Hochberg, the keys to understanding picture perception.

Hochberg (1979, 1980) describes how certain techniques used in painting can be thought to mimic the workings of the visual system. For example, in some c Rembrandt's paintings most of the canvas is bluffed; only few areas are rendered in sharp detail, simulating what is registered by the eye in a series of fixations. Similarly, techniques used in impressionistic paintings (which Hochber calls "painting for parafoveal viewing"), pointillist paintings, and Op Art (Vitz & Glimcher, 1984) mirror processe of the human perceptual system.

Another issue discussed by Hochberg concerns the question of which picture of an object is the "best" picture Hochberg uses the term canonical form to refer to "the most readily recognized and remembered view or 'clear up' version of some form or object" (Hochberg, 1980, 1 76). Canonical form preserves the most distinctive feature of an object and eliminates noninformative features. Another factor in determining canonical form is the point of vie, from which an object is depicted.

26.2.8 A Mentatistic Approach: John M. Kennedy

Kennedy is supportive of Arnheim's approach and opposed to Gibson and Gombrich. He argues that we will learn very little about how pictures are perceived by studying the optical geometry of naturalistic art. Understanding picture perception should begin with the realization that pictures are made by people trying to communicate to receivers who are themselves intelligent perceivers striving to grasp the sender's intent. Pictures are made to communicate ideas, not just show scenes. To exemplify his approach, Kennedy discusses the pictorial metaphor:

Imagine a picture of a businessman with as many arms as an octopus, each hand holding a telephone. Or imagine a picture of a bride looking into a mirror and seeing a harried housewife. These pictures violate the laws of physics; they break the rules that Gibson called on.... And they do so precisely because the artist wants to put across ideas: that business men are overworked; that present bliss gives rise to future stress (Kennedy, 1985, p. 38).

Metaphoric pictures present two meanings, one false, the other intended. Understanding the perception of such pictures requires a "mentalistic analysis" in which assumptions are made about the experience and mental processes of the sender and the receiver. "The person who makes the metaphor expects the recipient to notice both meanings, and expects the recipient to know which was intended, and expects the recipient to know what the maker expected from the recipiene' (Kennedy, 1984b, p. 901). Kennedy also argues that pictorial cues such as implied motion cues can be conceived of as metaphor rather than as pictorial convention. As a historical footnote, Kennedy was Gibson's student at Comell, and at one time followed in his footsteps, writing a survey of the field that was based largely on Gibsonian ideas 0 974). But a decade later, Kennedy would write: "Regrettably scientific psychology as found in our universities can never be anything more than a trivial pursuit. By its very nature it is incapable of profound insights into humankind" (Kennedy, 1984a, p. 30). Although this represents a dramatic change in Philosophy on Kennedy's part, the attack on a competing approach is by no means unusual. The picture perception literature is an intellectual battlefield delightfully seasoned with charge and countercharge. Theorists are robustly combative in attacking opposing views while defending their own.

26.2.9 A Semiotic Approach: James Knowlton

The theories discussed so far approach the topic from points Of view related to visual perception, either by way of perceptual psychology or through the analysis of visual art. The next two theories have a different starting point; they derive from a concern with symbol using in general, thus placing the discussion of picture perception in a broader context.

The boundaries of semiotics-the science of signs-are wide and indistinct. The domain includes questions of the meaning of as well as the communication of meaning. Among the central figures in this field are Cassirer (1944), Morris (1946), Pierce (1960), and Sebeok (1976). For further corm-nentary on the contribution of semiotics to picture perception see Cassidy (1982), Eco (1976), Holowka (1981), Langer (1976), Sless (1986), and Veltrusky (1976). Here, however, we will focus on the theorist in this tradition who speaks most directly to our present concerns with visual message design research: James Knowlton. Knowlton (1964, 1966) develops a metalanguage for talking about pictures beginning with the term sign. A sign is a stimulus intentionally produced for the purpose of making reference to some other object or concept. A key distinction is that between digital signs and iconic signs. Digital signs bear no resemblance to their referents. For example, the physical appearance of the signs "man" and "hombre" do not in any way look like their referent. Examples of digital signs are words, numbers, Morse code, Braille, and semaphore. Iconic signs, on the other hand, are not arbitrary in their appearance. In some way, iconic signs include drawings, photographs, maps, and blueprints.

Usually pictures are thought to resemble their referents in terms of visual appearance. Resemblance can, however, take other forms. Knowlton broadens the concept of "picture" to include "logical pictures" and "analogical pictures." Logical pictures resemble their referents in terms of the relationships between elements. An electrical writing schematic, for example, bears no visual resemblance to the piece of apparatus it represents; it is a picture of the pattern of connections between elements. Flowcharts and diagrams are other examples of logical pictures. In analogical pictures, the intent is to portray a resemblance in function. For example, a pictorial analogy could be made between a suit of armor and an insect's exoskeleton. Thus Knowlton's definition of "resemblance" goes far beyond Gibson's concept in which resemblance is based on the optical equivalence of pictures and their referents. And, even when resemblance is based on physical appearance, the resemblance of a picture to its referent can, according to Knowlton, be slight. Sometimes a simple silhouette will do thejob. Additionally, the ways in which resemblance functions in pictorial communication often depend on factors that are extrinsic to the picture itself.

Resemblance does not designate a single relation between pictures and their subjects; it designates the members of a fairly comprehensive class of relations-a class whose boundaries are not clear. And relations of resemblance are not always immediately evident to the uneducated eye. Knowing how to look at a picture is required to discern the ways it resembles its subject. Knowledge of other matters may be required as well-pictorial conventions, referential connections, historical, scientific, or mythical lore that sets the context of the work. Such matters are not taken in at a glance (Elgin, 1984, p. 919).

The most extreme and controversial position on the role of resemblance is taken by Goodman. He asserts that resemblance between picture and nature is not necessary, and that "A picture is realistic to the extent that it is correct under the accustomed system of representation" (Goodman, 1978, p. 130).

26.2. 10 Symbol Systems Theory: Nelson Goodman

Goodman (1976) has devised a detailed theory of symbol systems. A symbol system consists of a set of inscriptions (e.g., phonemes, numbers) organized into a scheme that correlates with a field of reference. For example, musical staff notation consists of five horizontal lines on which notes and other marks are placed that correlate with a musical performance. As another example, maps consist of lines, shapes, and symbols that correlate with a musical performance. Also, maps consist of lines, shapes, and symbols that correlate with roads, boundaries, and landmarks. Thus the analysis of a symbol system involves an examination of (1) the scheme of representation, (2) the field of reference, and (3) the rules of correspondence between the two.

Goodman provides several conceptual tools that can be used for analyzing symbol systems. One key concept is notationality. Notationality is the degree to which the elements of a symbol system are distinct and are combined according to precise rules. Music is high in notationality. The notes on the scale are distinct in terms of pitch and duration, and the rules for combining them are clear. Mathematics systems are also high in notationality; each number is distinct and the rules for "making statements" are precise. Pictures, on the other hand, are nonnotational. The "elements" of picturing are overlapping, confusable, and lacking in syntax. The fines and shadings that pictures are built from are without limit, and the ways they are combined to produce a symbol are undefined.

Notationality is an aspect of symbol using that may have implications for human information processing. Gardner (1982) speculates that "a case can be made that the left hemisphere of the human brain is relatively more effective than the right at dealing with notational symbol systems, ... while the right hemisphere is more at ease in dealing with ... nonnotational systems" (p. 59).

Another key concept in Goodman's theory is repleteness. Some symbol schemes, such as most pictures, are replete (or dense), whereas other schemes, such as printed words, are lacking in repleteness. The degree of repleteness is an index of how many aspects of a scheme are significant. In printed text, changes in the typeface, boldness, ink color, and other physical parameters do not necessarily alter meaning in any significant way. Drawings, on the other hand, are relatively replete, since several aspects of the marks in a drawing are often critical. Paintings are very high in repleteness. "Everything about a painting is part of it-Aesign, coloration, brush stroke, texture, and so on. A painting is unrepeatable in the strict sense of the term" (Kolers, 1983, p. 146).

Goodman distinguishes three primary functions of symbol systems. Symbols can represent concepts by denoting or depicting them. Symbols can exemplify ideas or qualities by providing a sample of the concept. And symbols can express affective meaning (emotions).

Symbol systems differ in respect to the ease with which they can perform the functions of representation, exemplification, and expression. For example, music, although richly expressive, has no literal denotation. Music in the absence of a title or lyrics is not "aboue' anything. Number systems are limited in a different way. Numbers represent quantities, but they normally have no expressive function. Most pictorial systems are versatile. Line drawings, photographs, and representational paintings can depict, exemplify, and express forcefully.

Pictures exemplify qualities such as color and shape through the possession and presentation of them. The qualities exemplified are properties of the picture. Pictures express through "metaphorical exemplification", the figurative possession and presentation of emotion. For example, when a picture expresses sorrow, the feeling can be said to be "in the picture." We must, however, learn how to decode the expressive features of pictorial systems. "Emotions are everywhere the same; but the artistic expression of them varies from age to age and from one country to another" (Goodman, 1976, p. 90).

For other comments on Goodman's theory see Coldron (1982), Gardner, Howard, and Perkins (1974), Roupas (1977), Salomon (1979a, 1979b), and Scruton (1974).

26.2.11 CopiRive Science: David Marr

Artificial-intelligence research on computer vision is a rapidly developing area that may contribute to understanding picture perception by humans. One focus of this work involves determining the computations that are required in order to program a computer to see. To do this, it is necessary to specify the nature of the visual input, to describe how this input is transformed into data that can be handled by a computer, and to enumerate the computations that are carried out on-line to produce solutions to visual problems. Such problems include the detection of shape contours and surface textures. 4

A central figure in this area is David Marr. Marr's (1982) theory of vision involves the analysis of visual input through a series of stages that culminates the meaningful interpretation of an image. In Marr's theory, an initial analysis involves the detection of features such as boundaries. These determinations are used to construct a "primal sketch" that distinguishes the sections of the display. From these sections, surface data such as shading are used to define the simple three-dimensional shapes in the scene. Finally, "generalized cones" form the basis for the representation and recognition of complex shapes such as animals.

Marr (1982) asserts that since the early days of the Gestalt school "students of the psychology of perception have made no serious attempts at an overall understanding of what perception is" (p. 9). Some psychologists are equally skeptical of the reciprocal value of Marr's work. Kolers (1983), for example, comments that "Although the study of human perceiving may continue to inform the study of machine vision, it remains to be seen whether students of computer vision will teach us much about human perceiving" (p. 160). For comments on Marr's work and other recent approaches to computer vision, see Connell and Brady (1987), Fischler and Firschein (1987), Gregory (1981), Jackendoff (1987), Kitcher (1988), Kolers and Smythe (1984), Lowe (1987), and Rosenfeld (1986).

A theory that is closely related to Marr's approach has been proposed by Biederman (1985, 1987). Biederman describes a process by which an object in a two-dimensional image can be recognized. The process uses a set of primitive elements: 36 generalized-cone components called geons. These geons are derived from the combination of only five aspects of the edges of objects (e.g., curvature and symmetry). The process of interpreting a picture involves detecting the edge elements in an image, generating the resulting geons, combining these geons to produce meaningful forms, and matching them to known forms in the visual environment. Only 36 geons are needed for the perception of all possible images, a situation that is analogous to speech perception in which only 44 phonemes are needed to encode all the words in the English language. Biederman invokes evidence showing that the recognition of objects is robust across a wide range of viewing conditions (e.g., occluded views) and viewpoints (e.g., rotations in depth). Biederman's theory would appear to be in opposition to most other theorists who contend that it makes little sense to talk of a "vocabulary" and "grammar" of picturing.

Another area that should be mentioned is neurophysiology. Kosslyn (1986, 1987) suggests how neurophysiology might be combined with AI computational theory to yield a more complete understanding of vision. After all, Kosslyn observes, perception and cognition are something the brain does. The extreme belief regarding the potential importance of neurophysiology is expressed by Kitcher (1988): "Ultimately, all phenomena currently regarded as psychological will either be explained by neurophysiology or not at all" (p. 10).

26.2.12 Implications for Media Researchers: An Example

Picture perception theorists have challenged many of our orthodox beliefs about pictures. For example, consider the question of what constitutes "realisne' in pictures. In the media research literature, realism is generally defined as a matter of faithfully copying nature. A picture is said to be "realistic" to the degree that it mirrors the visual information Provided by the real-world referent, and researchers studying the effects of pictorial realism have manipulated "realism cues" such as amount of detail, color, and motion. The outcomes of this research have been frequently disappointing.

Picture perception theorists have offered alternatives to the simple "copy theory" of realism. Although Gibson's approach stresses the fidelity of picture to referent, he adds the qualification that a successful picture copies the invariant visual information in nature-the optical data about reality that remains constant across time and across different views of an object. Goodman (1976) contends that realism is "... not a matter of copying but of conveying. It is more a matter of 'catching a likeness' than of duplicatingin the sense that a likeness lost in a photograph may be caught in a caricature" (p. 14). For Gombrich, the criteria for realism are not in nature, but in the perceiver's head in the form of expectations for what pictures of a given type "should" look like. These expectations are built up during extensive experience with the prevailing pictorial system and function as the standards for judging realism. Amheim argues that perceptions of realism are relative to pictorial style, and are particularly influenced by how a style represents what we know about an object (conceptual reality) as compared to what the object looks like (perceptual reality). Marr and Biederman propose bottom-up theories that focus on the match between abstract elementary forms in pictures and their referents.

Thus contrasting the copy theory of pictorial realism with those of picture perception theorists, the copy theory emphasizes the exact visual match between pictures and referents, whereas theorists emphasize the nature of departures of picture from reality: surface level vs. deeper semantic, psychological, stimulus only vs. contribution of perceiver also.

AECT
1800 North Stonelake Drive, Suite 2
Bloomington, IN 47404

877.677.AECT (toll-free)
812.335.7675