AECT Handbook of Research

Table of Contents

39: Experimental Research Methods
PDF

39.1 Evolution of Experimental Research Methods
39.2 What is Experimental Research?
39.3 The Practice of Experimentation in Educational Technology
39.4 The Status of Experimentation in Educational Technology
39.5 Contemporary Issues In educational Technology Experimentation
39.6 Summary
References
Search this Handbook for:

39.5 - Contemporary Issues in Educational Technology Experimentation

39.5.1 Balancing Internal and External Validity

Frequently in this chapter, we have discussed the traditional importance to experimenters of establishing high internal validity by eliminating sources of extraneous variance in testing treatment effects. Consequently, any differences favoring one treatment over another can be attributed confidently to the intrinsic properties of those treatments rather than to confounding variables, such as one group having a better teacher or more comfortable conditions for learning (see e.g., reviews by Ross & Morrison, 1989; Slavin, 1993).

The quest for high internal validity orients researchers to design experiments in which treatment manipulations can be tightly controlled. In the process, using naturalistic conditions (e.g., real classrooms) is discouraged, given the many extraneous sources of variance that are likely to operate in those contexts. For example, the extensive research conducted on "verbal learning" in the 1960s and 1970s largely involved associative learning tasks using simple words and nonsense syllables (e.g., see Underwood, 1966). With simplicity and artificiality come greater opportunity for control.

This orientation directly supports the objectives of the basic learning or educational psychology researcher whose interests lie in testing the generalized theory associated with treatment strategies, independent of the specific methods used in their administration. Educational technology researchers, however, are directly interested in the interaction of medium and method (Kozma, 1991, 1994; Ullmer, 1994). To learn about this interaction, realistic media applications rather than artificial ones need to be established. In other words, external validity becomes as important a concern as internal validity.

Discussing these issues brings to mind a manuscript that one of us was asked to review about 5 years ago for publication in an educational research journal. The author's intent was to compare, using an experimental design, the effects on learning of programmed instruction and computer-based instruction. To avoid Clark's (1983) criticism of performing a media comparison, i.e., confounding media with instructional strategies (see 23.6), the author decided to make the two "treatments" as similar as possible in all characteristics except delivery mode. This essentially involved replicating the exact programmed instruction design in the CBI condition. Not surprisingly, the findings showed no difference between treatments, a direct justification of Clark's (1983) position. But, unfortunately, this result (or one showing an actual treatment effect as well) would be meaningless for advancing theory or practice in educational technology (see 3.4). By stripping away the special attributes of a normal CBI lesson (e.g., interaction, sound, adaptive feedback, animation, etc.), all that remained were alternative forms of programmed instruction and the unexciting finding, to use Clark's (1983) metaphor, that groceries delivered in different, but fundamentally similar, ways still have the same nutritional value. Needless to say, this study, with its high internal validity but very low external validity, was evaluated as unsuitable for publication. Two more appropriate orientations for educational technology experiments are proposed in the following sections.

39.5.1.1. Randomized Field Experiments. Given the importance of balancing external validity (application) and internal validity (control) in educational technology research, an especially appropriate design is the randomized field experiment (Slavin, 1993) in which instructional programs are evaluated over relatively long periods of time under realistic conditions. In contrast to descriptive or quasiexperimental designs, the randomized field experiment requires random assignment of subjects to treatment groups, thus eliminating differential selection as a validity threat.

For example, Reiser, Driscoll, and Vergara (1987) randomly assigned undergraduate students in a mastery-oriented educational psychology course to one of three treatment groups differing in mastery criteria (ascending difficulty on unit quizzes, descending difficulty, and fixed criterion). At the end of 15 weeks, students completed a comprehensive final examination. Results indicated that those in the fixed criterion group proceeded through the course at the steadiest pace and performed better than the other students.

The obvious advantage of the randomized-field experiment is high external validity. Had Reiser et al. (1987) assigned volunteer subjects to the same treatments for a 1-or 2-hour "experimental" lesson, the actual conditions of learning would have been substantially altered and likely to have yielded different results. On the other hand, the randomized-field experiment concomitantly sacrifices internal validity, since its length and complexity permit interactions to occur with confounding variables. Reiser et al.'s (1987) results, for example, might have been influenced by subjects discussing the study and its different conditions with one another after class (e.g., diffusion of treatments). The experimental results from such studies, therefore, reflect "what really happens" from combined effects of treatment and environmental variables rather than the pure effects of an isolated instructional strategy.

39.5.1.2. Basic-Applied Design Replications. Basic research designs demand a high degree of control to provide valid tests of principles of instruction and learning. Once a principle has been thoroughly tested with consistent results, the natural progression is to evaluate its use in a real-world application. For educational technologists interested in how learners are affected by new technologies, the question of which route to take, basic versus applied, may pose a real dilemma. Typically, existing theory and prior research on related interventions will be sufficient to raise the possibility that further basic research may not be necessary. Making the leap to a real-life application, however, runs the risk of clouding the underlying causes of obtained treatment effects due to their confounding with extraneous variables.

To avoid the limitations of addressing one perspective only, a potentially advantageous approach is to look at both using a replication design. "Experiment 1," the basic research part, would examine the variables of interest by establishing a relatively high degree of control and high internal validity. "Experiment 2," the applied component, would then reexamine the same learning variables by establishing more realistic conditions and high external validity. Consistency of findings across experiments would provide strong convergent evidence supporting the obtained effects and underlying theoretical principles. Inconsistency of findings, however, would suggest influences of intervening variables that alter the effects of the variables of interest when converted from their "pure" form to realistic applications. Such contamination may often represent "media effects," as might occur, for example, when feedback strategies used with print material are naturally made more adaptive (i.e., powerful and effectual) via interactive CBI (see Kozma, 1991). (For example, a learner who confuses discovery learning with inquiry learning in response to an inserted lesson question may be branched immediately to a remedial CBI frame that differentiates between the two approaches, whereas his or her counterpart in a parallel print lesson might experience the same type of feedback by having to reference the response selected on an answer page and manually locate the appropriate response-sensitive feedback in another section of the lesson.) The next implied step of a replication design would be further experimentation on the nature and locus of the altered effects in the applied situation. Several examples from the literature of the basic-applied replication orientation follow.

Example 1. In a repeated-measures experiment conducted by our research group, adult subjects were asked to indicate their preferences for screen designs representing differing degrees of text density (Ross, Morrison, Schultz & 0' Dell, 1989). In one experiment, high internal validity was established by having learners judge only the initial screen of a given text presentation, thus keeping the number of displays across higher- and lower-density variations constant. In realistic lessons, however, using lower-density displays requires the use of additional screens (or more scrolling) to view the content fully. Accordingly, a parallel experiment, having higher external validity but lower internal validity, was conducted in which the number of screens was allowed to vary naturally in accord with the selected density level.

Both experiments produced similar results, supporting higher- over lower-density displays, regardless of the quantity of screens that conveyed a particular density condition. Consequently, we were able to make a stronger case both for the theoretical assumption that higher density would provide greater contextual support for comprehending expository text and for the practical recommendation that such density levels be considered for the design of actual CBI lessons.

Example 2. In the previously described replication design by Winn and Solomon (1993), nonsense syllables were used as verbal stimuli in experiment 1. Findings indicated that the interpretation of diagrams containing verbal labels (e.g., "Yutcur" in box A and "Nipden" in box B) was mainly determined by syntactic rules of English. For example, if box B were embedded in box A, subjects were more likely to select, as an interpretation, "Yutcur are Nipden" then the converse description. However, when English words were substituted for the nonsense syllables (e.g., "sugar" in box A and "spice" in box B) in experiment 2, this effect was overridden by common semantic meanings. For example, "Sugar is spice" would be a more probable response than the converse, regardless of the diagram arrangement. Taken together, the two experiments supported theoretical assumptions about the influences of diagram arrangement on the interpreted meaning of concepts, while suggesting for designers that appropriate diagram arrangements become increasing critical as the meaningfulness of the material decreases.

Example 3. Although using a descriptive rather than experimental design, Grabinger (1993) asked subjects to judge the readability of "model" screens that presented symbolic notation as opposed to real content in different formats (e.g., using or not using illustrations, status bars, headings, etc.) Using multidimensional scaling analysis, he found that evaluations were made along two dimensions: organization and structure. In a second study, he replicated the procedure using real content screens. Results yielded only one evaluative dimension that emphasized organization and visual interest. In this case, somewhat conflicting results from the basic and applied designs required the researcher to evaluate the implications of each relative to the research objectives. The basic conclusion reached was that while the results of study I were free from content bias, the results of study 2 more meaningfully reflected the types of decisions that learners make in viewing CBI information screens.

Example 4. Recently, Morrison et al. (1995) examined uses of different feedback strategies in learning from CBI. Built into the experimental design was a factor representing the conditions under which college student subjects participated in the experiment: simulated or realistic. Specifically, in the simulated condition, the students from selected education courses completed the CBI lesson to earn extra credit toward their course grade. The advantage of using this sample was increased internal validity, given that students were not expected to be familiar with the lesson content (writing instructional objectives) or to be studying it during the period of their participation. In the realistic condition, subjects were students in an instructional media course for which performance on the CBI unit (posttest score) would be computed in their final average.

Interestingly, the results showed similar relative effects of the different feedback conditions; for example, knowledge of correct response (KCR) and delayed feedback tended to surpass no-feedback and answer-until-correct (AUC) feedback. Examination of learning process variables, however, further revealed that students in the realistic conditions performed better, while making greater and more appropriate use of instructional support options provided in association with the feedback. While the simulated condition was valuable as a more basic and purer test of theoretical assumptions, the realistic condition provided more valid insights into how the different forms of feedback would likely be used in combination with other learning resources on an actual learning task.

39.5.2 Assessing Multiple Outcomes In Educational Technology Experiments

The classic conception of an experiment might be to imagine two groups of white rats, one trained in a Skinner Box under a continuous schedule of reinforcement and the other under an intermittent schedule. After a designated period of training, reinforcement (food) is discontinued, and the two groups of rats are compared on the number of trials to extinction. That is, how long will they continue to press the bar even though food is withheld?

In this type of experiment, it is probable that the single dependent measure of "trials" would be sufficient to answer the research question of interest. In educational technology research, however, research questions are not likely to be resolved in so straightforward a manner (see 24. 11). Merely knowing that one instructional strategy produced better achievement than another provides little insight into how those effects occurred or about other possible effects of the strategies. Earlier educational technology experiments, influenced by behavioristic approaches to learning (see 2.2), were often subject to this limitation.

For example, Shettel, Faison, Roshal, and Lumsdaine (1956) compared live lectures and identical film lectures on subjects (Air Force technicians) learning fuel and rudder systems. The dependent measure was immediate and delayed multiple-choice tests on three content areas. Two outcomes were significant, both favoring the live-lecture condition on the immediate test. Although the authors concluded that the films taught the material less well than the "live" lectures, they were unable to provide any interpretation as to why. Observation of students might have revealed greater attentiveness to the live lecture; student interviews might have indicated that the film audio was hard to hear; or a problem-solving test might have shown that application skills were low (or high) under both presentations.

Released from the rigidity of behavioristic approaches, contemporary educational technology experimenters are likely to employ more and richer outcome measures than did their predecessors. Two factors have been influential in promoting this development. One has been the predominance of cognitive learning perspectives in the past 2 decades (Tennyson, 1992; Tennyson & Rasch, 1988; Snow & Lohman, 1989; see also 5.2); the other has been the growing influence of qualitative research methods (see 40. 1).

39.5.2.1. Cognitive Applications. In their comprehensive review paper, Snow and Lohman (1989) discuss influences of cognitive theory to contemporary educational measurement practices. One key contribution has been the expansion of conventional assessment instruments so as to describe more fully the "cognitive character" of the target. Among the newer, cognitively derived measurement applications that are receiving greater usage in research are tests of declarative and procedural knowledge, componential analysis, computer simulations, faceted tests, and coaching methods, to name only a few.

Whereas behavioral theory stressed learning products, such as accuracy and rate, cognitive approaches also emphasize learning processes (Brownell, 1992). The underlying assumption is that learners may appear to reach similar destinations in terms of observable outcomes but take qualitatively different routes to arrive at those points. Importantly, the routes or "processes" used determine the durability and transferability of what is learned (Mayer, 1989). Process measures may include such variables as the problem-solving approach employed, level of task interest, resources selected, learning strategies used, and responses made on task. At the same time, the cognitive approach expands the measurement of products to include varied, multiple learning outcomes such as declarative knowledge, procedural knowledge, long-term retention, and transfer (Tennyson & Rasch, 1988).

This expanded approach to assessment is exemplified in a recent experiment by Hicken, Sullivan, and Klein (1992). The focus of the study was comparing two types of learner control ("FullMinus" vs. "LeanPlus" under two conditions of incentives (performance contingent vs. task contingent). In the FullMinus condition, learners could selectively bypass elements of a full instructional program, whereas in the LeanPlus condition, they could opt to add elements to a core program. Degree of learning, assessed via a posttest on the unit studied, reflected advantages for FullMinus learner control and performance-contingent incentives. This information alone, however, would have provided little insight into why those strategies were effective. Accordingly, Hicken et al. also examined learner-control option use (i.e., optional examples, practice examples, and review screens), which showed that FullMinus subjects used 80% of the options, whereas LeanPlus subjects used only 37%. Apparently, learners, given individual control over instruction, are inclined to choose the "default 'option, which in the case of FullMinus produces exposure to higher levels of instructional support and, in turn, better learning. Further analyses showed typical patterns of option use by learners in the four conditions, time spent on the overall program and on option usage, and student attitudes. Using these multiple outcome measures, the researchers acquired a comprehensive perspective on how processes induced by the different strategies culminated in the learning products obtained.

Use of special assessments that directly relate to the treatment is illustrated in a study by Shin, Schallert, and Savenye (1994). Both quantitative and qualitative data were collected to determine the effectiveness of leaner control with elementary students who varied in prior knowledge. An advisement condition that provided the subject with specific directions as to what action to take next was also employed. Quantitative data collected consisted of both immediate and delayed posttest scores, preferences for the method, self-ratings of difficulty, and lesson completion time. The qualitative data include an analysis of the path each learner took through the materials. This analysis revealed that nonadvisement students became lost in the hypertext "maze" and often went back and forth between two sections of the lessons as though searching for a way to complete the lesson. In contrast, students who received advisement used the information to make the proper decisions regarding navigation more than 70% of the time. Based on the qualitative analysis, they concluded that advisement (e.g., orientation information, what to do next) was necessary when learners can freely access (e.g., learner control) different parts of the instruction at will. They also concluded that advisement was not necessary when the program controlled access to the instruction.

Another example of multiple and treatment-oriented assessments is found in Neuman's (1994) study on the applicability of databases for instruction. Neuman used observations of the students using the database, informal interviews, and document analysis (e.g., review of assignment, search plans, and search results). This triangulation of data provided information on the design and interface of the database. If the data collection were limited to only the number of citations found or used in the students' assignment, the results might have shown that the database was quite effective. Using a variety of sources allowed the researcher to make specific recommendations for improving the database rather than simply concluding that it was beneficial or not.

39.5.2.2. Qualitative Research. In recent years, educational researchers have shown increasing interest in qualitative research approaches (see 40.2). Such research involves naturalistic inquiries using techniques such as in-depth interviews, direct observation, and document analysis (Patton., 1990). Unfortunately, in recalling our personal experiences at recent AECT meetings, the reaction by some researchers has been to view quantitative and qualitative paradigms as competing or even mutually exclusive (see Bruner, 1990). Our position, in congruence with what is likely the majority opinion (albeit a silent one at times), is that quantitative and qualitative research are each more useful when used together than when used alone (Warwick, 1990, as cited by Peshkin, 1993). Both provide unique perspectives, which, when combined, are likely to yield a richer and more valid understanding.

Presently, in educational technology research, experimentalists have been slow to incorporate qualitative measures as part of their overall research methodology. To illustrate how such an integration could be useful, we recall conducting an editorial review of a manuscript submitted for publication in ETR&D by Klein and Pridemore (1992). The focus of their study was the effects of cooperative learning and need for affiliation on performance and satisfaction in learning from instructional television. Findings showed benefits for cooperative learning over individual learning, particularly when students were high in affiliation needs. While we and the reviewers evaluated the manuscript positively, a shared criticism was the lack of data reflecting the nature of the cooperative interactions. It was felt that such qualitative information would have increased understanding of why the treatment effects obtained occurred. Seemingly, the same recommendation could be made for nearly any applied experiment on educational technology uses. The following excerpt from the published version of Klien and Pridemore (1992) illustrates the potential value of this approach:

... observations of subjects who worked cooperatively suggested that they did, in fact, implement these directions [to work together, discuss feedback, etc.]. After each segment of the tape was stopped, one member of the dyad usually read the practice question aloud. If the question was unclear to either member, the other would spend time explaining it ... [in contrast to individuals who worked alone] read each question quietly and would either immediately write their answer in the workbook or would check the feedback for the correct answer. These informal observations tend to suggest that subjects who worked cooperatively were more engaged than those who worked alone (p. 45).

Qualitative and quantitative measures can thus be used collectively in experiments to provide complementary perspectives on research outcomes.

39.5.3 Item Responses vs. Aggregate Scores as Dependent Variables

Consistent with the "expanded assessment" trend, educational technology experiments are likely to include dependent variables consisting of one or more achievement (learning) measures, attitude measures, or a combination of both types. In the typical case, the achievement or attitude measure will be a test comprised of multiple items. By summing item scores across items, a total or "aggregate" score is derived. To support the validity of this score, the experimenter may report the test's internal-consistency reliability (computed using Cronbach's alpha or the KR-20 formula) or some other reliability index. Internal consistency represents "equivalence reliability'!--the extent to which parts of a test are equivalent (Wiersma & Jurs, 1985). Depending on the situation, these procedures could prove limiting or even misleading with regard to answering the experimental research questions.

A fundamental question to consider is whether the test is designed to measure a unitary construct (e.g., ability to reduce fractions or level of test anxiety) or multiple constructs (e.g., how much students liked the lesson and how much they liked using a computer). In the latter cases, internal consistency reliability might well be low, because students vary in how they perform or how they feel across the separate measures. Specifically, there may be no logical reason why good performances on, say, the "math facts" portion of the test should be highly correlated with those on the problem-solving portion (or why reactions to the lesson should strongly correlate with reactions to the computer). It may even be the case that the treatments being investigated are geared to affect one type of performance or attitude more than another. Accordingly, one caution is that, where multiple constructs are being assessed by design, internal-consistency reliability may be a poor indicator of construct validity. More appropriate indices would assess the degree to which: (a) items within the separate subscales intercorrelate (subscale internal consistency), (b) the makeup of the instruments conforms with measurement objectives (content validity), (c) students answer particular questions in the same way on repeated administrations (test-retest reliability), and (d) subscale scores correlate with measures of similar constructs or identified criteria (construct or predictive validity).

Separate from the test validation issue is the concern that aggregate scores may mask revealing patterns that occur across different subscales and items. We will explore this issue further by examining some negative and positive examples from actual studies.

39.5.31. Aggregating Achievement Results. Recently, we evaluated a manuscript for publication which described an experimental study on graphic aids. The main hypothesis was that such aids would primarily promote better understanding of the science concepts being taught. The dependent measure was an achievement test consisting of factual (fill-in die-blank), application (multiple-choice and short answer), and problem-solving questions. The analysis, however, examined total score only in comparing treatments. Because the authors had not recorded subtest scores and were unable to rerun the analysis to provide such breakdowns (and, thereby, directly address the main research question), the manuscript was rejected.

39-5.3-2. Aggregating Attitude Results. More commonly, educational technology experimenters commit comparable oversights in analyzing attitude data. When attitude questions concern different properties of the learning experience or instructional context, it may make little sense to compute a total score, unless there is an interest in an overall attitude score. For example, in a study using elaborative feedback as a treatment strategy, students may respond that they liked the learning material but did not use the feedback. The overall attitude score would mask the latter, important finding.

For a brief illustration, we recall a manuscript recently submitted to ETR&D in which the author reported only aggregate results on an postlesson attitude survey. When the need for individual item information was requested, the author replied, "the KR-20 reliability of die scale was .84; therefore, all items are measuring the same thing." While high internal consistency reliability implies that the items are "pulling in the same direction," it does not also mean necessarily that all yielded equally positive responses. For example, as a group, learners might have rated the lesson material very high, but the instructional delivery very low. Such specific information might have been useful in furthering understanding of why certain achievement results occurred.

Effective reporting of item results was done by Welsh, Murphy, Duffy, and Goodrum (1993) in investigating the effects of different link displays for accessing information in a hypermedia system. The three displays were (a) an arrow indicating that some type of elaboration was available; (b) six unique icons, each designating a different elaboration type; and (c) a submenu structure. In addition to an assessment of the number and type of elaborations students accessed, one of the dependent measures was an attitude measure of "ease of reading." The analysis of total attitude scores showed, as the only trend, a predictable preference for the submenu (since the display was less cluttered). Individual item results further revealed that participants in general tended to agree with the statements, "The arrows that I clicked on in the text were distracting," and "A computer text screen without arrows would have been easier to read." The authors concluded on this basis that, regardless of the link strategy used, novice users of hypermedia are initially distracted by unfamiliar symbols embedded in the text (p. 31). More insight into user experiences was thus obtained relative to examining the aggregate score only. It is important to keep in mind, however, that the multiple statistical tests resulting from individual item analyses can drastically inflate the chances of making a type I error (falsely concluding that treatment effects exists). Usage of appropriate statistical controls, such as MANOVA (see Table 39- 1) or a reduced alpha (significance) level, is required.

39.5.4 Media Studies vs. Media Comparisons

As confirmed by our analysis of trends in educational technology experimentation, a popular focus of the past was comparing different types of media-based instruction to one another or to teacher-based instruction to determine which approach was "best." The fallacy or, at least, unreasonableness of this orientation, now known as "media comparison studies," was forcibly explicated by Clark (1983) in his now classic article (see also Hagler & Knowlton, 1987; Petkovich & Tennyson, 1984; Ross & Morrison, 1989; Salomon & Clark, 1977). As previously discussed, in that paper, Clark argued that media were analogous to grocery trucks that carry food but do not in themselves provide nourishment (i.e., instruction). It, therefore, makes little sense to compare delivery methods when instructional strategies are the variables that impact learning.

For present purposes, these considerations present a strong case against experimentation that simply compares media. Specifically, two types of experimental designs seem particularly unproductive in this regard. One of these represents treatments as amorphous or "generic" media applications, such as CBI, interactive video, Personalized System of Instruction, lecture, and the like. The focus of the experiment then becomes which medium "produces" the highest achievement. The obvious problem with such research is the confounding of results with numerous media attributes. For example, because CBI may offer immediate feedback, animation, and sound, while a print lesson may not, differences in outcomes from the two types of presentations would be expected to the extent the differentiating attributes impact criterion performance.

A second type of inappropriate media comparison experiment is to create artificially comparable alternative media presentations, such that both variations contain identical attributes but use different modes of delivery. In an earlier section, we described a study in which CBI and a print manual were used to deliver the identical programmed instruction lesson. The results, which predictably showed no treatment differences, revealed little about CBI's capabilities as a medium compared to those of print lessons. Similarly, to learn about television's "effects" as a medium, it seems to make more sense to use a program like Sesame Street as an exemplar (see, e.g., Reiser et al., 1988) than a "talking head" from a taped, unedited lecture. (This does not mean, however, that comparing the talking head to a live head would be inappropriate in an evaluation study of a particular instructional program that uses taped lectures.)

So where does this leave us with regard to experimentation on media differences? We propose that researchers consider two related orientations for "media studies." Both orientations involve conveying media applications realistically, whether "conventional" or "ideal" (cutting edge) in form. Both also directly compare educational outcomes from the alternative media presentations. However, as will be explained below, one orientation is deductive in nature and the other is inductive.

39.5.4.1. Deductive Approach: Testing Hypotheses about Media Differences. In this first approach, the purpose of the experiment is to test a priori hypotheses of differences between the two media presentations based directly on analyses of their different attributes (see Kozma, 1991, 1994). For example, it might be hypothesized that for teaching an instructional unit on a cardiac surgery procedure, a conventional lecture presentation might be superior to an interactive video presentation for facilitating retention of factual information, whereas the converse would be true for facilitating meaningful understanding of the procedure. The rationale for these hypotheses would be directly based on analyses of the special capabilities (embedded attributes or instructional strategies) of each medium in relation to the type of material taught. Findings would be used to support or refute these assumptions.

An example of this a priori search for media differences is the recent study by Aust, Kelley, and Roby (1993) on "hypereference" (on line) and conventional paper dictionary use in foreign-language learning. Because hypereferences offer immediate access to supportive information, it was hypothesized and confirmed that learners would consult such dictionaries more frequently and with greater efficiency than they would conventional dictionaries.

39.5.4.2. Inductive Approach: Replicating Findings Across Media. The second type of study, which we have called media replications (Ross & Morrison, 1989), examines the consistency of effects of given instructional strategies delivered by alternative media. Consistent findings, if obtained, are treated as corroborative evidence to strengthen theoretical understanding of the instructional variables in question as well as claims concerning the associated strategy's effectiveness for learning. If inconsistent outcomes are obtained, methods and theoretical assumptions are reexamined and the target strategy subjected to further empirical tests using diverse learners and conditions. Key interests are why results were better or worse with a particular medium and how the strategy might be more powerfully represented by the alternative media. Subsequent developmental research might then explore ways of incorporating the suggested refinements in actual systems and evaluating those applications. In this manner, media replication experiments use an inductive, post hoc procedure to identify media attributes that differentially impact learning. At the same time, they provide valuable generalizability tests of the effects of particular instructional strategies.

The continuing debate on media effects (Clark, 1983, 1994; Kozma, 1994) is important for sharpening conceptualization of the role of media in enhancing instruction. However, Clark's focal argument that media do not affect learning should not be used as a basis for discouraging experimentation that compares educational outcomes using different media. In the first orientation reviewed above, the focus of the experiment is hypothesized effects on learning of instructional strategies embedded in media. In the second orientation, the focus is the identified effects of media in altering how those strategies are conveyed. In neither case is the medium itself conceptualized as the direct cause of learning. In both cases, the common goal is increasing theoretical and practical understanding of how to use media more effectively to deliver instruction.


Updated August 3, 2001
Copyright © 2001
The Association for Educational Communications and Technology

AECT
1800 North Stonelake Drive, Suite 2
Bloomington, IN 47404

877.677.AECT (toll-free)
812.335.7675

AECT Home Membership Information Conferences & Events AECT Publications Post and Search Job Listings