22.5 Aptitudes, on-task performance, and response sensitive adaptation

22: Adaptive Instructional Systems
PDF

22.1	Adaptive instructional systems: three approaches
22.2	Macro-adaptive instructional systems
22.3	Macro-adaptive instructional models
22.4	Micro-adaptive instructional models
22.5	Attitudes, on-task performance, and response-sensitive adaptation
22.6	Interactive communication in adaptive instruction
22.7	A model of adaptive instructional systems
22.8	Conclusion
	References

22.4 Micro-adaptive instructional models

Although the research evidence has failed to show the advantage of the ATI approach for the development of adaptive instructional systems, research for finding aptitude constructs relevant to learning, learning and instructional strategies, and their interactions continue. However, the outlook is not optimistic for the development of a comprehensive ATI model or set of principles for developing adaptive instruction that is empirically traceable and theoretically coherent in the near future. Thus, some researchers have attempted to establish micro-adaptive instructional models using on-task measures rather than pretask measures. On-task measures of student behavior and performance, such as response errors, response latencies, and emotional states, can be valuable sources for making adaptive instructional decisions during the instructional process. Such measures taken during the course of instruction can be applied to the manipulation and optimization of instructional treatments and sequences on a much more refined scale (Federico, 1983). Thus, micro-adaptive instructional models using on-task measures are likely to be more sensitive to the student's needs.

A typical example of micro-adaptive instruction is one-on-one tutoring. 'Me tutor selects the most appropriate information to teach based on his or her judgment of the student's learning ability, including prior knowledge, intellectual ability, and motivation. Then, the tutor continuously monitors and diagnoses the student's learning process and determines the next instructional actions. The instructional actions could be questions, feedback, explanations, or others that maximize the student's learning. Although the instructional effect of one-on-one tutoring has been fully recognized for a long time and empirically proven (Bloom, 1984; Kulik, 1982), few systematic guidelines have been developed. That is, most tutoring activities are determined by the tutor's intuitive judgments about the student's learning needs and ability for the given task. Also, one-on-one tutoring is virtually impossible for most educational situations because of the lack of both qualified tutors and resources.

As the one-on-one tutorial process suggests, the essential element of micro-adaptive instruction is the ongoing diagnosis of the student's learning needs and the prescription of instructional treatments based on the diagnosis. Holland (1977) emphasized the importance of the diagnostic and prescriptive process by defining adaptive instruction as a set of processes by which individual differences in student needs are diagnosed in an attempt to present each student with only those teaching materials necessary to reach proficiency in the terminal objectives of instruction. Landa (1976) also said that adaptive instruction is the diagnostic and prescriptive processes aimed at adjusting the basic learning environment to the unique learning characteristics and needs of each learner. According to Rothen and Tennyson (1978), the diagnostic process should assess a variety of learner indices (e.g., aptitudes and prior achievement) and characteristics of the learning task (e.g., difficulty level, content structure, and conceptual attributes). Hansen, Ross, and Rakow (1977) described the instructional prescription as a corrective process that facilitates a more appropriate interaction between the individual learner and the targeted learning task by systematically adapting the allocation of learning resources to the learner's aptitudes and recent performance.

Instructional researchers or developers have different views about the variables, indices, procedures, and actions that should be included in the diagnostic and the prescriptive processes. For example, Atkinson (1976) says that an adaptive instructional system should have the capability of varying the sequence of instructional action as a function of a given learner's performance history. According to Rothen and Tennyson (1977), a strategy for selecting the optimal amount of instruction and time necessary to achieve a given objective is the essential ingredient in an adaptive instructional system. This observation suggests that different adaptive systems have been developed to adapt different features of instruction to learners in different ways.

Micro-adaptive instructional systems have been developed through a series of different attempts beginning with programmed instruction to the recent application of artificial intelligence (AI) methodology for the development of intelligent tutoring systems (ITS) (see 19.3 to 19.5).

22.4. 1 Programmed Instruction

Skinner has generally been considered the pioneer of programmed instruction (see 2.3.4). However, 3 decades earlier than Skinner (1954, 1958), Pressey (1926) used a mechanical device to assess a student's achievement and to provide further instruction in the teaming process. The mechanical device, which used a keyboard, presented a series of multiple-choice questions, and required the student to respond by pressing the appropriate key. If the student pressed the correct key to answer the question, the device would present the next question. However, if the student pressed a wrong key, the device would ask the student to choose another answer without advancing to the next question. Using Thorndike's (1913) "Law of Effect" as the theoretical base for the teaching methodology incorporated in his mechanical

device, Pressey (1927) claimed that its purpose was to ensure mastery of a given instructional objective. If the student correctly answered two questions in succession, mastery was accomplished, and no additional questions were given. The device also recorded responses to determine whether the student needed more instruction (further questions) to master the objective. According to Pressey, this made use of a modified form of Thorndike's "Law of Exercise." Little's (1934) study demonstrated the effectiveness of Pressey's testing-drill device against a testing-only device.

Skinner (1954) criticized Pressey's work by stating that it was not based on a thorough understanding of learning behavior. However, Pressey's work contained some noticeable instructional principles (see 2.3.4.2). First, he brought the mastery learning concept into his programmed instructional device, although the determination of mastery was arbitrary and did not consider measurement or testing theory. Second, he considered the difficulty level of the instructional objectives, suggesting that more difficult objectives would need additional instructional items (questions) for the student to reach mastery. Finally, his procedure exhibited a diagnostic characteristic in that, although the criterion level was based on intuition, he determined from the student's responses whether or not more instruction was needed.

Using Pressey's (1926, 1927) basic idea, Skinner (1954, 1958) designed a teaching machine to arrange contingencies of reinforcement in school learning (see 2.3.4. 1). The instructional program format used in the teaching machine had the following characteristics: (a) It was made up of small, relatively easy-to-learn steps; (b) the student had an active role in the instructional process; and (c) positive reinforcement was given immediately following each correct response. In particular, Skinner's (1968) linear programmed instruction emphasized an individually different learning rate. However, the programmed material itself was not individualized since all students received the same instructional sequence (Cohen, 1963). In 1959, Pressey criticized this nonadaptive nature of the Skinnerian programmed instruction.

The influx of technology influenced Crowder's (1959) procedure of intrinsic programming with provisions for branching able students through the same material more rapidly than slower students, who received remedial frames whenever a question was missed (see 2.3.4.2). Crowder's intrinsic program was based totally on the nature of the student's response. The response to a particular frame was used both to determine whether the student learned from the preceding material and to determine the material to be presented next. The student's response was thought to reflect her or his knowledge rate, and the program was designed to adapt to that rate. Having provided only a description of his intrinsic programming, however, Crowder revealed no underlying theory or empirical evidence that could support its effectiveness against other kinds of programmed instruction. Because of the difficulty in developing tasks that required review sections for each alternative answer, Crowder's procedure was not widely used in instructional situations (Merrill, 197 1).

In 1957, Pask described a perceptual motor training device in which differences in task difficulty were considered for different learners. The instructional target was made progressively more difficult until the student made an error, at which point the device would make the target somewhat easier to detect. From that point, the level of difficulty would build again. Remediation consisted of a step backward on a difficulty dimension to provide the student with further practice on the task. Pask's (1960a, 1960b) Solartron Automatic Key-board Instructor (SAKI) was capable of electronically measuring the student's performance and storing it in a diagnostic history that included response latency, error number, and pattern. On the basis of this diagnostic history, the machine prescribed exercises to .be presented next and varied the rate and amount of material to be presented in accordance with the proficiency. Lewis and Pask (1965) demonstrated the effectiveness of Pask's device by testing the hypothesis that adjusting difficulty level and amount of practice would be more effective than adjusting difficulty level alone. Though the application of the device was limited to instruction of perceptual motor tasks, Pask (1960a) described a general framework for the device which included instruction of conceptual as well as perceptual motor tasks.

As described above, most of early programmed instruction methods relied primarily on intuition of the school learning process rather than on a particular model or theory of learning, instruction, or measurement. Although some of the methods were designed on a theoretical basis (for example, Skinner's teaching machine), they were primitive in terms of the adaptation of the learning environment to the individual differences of students. However, programmed instruction did provide some important implications for the development of more sophisticated instructional strategies made possible by the advance in computer technology.

22.4.2 Micro-Adaptive Instructional Models

Using computer technology, a number of micro-adaptive instructional models have been developed. An adaptive instructional model differs from programmed instruction techniques in that it is based on a particular model or theory of learning, and its adaptation of the learning environment is rather sophisticated, while the early programmed instruction was primarily based on intuition and its adaptation was primitive. Unlike macro-adaptive models, the micro-adaptive model uses the temporal nature of learner abilities and characteristics as a major source of diagnostic information on which an instructional treatment is prescribed. Thus, an attribute of a micro-adaptive model is its dynamic nature as contrasted with a macro-adaptive model. A typical microadaptive model includes more variables related to instruction than a macro-adaptive model or programmed instruction. It thus provides a better control process than a macro-adaptive model or programmed instruction in responding to the student's performance in reference to type of content and behavior required in a learning task (Merrill & Boutwell, 1973).

As described by Suppes, Fletcher, and Zanottie (1976), most micro-adaptive models use a quantitative representation and trajectory methodology. The most important feature of a micro-adaptive model relates to the timeliness and accuracy with which it can determine and adjust learning prescriptions during instruction. A conventional instructional method identifies how the student answers but does not identify the reasoning process that leads the student to that answer. An adaptive model, however, relies on different processes that lead to given outcomes. Discrimination between the different processes is possible when on-task information is used. The importance of the adaptive model is not that the instruction can correct each mistake, but, that it attempts to identify the psychological cause of mistakes and thereby lower the probability that such mistakes will occur again.

Several examples of micro-adaptive models are described in the following section. Although some of these models are a few decades old, an attempt was made to provide a rather detailed review because the theoretical bases and technical (nonprogramming) procedures used in these models are still relevant and valuable in identifying research issues related to adaptive instruction and in designing future adaptive systems. Particularly, having considered that some theoretical issues and ideas proposed in these models could not be fully explored because of the lack of computer power at that time, the review may provide some valuable research and development agenda.

22A.2.1. Mathematical Model. According to Atkinson (1972), an optimal instructional strategy must be derived from a model of learning. In mathematical learning theory, two general models describe the learning process: a linear (or incremental) model and an all-or-none (or one element) model. From these two models, Atkinson and Paulson (1972) deducted three strategies for prescribing the most effective instructional sequence for a few special subjects, like foreign-language vocabulary (Atkinson, 1968, 1974, 1976; Atkinson & Fletcher, 1972).

In the linear model, learning is defined as the gradual reduction in probability of error by repeated presentations of the given instructional items. The strategy in this model orders the instructional materials without taking into account the student's responses or abilities, since it is assumed that all students learn with the same probability. Because the probability of student error on each item is determined in advance, prediction of his or her success depends only on the number of presentations of the items.

In the all-or-none model, learning an item is not all gradual but occurs on a single trial. An item is in one of two states, a learned state or an unlearned state. If an item in the learned state is presented, the correct response is always given; however, if an item in the unlearned state is presented, an incorrect response is given unless the student makes a correct response by guessing. The optimal strategy in this model is to select for presentation the item least likely to be in the learned state, because once an item has been learned, there is no further reason to present it again. If an item in the unlearned state is presented, it changes to the learned state with a probability that remains constant throughout the procedure. Unlike the strategy in the linear model, this strategy is response sensitive. A student's response protocol for a single item provides a good index of the likelihood of that item's being in the learned state (Groen & Atkinson, 1966). This response-sensitive strategy used a dynamic programming technique (Smallwood, 1962). The dynamic programming technique is a method for finding an optimal strategy by systematically varying the number of learning stages and obtaining an expression that gives the return for a process with n stages as a function of the return for a process with n - I stages. The operational function in a deterministic process of the all-or-none strategy is Wn = T(Wn - 1, dn - 1), where W is the student's state in learning, n is the stage, and d is the decision (Groen & Atkinson, 1966). In this strategy, the items should be presented at well-spaced intervals, because it is not effective under the condition of massed presentation (Dear, Silberman, Estavan & Atkinson, 1967).

This all-or-none model does not depend on the values of its parameters. In other words, such parameters as the probability of changing an item in the unlearned state to the learned state with a presentation of the item, probability of making a correct response by guessing, and the initial error probability are constant throughout all stages because of the assumption that items and students are homogeneous.

On the basis of Norman's (1964) work, Atkinson and Paulson (1972) proposed the random-trial incremental model, a compromise between the linear and all-or-none models. The instructional strategy derived for this model is parameter dependent, allowing the parameters to vary with student abilities and item difficulty. This strategy determines which item, if presented, has the best expected immediate gain, using a reasonable approximation (Calfee, 1970). The initial parameter of a subject-item (7rij) is estimated with an analysis of variance model: E(7rij) = m = ai + dj, where m is the mean, ai is the probability of student i, and dj is the difficulty of item j. Because this equation cannot generate the probabilities of the parameters (0 < 7rij < 1), it is transformed in a logistic equation through an algebraic operation and a logarithmic procedure (see Atkinson & Paulson, 1972). The logistic equation is logit 7rij = ~t + Ai + Dj, where 1i is die mean, Ai is the ability of student i applied across all items, and Dj is the difficulty of item - i applied across all students. This logistic equation reduces the number of parameters for N items X S subjects to N + S parameters. The subject and item effects (Ai and Dj are estimated by standard analysis of variance procedures. Using the estimated student and item effects, the logit pij is transformed back to obtain the final estimates of the original student-item parameter.

Atkinson and Crothers (1964) assumed that the all-or-none model provided a better account of data than the linear model, and that the random-trial increments model was better than either of diem. This assumption was supported by testing the effectiveness of the strategies (Atkinson, 1976).

The all-or-none strategy was more effective than the standard linear procedure for spelling instruction, while the parameter-dependent strategy was better than the all-or-none strategy for teaching foreign vocabularies (Lorton, 1972).

In the context of instruction, cost-benefit analysis is one of the key elements in a description of the learning process and determination of instructional actions (Atkinson, 1972). In the mathematical adaptive strategies, however, it is assumed that the costs of instruction are equal for all strategies, since the instructional formats and the time allocated to instruction are all the same. If both costs and benefits are significantly variable in a problem, then it is essential that both quantities be estimated accurately. Smallwood (1970, 1971) treated this problem by including a utility function into -the mathematical model. The utility function, Q, (k, h), specifies the immediate value accrued if alternative a is presented to a student with response history (h ), and k is the response elicited. The terminal utility function is Ujh), which describes the utility associated with terminating the instruction for a student with a past history h. Smallwood's (197 1) economic teaching strategy is a special form of the all-or-none model strategy, except that it can be applied for an instructional situation in which the instructional alternatives have different costs and benefits.

Recently, Townsend (1992) and Fisher and Townsend (1993) applied a mathematical model to the development of a computer simulation and testing system for predicting the probability and duration of student responses in the acquisition of Morse code classification skills. The mathematical adaptive model, however, has never been widely used, probably because the learning process in the model is oversimplified and the applicability is limited to a relatively simple range of instructional contents.

There are criticisms of the mathematical adaptive instructional models. First, the learning process in the mathematical model is oversimplified when implemented in a practical teaching system. Yet it may not be so simple to quantify the transition probability of a learning state and the response probabilities that are uniquely associated with the student's internal states of knowledge and with the particular alternatives for presentation (Glaser, 1976). Although quantitative knowledge can be ' obtained about how the variables in the model interact, reducing computer decision time has little overall importance if the system can handle only a limited range of instructional materials and objectives, such as foreign-language vocabulary items (Gregg, 1970). Also, the two-state or three-state or n-state model cannot be arbitrarily chosen because the values for transitional probabilities of a learning state can change depending on how one chooses to aggregate over states. The response probabilities may not be assumed equally likely in a multiple-choice test question. This kind of assumption would hold only for homogeneous materials and highly sophisticated preliminary item analyses (Gregg, 1970).

Another disadvantage of the mathematical adaptive model is that its estimates for the instructional diagnosis and prescription cannot be reliable until a significant amount of student and content data are accumulated. For example, the parameter-dependent strategy supposes to predict the performance of other students or the same student on other items from the estimates computed by the logistic equation. However, the first students in an instructional program employing this strategy do not benefit from the program's sensitivity to individual differences in students or items because the initial parameter estimates must be based on data from these students. Thus, the effectiveness of this strategy is questionable unless the instructional program continues over a long period of time.

Atkinson (1972) admitted that the mathematical adaptive models are very simple, and the identification of truly effective strategies will not be possible until the learning process is better understood. However, Atkinson (1972, 1976) contended that an all-inclusive theory of learning is not a prerequisite for the development of optimal procedures. Rather, a model is needed that captures the essential features of that part of the learning process being tapped by a given instructional task.

22A.2.2. Trajectory Model: Multiple Regression Analysis Approach. In a typical adaptive instructional program, the diagnostic and prescriptive decisions are frequently made based on the estimated contribution of one or two particular variables. The possible contributions of other variables are ignored. In a trajectory model, however, numerous variables can be included with the use of a multiple regression technique to yield what may be a more powerful and precise predictive base than is obtained by considering a particular variable alone.

The theoretical view in the trajectory model is that the expected course of the adaptive instructional trajectory is determined primarily by generic or trait factors that define the student group. The actual proceeding of the trajectory is dependent on the specific effects of individual learner parameters and variables derived from the task situation (Suppes, Fletcher & Zanotti, 1976). Using this theoretical view, Hansen, Ross, and Rakow (1977; Ross & Rakow, 1982; Ross & Morrison, 1988) developed an adaptive model that reflects both group and individual indices and matches them to appropriate changes both for predictions on entry and adjustments during the treatment process. The model was developed to find an optimal strategy for selecting the appropriate number of examples in a mathematical rule-learning task.

The procedures Hansen et al. used to develop an adaptive system using the trajectory model are as follows:

(a) Learning and test materials were prepared (for example, the instructional unit consisted of 10 basic algebra rules) and the predictive input database was obtained from two measures of personality variables (locus of control and trait anxiety), one measure of general aptitude related to the task (math and verbal), and one measure of a subject familiarity (pretest). Upon completion of the pretest, the subject was given the programmed manual and task instructions. After working through the manual, the student took the posttest, which was matched to the pretest in the number of items, format, and level of difficulty. The measures of the four entry variables and the posttest score provided the predictive database for the formulation of adaptive grouping.

(b) With the cluster analysis technique, students who had similar characteristics according to the predictive database are clustered in one of a reasonably small number of mutually exclusive groups. The purpose of grouping was to aggregate students so that those within a group were relatively homogeneous among themselves and relatively different from students in other groups. Hansen et al. assumed that for instructional purposes, approximately three to five groups best characterize the cultural and psychological characteristics of the group which are to be differentially treated.

(c) The new students who would receive the adaptive treatments were classified into one of the groups by discriminant analysis. This is a method used to seek the linear combination of variables that will maximize the difference between the groups relative to the difference within the groups.

(d) Multiple regression analysis was used to derive differential predictions about the number of instructional items (examples) to assign to the student. From regression equations based on group parameter characteristics, initial performance estimates were derived for all subjects. In order to derive a decision rule for converting the performance estimate on the test into the prescription example number, a quasi-standard score (Z score) procedure was employed. To systematize matching of the Z score to example prescriptions, the latter were treated as whole numbers on a score continuum having a median number and a range from the minimum to the maximum number of examples for learning each rule. For example, minimum = 2, median = 6, and maximum = 10. A student who had a predicted score close to the mean (Z = 0) on a given rule received a prescription of the median number of examples. If the student was predicted to be performing below or above the mean, he or she received more or fewer examples. For example, for a student who was below one standard deviation from the mean, nine examples were given. This decision rule was arbitrary.

(e) The initial prescription derived from the group characteristics were redefined during instruction on the basis of the student's performance on the immediately preceding rule posttest (termed minitest). The decision rule employed in making this refinement was again arbitrary. For example, two examples were added following the rule prescription for a minitest score of 0; one example was added for a minitest score of 1; one example was subtracted for a minitest score of 3; two examples were subtracted for a minitest score of 4; no adjustment was made for a minitest score of 2. These adjustments were made only on the next rule in the sequence. To maintain the arbitrarily established boundaries of minimum and maximum, the prescriptions were not limited to vary beyond the minimum or maximum number of examples regardless of the minitest performance.

Hansen et al. (1977) assessed their trajectory adaptive model with a validation study that supported the basic tenets of the model. A desirable number of groups (four) with differential characteristics was found, and the outcomes were as predicted: superior for the adaptive group, highly positive for the cluster group, and poorest for the mismatched groups. The outcome of regression analysis revealed that the pretest yielded the largest amount of explained variance within the regression coefficient. The math reading comprehension measures seemed to contribute to the assignment of the broader skill domain involved m the learning task. However, the two personality measures varied in terms of directions as well as magnitude.

This regression model is apparently helpful in estimating the relative importance of different variables for instruction. However, it does not seem to be a very useful adaptive instructional strategy. Even though many variables can be included in the analysis process, the evaluation study results indicate that only one or two are needed in the instructional prescription process because of the inconsistent or negligible contribution of other variables to the instruction. Unless the number of students to be taught is large, this approach cannot be effective since die establishment of the predictive database in advance requires a considerable number of students, and this strategy cannot be applied to those students who make up the initial database. Furthermore, a new predictive database has to be established whenever the characteristics of the learning task are changed. Transforming the student's score, as predicted from the regression equation, into the necessary number of examples does not have strong justification when a quasi-standard score procedure is used. The decision rules for adjustment of instructional treatment during on-task performance as well as for the initial instructional prescription are entirely arbitrary. Since regression analyses are based on group characteristics, shrinkage of the degrees of freedom due to reduced sample size may raise questions about the value of this approach.

To offset the shortcoming of the regression model that is limited to the adaptation of instructional amount (e.g., selection of the number of examples in concept or rule learning), Ross and Morrison (1988) attempted to expand its functional scope by adding the capability for selecting the appropriate instructional content based on the student's interest and other background information. This contextual adaptation was based on empirical research evidence that the personalized context based on an individual student's interest and orientation facilitates the student's understanding of the problem and learning of the solution. A field study demonstrated the effectiveness of the contextual adaptation (Ross & Anand, 1986).

Ross and Morrison (1988) further extended their idea of contextual adaptation by allowing the system to select different densities (or "detailedness") of textual explanation based on the student's predicted learning needs. The predicted learning needs were estimated using a multiple regression model described above. A preliminary evaluation study showed the superior effect of the adaptation of contextual density over a standard contextual density condition or learner-control condition.

The Ross and Morrison's approaches for the contextual adaptation alone cannot be considered micro-adaptive systems because they do not have the capability of performing the ongoing diagnosis and prescription generation during the task performance. Their diagnostic and prescriptive decisions are made on the basis of preinstructional data. The contextual adaptation approach, however, can be a significant addition to a micro-adaptive model like the regression analysis approach that has a limited function for adapting the quality of instruction, including the content. Although we presume that the contextual adaptation approaches were originally developed with the intent to be incorporated in the regression analysis model, the incorporation has not yet been fully accomplished.

22.4.2.3. Bayesian Probability Model. The Bayesian probability model employs a two-step approach for adapting instruction to individual students. After the initial assignment of the instructional treatment is made on the basis of preinstructional measures (e.g., pretest scores), the treatment prescription is continuously adjusted according to student on-task performance data.

To operationalize this approach in CBI, a Bayesian statistical model was used. The Bayes's theorem of conditional probability seems appropriate for the development of an adaptive instructional system because it can predict the probability of mastery of the new learning task from student preinstructional characteristics and then continuously update the probability according to the on-task performance data (Rothen & Tennyson, 1978; Tennyson & Christensen, 1988). Accordingly, the instructional treatment is selected and adjusted.

The functional operation of this model is related to guidelines described by Novick and Lewis (1974) for determining the minimal length of a test adequate to provide sufficient information about the learner's degree of mastery of behavior being tested. Novick and Lewis's procedure uses a pretest on a set of objectives. From this pretest, the initial prior estimate of a student's ability per objective is combined in a Bayesian manner with information accumulated from previous students to generate a posterior estimate (using the beta, 0, distribution) of the student's probability of mastery of each objective. This procedure generates a table of values for different test lengths for the objectives and selects an appropriate number of test items from this table that seem adequate to predict mastery of each objective. Rothen and Tennyson (1978) modified Novick and Lewis's (1974) model in such a way that a definite rule or algorithm selects an instructional prescription from the table of generated values. In addition, this prescription is updated according to individual student's on-task learning performance. The implementation of this procedure requires the establishment of three parameters:

(a) An estimate is made of the student's initial ability based on prior knowledge. The beta distribution is used to characterize this information in probabilistic terms. This involves making an initial estimate of probability by administering a pretest and comparing the score to historically accumulated data (i.e., information collected from previous students). Rothen and Tennyson used the procedure described by Novick and Jackson (1974) in the selection of a particular beta distribution to characterize prior briefs. Novick and Lewis (1974) suggested using a prior distribution, 0 (a, b), which assign a probability slightly greater than .5 to the region above the criterion level set in advance to determine the mastery of the given objective.

(b) A criterion level (,aj for the objective is set. To decide on a student's attainment of mastery, it is necessary to select a minimum acceptance probability that a student's true level (w) exceeds or is equal to the criterion. For a test of length n with a student's score of x, a value of 7r, must be selected such that the probability (7r ~! ir,, Ix, n ) ~:.5. This is equivalent to at least a 50% certainty that the student's level of functioning is above -Tr,.

(c) The loss ratio (R) is defined as the disutilities associated with a false advance to a false retain decision. R refers to the relative losses associated with advancing a learner whose true level of functioning is below -Tr, and retaining a learner whose true level exceeds ir,.

Specification of these parameters, 6(a, b), R, and rr, affects the minimum necessary instructional presentation. As the prior distribution approaches unity, the length of instructional presentation decreases. A large loss ratio increases the length of instructional presentation to allow the possibility of high posterior probability of mastery. If the criterion level approaches 1, the instructional length is increased to provide adequate information about a student's level of functioning in the interval of the criterion.

The amount of instruction is selected by establishing the operating level for the student. The operating level is updated with each on-task response and compared to the posterior distribution. If the student's operating level is greater than or equal to the posterior probability, the student is judged to have correctly mastered the objective, and no further instruction is given. If the student's operating level is below that generated from the posterior distribution, his or her posterior distribution is used as a prior distribution with the same parameters for the criterion level and loss ratio as before. A new instructional presentation is then generated. This procedure is applied iteratively until either the student is judged to have mastered the objective or the instructional materials pool is exhausted.

Studies by Tennyson and his associates (see Tennyson & Christensen, 1988) demonstrated the effectiveness of the Bayesian probabilistic adaptive model in selecting die appropriate number of examples in concept learning. Posttest scores showed that the adaptive group was significantly better than the nonadaptive groups. Particularly, students in the adaptive group required significantly less learning time than students in the nonadaptive groups. This model was also effective in selecting the appropriate amount of instructional time for each student based on her or his on-task performance (Tennyson & S. Park, 1984; Tennyson, Park & Christensen, 1985).

If the instructional system uses mastery learning as its primary goal (Glaser, 1963) and adjustment of the instructional treatment is critical for learning, this model may be ideal. Another advantage of this model is that no assumption regarding the instructional item homogeneity (in content or difficulty) is needed. A questionable aspect of the model, however, is whether or not variables other than prior achievement and on-task performance can be effectively incorporated. Tennyson and Rothen (1977) used a task-related aptitude measure (logical reasoning ability) in deciding the loss ratio and included a response-confidence measure in weighting the on-task performance score. However, these procedures were employed without a theoretical base. Another difficulty of this model is how to make a prior-distribution from the pretest score and historical information collected from previous students. Although Hambleton and Novick (1973) suggested the possibility of using the student's performance level on other referral tasks for the historical data, until enough historical data are accumulated, this model cannot be utilized. Also, the application of this model is limited to rather simple tasks such as concept and rule learning.

Park and Tennyson (1980, 1986) extended the function of the Bayesian model by incorporating a sequencing strategy in the model. Park and Tennyson (1980) developed a responsive-sensitive strategy for selecting the presentation order of examples in concept learning from the analysis of cognitive learning requirements in concept learning (Tennyson & Park, 1982). Studies by Park and Tennyson (1980, 1986) and Tennyson, Park, and Christensen (1985) showed that the response-sensitive sequence was not only more effective than nonresponse-sensitive strategy but also reduced the necessary number of examples that the Bayesian model predicted for the student. Also, Park and Tennyson's studies found that the value of the pretask information decreases as the instruction progresses. In contrast, the contribution of the on-task performance data to the model's prediction increases as the instruction progresses.

22.4.2A. Structural and Algorithmic Approach. The optimization of instruction in Scandura's (1973, 1977a, 1977b, 1983) structural learning theory consists of finding optimal trade-offs between the sum of the values of the objectives achieved and total time required for instruction. Optimization will involve balancing gains against costs (a form of cost-benefit analysis). This notion is conceptually similar to Atkinson's (1976) and Atkinson and Paulson's (1972) cost-benefit dimension of instructional theory, Smallwood's (1971) economic teaching strategy, and Chant and Atkinson's (1973) optimal allocation of instructional efforts.

In structural learning theory, structural analysis of content is especially important as a means of finding optimal trade-offs. According to Scandura (1977a, 1977b), the competence underlying a given task domain is represented in terms of sets of processes, or rules for problem solving. Analysis of content structure is a method for identifying those processes.

Given a class of tasks, the structural analysis of content involves (a) sampling a wide variety of tasks, (b) identifying a set of problem-solving rules (R) for performing the tasks (as an ideal student in the target population might use), (c) identifying parallels among the rules and devising higher-order rules that reflect these parallels, (d) constructing more basic rule sets that incorporate higher-order and other rules, (e) testing and refining the resulting rule set on new problems, and (f) extending the rule set when necessary so that it accounts for both familiar and novel tasks in the domain. This method may be reapplied to the obtained rule set and repeated again as many times as desired. Each time the method is applied, the resulting rule set tends to become more basic in two senses: first, the individual rules become more simple, and second, the new rule set as a whole has greater generating power for solving a wider variety of problems.

Once a basic rule set B(R has been identified, and if B can be considered the student's entering knowledge from assessment of prior knowledge, it is possible to determine whether or not given problems might be solved by applying rules to other available rules, and, correspondingly, which rules might be learned (derived) as a result. The rule set that might be learned (at a given stage) by the student with exact knowledge of the rules in B is denoted as B2. The rule set immediately learnable, given the rules in B-4, is denoted B-. Each rule in B- represents a unit of knowledge that might be acquired by the student whose entry knowledge (B) includes only the initial rules (R ). In general, B- will be a far more encompassing and powerful rule set than the initial rule set R from which B- is derived. The ability to solve problems associated with B- comes about gradually as a result of solving sequences of simple problems associated with B, B2, ... , Bn- 1.

Hence, given any random selection of problems from the domain and a set of rules available in the learner's knowledge, it is possible to determine algorithmically which of the problems might be learned at any given stage and which problems require further instruction (e.g., in the form of prior problem-solving experience). In turn, this makes it possible to arrange the problems algorithmically in a learnable order. In general, it would be impossible or impractical to teach directly all of the solution rules contained in Bn.

The algorithmic sequence can be determined by computer alone, without the student's involvement. The operational procedure of the computer program is as follows: The program takes as input the initial set of rules, which is available in the student's knowledge, and an arbitrary list of problems. It then attempts the given problem in turn. Solved problems are added to a learnable sequence, and rules derived from solving problems are added to the rule set. Failed problems are retained on a failing problem list and reattempted after all problems are solved, or until the number of failed problems reaches a prespeciflied limit. This process has the effect of reordering presented problems so that each problem is solvable on its first presentation. That is, the program outputs may be used to discard redundant problems, to rearrange problems, or to add intermediate problems so that unsolved problems become solvable.

According to Scandura (1977a) and Wulfeck and Scandura (1977), the instructional sequence determined by this algorithmic procedure is optimal. This algorithmically designed sequence was superior to learner-controlled and random sequences in terms of the performance scores and the problem solution time (Wulfeck & Scandura, 1977). Also, Scandura and Dumin (1977) reported that a testing method based on the algorithmic sequence could assess the student's performance potential more accurately with fewer test items and less time than a domain-reference generation procedure and a hierarchical item generation procedure.

Since the algorithmic sequence is determined only by the structural characteristics of given problems and the prior knowledge of the target population (not individual students), the instructional process in structural learning theory is not adaptive to individual differences of the learner. Stressing the importance of individual differences in his structural learning theory, Scandura (1977a, 1977b, 1983) states that what is learned at each stage depends both on what is presented to the learner and what the learner knows. Based on the algorithmic sequence in the structural learning theory, Scandura and his associates (Scandura & E. Scandura, 1988) developed a rule-based CBI system. However, there has been no combined study of algorithmic sequence and individual differences that might show how individual differences could be used to determine the algorithmic sequences.

Landa's (1976) structural psychodiagnostic method may be well combined with Scandura's algorithmic sequence strategy to adapt the sequential procedure to individual differences that would emerge as the student learns a given task using the predetermined algorithmic sequence. According to Landa (1976), the structural psychodiagnostic method can identify the specific defects in the student's psychological mechanisms of cognitive activity by isolating the attributes of the given learning task which define the required actions and then by joining these attributes with the student's logical operations.

22.4.2.5. Other Micro-Adaptive Models. For the last 2 decades, some other micro-adaptive instructional systems have been developed to optimize the effectiveness or efficiency of instruction for individual students. For example, McCombs and McDaniel (1981) developed a two-step (macro and micro) adaptive system to accommodate the multivariate nature of learning characteristics and idiosyncratic learning processes in the ATI paradigm. McComb and McDaniel identified the important learning characteristics (e.g., reading/reasoning and memory ability, anxiety, and curiosity, etc.) from the results of multiple stepwise regression analyses of existing student performance data. To compensate for the student's deficiencies of the learning characteristics, they added a number of special-treatment components to the main track of instructional materials. For example, to assist low-ability students in reading comprehension or information-processing skills, schematic visual organizers were added. However, most systems like McComb and McDaniel's are not included in this review because they do not have true on-task adaptive capability, which is the most important criterion to be qualified as a micro-adaptive model. In addition, these systems are task dependent, and the applicability to other tasks is very limited, although the basic principles or ideas of the systems are plausible.

22.4.3 Treatment Variables in Micro-Adaptive Models

As reviewed above, micro-adaptive models are primarily developed to adapt two instructional variables: amount of content to be presented and presentation sequence of content. The Bayesian probabilistic model and the multiple regression model are designed to select the amount of instruction needed to learn the given task. Park and Tennyson (1980, 1986) incorporated sequencing strategies in the Bayesian probability model, and Ross and his associates (Ross & Anand, 1986; Ross & Morrison, 1986) investigated strategies for selecting content in the multiple regression model. Although these efforts showed that other instructional strategies could be incorporated in the model, they did not change the primary instructional variables and the operational procedure of the model. The mathematical model and the structural/algorithmic approach are designed mainly to select the optimal sequence of instruction. According to the Bayesian model and the multiple regression approach, the appropriate amount of instruction is determined by individual learning differences (aptitudes, including prior knowledge) and the individual's specific learning needs (on-task requirements). In the mathematical model, the history of the student's response pattern determines the sequence of instruction. However, an important implication of the structural/algorithmic approach is that the sequence of instruction should be decided by the content structure of the learning task as well as the student's performance history.

The Bayesian model and the multiple regression model use both pretask and on-task information to prescribe the appropriate amount of instruction. Studies by Tennyson and his associates (Tennyson & Rothen, 1977; Park & Tennyson, 1980) and Hansen et al. (1977) demonstrated the relative importance of these variables in predicting the appropriate amount of instruction. Subjects who received the amount of instruction selected based on the pretask measures (e.g., prior achievement, aptitude related to the task) needed less time to complete the task and showed higher performance level on the posttest than subjects who received the same amount of instruction regardless of individual differences. In addition, some studies (Hansen et al., 1977; Ross & Morrison, 1988) indicated that only prior achievement among pretask measures (e.g., anxiety, locus of control, etc.) provides consistent and reliable information for prescribing the amount of instruction. However, subjects who received the amount of instruction selected based on both pretask measures and on-task measures needed less time and showed higher test scores than subjects who received the amount of instruction based on only pretask measures. The results of the response-sensitive strategies studied by Park and Tennyson (1980, 1986) suggest that the predictive power of the pretask measures, including prior knowledge, decreases, while that of on-task measures increases as the instruction progresses.

As reviewed above, a common characteristic of microadaptive instructional models is response sensitivity. For response-sensitive instruction, the diagnostic and prescriptive processes attempt to change the student's internal state of knowledge about the content being presented. Therefore, the optimal presentation of instructional stimulus should be determined on the basis of the student's response pattern.

Response-sensitive instruction has a long history of development from Crowder's (1959) simple branching program to Atkinson's mathematical model of adaptive instruction. Until the late 1960s, technology was not readily Available to implement the response-sensitive diagnostic and prescriptive procedures as a general practice outside the experimental laboratory (Hall, 1977). Although the recent development of computer technology has made the implementation of this kind of adaptive procedures possible and allowed for further investigation of their instructional effects, as seen in the descriptions of microadaptive models, they have been mostly limited to simple tasks that can be easily analyzed for quantitative applications.

However, the Al methodology has provided a powerful tool for overcoming the primary limitation of micro-adaptive instructional models, so the response sensitive procedures can be utilized for more broad and complex domain areas.

22.4.4 Intelligent Tutoring Systems

Intelligent tutoring systems (ITS) are adaptive instructional systems developed with the application of Al methods and techniques. ITSs are developed to resemble what actually occurs when student and teacher sit down one-on-one and attempt to teach and learn together (see 19.3). As in any other instructional systems, ITSs have components representing content to be taught, the inherent teaching or instructional strategy, and mechanisms for understanding what the student does and does not know. In ITSs, these components are referred to as the problem-solving or expertise module, student-modeling module, and tutoring module. The expertise module evaluates the student's performance and generates instructional content during the instructional process. The student-modeling module assesses the student's current knowledge state and makes hypotheses about his or her conceptions and reasoning strategies employed to achieve the current state of knowledge. The tutorial module usually consists of a set of specifications for the selection of instructional materials the system should present and how and when they should be presented. Al methods for the representation of knowledge (e.g., production rules, semantic networks, and scripts frames) make it possible for the ITS to generate the knowledge to present the student based on his or her performance on the task rather than selecting the presentation according to the predetermined branching rules. Methods and techniques for natural language dialogues allow much more flexible interactions between the system and student. The function for making inferences about the cause of the student's misconceptions and learning needs allows the ITS to make qualitative decisions about the learning diagnosis and instructional prescription, unlike the micro-adaptive model in which the decision is entirely based on quantitative data. (For a detailed description of the ITS components and the AI methods used in the systems, see Chapter 19.)

Furthermore, ITS techniques provide a powerful toot for effectively capturing human learning and teaching processes. It has apparently contributed to a better understanding of cognitive processes involved in learning specific skills and knowledge (see 19.4). Some ITSs have not just demonstrated their effects for teaching specific domain contents but also provided research environments for investigating specific instructional strategies and tools for modeling human tutors and simulating human learning and cognition (Seidel & Park, 1994; see also 19.5). However, there are criticisms that ITS developers have failed- to incorporate many valuable learning principles and instructional strategies developed by instructional researchers and educators (Park, Perez & Seidel, 1987). Cooperative efforts among experts in different domains, including learning/instruction and AI, are required to develop more powerful adaptive systems using the ITS methods and techniques (Park & Seidel, 1989; Seidel, Park & Perez, 1988). However, theoretical issues of how to learn and teach with emerging technology, including Al, would continue to remain the most challenging problem.

Updated August 3, 2001
Copyright © 2001
The Association for Educational Communications and Technology

AECT
1800 North Stonelake Drive, Suite 2
Bloomington, IN 47404

877.677.AECT (toll-free)
812.335.7675