Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Educational Researcher
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Zientek, L. R.
Right arrow Articles by Thompson, B.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Matrix Summaries Improve Research Reports: Secondary Analyses Using Published Literature

Linda Reichwein Zientek and Bruce Thompson

LINDA REICHWEIN ZIENTEK is an assistant professor of mathematics education at Sam Houston State University, Department of Mathematics and Statistics, P.O. Box 2206, Huntsville, TX 77341; lrzientek{at}shsu.edu. Her research focuses on mathematics teacher preparation, teacher induction programs, and quantitative research methods.

BRUCE THOMPSON is Distinguished Professor and College Distinguished Research Fellow of Educational Psychology, and Distinguished Professor of Library Sciences, Texas A&M University, and adjunct professor of allied health sciences, Baylor College of Medicine (Houston), Department of Educational Psychology, 4225 TAMU, College Station, TX 77843; bruce-thompson{at}tamu.edu. His research interests include effect sizes, multivariate statistics, and LibQUAL+TM.


    Abstract
 TOP
 Abstract
 Benefits to Secondary Research
 Methodology
 Results
 Discussion
 REFERENCES
 
Correlation matrices and standard deviations are the building blocks of many of the commonly conducted analyses in published research, and AERA and APA reporting standards recommend their inclusion when reporting research results. The authors argue that the inclusion of correlation/covariance matrices, standard deviations, and means can enhance findings in education and psychology by permitting secondary researchers to (a) conduct commonly utilized traditional univariate and multivariate analyses not initially performed in primary studies, (b) produce effect sizes and other statistics not included in prior published literature, and (c) conduct analyses once difficult to perform. Furthermore, meta-analytic thinking is encouraged when researchers have the ability to conduct the same analyses on multiple studies and then compare these findings across studies.

Key Words: correlation • general linear model • reporting standards • secondary data analyses

The 2006 American Educational Research Association (AERA) Standards for Reporting on Empirical Social Science Research in AERA Publications required that in quantitative studies "relevant descriptive statistics (such as means and standard deviations for continuous variables, frequencies for discrete variables with few categories, and correlation matrices)" (p. 36) should be reported or made available by authors. The 2001 Publication Manual of the American Psychological Association (APA) noted that in research reports, "for correlational analyses (e.g., multiple regression analysis, factor analysis, and structural equation modeling), the sample size and variance-covariance (or correlation) matrix are needed" (p. 23). Thompson (2008) reviewed the etiology of these and related publication standards.

Because all analyses (e.g., t tests, ANOVAs, MANOVAs) are correlational (see Bagozzi, Fornell, & Larcker, 1981; Cohen, 1968; Knapp, 1978), the necessity of reporting matrix summaries is not limited to the three specific analyses provided in APA’s noninclusive list. Scholars benefit greatly when matrix summaries (e.g., correlation matrix and standard deviations, or the variance-covariance matrix) are made available for continuous data.

Matrix Summaries
When researchers adhere to the standards for correlational analyses and provide matrix summaries, sample sizes, and means, the interested reader has sufficient information to conduct a variety of secondary analyses on the same data without needing the raw data set. Advances in technology have made it possible for just about every statistical analysis to be conducted with standard deviations and a correlation matrix (Cudeck, 1989).

The correlation matrix is a symmetric matrix whose off-diagonal entries are the Pearson product-correlation coefficients (rXY). Because the bivariate correlation between a variable and itself is always 1, the diagonal entries of the correlation matrix are all 1s. When reporting correlation matrices, the APA (2001) Publication Manual recommended entering dashes in place of the 1s along the diagonal. In the case where the correlation matrix is not sufficient for further secondary analyses, and the variance-covariance matrix is needed, the variance-covariance matrix can be computed given (a) the correlation matrix and (b) standard deviations. The variance-covariance matrix is a symmetric matrix whose diagonal entries are the variances, and off-diagonal entries are the covariances between X and Y (COVXY). The COVXY equals rXY x SDX x SDY. The APA Publication Manual recommended reporting only the upper or the lower portion of correlation and variance-covariance matrices, because the two off-diagonal triangles of symmetric matrices are redundant. The correlation matrix serves as the foundation for all analytic methods, as suggested by the concept called the general linear model (GLM).

The General Linear Model
The GLM is the concept that "all analytic methods are correlational . . . and yield variance-accounted-for effect sizes analogous to r2 (e.g., R2, {eta}2, {omega}2)" (Thompson, 2000b, p. 263). As Graham (2008) explained,

The vast majority of parametric statistical procedures in common use are part of [a single analytic family called] the General Linear Model (GLM), including the t test, analysis of variance (ANOVA), multiple regression, descriptive discriminant analysis (DDA), multivariate analysis of variance (MANOVA), canonical correlation analysis (CCA), and structural equation modeling (SEM). Moreover, these procedures are hierarchical [italics added], in that some procedures are special cases of others. (p. 485)

In 1968, Cohen argued that multiple regression analysis subsumes all univariate parametric statistical analyses as special cases. In 1978, Knapp showed that all commonly utilized univariate and multivariate analyses are special cases of canonical correlation analysis (CCA). In 1981, Bagozzi and colleagues demonstrated that structural equation modeling (SEM) is an even more general case of the GLM (see Fan, 1997, for a cogent explanation).

Figure 1 presents a conceptual map of the commonly used statistical analyses falling within GLM. Note that predictive discriminant analysis, unlike descriptive discriminant analysis (DDA), is not part of the GLM (Huberty, 1994). It can also be shown that the mathematics of factor analysis are used to compute the multiplicative weights applied to the measured variables, either explicitly or implicitly, in all analyses throughout the GLM.


Figure 10380343
View larger version (11K):
[in this window]
[in a new window]

 
FIGURE 1. Conceptual map of the general linear model (GLM). Predictive discriminant analysis (PDA), unlike descriptive discriminant analysis (DDA), is not part of the GLM. The mathematics of factor analysis are used to compute the weights applied to the measured variables, either explicitly or implicitly, in all analyses throughout the GLM. SEM = structural equation modeling; CCA = canonical correlation analysis.

 
As noted by Thompson (1984), "The first step in a canonical correlation analysis involves the calculation of the intervariable correlation matrix" (p. 11), which is precisely why the majority of the commonly utilized analyses within the GLM can be conducted using only matrix summaries. In 1996, Campbell and Taylor demonstrated Knapp’s conclusion (i.e., CCA subsumes both univariate and multivariate parametric analyses) by conducting all common quantitative analyses with a CCA computer program syntax. The necessity of the correlation matrix for conducting CCA, in conjunction with evidence that CCA subsumes the most commonly utilized univariate and multivariate parametric analyses (e.g., t tests, ANOVA, MANOVA, DDA), logically confirms that both univariate and multivariate parametric analyses are correlational in nature.

Purpose
The purpose of the present article is to suggest the benefits of providing—for continuous variables—(a) matrix summaries, (b) sample sizes, and (c) means. We argue that the inclusion of correlation/covariance matrices, standard deviations, and means can enhance findings in education and psychology by permitting secondary researchers to (a) conduct commonly utilized traditional univariate and multivariate analyses not initially performed in primary studies (e.g., analysis of variance, multivariate analysis of variance, multiple regressions, and exploratory factor analyses), (b) produce effect sizes and other statistics not included in prior published literature, and (c) conduct analyses once difficult to perform and thus not reported in older literature (e.g., SEM).

We hope our few examples illustrate how "meta-analytic thinking" (cf. Cumming & Finch, 2001; Fidler & Thompson, 2001; Thompson, 2006b) can be encouraged when researchers have the ability to conduct the same analyses on multiple studies and then compare these findings across studies and over time. Thompson (2002) defined meta-analytic thinking as

both (a) the prospective formulation of study expectations and design by explicitly invoking prior effect sizes and (b) the retrospective interpretation of new results, once they are in hand, via explicit, direct [italics added] comparison with the prior effect sizes in the related literature. (p. 28)

Our examples may also facilitate broader recognition that all parametric statistical analyses are correlational in nature.

Syntax Commands and Our Reporting of Decimal Places
We provide here the necessary SPSS syntax commands to conduct these secondary analyses. In an era of point-and-click analyses, many if not most researchers are unaware that these secondary analyses can be conducted with commonly available software by typing the needed syntax commands. The interested reader can apply these syntaxes to other published matrix summaries merely by modifying the syntax commands we present.

We present our results here to more decimal points than many researchers use, to allow readers to replicate our secondary analysis examples with more precision. We also believe reporting numerical results to more decimal places in primary reports might facilitate more precise secondary analyses. Historically, the potential contributions of secondary analyses may have been underestimated within the field.


    Benefits to Secondary Research
 TOP
 Abstract
 Benefits to Secondary Research
 Methodology
 Results
 Discussion
 REFERENCES
 
Reporting matrix summaries provides researchers the necessary resources to answer different questions and conduct different analyses than those performed in initial research reports. As advances in technology are made, researchers can locate archived studies and conduct analyses once considered difficult or impossible to conduct.

Univariate Analysis
ANOVA and regression have been the most utilized univariate method in published education and psychology research (Edgington, 1964, 1974; Elmore & Woehlke, 1988; Goodwin & Goodwin, 1985; Kieffer, Reese & Thompson, 2001; Willson, 1980; Zientek, Capraro, & Capraro, 2008). Aiken, West, and Millsap (2008; cf. Aiken, West, Sechrest, & Reno, 1990) reported that 95% of Ph.D. psychology programs offered courses covering the topics of ANOVA and regression at least every 2 years. Henson and Williams (2006) found that faculty members of education doctoral programs believed their graduates to be proficient in the use of ANOVA, whereas faculty members at fewer doctoral programs (i.e., 42.3%) believed the preponderance of their graduates to be proficient in using regression analyses.

When matrix summaries are provided, secondary researchers can (a) conduct univariate analyses such as ANOVA, multiple regression, and path analysis; (b) produce information lacking in prior published literature; and (c) conduct analyses once considered difficult to conduct, such as commonality analyses (Nimon, Lewis, Kane, & Haynes, 2008; Zientek & Thompson, 2006). This is important because despite the widespread use of ANOVA and regressions, effect sizes have not historically been reported for ANOVA (cf. Kieffer et al., 2001; Snyder & Thompson, 1998; Thompson & Snyder, 1998), and researchers have been inconsistent in the reporting of effect sizes, inferential statistics, and p values—particularly when results are not statistically significant (Zientek et al., 2008). Both AERA (2006) and APA (2001) recommended the reporting of inferential statistics, p values, and effect sizes. Reporting matrix summaries allows secondary researchers to compute and interpret all relevant findings, even those not published in the original study.

Multivariate Analyses
When matrix summaries are provided, multivariate methods often not conducted in published studies can still be performed. The dearth of multivariate methods in published education research suggests that some researchers may not understand the benefits of multivariate methods (Zientek et al., 2008). First, multivariate methods decrease the probability of experiment-wise Type I errors. Second, multivariate tests can be "much more powerful" (p. 35) against Type II error than univariate tests of the same data, but only under certain conditions as explained by Bray and Maxwell (1985). Third, and most important, "multivariate methods best honor the reality to which the researcher is purportedly trying to generalize" (Thompson, 1991, p. 80), because only multivariate analyses simultaneously consider all relationships among the variables and honor the fact that all the variables coexist in reality. Multivariate analyses are consistent with a worldview that (a) most effects are multiply caused (e.g., reading curricula may affect reading achievement but so may free-lunch programs) and, conversely, (b) most interventions have multiple effects (e.g., successful reading interventions may affect reading achievement, but children who read better also may develop more positive self-concepts and more positive attitudes toward schooling).

Furthermore, univariate and multivariate analyses even of the same data (a) address different research questions and (b) analyze different variables. For example, in a single study looking at gender differences across two outcome variables, X and Y, two univariate ANOVAs might yield two pCALCULATED values both of .77 and two univariate {eta}2 values both of 0.5%, whereas a MANOVA of the same data might yield a pCALCULATED value less than .001 and a multivariate {eta}2 of 62.5%! The two ANOVAs actually test the differences of the means on the observed or measured variables, X and Y, whereas the MANOVA actually tests the difference of the mean of the DDA function scores of the women versus the mean of the DDA function scores of the men. Thus ANOVAs ask research questions about measured outcome variables, whereas MANOVAs ask questions about means on unobserved latent variables or constructs.

This is why night-and-day different results can be obtained when using multivariate as opposed to univariate analyses of exactly the same data (Fish, 1988). This is also why the use of ANOVAs post hoc to find statistically significant MANOVA effects is illogical, given that the two analyses address different research questions and also focus on different variables. Notwithstanding the fact that about three quarters of published MANOVA articles incorrectly report post hoc ANOVAs (Kieffer et al., 2001), "one would usually be well advised to follow up a significant multivariate test with a [descriptive] discriminant function analysis in order to study the nature of the group differences more closely" (Tatsuoka, 1971, p. 50).

Multivariate Analysis of Variance
When matrix summaries, sample sizes, and group means are provided, MANOVAs can be conducted by the researcher. MANOVAs enable researchers to investigate "groups of subjects on several dependent variables simultaneously; focusing on cases where the variables are correlated and share a common conceptual meaning" (Stevens, 2002, p. 173). In contrast, ANOVAs allow groups to be investigated on only one outcome variable at a time.

Score Validity: Factor Analysis
When matrix summaries are provided, score validity can be explored analytically. Exploratory factor analysis (EFA) and SEM are common methods for examining construct validity (Kline, 2005; Thompson, 2004). Validity, which is "broadly defined as the extent to which a measure is meaningful, relevant, and useful for the research at hand [italics removed]" (American Statistical Association [ASA], 2007, p. 12), should be investigated for the data in hand. The ASA recommended that "for every measure in every research process it is essential to provide appropriately defensible evidence for the validity, reliability, and fairness of [scores on] the measure" (p. 11).

In a review of quantitative reporting practices in teacher education research, Zientek et al. (2008) found that relatively few (6%) of the 174 reviewed studies conducted EFAs. When EFAs were conducted, the majority of the studies did not report the rotation method or both pattern and structure coefficients (Zientek et al., 2008), even when oblique rotations were performed, and then these matrices differ (Thompson, 2004). Henson and Williams (2006) found that faculty members at 38.5% of education doctoral programs believed that few or none (i.e., less than 25%) of their doctoral graduates were "capable of applying the technique or concept [factor analysis] independently in their own research" (p. 30). Advances in technology have increased the ease in which EFAs can be conducted (Henson & Roberts, 2006). However, EFAs require a number of decisions in determining the number of factors to extract and the rotation methods to be used. In published research where authors conducted EFAs but did not report rotation methods, pattern/structure coefficients, or scree plots, the secondary researcher can use the published matrix summaries to reproduce full EFA results and thus evaluate reported results based on complete information and correct analyses (Thompson, 2004).

Structural Equation Modeling
Henson and Williams (2006) found that faculty members at most education doctoral programs did not believe that the majority of their graduates were capable of conducting SEM, which may help explain deficits of SEM reports in published research studies. SEM is an a priori theory-oriented analysis (Kline, 2005). The technology to conduct an SEM using a correlation matrix and standard deviations has only become available over roughly the past 35 years (Cudeck, 1989), but secondary SEM analyses in some cases can be conducted on reports published prior to 1970 or using reports published more recently when more complete or alternative analyses are of interest.

When conducting an SEM, there is the occasional case when using the raw data set is necessary. According to Kline (2005), when non-normal data are being analyzed or special estimation methods are being used that do "not assume normal distributions or accommodate cases with missing observations," matrix summaries may not be appropriate (p. 46). However, for the majority of all other cases, inputting matrix summaries is appropriate and indeed may offer advantages to conducting the analysis with the raw data set. According to Kline, these advantages include the following: (a) When sample sizes are large, the statistical program is only required to work with a data file that is as long as the number of variables; (b) conducting analyses with matrix summaries allows readers to "either replicate the original analyses or estimate alternative models not considered in the original work" (p. 46); and (c) models can be formulated based on theory or meta-analysis and be submitted for secondary exploratory or diagnostic purposes.

Encourage Meta-Analytic Thinking
Reporting correlation/covariance matrices can encourage meta-analytic thinking. For example, presume two researchers each published research findings on a similar topic but conducted different analyses. A secondary researcher may wish to know if the same results would replicate across the studies, and the seemingly different results were instead merely methodology artifacts. If matrix summaries are provided, interested researchers can conduct a variety of analyses and compare findings across studies and methods.


    Methodology
 TOP
 Abstract
 Benefits to Secondary Research
 Methodology
 Results
 Discussion
 REFERENCES
 
Published articles were utilized to illustrate how reporting matrix summaries (i.e., standard deviations and correlation matrix, or variance-covariance matrix), means, and sample size can be used in secondary research using commonly available software. Space precludes illustration of all possible secondary analyses or the pursuit of all possible secondary research purposes.

The criteria for article selection were (a) matrix summaries, means, and sample sizes for the variables of interest were reported; (b) the new analyses were logical for the variables in the study; and (c) the article was archived in a peer-reviewed education or psychology journal or in ERIC. Because the most commonly utilized univariate and multivariate parametric analyses are all correlational in nature, our examples were not limited to multiple regressions, factor analyses, and SEMs.

The statistical analyses were conducted in SPSS by inputting matrix summaries, means, and sample sizes. The syntax commands we present here may be modified by readers to conduct similar secondary analyses with other research reports.


    Results
 TOP
 Abstract
 Benefits to Secondary Research
 Methodology
 Results
 Discussion
 REFERENCES
 
MANOVA, CCA, and EFA were conducted as secondary analyses for selected prior studies. In addition, commonality analysis as a supplementary secondary analysis for multiple regression was performed.

Multivariate Analysis of Variance
MANOVAs can be conducted without a raw data set when (a) matrix summaries, (b) group means, and (c) sample sizes are provided. MANOVAs allow researchers to simultaneously investigate a set of dependent variables "focusing on cases where the variables are correlated and share a common conceptual meaning" (Stevens, 2002, p. 173). Our first example of performing MANOVAs in secondary analysis is from the study reported by Wilkins (2008).

Wilkins (2008) "investigated 481 in-service elementary teachers’ levels of mathematical content knowledge, attitudes toward mathematics, beliefs about the effectiveness of inquiry-based instruction, [and] use of inquiry-based instruction" (p. 139). Among other analyses, Wilkins ran four separate ANOVAs to test mean differences between primary (Grades K–2) and upper elementary (Grades 3–5) teachers, with respect to content knowledge, mathematics attitudes, instructional beliefs, and instructional practices.

In Wilkins’ (2008) study, many of these variables was correlated at a noteworthy level, and the items shared a common conceptual meaning. In the present analyses, MANOVA was employed to investigate mean differences between primary (Grades K–2) and upper elementary (Grades 3–5) teachers on four variables. Wilkins provided the recommended means and matrix summaries within the primary report (p. 149). The SPSS syntax for conducting a MANOVA in secondary analysis is presented in Figure 2. The interested reader is encouraged to consult the original report and then retype and run our syntax to see how to conduct such secondary analyses.


Figure 20380343
View larger version (23K):
[in this window]
[in a new window]

 
FIGURE 2. SPSS syntax for conducting a MANOVA in a secondary analysis.

 
The secondary MANOVA results indicated that statistically significant differences existed between the primary (Grades K–2) and upper elementary (Grades 3–5) teachers across the four response variables, F(4/476) = 17.16, p < .001, with a noteworthy effect size ({eta}2 = 12.6%). The effect size was computed as 1 – Wilks’ lambda (Stevens, 2002). In contrast with Wilkins’ ANOVA results, the secondary analysis revealed that the largest of the four (unreported) univariate {eta}2 values (i.e., ANOVA {eta}2 = 7.4%) was only roughly half our multivariate effect size. Thus our secondary analysis also enabled the computation of univariate effect sizes not reported in the primary report.

As emphasized previously, it is theoretically conceivable for ANOVA and MANOVA analyses of the same data to yield results that are night-and-day different (i.e., respectively, have near 1.0 p values and near-zero effect sizes versus near-zero p values and near 1.0 variance-accounted-for effect sizes; see Fish, 1988). In such cases, only the multivariate analyses take into account all possible simultaneous relationships among the variables and thus honor the ecological reality that the variables in reality coexist.

Canonical Correlation Analysis
Figure 3 presents the SPSS syntax for conducting a secondary CCA when the primary report includes means, standard deviations, and the correlation matrix. The syntax is for the heuristic data in Thompson’s (1984) explanation of CCA. The same syntax, minus the MATRIX=IN subcommand, is typed to run a CCA in a primary analysis.


Figure 30380343
View larger version (18K):
[in this window]
[in a new window]

 
FIGURE 3. SPSS syntax for conducting a secondary canonical correlation analysis.

 
Here we conducted a CCA in secondary analysis of the data reported by Wentzel (2002, p. 294). In the primary report, Wentzel conducted four separate hierarchical multiple regressions to determine if "the five teaching dimensions [Fairness, Teacher Motivation, Rule Setting, Negative Feedback, High Expectations] explained significant amounts of variance in student motivation, social behavior, and achievement [Prosocial Pursuit, Responsibility Pursuit, Class Interest, Mastery Orientation]" (p. 287). Conducting multiple tests increases the likelihood of experiment-wise Type I errors and fails to honor a reality in which the variables simultaneously coexist. Therefore, we conducted a CCA (see Thompson, 1984, 2000a) between the variable sets of (a) five teacher beliefs/behaviors and (b) four student goals and interests outcomes.

CCA produces orthogonal (i.e., uncorrelated) functions (i.e., sets of multiplicative weights, like regression β weights or factor pattern coefficients) that optimize the multivariate relationships between the variable sets. In the more familiar regression analysis, multiplicative weights (often called β weights) are applied to the measured predictor variables to obtain scores on the latent predicted Y or Y variable. In regression, no multiplicative weight is explicitly applied to the outcome variable Y scores, but regression analysis can be conceptualized as also applying a weight to the outcome variable, albeit always a multiplicative weight of 1.0.

In canonical analysis, sets of multiplicative weights are applied to the predictor variables to obtain scores on unobserved latent predictor variables, and sets of multiplicative weights are also applied to the criterion or outcome variables to obtain scores on unobserved latent outcome variables. Indeed, the Pearson r2 values between the latent predictor and the latent outcome variables are the multivariate squared canonical correlations (i.e., the RC2 values). A more complete explanation of this weighting process using manageable examples involving only 10 or so people is provided by Thompson (1984, 1991).

The number of uncorrelated canonical functions equals the number of variables in the smaller variable set (i.e., in our secondary analysis, four). However, the squared canonical correlation coefficients (i.e., respectively, 44.0%, 4.5%, 0.8%, and 0.5%) from our secondary analyses suggested that only the first two functions were noteworthy. The canonical coefficients from our secondary analysis are reported in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 Canonical Correlation Analysis Results for Secondary Analysis of the Wentzel (2002) Data

 
Our results were consistent with Wentzel’s finding that High Expectations was a positive predictor of motivation outcomes. However, when comparing our findings with the four multiple regression analyses originally reported, the contributions of Teacher Motivation and Fairness were underestimated in the primary regression analyses. In the CCA results, Teacher Motivation and Fairness each accounted for approximately 44% of the variance in Function I, even though High Expectations also contributed 65.1% of its variance to Function I. And our findings indicate that when all four motivation outcomes were simultaneously entered in a single model, Class Interest shared almost all its variance (89.7%) with Function I and thus explained most of the multivariate relationship between the two sets of latent variables.

Factor Analysis
Because the covariance matrix is "the basic statistic" in factor analysis and SEM, textbook authors often suggest that students input illustrative covariance matrices into computer programs as a teaching tool for learners to be able to replicate the SEM, confirmatory factor analysis (CFA), and EFA results explained in texts (Kline, 2005, p. 10). According to Thompson (2004), EFA can be conducted to "inform evaluations of score validity," "to develop theory regarding the nature of constructs," and to "summarize relationships in the form of a more parsimonious set of factor scores that can then be used in subsequent analyses" (pp. 4–5).

EFAs require a logical sequence of steps. For example, the number of factors to extract and the rotation method must be determined. Guttman’s (1954) eigenvalue-greater-than-1 rule is often used to determine the number of factors, but this method may lead to extraction of too many factors (Gorsuch, 1983; Thompson 2004). Alternatively, the scree test (Cattell, 1966) determines the number of factors as the point where the graph of the eigenvalues across the unrotated factors tapers off.

Advantages to conducting an EFA from published studies, even when researchers themselves conducted an EFA, include the ability to (a) base factor extraction decisions on scree plot or parallel analysis results, even if authors based their decisions on Guttman’s rule, and (b) conduct an EFA with a different rotation method when theory supports alternate models (e.g., correlated factors).

To illustrate, an EFA with a varimax rotation was conducted to investigate the nature of the constructs in the Steinmayr and Spinath (2009) study, which investigated the role of motivation and ability in predicting student achievement, by using a combination of well-known instruments. In the primary study, a number of regression analyses were performed to predict student achievement, even though the authors noted (p. 5) substantial multicollinearity among some of the predictors.

Steinmayr and Spinath (2009) did not conduct a factor analysis of the predictors, notwithstanding the noted multicollinearity. However, sufficient statistics for an EFA were reported (p. 5). The scree plot and Guttman rule results were in agreement on the number of factors to extract. The varimax-rotated pattern/structure coefficients for the four factors are presented in Table 2.


View this table:
[in this window]
[in a new window]

 
Table 2 Varimax-Rotated Pattern/Structure Coefficients From Secondary Analysis of Variables in the Steinmayr and Spinath (2009) Study

 
Our analyses identified four interpretable uncorrelated factors, which we labeled Intelligence, Performance Preferences, Mathematics Versus Language Orientation, and Achievement Motivation. Our results suggest that a more parsimonious analysis could have been performed by using factor scores on these four constructs to predict academic success, rather than using the original numerous collinear predictors.

Commonality Analysis
Commonality analysis (a) provides an alternative to stepwise methods and (b) is "designed to identify the proportions of variance in the dependent variables that may be attributed uniquely to each of the independent variables, and the proportions of variance that are attributed to various combinations of independent variables" (Pedhazur, 1997, pp. 261–262). For example, if we are predicting outcome variable Y with two predictor variables, X1 and X2, we can conduct the regression to estimate the multiple R2Y x X1,X2. We can use commonality analysis to determine the proportion of the R2Y x X1,X2 that is (a) uniquely due to X1 (i.e., UX1), (b) uniquely due to X2 (i.e., UX2), and (c) common to both X1 and X2 (i.e., CX1,X2). And, as Thompson (2006a) explained in more detail, for this analysis:

Formula
Our secondary analysis example will illustrate these dynamics.

With more than three predictor variables, commonality analyses have historically been difficult to conduct because as the number of predictor variables (k) increases, the number of coefficients (2k – 1) exponentially increases. However, the recent development of a free computer program by Nimon et al. (2008) facilitates the analysis of even complex problems. Commonality analysis can be employed to supplement either univariate (Mood, 1969; Zientek & Thompson, 2006) or multivariate (Frederick, 1999) analyses.

The primary report by Pajares and Graham (1999, p. 130) provides the data for our secondary analysis. The researchers employed a number of variables to predict school performance in a series of regression analyses and did not conduct commonality analysis as a vehicle to explore all the unique and common explanatory abilities of the predictors. Here we use a subset of their predictors (i.e., Self-Efficacy, Anxiety, and Self-Concept) to illustrate secondary commonality analysis.

This analysis quantifies the percentage of the sum-of-squares of the outcome variable associated uniquely with each of the three predictor variables (i.e., areas 1–3 in Figure 4), the percentage associated with each of the three combinations of the three predictors taken two at a time (i.e., areas 4–6 in Figure 4), and the percentage of area commonly explained by all three predictors (i.e., area 7 in Figure 4), as exemplified in Figure 4. Figure 4 is a generic graphic describing any commonality analysis involving exactly three predictor variables.


Figure 40380343
View larger version (12K):
[in this window]
[in a new window]

 
FIUGRE 4. Venn diagram illustrating commonality variance partitions for cases involving three predictor variables.

 
As explained by Thompson (2006a), the first step in a commonality analysis is to compute the R2 values for all combinations of the predictors. The SPSS syntax to obtain the required values in a secondary analysis is presented in Figure 5. For these data, the relevant values are:

Formula


Figure 50380343
View larger version (23K):
[in this window]
[in a new window]

 
FIGURE 5. SPSS syntax to obtain the R2 values for all combinations of the predictors for a secondary commonality analysis.

 
Next, formulas (Thompson, 2006a, p. 279) are used to compute the predictive power unique only to a given predictor, and the predictive power common to the predictors in all their combinations. For these data the results are:

Formula

These seven partitions of the R2 of 34.4% using all three predictors are presented in Table 3. The results make clear that Self-Efficacy dominates the explanation of School Performance in the analysis. The Self-Efficacy variable alone explained r2SCHOOL PERFORMANCE x SELF-EFFICACY = 32.49% of the variance in School Performance, and of the total variance explained by all three predictors together (i.e., R2 = 34.40%), Self-Efficacy uniquely accounted for nearly a third of the explained variance (i.e., USELF-EFFICACY / R2 = 10.35% / 34.40% = 0.301).


View this table:
[in this window]
[in a new window]

 
Table 3 Seven Partitions (All Percentages) of the R2 = 34.4% From the Secondary Commonality Analysis

 

    Discussion
 TOP
 Abstract
 Benefits to Secondary Research
 Methodology
 Results
 Discussion
 REFERENCES
 
Conducting a study is analogous to detective work. Given the same research questions, two different researchers may develop completely different study designs and conduct completely different analyses. When matrix summaries are provided in primary research, variables later can be investigated from different perspectives and new secondary analyses can be conducted—even after the passage of time as new statistical methods are developed.

Secondary analysis also supports the maturation of the culture of respectful criticism so important in academic organizations. In their article in Educational Researcher, Feuer, Towne, and Shavelson (2002) asked,

What are the most effective means of stimulating more and better scientific educational research? . . . The primary emphasis [italics added] should be on nurturing and reinforcing a scientific culture of educational research. . . . The development of a scientific culture rests with individual researchers, supported by leadership in their professional associations. (p. 4)

They defined scientific culture as "a set of norms and practices and an ethos of honesty, openness, and continuous reflection, including how research quality is judged [italics added]" (Feuer et al., 2002, p. 4). Thompson (2008) argued that an organizational culture of respectful criticism is needed to support vibrant, mature scholarly dialogue.

The present report illustrated the value of providing means, sample size, and matrix summaries (e.g., means, standard deviations, and the correlation matrix or variance-covariance matrix) for (a) reanalyzing data from different perspectives or for addressing different research questions, (b) producing information lacking in primary literature (e.g., effect sizes), and (c) conducting analyses once considered too difficult to conduct (e.g., SEM prior to the creation of "graphics in/graphics out" software interfaces for declaring models). These illustrations endorse the underlying realization that all common analyses are correlational. Our report also illustrated how to conduct secondary analyses by typing syntax for use in commonly available software.

Correlation matrices and standard deviations—or variance-covariance matrices—are the building blocks of many of the analyses commonly conducted in published research (Cohen, 1968; Knapp, 1978). Because the first step in many of these analyses involves the use of the correlation or covariance matrix, the computer must first compute the correlation or covariance matrix. Providing the association coefficient matrix saves the computer one set of calculations but, more important, allows researchers to conduct a number of secondary analyses without the original data set.

The illustrations in the present study suggest that reporting matrix summaries, means, and sample sizes can benefit research. In particular, reporting this information allows secondary researchers to investigate the variables further to address either the research question in the primary report using alternative analyses or different research questions not considered in the original report. Furthermore, limitations of prior statistical methods can be examined. For example, the inherent defects of stepwise analyses have been exposed (Thompson, 1995, 2006a). When sufficient information is reported, the interested reader can conduct and compare results of stepwise regressions (or other analyses) from prior published reports with more suitable analyses (e.g., all-possible-subsets analyses) to better investigate research phenomena.

The inclusion of matrix summaries, means, and sample sizes enables researchers to explore further the nature of constructs (e.g., EFA) and estimate alternative models (e.g., Kline, 2005). The secondary analysis reported by Marsh, Dowson, Pietsch, and Walker (2004) is an excellent example of the benefits of such secondary analyses.

Meta-Analytic Thinking
Reporting matrix summaries can encourage meta-analytic thinking by allowing researchers to conduct the same analyses and compare results across studies and over time. In the present report, data from only a few prior studies were reinvestigated in order to illustrate the mechanics of such reanalyses. However, we hope that our examples point to the potential to apply secondary analyses to the full corpus of a given literature.

The advent of technology has facilitated the ease with which researchers can conduct more complicated analyses. In the 1960s, 1970s, and 1980s, limitations in software hindered conducting complicated analyses. For example, in the early days of SEM, researchers had to be masters of both matrix algebra and virtually the full Greek alphabet! However today, with means, sample sizes, and matrix summaries, the same analyses—regardless of when the article was published—can be conducted and results compared across studies and time. Unfortunately, reporting correlation matrices has not been the common practice in many journals; therefore, the researcher’s ability to compare studies over time on some topics may be limited.

Future Enforcement of Reporting Standards
We cannot change the past, but we can change the future. Research benefits when standards for empirical research are faithfully followed. Both AERA (2006) and APA (2001) advocated the reporting of covariance or correlation matrices in primary reports. In order to encourage adherence to reporting standards for empirical research, researchers need to understand the value of adhering to these standards and the benefits such reporting affords to secondary research. Editors, reviewers, and researchers should demand that these standards be followed.

The present report illustrated the value and benefits that reporting means, sample size, and matrix summaries can bring to secondary research, particularly with regard to encouraging the use of multivariate methods and meta-analytic thinking. Hopefully, our illustrations will also entice readers’ curiosity about the rationale underlying the development of standards and encourage further exploration as to how adhering to the reporting standards can benefit the research community.

Received for publication February 17, 2009. Revision received May 5, 2009. Accepted for publication May 7, 2009.


    REFERENCES
 TOP
 Abstract
 Benefits to Secondary Research
 Methodology
 Results
 Discussion
 REFERENCES
 

  • Aiken, LS, West, SG, & Millsap, RE (2008). Doctoral training in statistics, measurement, and methodology in psychology: Replication and extension of Aiken, West, Sechrest, and Reno’s (1990) survey of PhD programs in North America. American Psychologist, 63, 32–50[CrossRef][Medline] [Order article via Infotrieve]
  • Aiken, LS, West, SG, Sechrest, L, & Reno, RR (1990). Graduate training in statistics, methodology, and measurement in psychology: A survey of PhD programs in North America. American Psychologist, 45, 721–734[CrossRef]
  • American Educational Research Association (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35(6), 33–40[Free Full Text]
  • American Psychological Association. (2001). Publication manual of the American Psychological Association. (5th ed. Washington, DC: Author.
  • American Statistical Association. (2007). Using statistics effectively in mathematics education research. Alexandria, VA: Author.
  • Bagozzi, RP, Fornell, C, & Larcker, DF (1981). Canonical correlation analysis as a special case of a structural relations model. Multivariate Behavioral Research, 16, 437–454[CrossRef][Web of Science]
  • Bray, JH, & Maxwell, SE. (1985). Multivariate analysis of variance. Thousand Oaks, CA: Sage.
  • Campbell, KT, & Taylor, DL (1996). Canonical correlation analysis as a general linear model: A heuristic lesson for teachers and students. Journal of Experimental Education, 64, 157–171[Web of Science]
  • Cattell, RB (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276[CrossRef][Web of Science]
  • Cohen, J (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70, 426–433[CrossRef][Web of Science]
  • Cudeck, R (1989). Analysis of correlation matrices using covariance structure models. Psychological Bulletin, 105, 317–327[CrossRef][Web of Science]
  • Cumming, G, & Finch, S (2001). A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532–575[Abstract/Free Full Text]
  • Edgington, ES (1964). A tabulation of inferential statistics used in psychology journals. American Psychologist, 19, 202–203
  • Edgington, ES (1974). A new tabulation of statistical procedures used in APA journals. American Psychologist, 29, 25–26[CrossRef]
  • Elmore, PB, & Woehlke, PL (1988). Statistical methods employed in American Educational Research Journal, Educational Researcher, and Review of Educational Research from 1978 to 1987. Educational Researcher, 17(9), 19–20[Free Full Text]
  • Fan, X (1997). Canonical correlation analysis and structural equation modeling: What do they have in common? Structural Equation Modeling, 4, 65–79
  • Feuer, MJ, Towne, L, & Shavelson, RJ (2002). Scientific culture and educational research. Educational Researcher, 31(8), 4–14[Abstract/Free Full Text]
  • Fidler, F, & Thompson, B (2001). Computing correct confidence intervals for ANOVA fixed- and random-effects effect sizes. Educational and Psychological Measurement, 61, 575–604[Abstract/Free Full Text]
  • Fish, LJ (1988). Why multivariate methods are usually vital. Measurement and Evaluation in Counseling and Development, 21, 130–137[Web of Science]
  • Frederick, BN. (1999). Partitioning variance in the multivariate case: A step-by-step guide to canonical commonality analysis. In Thompson, B (Ed.), Advances in social science methodology. 5, (pp.305-318). Stamford, CT: JAI.
  • Goodwin, LD, & Goodwin, WL (1985). Statistical techniques in AERJ articles, 1979–1983: The preparation of graduate students to read the educational research literature. Educational Researcher, 14(2), 5–11[Abstract/Free Full Text]
  • Gorsuch, RL. (1983). Factor analysis. (2nd ed. Hillsdale, NJ: Lawrence Erlbaum.
  • Graham, JM (2008). The general linear model as structural equation modeling. Journal of Educational and Behavioral Statistics, 33, 485–506[Abstract/Free Full Text]
  • Guttman, L (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149–161[CrossRef][Web of Science]
  • Henson, RK, & Roberts, JK (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66, 393–416[Abstract/Free Full Text]
  • Henson, RK, & Williams, C. (2006, April). Doctoral training in research methodology: A national survey of education-related degrees. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
  • Huberty, C. (1994). Applied discriminant analysis. New York: Wiley.
  • Kieffer, KM, Reese, RJ, & Thompson, B (2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1997: A methodological review. Journal of Experimental Education, 69, 280–309[Web of Science]
  • Kline, RB. (2005). Principles and practice of structural equation modeling. (2nd ed. New York: Guilford.
  • Knapp, TR (1978). Canonical correlation analysis: A general parametric significance testing system. Psychological Bulletin, 85, 410–416[CrossRef][Web of Science]
  • Marsh, H, Dowson, M, Pietsch, J, & Walker, R (2004). Why multicollinearity matters: A reexamination of relations between self-efficacy, self-concept, and achievement. Journal of Educational Psychology, 96, 518–522[CrossRef][Web of Science]
  • Mood, AR (1969). Macro-analysis of the American educational system. Operations Research, 17, 770–784[Abstract/Free Full Text]
  • Nimon, K, Lewis, M, Kane, R, & Haynes, RM (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40, 457–466[Abstract/Free Full Text]
  • Pajares, F, & Graham, L (1999). Self-efficacy, motivation constructs, and mathematics performance of entering middle school students. Contemporary Educational Psychology, 24, 124–139[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  • Pedhazur, EJ. (1997). Multiple regression in behavioral research: Explanation and prediction. (3rd ed. Fort Worth, TX: Harcourt Brace.
  • Snyder, PA, & Thompson, B (1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: Review of practices and suggested alternatives. School Psychology Quarterly, 13, 335–348[CrossRef][Web of Science]
  • Steinmayr, R, & Spinath, B (2009). The importance of motivation as a predictor of school achievement. Learning and Individual Differences, 19, 80–90[CrossRef][Web of Science]
  • Stevens, JP. (2002). Applied multivariate statistics for the social sciences. (4th ed. Mahwah, NJ: Lawrence Erlbaum.
  • Tatsuoka, MM. (1971). Significance tests: Univariate and multivariate. Champaign, IL: Institute for Personality and Ability Testing.
  • Thompson, B. (1984). Canonical correlation analysis: Uses and interpretations. Thousand Oaks, CA: Sage.
  • Thompson, B (1991). A primer on the logic and use of canonical correlation analysis. Measurement and Evaluation in Counseling and Development, 24, 80–95[Web of Science]
  • Thompson, B (1995). Stepwise regression and stepwise discriminant analysis need not apply here: A guidelines editorial. Educational and Psychological Measurement, 55, 525–534[Abstract]
  • Thompson, B. (2000a). Canonical correlation analysis. In Grimm, L, & Yarnold, P (Eds.), Reading and understanding more multivariate statistics. (p. 285-316). Washington, DC: American Psychological Association.
  • Thompson, B. (2000b). Ten commandments of structural equation modeling. In Grimm, L, & Yarnold, P (Eds.), Reading and understanding more multivariate statistics. (p. 261-284). Washington, DC: American Psychological Association.
  • Thompson, B (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 24–32[Free Full Text]
  • Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association.
  • Thompson, B. (2006a). Foundations of behavioral statistics: An insight-based approach. New York: Guilford.
  • Thompson, B. (2006b). Research synthesis: Effect sizes. In Green, J, Camilli, G, & Elmore, PB (Eds.), Handbook of complementary methods in education research. (p. 583-603). Washington, DC: American Educational Research Association.
  • Thompson, B.( 2008 Standards in conducting and publishing research in education. Mid-Western Educational Researcher, 21, ( 1, 10–16, Available at http://www.coe.tamu.edu/~bthompson/mwera.htm.
  • Thompson, B, & Snyder, PA (1998). Statistical significance and reliability analyses in recent JCD research articles. Journal of Counseling and Development, 76, 436–441[Web of Science]
  • Wentzel, K (2002). Are effective teachers like good parents? Teaching styles and student adjustment in early adolescence. Child Development, 73, 287–301[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  • Wilkins, JLM (2008). The relationship among elementary teachers’ content knowledge, attitudes, beliefs, and practices. Journal of Mathematics Teacher Education, 11, 139–164[CrossRef]
  • Willson, VL (1980). Research techniques in AERJ articles: 1969 to 1978. Educational Researcher, 9(6), 5–10[Free Full Text]
  • Zientek, LR, Capraro, MM, & Capraro, RM (2008). Reporting practices in quantitative teacher education research: One look at the evidence cited in the AERA panel report. Educational Researcher, 37(4), 208–216[Abstract/Free Full Text]
  • Zientek, LR, & Thompson, B (2006). Commonality analysis: Partitioning variance to facilitate better understanding of data. Journal of Early Intervention, 28, 299–307[Abstract/Free Full Text]

Educational Researcher, Vol. 38, No. 5, 343-352 (2009)
DOI: 10.3102/0013189X09339056


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Zientek, L. R.
Right arrow Articles by Thompson, B.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

AER home page RER home page EPA home page JEB home page RRE home page