A Comparative Study of Two-Sample Tests for High-Dimensional Covariance Matrices

: The equality of covariance matrices is an essential assumption in means and discriminant analyses for high-dimensional data. The performance of tests for covariance matrices may vary substantially depending on the covariance structure, so using inappropriate methods to verify the assumption will result in worse performance. The purpose of this study is to assess and compare the performance of three tests for two-sample high-dimensional covariance matrices: Schott’s (2007), Srivastava and Yanagihara’s (2010), and Li and Chen's (2012) under various covariance structures. A simulation study was conducted when the covariance structures were spherical, compound symmetric, block-diagonal, and first-order autoregressive with homogenous variances. The results show that Li and Chen's test outperforms the others with a sample size of at least 10 under particular covariance structures. When the number of variables is increased with a fixed sample size, Li and Chen's test still performs well, whereas Schott's performance deteriorates. Some recommendations for selecting appropriate tests are also provided in this paper.


I. INTRODUCTION
As measurement technology has advanced, high dimensional data have become more common in a variety of fields, including medical science, genomics, and economics.DNA microarrays, a powerful technology for studying gene expression on a genomic scale, are examples of high dimensional data in medical science since they involve thousands of variables of gene expression data with a very small sample size.In Alon et al. [1], the number of variables reached 6,500 while the sample sizes of the first and the second groups were 22 and 40, respectively, and despite the classification method being applied to reduce them into groups, there were still 2,000 variables left.The analytical methods used to deal with high-dimensional data differ from those used for low-dimensional data.When the number of variables exceeds the sample size, as in high dimensional data, statistical methods become very complicated, and in many cases, effective methods used in univariate and multivariate analyses are inapplicable.Two prominent instances are the Hotelling's test [2] for comparing two mean vectors and the likelihood ratio test for comparing two or more covariance matrices.In means and discriminant analyses, testing the equality of two covariance matrices is an important method for data analysts to ensure that the data satisfy the assumption of homogeneous covariance matrices.However, methods for dealing with high-dimensional data are still limited and are dependent on covariance structure, the number of variables, and sample size.Among these were Schott [3], Srivastava and Yanagihara [4], Li and Chen [5], and Srivastava, Yanagihara, and Kubokawa [6], the latter of which was modified from Schott [3].The goal of this study is to examine and compare the performance of three tests established by Schott [3], Srivastava and Yanagihara [4], and Li and Chen [5] for equality of two covariance matrices in high dimensional data under various covariance structures.

II. TESTS FOR COVARIANCE MATRICES IN HIGH-DIMENSIONAL DATA
Let ij x be distributed as iid ( , ) , 1, 2; 1, 2,..., i i j n  from population i and the two samples are assumed to be independent.In this study, we considered the case of high-dimensional data which is , pn  One important obstacle that makes most statistical methods in multivariate cases inapplicable in high-dimension cases is the singularity of the highdimensional sample covariance matrix.As a result, existing tests of high-dimensional covariance matrices such as those presented by Schott [3], Srivastava and Yanagihara [4], Li and Chen [5], and Cai, Liu and Xia [7] were developed without using the inverse of the sample covariance matrix.Additionally, since the test proposed by Cai et al. [7] was based on a sparse matrix, which is narrower than the previous three techniques, it was not included in this study.
The hypothesis testing problem in this study is : The sample mean vectors are , and the sample covariance matrices are

A. Schott's Test
The test presented by Schott [3] is based on the square of the Frobenius norm The S T test does not perform well when the data is right-skewed, particularly when the number of variables is increased while the sample size remains constant [6].

B. Srivastava and Yanagihara's Test
The test developed by Srivastava and Yanagihara [4]  ˆi .The null hypothesis would be rejected when and Z is a standard normal random variable.In addition, when 22 is the chi-squared distribution with 1 degree of freedom.The null hypothesis would be rejected when 2 When the number of variables was 200, it was shown that the test statistic 2 Q performed well [4].

C. Li and Chen's Test
The test developed by Li and Chen [5] is based on unbiased and consistent estimator of The test statistic, denoted by  n ; this complicates the calculation and takes a long time if the sample sizes are very large [5].

III. SIMULATION PROCEDURE
To evaluate the performance of the tests created by Schott [3], Srivastava and Yanagihara [4] and Li and Chen [5], four structures of covariance matrices: sphericity, compound symmetry (CS), block diagonal structure (BD), and first-order autoregressive structure with homogenous variances, or AR(1), were formed in a simulation study.The simulation study was conducted using R version nn  .We set 12  ΣΣ in the null hypothesis, and in the alternative, we set the first population covariance matrix 1 Σ to be the same as in the null hypothesis, but the second population covariance matrix 2 Σ to be different from 1 Σ but with the same structure as follows:

A. Sphericity
Under the null hypothesis, set and under the alternative hypothesis, set 2 3 p  ΣI .

C. BD
Under the null hypothesis, set For each condition, 1,000 samples were generated at a nominal significance level of 0.05.The performance of the tests was assessed using the attained significance level (ASL) and empirical power.Under the null hypothesis, the ASL was calculated as the proportion of the number of times the calculated test statistics fell inside the critical region in 1,000 times.The ASL was evaluated using Bradley's liberal criterion [8].When the ASLs of a test fall within the interval [0.04, 0.06], the test was regarded as acceptable.To obtain the empirical power, the simulation study was conducted under the alternative hypothesis of unequal covariance matrices but with the same covariance structure.

IV. RESULTS
The simulation results were presented in terms of the attained significance level (ASL) and empirical power under four covariance matrix structures: sphericity, CS, BD, and AR(1), as shown in Tables 1-4, respectively.Table 2 shows that all three tests in the study, TS, TSY, and TLC, performed unacceptably under the compound symmetric covariance matrix.Table 3 illustrates that the statistic TS performs acceptably, and the test TLC performs well with the sample size of at least 10, but the test TSY does not.4, that under the covariance structure of AR(1), the tests TS and TLC perform well, whereas TSY performs poorly.In addition, when the number of variables (p) is increased with a fixed sample size, the test TLC performs slightly better than TS.

V. CONCLUSION AND DISCUSSION
The purpose of this study is to assess and compare the performance of tests for equality of covariance matrices in high-dimensional data.The tests considered in this study were Schott's [3], Srivastava and Yanagihara's [4], and Li and Chen's [5] and the simulation study was conducted under four structures of covariance matrices: sphericity, compound symmetry, block diagonal structure, and first-order autoregressive structure with homogenous variances, or AR (1).

Conclusion
When the data are multivariate normal distributed with the covariance matrix structures of sphericity, block diagonal matrix, or AR(1), the tests presented by Schott [3], Srivastava and Yanagihara [4], and Li and Chen [5] perform differently.Overall, Li and Chen's test outperforms the others; actually, it is slightly better than Schott's test.For sample sizes of at least 10, Li and Chen's and Schott's tests can be used effectively.When the number of variables is increased with a fixed sample size, Li and Chen's test performs better, while Schott's test performs worse.In addition, when the covariance matrix structure is compound symmetry, none of the three tests in this study performs well.To test the equality of covariance matrices in high-dimensional data, the structures of

Discussion
The findings from this study collaborated by Li and Chen [5], which a simulation study was conducted under spherical and blockdiagonal covariance structures, and it was found that the ASLs were close to the nominal significance level.From the results, it was shown that the test by Srivastava and Yanagihara [4] did not perform well under all structures of covariance matrices in this study, the test might be only suitable for particular covariance structures.For instance, the covariance structure, determined by Srivastava and Yanagihara [4], was that under the null hypothesis of 1 Σ was the same as in the null hypothesis.This result leads to a suggestion for developing a new test, i.e. if a developed test performs acceptably with a variety of covariance structures, analysts and researchers will find it easier to select a test for equality of covariance matrices.
uniform distribution on the interval [a,b].Under the alternative hypothesis, set 2 

1 : 2 : 3 :
should be examined first, which can be done by considering the pattern of the sample covariance matrix.When the covariance matrix structure is spherical, block-diagonal, or first-order regressive, the guidelines are:Case When the sample size (ni) is at least 10 and the number of variables (p) is substantially greater than the sample size, such as p > 7ni, Li and Chen's test should be applied.Case When the sample size (ni) is at least 10 and the number of variables (p) is not substantially greater than the sample size, such as ni < p < 7ni, either Li and Chen's or Schott's test should be used.Case When the sample size is smaller than 10, Schott's test is recommended.
D , where D was a diagonal matrix and 0  ( ),

International Journal of Current Science Research and Review ISSN: 2581-8341 Volume 05 Issue 04 April 2022 DOI: 10.47191/ijcsrr/V5-i4-28, Impact Factor: 5.995 IJCSRR @ 2022 www.ijcsrr.org 1075 * Corresponding Author: Knavoot Jiamwattanapong Volume 05 Issue 04 April 2022 Available at: ijcsrr.org Page No.-1073-1080
) Z is a standard normal random variable.To obtain the value of the test statistic LC T , it needs to expand the terms in the summations to the order of 4   and

Table 1 .
ASL and Empirical power of the tests with spherical covariance structure at nominal level 0.05

Table 1
shows that when the sample size is at least 10, both the tests TS and TLC are acceptable under the spherical covariance structure, while TSY is not.When the number of variables is increased while the sample size remains constant, the test TLC still performs better than TS considering from the empirical power of the tests.Overall, it can be concluded that the test TLC outperformed the others when the covariance structure is spherical.ISSN:

Table 2 .
ASL and Empirical power of the tests with compound symmetric covariance structure at nominal level 0.05

Table 3 .
ASL and Empirical power of the tests with block diagonal covariance structure at nominal level 0.05 ISSN:

Table 4 .
ASL and Empirical power of the tests with covariance structure of AR(1) at nominal level 0.05