Publications by authors named "Nathaniel Schenker"

20 Publications

  • Page 1 of 1

Small Area Estimation of Cancer Risk Factors and Screening Behaviors in US Counties by Combining Two Large National Health Surveys.

Prev Chronic Dis 2019 08 29;16:E119. Epub 2019 Aug 29.

Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.

Background: National health surveys, such as the National Health Interview Survey (NHIS) and the Behavioral Risk Factor Surveillance System (BRFSS), collect data on cancer screening and smoking-related measures in the US noninstitutionalized population. These surveys are designed to produce reliable estimates at the national and state levels. However, county-level data are often needed for cancer surveillance and related research.

Methods: To use the large sample sizes of BRFSS and the high response rates and better coverage of NHIS, we applied multilevel models that combined information from both surveys. We also used relevant sources such as census and administrative records. By using these methods, we generated estimates for several cancer risk factors and screening behaviors that are more precise than design-based estimates.

Results: We produced reliable, modeled estimates for 11 outcomes related to smoking and to screening for female breast cancer, cervical cancer, and colorectal cancer. The estimates were produced for 3,112 counties in the United States for the data period from 2008 through 2010.

Conclusion: The modeled estimates corrected for potential noncoverage bias and nonresponse bias in the BRFSS and reduced the variability in NHIS estimates that is attributable to small sample size. The small area estimates produced in this study can serve as a useful resource to the cancer surveillance community.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.5888/pcd16.190013DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6716412PMC
August 2019

Multiple imputation of completely missing repeated measures data within person from a complex sample: application to accelerometer data in the National Health and Nutrition Examination Survey.

Stat Med 2016 12 2;35(28):5170-5188. Epub 2016 Aug 2.

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, U.S.A.

The Physical Activity Monitor component was introduced into the 2003-2004 National Health and Nutrition Examination Survey (NHANES) to collect objective information on physical activity including both movement intensity counts and ambulatory steps. Because of an error in the accelerometer device initialization process, the steps data were missing for all participants in several primary sampling units, typically a single county or group of contiguous counties, who had intensity count data from their accelerometers. To avoid potential bias and loss in efficiency in estimation and inference involving the steps data, we considered methods to accurately impute the missing values for steps collected in the 2003-2004 NHANES. The objective was to come up with an efficient imputation method that minimized model-based assumptions. We adopted a multiple imputation approach based on additive regression, bootstrapping and predictive mean matching methods. This method fits alternative conditional expectation (ace) models, which use an automated procedure to estimate optimal transformations for both the predictor and response variables. This paper describes the approaches used in this imputation and evaluates the methods by comparing the distributions of the original and the imputed data. A simulation study using the observed data is also conducted as part of the model diagnostics. Finally, some real data analyses are performed to compare the before and after imputation results. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.7049DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096983PMC
December 2016

MULTIPLE IMPUTATION FOR MISSINGNESS DUE TO NONLINKAGE AND PROGRAM CHARACTERISTICS: A CASE STUDY OF THE NATIONAL HEALTH INTERVIEW SURVEY LINKED TO MEDICARE CLAIMS.

J Surv Stat Methodol 2016 Sep 21;4(3):316-338. Epub 2016 May 21.

National Center for Health Statistics, Hyattsville, MD 20782, USA.

Record linkage is a valuable and efficient tool for connecting information from different data sources. The National Center for Health Statistics (NCHS) has linked its population-based health surveys with administrative data, including Medicare enrollment and claims records. However, the linked NCHS-Medicare files are subject to missing data; first, not all survey participants agree to record linkage, and second, Medicare claims data are only consistently available for beneficiaries enrolled in the Fee-for-Service (FFS) program, not in Medicare Advantage (MA) plans. In this research, we examine the usefulness of multiple imputation for handling missing data in linked National Health Interview Survey (NHIS)-Medicare files. The motivating example is a study of mammography status from 1999 to 2004 among women aged 65 years and older enrolled in the FFS program. In our example, mammography screening status and FFS/MA plan type are missing for NHIS survey participants who were not linkage eligible. Mammography status is also missing for linked participants in an MA plan. We explore three imputation approaches: (i) imputing screening status first, (ii) imputing FFS/MA plan type first, (iii) and imputing the two longitudinal processes simultaneously. We conduct simulation studies to evaluate these methods and compare them using the linked NHIS-Medicare files. The imputation procedures described in our paper would also be applicable to other public health-related research using linked data files with missing data issues arising from program characteristics (e.g., intermittent enrollment or data collection) reflected in administrative data and linkage eligibility by survey participants.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/jssam/smw002DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6444366PMC
September 2016

A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to , .

J Off Stat 2016 10;32(1):147-164. Epub 2016 Mar 10.

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, 20782, U.S.A.

Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014) compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1515/jos-2016-0007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6444354PMC
March 2016

Identifying implausible gestational ages in preterm babies with Bayesian mixture models.

Stat Med 2013 May 4;32(12):2097-113. Epub 2012 Nov 4.

National Center for Health Statistics, Hyattsville, MD, 20782, USA.

Infant birth weight and gestational age are two important variables in obstetric research. The primary measure of gestational age used in US birth data is based on a mother's recall of her last menstrual period, which has been shown to introduce random or systematic errors. To mitigate some of those errors, Oja et al., Platt et al., and Tentoni et al. estimated the probabilities of gestational ages being misreported under the assumption that the distribution of infant birth weights for a true gestational age is approximately Gaussian. From this assumption, Oja et al. fitted a three-component mixture model, and Tentoni et al. and Platt et al. fitted two-component mixture models. We build on their methods and develop a Bayesian mixture model. We then extend our methods using reversible jump Markov chain Monte Carlo to incorporate the uncertainty in the number of components in the model. We conduct simulation studies and apply our methods to singleton births with reported gestational ages of 23-32 weeks using 2001-2008 US birth data. Results show that a three-component mixture model fits the birth data better for gestational ages reported as 25 weeks or less; and a two-component mixture model fits better for the higher gestational ages. Under the assumption that our Bayesian mixture models are appropriate for US birth data, our research provides useful statistical tools to identify records with implausible gestational ages, and the techniques can be used in part of a multiple-imputation procedure for missing and implausible gestational ages.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.5657DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6426294PMC
May 2013

Estimating standard errors for life expectancies based on complex survey data with mortality follow-up: A case study using the National Health Interview Survey Linked Mortality Files.

Stat Med 2011 May 22;30(11):1302-11. Epub 2011 Mar 22.

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, USA.

Life expectancy is an important measure for health research and policymaking. Linking individual survey records to mortality data can overcome limitations in vital statistics data used to examine differential mortality by permitting the construction of death rates based on information collected from respondents at the time of interview and facilitating estimation of life expectancies for subgroups of interest. However, use of complex survey data linked to mortality data can complicate the estimation of standard errors. This paper presents a case study of approaches to variance estimation for life expectancies based on life tables, using the National Health Interview Survey Linked Mortality Files. The approaches considered include application of Chiang's traditional method, which is straightforward but does not account for the complex design features of the data; balanced repeated replication (BRR), which is more complicated but accounts more fully for the design features; and compromise, 'hybrid' approaches, which can be less difficult to implement than BRR but still account partially for the design features. Two tentative conclusions are drawn. First, it is important to account for the effects of the complex sample design, at least within life-table age intervals. Second, accounting for the effects within age intervals but not across age intervals, as is done by the hybrid methods, can yield reasonably accurate estimates of standard errors, especially for subgroups of interest with more homogeneous characteristics among their members.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.4219DOI Listing
May 2011

Multiple imputation of missing dual-energy X-ray absorptiometry data in the National Health and Nutrition Examination Survey.

Stat Med 2011 Feb 30;30(3):260-76. Epub 2010 Nov 30.

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD, USA.

In 1999, dual-energy x-ray absorptiometry (DXA) scans were added to the National Health and Nutrition Examination Survey (NHANES) to provide information on soft tissue composition and bone mineral content. However, in 1999-2004, DXA data were missing in whole or in part for about 21 per cent of the NHANES participants eligible for the DXA examination; and the missingness is associated with important characteristics such as body mass index and age. To handle this missing-data problem, multiple imputation of the missing DXA data was performed. Several features made the project interesting and challenging statistically, including the relationship between missingness on the DXA measures and the values of other variables; the highly multivariate nature of the variables being imputed; the need to transform the DXA variables during the imputation process; the desire to use a large number of non-DXA predictors, many of which had small amounts of missing data themselves, in the imputation models; the use of lower bounds in the imputation procedure; and relationships between the DXA variables and other variables, which helped both in creating and evaluating the imputations. This paper describes the imputation models, methods, and evaluations for this publicly available data resource and demonstrates properties of the imputations via examples of analyses of the data. The analyses suggest that imputation helps to correct biases that occur in estimates based on the data without imputation, and that it helps to increase the precision of estimates as well. Moreover, multiple imputation usually yields larger estimated standard errors than those obtained with single imputation.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.4080DOI Listing
February 2011

The use of covariates to identify records with implausible gestational ages using the birthweight distribution.

Paediatr Perinat Epidemiol 2010 Sep;24(5):424-32

Office of Analysis and Epidemiology, National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782-2003, USA.

The objective of this study was to evaluate the usefulness of covariates in identifying birth records with implausible values of gestational age. Birthweight distributions for births with early reported gestational ages are markedly bimodal, suggesting a mixture of two distributions. Most births form a normal-shaped left-hand (primary) distribution and a smaller number form the right-hand (secondary) distribution. The births in the secondary distribution are thought to have gestational age mistakenly reported. Prior work has found that births in the secondary distribution are at higher risk of poor outcomes than those in the primary distribution. Using 2002 US Natality data for gestational ages 26-35 weeks, we fit normal mixture models to birthweight with and without covariates (maternal race, education, parity, age, region of the country, prenatal care initiation) by reported gestational age. Additional models were stratified by infant sex. This approach allowed for the relationship between the covariates and birthweight to differ between the components. Mixture models fit reasonably well for reported gestational ages <33 weeks, but not for later weeks. Counter to the hypothesis, results were similar for models with and without covariates or stratification or both, although stratified models without covariates predicted slightly more girls and slightly fewer boys in the secondary distribution than did the corresponding unstratified models. For reported gestational ages <33 weeks, predictions from the four sets of models were highly correlated and predictions were similar for subgroups defined by the clinical estimates of gestational age and other covariates. For births with reported gestational ages of 29 or more weeks, the proportion in the secondary distribution exceeded 30%, although this varied by maternal characteristics. The use of covariates and stratification complicated model fitting without materially improving identification of implausible gestational age values, supporting inferences from prior studies using data 'cleaned' without consideration of maternal or infant characteristics.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1365-3016.2010.01138.xDOI Listing
September 2010

State-based estimates of mammography screening rates based on information from two health surveys.

Public Health Rep 2010 Jul-Aug;125(4):567-78

Statistical Research & Applications Branch, National Cancer Institute, Bethesda, MD, USA.

Objectives: We compared national and state-based estimates for the prevalence of mammography screening from the National Health Interview Survey (NHIS), the Behavioral Risk Factor Surveillance System (BRFSS), and a model-based approach that combines information from the two surveys.

Methods: At the state and national levels, we compared the three estimates of prevalence for two time periods (1997-1999 and 2000-2003) and the estimated difference between the periods. We included state-level covariates in the model-based approach through principal components.

Results: The national mammography screening prevalence estimate based on the BRFSS was substantially larger than the NHIS estimate for both time periods. This difference may have been due to nonresponse and noncoverage biases, response mode (telephone vs. in-person) differences, or other factors. However, the estimated change between the two periods was similar for the two surveys. Consistent with the model assumptions, the model-based estimates were more similar to the NHIS estimates than to the BRFSS prevalence estimates. The state-level covariates (through the principal components) were shown to be related to the mammography prevalence with the expected positive relationship for socioeconomic status and urbanicity. In addition, several principal components were significantly related to the difference between NHIS and BRFSS telephone prevalence estimates.

Conclusions: Model-based estimates, based on information from the two surveys, are useful tools in representing combined information about mammography prevalence estimates from the two surveys. The model-based approach adjusts for the possible nonresponse and noncoverage biases of the telephone survey while using the large BRFSS state sample size to increase precision.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2882608PMC
http://dx.doi.org/10.1177/003335491012500412DOI Listing
August 2010

Improving on analyses of self-reported data in a large-scale health survey by using information from an examination-based survey.

Stat Med 2010 Feb;29(5):533-45

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, USA.

Common data sources for assessing the health of a population of interest include large-scale surveys based on interviews that often pose questions requiring a self-report, such as, 'Has a doctor or other health professional ever told you that you have health condition of interest?' or 'What is your (height/weight)?' Answers to such questions might not always reflect the true prevalences of health conditions (for example, if a respondent misreports height/weight or does not have access to a doctor or other health professional). Such 'measurement error' in health data could affect inferences about measures of health and health disparities. Drawing on two surveys conducted by the National Center for Health Statistics, this paper describes an imputation-based strategy for using clinical information from an examination-based health survey to improve on analyses of self-reported data in a larger interview-based health survey. Models predicting clinical values from self-reported values and covariates are fitted to data from the National Health and Nutrition Examination Survey (NHANES), which asks self-report questions during an interview component and also obtains clinical measurements during a physical examination component. The fitted models are used to multiply impute clinical values for the National Health Interview Survey (NHIS), a larger survey that obtains data solely via interviews. Illustrations involving hypertension, diabetes, and obesity suggest that estimates of health measures based on the multiply imputed clinical values are different from those based on the NHIS self-reported data alone and have smaller estimated standard errors than those based solely on the NHANES clinical data. The paper discusses the relationship of the methods used in the study to two-phase/two-stage/validation sampling and estimation, along with limitations, practical considerations, and areas for future research.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.3809DOI Listing
February 2010

Comparisons of percentage body fat, body mass index, waist circumference, and waist-stature ratio in adults.

Am J Clin Nutr 2009 Feb 30;89(2):500-8. Epub 2008 Dec 30.

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, USA.

Background: Body mass index (BMI), waist circumference (WC), and the waist-stature ratio (WSR) are considered to be possible proxies for adiposity.

Objective: The objective was to investigate the relations between BMI, WC, WSR, and percentage body fat (measured by dual-energy X-ray absorptiometry) in adults in a large nationally representative US population sample from the National Health and Nutrition Examination Survey (NHANES).

Design: BMI, WC, and WSR were compared with percentage body fat in a sample of 12,901 adults.

Results: WC, WSR, and BMI were significantly more correlated with each other than with percentage body fat (P < 0.0001 for all sex-age groups). Percentage body fat tended to be significantly more correlated with WC than with BMI in men but significantly more correlated with BMI than with WC in women (P < 0.0001 except in the oldest age group). WSR tended to be slightly more correlated with percentage body fat than was WC. Percentile values of BMI, WC, and WSR are shown that correspond to percentiles of percentage body fat increments of 5 percentage points. More than 90% of the sample could be categorized to within one category of percentage body fat by each measure.

Conclusions: BMI, WC, and WSR perform similarly as indicators of body fatness and are more closely related to each other than with percentage body fat. These variables may be an inaccurate measure of percentage body fat for an individual, but they correspond fairly well overall with percentage body fat within sex-age groups and distinguish categories of percentage body fat.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.3945/ajcn.2008.26847DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647766PMC
February 2009

Evaluation of a method for fitting a semi-Markov process model in the presence of left-censored spells using the Cardiovascular Health Study.

Stat Med 2008 Nov;27(26):5509-24

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, USA.

We used a longitudinal data set covering 13 years from the Cardiovascular Health Study to evaluate the properties of a recently developed approach to deal with left censoring that fits a semi-Markov process (SMP) model by using an analog to the stochastic EM algorithm--the SMP-EM approach. It appears that the SMP-EM approach gives estimates of duration-dependent probabilities of health changes similar to those obtained by using SMP models that have the advantage of actual duration data. SMP-EM estimates of duration-dependent transition probabilities also appear more accurate and less variable than multi-state life table estimates.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.3382DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2878178PMC
November 2008

Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files.

Paediatr Perinat Epidemiol 2007 Sep;21 Suppl 2:97-105

National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, USA.

Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight-gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1365-3016.2007.00866.xDOI Listing
September 2007

Combining information from multiple surveys to enhance estimation of measures of health.

Stat Med 2007 Apr;26(8):1802-11

National Center for Health Statistics, Hyattsville, MD 20782, USA.

Survey estimates are often affected by non-sampling errors due to missing data, coverage error, and measurement or response error. Such non-sampling errors can be difficult to assess, and possibly correct for, using information from a single survey. Thus, combining information from multiple surveys can be beneficial. In addition, combining information from multiple surveys can help to reduce sampling error. This article describes four examples of projects undertaken by researchers within and outside the National Center for Health Statistics of the Centers for Disease Control and Prevention, in which information from multiple surveys was combined to adjust for non-sampling errors and thereby enhance estimation of various measures of health. The four projects can be described briefly as follows: (1) combining estimates from a survey of households and a survey of nursing homes to extend coverage; (2) using information from an interview survey to bridge the transition in race reporting in the United States census; (3) combining information from an examination survey and an interview survey to improve on analyses of self-reported data; and (4) combining information from two interview surveys to enhance small-area estimation. The article highlights the goals, techniques, and results from the four projects and discusses issues that can arise when information is combined from multiple surveys. Published in 2007 by John Wiley & Sons, Ltd.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.2801DOI Listing
April 2007

Overlapping confidence intervals or standard error intervals: what do they mean in terms of statistical significance?

J Insect Sci 2003 30;3:34. Epub 2003 Oct 30.

Department of Statistics, Oklahoma State University, 301 MSCS Building, Stillwater, OK 74078-1056, USA.

We investigate the procedure of checking for overlap between confidence intervals or standard error intervals to draw conclusions regarding hypotheses about differences between population parameters. Mathematical expressions and algebraic manipulations are given, and computer simulations are performed to assess the usefulness of confidence and standard error intervals in this manner. We make recommendations for their use in situations in which standard tests of hypotheses do not exist. An example is given that tests this methodology for comparing effective dose levels in independent probit regressions, an application that is also pertinent to derivations of LC50s for insect pathogens and of detectability half-lives for prey proteins or DNA sequences in predator gut analysis.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC524673PMC
http://dx.doi.org/10.1093/jis/3.1.34DOI Listing
November 2006

Bridging between two standards for collecting information on race and ethnicity: an application to Census 2000 and vital rates.

Public Health Rep 2004 Mar-Apr;119(2):192-205

Office of Analysis, Epidemiology, and Health Promotion, National Center for Health Statistics, 3311 Toledo Rd., Rm. 6415, Hyattsville, MD 20782, USA.

Objectives: The 2000 Census, which provides denominators used in calculating vital statistics and other rates, allowed multiple-race responses. Many other data systems that provide numerators used in calculating rates collect only single-race data. Bridging is needed to make the numerators and denominators comparable. This report describes and evaluates the method used by the National Center for Health Statistics to bridge multiple-race responses obtained from Census 2000 to single-race categories, creating single-race population estimates that are available to the public.

Methods: The authors fitted logistic regression models to multiple-race data from the National Health Interview Survey (NHIS) for 1997-2000. These fitted models, and two bridging methods previously suggested by the Office of Management and Budget, were applied to the public-use Census Modified Race Data Summary file to create single-race population estimates for the U.S. The authors also compared death rates for single-race groups calculated using these three approaches.

Results: Parameter estimates differed between the NHIS models for the multiple-race groups. For example, as the percentage of multiple-race respondents in a county increased, the likelihood of stating black as a primary race increased among black/white respondents but decreased among American Indian or Alaska Native/black respondents. The inclusion of county-level contextual variables in the regression models as well as the underlying demographic differences across states led to variation in allocation percentages; for example, the allocation of black/white respondents to single-race white ranged from nearly zero to more than 50% across states. Death rates calculated using bridging via the NHIS models were similar to those calculated using other methods, except for the American Indian/Alaska Native group, which included a large proportion of multiple-race reporters.

Conclusion: Many data systems do not currently allow multiple-race reporting. When such data systems are used with Census counts to produce race-specific rates, bridging methods that incorporate geographic and demographic factors may lead to better rates than methods that do not consider such factors.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1497618PMC
http://dx.doi.org/10.1177/003335490411900213DOI Listing
June 2004

United States Census 2000 population with bridged race categories.

Vital Health Stat 2 2003 Sep(135):1-55

Objectives: The objectives of this report are to document the methods developed at the National Center for Health Statistics (NCHS) to bridge the Census 2000 multiple-race resident population to single-race categories and to describe the resulting bridged race resident population estimates.

Method: Data from the pooled 1997-2000 National Health Interview Surveys (NHIS) were used to develop models for bridging the Census 2000 multiple-race population to single-race categories. The bridging models included demographic and contextual covariates, some at the person-level and some at the county-level. Allocation probabilities were obtained from the regression models and applied to the Census Bureau's April 1, 2000, Modified Race Data Summary File population counts to assign multiple-race persons to single-race categories.

Results: Bridging has the most impact on the American Indian and Alaska Native (AIAN) and Asian or Pacific Islander (API) populations, a small impact on the Black population and a negligible impact on the White population. For the United States as a whole, the AIAN, API, Black, and White bridged population counts are 12.0, 5.0, 2.5, and 0.5 percent higher than the corresponding Census 2000 single-race counts. At the sub-national level, there is considerably more variation than observed at the national level. The bridged single-race population counts have been used to calculate birth and death rates produced by NCHS for 2000 and 2001 and to revise previously published rates for the 1990s, 2000, and 2001. The bridging methodology will be used to bridge postcensal population estimates for later years. The bridged population counts presented here and in subsequent years may be updated as additional data become available for use in the bridging process.
View Article and Find Full Text PDF

Download full-text PDF

Source
September 2003

From single-race reporting to multiple-race reporting: using imputation methods to bridge the transition.

Stat Med 2003 May;22(9):1571-87

Office of Research and Methodology, National Center for Health Statistics, Hyattsville, MD 20782, USA.

In 1997, the Office of Management and Budget issued revised standards for the collection of race information within the Federal statistical system. One revision allows individuals to choose more than one race group when responding to Federal surveys and other Federal data collections. This paper explores methods that impute single-race categories for those who have given multiple-race responses. Such imputations would be useful when it is desired to conduct analyses involving only single-race categories, such as when trends over time are being examined by race group so that data collected under the old and new standards are being combined. The National Health Interview Survey has allowed multiple-race responses for several years, while also asking respondents to specify one race as their primary race. Exploratory analyses of data from the survey suggest that imputation methods that use demographic and contextual covariate information to predict primary race can have advantages with respect to lower bias and improved variance estimation compared to simpler methods discussed by the Office of Management and Budget. It also appears, however, that the relationships between primary race and covariates might be changing over time. Thus, caution should be exercised if an imputation model fitted to data from one time period is to be applied to data from another time period. Published in 2003 by John Wiley & Sons, Ltd.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/sim.1512DOI Listing
May 2003

Combining estimates from complementary surveys: a case study using prevalence estimates from national health surveys of households and nursing homes.

Public Health Rep 2002 Jul-Aug;117(4):393-407

Office of Research and Methodology, National Center for Health Statistics, Hyattsville, MD 20782, USA.

Objectives: When a single survey does not cover a domain of interest, estimates from two or more complementary surveys can be combined to extend coverage. The purposes of this article are to discuss and demonstrate the benefits of combining estimates from complementary surveys and to provide a catalog of the analytic issues involved.

Methods: The authors present a case study in which data from the National Health Interview Survey and the National Nursing Home Survey were combined to obtain prevalence estimates for several chronic health conditions for the years 1985, 1995, and 1997. The combined prevalences were estimated by ratio estimation, and the associated variances were estimated by Taylor linearization. The survey weights, stratification, and clustering were reflected in the estimation procedures.

Results: In the case study, for the age group of 65 and older, the combined prevalence estimates for households and nursing homes are close to those for households alone. For the age group of 85 and older, however, the combined estimates are sometimes substantially different from the household estimates. Such differences are seen both for estimates within a single year and for estimates of trends across years.

Conclusions: Several general issues regarding comparability arise when there is a goal of combining complementary survey data. As illustrated by this case study, combining estimates can be very useful for improving coverage and avoiding misleading conclusions.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1497448PMC
http://dx.doi.org/10.1093/phr/117.4.393DOI Listing
January 2003

Survival analysis using auxiliary variables via multiple imputation, with application to AIDS clinical trial data.

Biometrics 2002 Mar;58(1):37-47

Department of Biostatistics, UCLA School of Public Health, Los Angeles, California 90095-1772, USA.

We develop an approach, based on multiple imputation, to using auxiliary variables to recover information from censored observations in survival analysis. We apply the approach to data from an AIDS clinical trial comparing ZDV and placebo, in which CD4 count is the time-dependent auxiliary variable. To facilitate imputation, a joint model is developed for the data, which includes a hierarchical change-point model for CD4 counts and a time-dependent proportional hazards model for the time to AIDS. Markov chain Monte Carlo methods are used to multiply impute event times for censored cases. The augmented data are then analyzed and the results combined using standard multiple-imputation techniques. A comparison of our multiple-imputation approach to simply analyzing the observed data indicates that multiple imputation leads to a small change in the estimated effect of ZDV and smaller estimated standard errors. A sensitivity analysis suggests that the qualitative findings are reproducible under a variety of imputation models. A simulation study indicates that improved efficiency over standard analyses and partial corrections for dependent censoring can result. An issue that arises with our approach, however, is whether the analysis of primary interest and the imputation model are compatible.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.0006-341x.2002.00037.xDOI Listing
March 2002