Li Y, Wei Y, Li B, Alterovitz G. Modified Anderson-Darling test-based target detector in non-homogenous environments. Sensors (Basel) [Internet]. 2014;14 (9) :16046-61. Publisher's VersionAbstract

A constant false alarm rate (CFAR) target detector in non-homogenous backgrounds is proposed. Based on K-sample Anderson-Darling (AD) tests, the method re-arranges the reference cells by merging homogenous sub-blocks surrounding the cell under test (CUT) into a new reference window to estimate the background statistics. Double partition test, clutter edge refinement and outlier elimination are used as an anti-clutter processor in the proposed Modified AD (MAD) detector. Simulation results show that the proposed MAD test based detector outperforms cell-averaging (CA) CFAR, greatest of (GO) CFAR, smallest of (SO) CFAR, order-statistic (OS) CFAR, variability index (VI) CFAR, and CUT inclusive (CI) CFAR in most non-homogenous situations.

Villa A, Zollanvari A, Alterovitz G, Cagetti MG, Strohmenger L, Abati S. Prevalence of halitosis in children considering oral hygiene, gender and age. Int J Dent Hyg [Internet]. 2014;12 (3) :208-12. Publisher's VersionAbstract

BACKGROUND: To date, few studies have addressed halitosis in the paediatric population. As such, the aim of the present study was to investigate symptoms, signs and risk factors associated with halitosis in healthy children and to present a model based on the clinical data that predicts the presence of halitosis. METHODS: A total of 101 individuals were included. All patients received a questionnaire that queried on sociodemographic characteristics, self-reported halitosis and dental treatment history. Individuals received a thorough intra-oral examination, and the volatile sulphur compounds (VSC) were measured to test the presence of halitosis with a portable sulphide monitor (Halimeter(®); Interscan Co., Chatsworth, CA, USA). The distribution of the sociodemographic characteristics, self-reported halitosis, dental treatment history and other oral features was evaluated. Finally, a statistical model was constructed with the best set of features to predict halitosis in children. RESULTS: The median age was 12.0 years (mean: 11.7 ± SD 2.7) with 54.5% males. Halitosis (VSC > 100 parts per billion, or ppb) was objectively measured in 37.6% of patients. For comparison purposes, Bayesian network was obtained using clinical and demographic data. The model consisted of four variables (sex, age, oral hygiene status and self-reported halitosis) directly related to the presence of halitosis (VSC > 100 ppb). This model achieved 76.4% area under receiver operating characteristics curve (AUROC). Overall, female patients or individuals with dental plaque on more than 25% of the dental surfaces or patients older than 13 year old were more prone to present with halitosis. CONCLUSIONS: The results suggest that halitosis in the paediatric population is related to poor oral hygiene and may be more common in females and older individuals. This specific predictive model may be useful to identify subgroups to target for intervention to treat oral halitosis.

Warner JL, Denny JC, Kreda DA, Alterovitz G. Seeing the forest through the trees: uncovering phenomic complexity through interactive network visualization. J Am Med Inform Assoc [Internet]. 2014. Publisher's VersionAbstract

Our aim was to uncover unrecognized phenomic relationships using force-based network visualization methods, based on observed electronic medical record data. A primary phenotype was defined from actual patient profiles in the Multiparameter Intelligent Monitoring in Intensive Care II database. Network visualizations depicting primary relationships were compared to those incorporating secondary adjacencies. Interactivity was enabled through a phenotype visualization software concept: the Phenomics Advisor. Subendocardial infarction with cardiac arrest was demonstrated as a sample phenotype; there were 332 primarily adjacent diagnoses, with 5423 relationships. Primary network visualization suggested a treatment-related complication phenotype and several rare diagnoses; re-clustering by secondary relationships revealed an emergent cluster of smokers with the metabolic syndrome. Network visualization reveals phenotypic patterns that may have remained occult in pairwise correlation analysis. Visualization of complex data, potentially offered as point-of-care tools on mobile devices, may allow clinicians and researchers to quickly generate hypotheses and gain deeper understanding of patient subpopulations.

Warner J, Yang P, Alterovitz G. Automated synthesis and visualization of a chemotherapy treatment regimen network. Stud Health Technol Inform [Internet]. 2013;192 :62-6. Publisher's VersionAbstract

Cytotoxic treatments for cancer remain highly toxic, expensive, and variably efficacious. Many chemotherapy regimens are never directly compared in randomized clinical trials (RCTs); as a result, the vast majority of guideline recommendations are ultimately derived from human expert opinion. We introduce an automated network meta-analytic approach to this clinical problem, with nodes representing regimens and edges direct comparison via RCT(s). A chemotherapy regimen network is visualized for the primary treatment of chronic myelogenous leukemia (CML). Node and edge color, size, and opacity are all utilized to provide additional information about the quality and strength of the depicted evidence. Historical versions of the network are also created. With this approach, we were able to compactly compare the results of 17 CML regimens involving RCTs of 9700 patients, representing the accumulation of 45 years of evidence. Our results closely parallel the recommendations issued by a professional guidelines organization, the National Comprehensive Cancer Network (NCCN). This approach offers a novel method for interpreting complex clinical data, with potential implications for future objective guideline development.

Warner JL, Alterovitz G, Bodio K, Joyce RM. External phenome analysis enables a rational federated query strategy to detect changing rates of treatment-related complications associated with multiple myeloma. J Am Med Inform Assoc [Internet]. 2013;20 (4) :696-9. Publisher's VersionAbstract

Electronic health records (EHRs) are increasingly useful for health services research. For relatively uncommon conditions, such as multiple myeloma (MM) and its treatment-related complications, a combination of multiple EHR sources is essential for such research. The Shared Health Research Information Network (SHRINE) enables queries for aggregate results across participating institutions. Development of a rational search strategy in SHRINE may be augmented through analysis of pre-existing databases. We developed a SHRINE query for likely non-infectious treatment-related complications of MM, based upon an analysis of the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database. Using this query strategy, we found that the rate of likely treatment-related complications significantly increased from 2001 to 2007, by an average of 6% a year (p=0.01), across the participating SHRINE institutions. This finding is in keeping with increasingly aggressive strategies in the treatment of MM. This proof of concept demonstrates that a staged approach to federated queries, using external EHR data, can yield potentially clinically meaningful results.

Ting C, Alterovitz G, Merlob A, Abdi R. Genomic studies of GVHD-lessons learned thus far. Bone Marrow Transplant [Internet]. 2013;48 (1) :4-9. Publisher's VersionAbstract

GVHD remains the most significant complication of hematopoietic SCT, despite advances in HLA matching and the identification of risk various factors. To account for the variation in the incidence and severity of this disease, many genetic association studies have been performed in order to explore the role of immunoregulatory gene polymorphisms. These genes include those that encode cytokines, chemokines, and costimulatory molecules. Polymorphisms in other classes of genes such as those involved in drug metabolism, protein folding, and DNA replication have also been studied. In this review, we address the current knowledge of the role of genetic polymorphisms in GVHD. We also discuss the potential pitfalls inherent in genetic association testing and alternative strategies to address these problems.

Kallenbach J, Hsu W-L, Dunker KA, Alterovitz G. Order-disorder interface characterization reveals critical factors for disease and drug targets. AMIA Jt Summits Transl Sci Proc [Internet]. 2013;2013 :101. Publisher's VersionAbstract

Signal transduction pathways are of critical importance in disease and regulation of cellular functions. Proteins that do not fold to a state of stable tertiary structure, known as intrinsically disordered proteins, are highly represented in signaling pathways and protein interaction networks. Important examples of disordered signaling proteins include p53 and BRCA1, and approximately 40% of Eukaryotic proteins are estimated to have significant disordered regions. Certain regions within these disordered proteins, however, can take on an ordered structure upon binding to a partner. The nature of the resulting protein-protein interactions has not yet been established. Here we categorize and identify interactions between binding segments of disordered proteins and their ordered partners using a Bayesian network framework, constructed on a test set of 964 proteins mined for Molecular Recognition Feature (MoRF) characteristics from the PDB. This framework, more specifically Bayesian network learning, enables us to investigate the underlying biological processes involved, including the sequential and structural determinants of these interactions. After the construction of the training set (80% of data), features were successively eliminated to determine relative significances. The Bayesian network model was validated on the test set with excellent accuracy(>90% AUC). Examining features underlying the model provides a plethora of new and potentially useful biological information. The results also lend themselves to a strategy for rational drug design whereby disordered regions can be targeted with a high degree of specificity and small molecule peptide mimetics of their binding regions can be utilized as drugs.

Warner JL, Yang P, Alterovitz G. Reversal of medical practices. Mayo Clin Proc [Internet]. 2013;88 (10) :1182-3. Publisher's Version
K Koppula S, Zollanvari A, An N, Alterovitz G. Robust prediction-based analysis for genome-wide association and expression studies. AMIA Jt Summits Transl Sci Proc [Internet]. 2013;2013 :104. Publisher's VersionAbstract

Here we describe a prediction-based framework to analyze omic data and generate models for both disease diagnosis and identification of cellular pathways which are significant in complex diseases. Our framework differs from previous analysis in its use of underlying biology (cellular pathways/gene-sets) to produce predictive feature-disease models. In our study of alcoholism, lung cancer, and schizophrenia, we demonstrate the framework's ability to robustly analyze omic data of multiple types and sources, identify significant features sets, and produce accurate predictive models.

Sonis S, Antin J, Tedaldi M, Alterovitz G. SNP-based Bayesian networks can predict oral mucositis risk in autologous stem cell transplant recipients. Oral Dis [Internet]. 2013;19 (7) :721-7. Publisher's VersionAbstract

OBJECTIVE: Approximately 40% of patients receiving conditioning chemotherapy prior to autologous hematopoietic stem cell transplants (aHSCT) develop severe oral mucositis (SOM). Aside from disabling pain, ulcerative lesions associated with SOM predispose to poor health and economic outcomes. Our objective was to develop a probabilistic graphical model in which a cluster of single-nucleotide polymorphisms (SNPs) derived from salivary DNA could be used as a tool to predict SOM risk. METHODS: Salivary DNA was extracted from 153 HSCT patients and applied to Illumina BeadChips. Using sequential data analysis, we filtered extraneous SNPs, selected loci, and identified a predictive SNP network for OM risk. We then tested the predictive validity of the network using SNP array outputs from an independent HSCT cohort. RESULTS: We identified an 82-SNP Bayesian network (BN) that was related to SOM risk with a 10-fold cross-validation accuracy of 99.3% and an area under the ROC curve of 99.7%. Using samples from a small independent patient cohort (n = 16), we demonstrated the network's predictive validity with an accuracy of 81.2% in the absence of any false positives. CONCLUSIONS: Our results suggest that SNP-based BN developed from saliva-sourced DNA can predict SOM risk in patients prior to aHSCT.

Warner JL, Zollanvari A, Ding Q, Zhang P, Snyder GM, Alterovitz G. Temporal phenome analysis of a large electronic health record cohort enables identification of hospital-acquired complications. J Am Med Inform Assoc [Internet]. 2013;20 (e2) :e281-7. Publisher's VersionAbstract

OBJECTIVE: To develop methods for visual analysis of temporal phenotype data available through electronic health records (EHR). MATERIALS AND METHODS: 24 580 adults from the multiparameter intelligent monitoring in intensive care V.6 (MIMIC II) EHR database of critically ill patients were analyzed, with significant temporal associations visualized as a map of associations between hospital length of stay (LOS) and ICD-9-CM codes. An expanded phenotype, using ICD-9-CM, microbiology, and computerized physician order entry data, was defined for hospital-acquired Clostridium difficile (HA-CDI). LOS, estimated costs, 30-day post-discharge mortality, and antecedent medication provider order entry were evaluated for HA-CDI cases compared to randomly selected controls. RESULTS: Temporal phenome analysis revealed 191 significant codes (p value, adjusted for false discovery rate, ≤0.05). HA-CDI was identified in 414 cases, and was associated with longer median LOS, 20 versus 9 days, and adjusted HR 0.33 (95% CI 0.28 to 0.39). This prolongation carries an estimated annual incremental cost increase of US$1.2-2.0 billion in the USA alone. DISCUSSION: Comprehensive EHR data have made large-scale phenome-based analysis feasible. Time-dependent pathological disease states have dynamic phenomic evolution, which may be captured through visual analytical approaches. Although MIMIC II is a single institutional retrospective database, our approach should be portable to other EHR data sources, including prospective 'learning healthcare systems'. For example, interventions to prevent HA-CDI could be dynamically evaluated using the same techniques. CONCLUSIONS: The new visual analytical method described in this paper led directly to the identification of numerous hospital-acquired conditions, which could be further explored through an expanded phenotype definition.

Parikh N, Zollanvari A, Alterovitz G. An automated bayesian framework for integrative gene expression analysis and predictive medicine. AMIA Jt Summits Transl Sci Proc [Internet]. 2012;2012 :95-104. Publisher's VersionAbstract

MOTIVATION: This work constructs a closed loop Bayesian Network framework for predictive medicine via integrative analysis of publicly available gene expression findings pertaining to various diseases. RESULTS: An automated pipeline was successfully constructed. Integrative models were made based on gene expression data obtained from GEO experiments relating to four different diseases using Bayesian statistical methods. Many of these models demonstrated a high level of accuracy and predictive ability. The approach described in this paper can be applied to any complex disorder and can include any number and type of genome-scale studies.

Deng M, Zollanvari A, Alterovitz G. A bayesian translational framework for knowledge propagation, discovery, and integration under specific contexts. AMIA Jt Summits Transl Sci Proc [Internet]. 2012;2012 :25-34. Publisher's VersionAbstract

The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts-rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well.

Marwah K, Katzin D, Zollanvari A, Noy NF, Ramoni M, Alterovitz G. Context-specific ontology integration: a bayesian approach. AMIA Jt Summits Transl Sci Proc [Internet]. 2012;2012 :79-86. Publisher's VersionAbstract

We introduce a principled computational framework and methodology for automated discovery of context-specific functional links between ontologies. Our model leverages over disparate free-text literature resources to score the model of dependency linking two terms under a context against their model of independence. We identify linked terms as those having a significant bayes factor (p < 0.01). To scale our algorithm over massive ontologies, we propose a heuristic pruning technique as an efficient algorithm for inferring such links.We have applied this method to translationalize Gene Ontology to all other ontologies available at National Center of Biomedical Ontology (NCBO) BioPortal under the context of Human Disease ontology. Our results show that in addition to broadening the scope of hypothesis for researchers, our work can potentially be used to explore continuum of relationships among ontologies to guide various biological experiments.

Yu Y-H, Chiou G-Y, Huang P-I, Lo W-L, Wang C-Y, Lu K-H, Yu C-C, Alterovitz G, Huang W-C, Lo J-F, et al. Network biology of tumor stem-like cells identified a regulatory role of CBX5 in lung cancer. Sci Rep [Internet]. 2012;2 :584. Publisher's VersionAbstract

Mounting evidence links cancers possessing stem-like properties with worse prognosis. Network biology with signal processing mechanics was explored here using expression profiles of a panel of tumor stem-like cells (TSLCs). The profiles were compared to their parental tumor cells (PTCs) and the human embryonic stem cells (hESCs), for the identification of gene chromobox homolog 5, CBX5, as a potential target for lung cancer. CBX5 was found to regulate the stem-like properties of lung TSLCs and was predictive of lung cancer prognosis. The investigation was facilitated by finding target genes based on modeling epistatic signaling mechanics via a predictive and scalable network-based survival model. Topologically-weighted measurements of CBX5 were synchronized with those of BIRC5, DNMT1, E2F1, ESR1, MLH1, MSH2, RB1, SMAD1 and TAF5. We validated our findings in another Taiwanese lung cancer cohort, as well as in knockdown experiments using sh-CBX5 RNAi both in vitro and in vivo.

Warner JL, Alterovitz G. Phenome based analysis as a means for discovering context dependent clinical reference ranges. AMIA Annu Symp Proc [Internet]. 2012;2012 :1441-9. Publisher's VersionAbstract

Robust electronic medical records (EMR's) have made large-scale phenome-based analysis feasible. The context-dependent phenome of a large ICU-based EMR database (MIMIC II) was explored, as a function of a clinical feature: white blood cell count (WBC). Phenome visualization led to the discovery that peak WBC in the range 15-45 K/μl was highly associated with the diagnoses of Clostridium difficile and bacterial sepsis; thus, it is conceivable that clinicians might delay ordering targeted antimicrobials towards C. difficile for patients with peak WBC in this range. This hypothesis was confirmed, with significant delays in this group (median 135 vs. 85 hours, p = 0.002). These delays could be associated with adverse effects on patient health and high hospitalization costs (e.g. an additional $3,000,000 for the MIMIC II cohort). In conclusion, context-dependent clinical reference ranges are critical to clinical decision making; furthermore, important findings can be discovered through EMR-driven phenome association studies.

Quo CF, Kaddi C, Phan JH, Zollanvari A, Xu M, Wang MD, Alterovitz G. Reverse engineering biomolecular systems using -omic data: challenges, progress and opportunities. Brief Bioinform [Internet]. 2012;13 (4) :430-45. Publisher's VersionAbstract

Recent advances in high-throughput biotechnologies have led to the rapid growing research interest in reverse engineering of biomolecular systems (REBMS). 'Data-driven' approaches, i.e. data mining, can be used to extract patterns from large volumes of biochemical data at molecular-level resolution while 'design-driven' approaches, i.e. systems modeling, can be used to simulate emergent system properties. Consequently, both data- and design-driven approaches applied to -omic data may lead to novel insights in reverse engineering biological systems that could not be expected before using low-throughput platforms. However, there exist several challenges in this fast growing field of reverse engineering biomolecular systems: (i) to integrate heterogeneous biochemical data for data mining, (ii) to combine top-down and bottom-up approaches for systems modeling and (iii) to validate system models experimentally. In addition to reviewing progress made by the community and opportunities encountered in addressing these challenges, we explore the emerging field of synthetic biology, which is an exciting approach to validate and analyze theoretical system models directly through experimental synthesis, i.e. analysis-by-synthesis. The ultimate goal is to address the present and future challenges in reverse engineering biomolecular systems (REBMS) using integrated workflow of data mining, systems modeling and synthetic biology.

Alterovitz G, Tuthill C, Rios I, Modelska K, Sonis S. Personalized medicine for mucositis: Bayesian networks identify unique gene clusters which predict the response to gamma-D-glutamyl-L-tryptophan (SCV-07) for the attenuation of chemoradiation-induced oral mucositis. Oral Oncol [Internet]. 2011;47 (10) :951-5. Publisher's VersionAbstract

Gamma-D-glutamyl-L-tryptophan (SCV-07) demonstrated an overall efficacy signal in ameliorating oral mucositis (OM) in a clinical trial of head and neck cancer patients. However, not all SCV-07-treated subjects responded positively. Here we determined if specific gene clusters could discriminate between subjects who responded to SCV-07 and those who did not. Microarrays were done using peripheral blood RNA obtained at screening and on the last day of radiation from 28 subjects enrolled in the SCV-07 trial. An analytical technique was applied that relied on learned Bayesian networks to identify gene clusters which discriminated between individuals who received SCV-07 and those who received placebo, and which differentiated subjects for whom SCV-07 was an effective OM intervention from those for whom it was not. We identified 107 genes that discriminated SCV-07 responders from non-responders using four models and applied Akaike Information Criteria (AIC) and Bayes Factor (BF) analysis to evaluate predictive accuracy. AIC were superior to BF: the accuracy of predicting placebo vs. treatment was 78% using BF, but 91% using the AIC score. Our ability to differentiate responders from non-responders using the AIC score was dramatic and ranged from 93% to 100% depending on the dataset that was evaluated. Predictive Bayesian networks were identified and functional cluster analyses were performed. A specific 10 gene cluster was a critical contributor to the predictability of the dataset. Our results demonstrate proof of concept in which the application of a genomics-based analytical paradigm was capable of discriminating responders and non-responders for an OM intervention.

Zollanvari A, Saccone NL, Bierut LJ, Ramoni MF, Alterovitz G. Is the reduction of dimensionality to a small number of features always necessary in constructing predictive models for analysis of complex diseases or behaviours?. Conf Proc IEEE Eng Med Biol Soc [Internet]. 2011;2011 :3573-6. Publisher's VersionAbstract

Gene expression and genome wide association data have provided researchers the opportunity to study many complex traits and diseases. When designing prognostic and predictive models capable of phenotypic classification in this area, significant reduction of dimensionality through stringent filtering and/or feature selection is often deemed imperative. Here, this work challenges this presumption through both theoretical and empirical analysis. This work demonstrates that by a proper compromise between structure of the selected model and the number of features, one is able to achieve better performance even in large dimensionality. The inclusion of many genes/variants in the classification rules can help shed new light on the analysis of complex traitstraits that are typically determined by many causal variants with small effect size.

Alterovitz G, Muso T, Ramoni MF. The challenges of informatics in synthetic biology: from biomolecular networks to artificial organisms. Brief Bioinform [Internet]. 2010;11 (1) :80-95. Publisher's VersionAbstract

The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems. These biological machines are built from basic biomolecular components analogous to electrical devices, and the information flow among these components requires the augmentation of biological insight with the power of a formal approach to information management. Here we review the informatics challenges in synthetic biology along three dimensions: in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, from the specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitro synthetic biology in terms of information flow, and discuss genetic fidelity in DNA manipulation, development strategies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineering chassis can manipulate biological circuitries in vivo to give rise to future artificial organisms.