Arkansas Bioinformatics Consortium Home Page

 

Poster Abstracts

AR-BIC Second Nnnual Conference - April 17-18, 2016

 





Poster Number
Presenter
Affiliation
Title and Affiliation
AR-BIC-1
Kakraba, Samuel
UALR-UAMS
Effects of Small Molecules on Protein Aggregation and Paralysis In C. Elegans Expressing Aß1–42 in the Muscle
AR-BIC-2
Townsend, TA
NCTR/DGMT
The Development of a Modified Comet Assay for High-Throughput Assessment of DNA Methylation Status
AR-BIC-3
Fang, Hong
NCTR
FDALable Database: A Rich Resource For Study Of Pharmacogenomics Biomarkers To Facilitate Precision Medicine And Drug Safety
AR-BIC-4
Xu, Joshua
NCTR/DBB
Bioinformatics Choices Substantially Impact Isoform Analysis of RNA-seq Data from a Toxicogenomics Study
AR-BIC-5
Liu, Zhichao
NCTR/DBB
Bioinformatics Choices Substantially Impact Isoform Analysis of RNA-seq Data from a Toxicogenomics Study
AR-BIC-6
Gong, Binsheng
NCTR/DBB
Landscape of circRNA Candidates Across 11 Organs and 4 Developmental Stages in Fischer 344 Rat
AR-BIC-7
Chen, Minjun
NCTR/DBB-UAMS
The Largest Reference Drug List Ranked by the Risk for Developing Drug-Induced Liver Injury in Humans
AR-BIC-8
Wu, Leihong
NCTR/DBB
Investigating the Effect of Reads Coverage on Discovery of Single Nucleotide Variations in Human Genome with Alignment-Based and Assembly-Based Approaches
AR-BIC-9
Gu, Qiang
NCTR/DNT
Antibody Microarray Analysis of Protein Level Changes in an In Vitro Blood-Brain Barrier Model Following Exposures to Silver-Nanoparticles: Focusing on Apoptosis Signaling Proteins
AR-BIC-10
Li, Dan
UALR
An Integrative Method for Comprehensively Reconstructing Transcripts and Long Non-Coding RNA Identification
AR-BIC-11
Wang, Yan
UALR-UAMS
Association of Age at Menarche with Composition and Diversity of Gut Microbiota among Women in the Twins UK Study
AR-BIC-12
Nookaew, Intawat
UAMS & more
Biomarker Discovery for Kidney Cancer Diagnosis Based on a Unique Signature of Metabolic Reprogramming.
AR-BIC-13
Yavas, Gokhan
NCTR/DBB
A Framework for Evaluating the Quality of the Personal Genomes Generated by De Novo Assembly Tools
AR-BIC-14
Lee, Un Jung
NCTR/DBB
Tree-Based Recursive Partitioning Methods for Subgroup Selection in Precision Medicine
AR-BIC-15
Bayraktar, M
UAMS
A Novel Image Interpreting System for Clinics: From Tumor Response Tracking to Similar Image Retrieval
AR-BIC-16
Rhoads, Douglas
UAF
Genetic and Genomic Analyses of Staphylococcus agnotis, an Agent of Bacterial Chondronecrosis with Osteomyelitis in Broilers
AR-BIC-17
Thakkar, Shraddha
NCTR/DBB
Liver Toxicity Knowledge Base (LTKB): A comprehensive database to understand multiple dimensions of Drug-Induced Liver Injury
AR-BIC-18
Walker, Cameron L
UAPB
LCS-Based Protein Structure Prediction
AR-BIC-19
Gupta, Chirag
UAF/VA Tech
Network Analysis Pipelines for Systems Biology
AR-BIC-20
Thomas, J
UAF/CO Sate
Alternative Splicing in Environmental Stress Regulated Genes
AR-BIC-21
Zou, W
NCTR/DBB
Best Practices in Mining Meaningful Topics from Regulatory Textual Documents
AR-BIC-22 Saraf, Manish K
ACH & more
Formula Milk Alters Microbial Diversity and Impacts Immune Response in Porcine Neonatal Model
       
Poster Abstracts
(Presenter name bolded)

Effects of Small Molecules on Protein Aggregation and Paralysis in C. Elegans Expressing Aß1–42 in the Muscle
Samuel Kakraba (1) Narsimha R. Penthala (4), Peter A. Crooks (4) Robert J. Shmookler Reis (2, 3) and Srinivas Ayyadevara (2,3)

(1) UALR-UAMS Joint Program in Bioinformatics, University of Arkansas, Little Rock, AR; (2) Central Arkansas Veterans Healthcare System, Little Rock AR; (3) University of Arkansas for Medical Sciences, Little Rock AR; (4) Department of Pharmaceutical Sciences, College of Pharmacy, UAMS, Little Rock, AR

Proteins require correct folding and maintenance in order to function effectively and efficiently. Most or all common neurological disorders, such as Alzheimer's and Parkinson's diseases, and possibly a wide range of other age-associated diseases, are attributable to protein aggregation that is cytotoxic, especially to nerve cells. Protein aggregation is a biological phenomenon in which misfolded proteins aggregate (i.e., adhere together in Large conglomerates) either intra- or extracellular. Our goal is to determine whether anti-inflammatory compounds (i.e. parthenolide, sclareol, Combretastatin, and thiadiazolidinones (TDZD) analogs) are effective at reducing protein aggregation as well as preventing paralysis in C. elegans strain CL4176, which expresses a human Aß1–42 transgene in body-wall muscle. In addition to conducting studies on a library of small molecules as a first step in an iterative process of drug optimization, we have also assessed dose-response functions for active lead compounds in reducing protein aggregates.This project was made possible by the Arkansas INBRE program, supported by grant funding from the National Institutes of Health (NIH) National Institute of General Medical Sciences (NIGMS) (P20 GM103429) (formerly P20RR016460).
AR-BIC-1


The Development of a Modified Comet Assay for High-Throughput Assessment of DNA Methylation Status
TA Townsend and MG Manjanatha

National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR

Assessing DNA damage and epigenetic modifications, including DNA methylation, is critical in predicting carcinogenicity of pharmacological and biological agents. The single cell gel electrophoresis (Comet) assay is validated for regulatory use, with an OECD Test Guideline (TG489) approved in 2014 for conducting the in vivo Comet assay. Here, we utilized the methylation-dependent restriction endonuclease, McrBC, to develop a modified alkaline Comet assay that allows the use a single platform to evaluate genotoxicity, global and region-specific DNA methylation status of single cells under various conditions. First, we confirmed dose-dependent induction of DNA damage using a known genotoxicant, methyl methanesulfonate (MMS), in human cell lines derived from breast (Mean % Tail DNA±SEM, Control: 12.5±2.8; 20 µg/ml MMS Treatment (3h): 53.9±1.9, p=0.01), cervix (4.7±1.3; 51.9±2.4, p=0.006), liver (8.0±4.1; 60.6±0.8, p=0.004), and spleen (3.2±0.7; 50.1±0.7, p=0.001). Next, we defined background levels of global (5-mC) methylation in these cell lines (5-mC %±SEM for breast: 1.8±0.56%, cervix: 2.2±0.31%, spleen: 0.9±0.28%, liver: 1.5±0.47%), and characterized the dose-response kinetics to several agents of interest to the FDA, including chemotherapeutic, environmental and novel agents. We then demonstrated proof-of-principle for our assay by detecting hypermethylation after 20 µM hydroxyurea treatment (Difference in % Tail DNA with McrBC vs. buffer±SEM: 30.1±3.1, p=0.002), and hypomethylation with 0.1 mM 5-Azacytidine treatment (-6.3±0.9, p=0.03). To date, these results demonstrate high sensitivity of the modified Comet assay for detecting as little as a 14% reduction in global DNA methylation in single cells. The successful application of this novel technology will aid in the hazard identification and risk characterization of FDA-regulated products. Furthermore, this assay will have utility in investigating the potential epigenetic mode of action of agents in target organs, since the assay is amenable to cells in culture or cells from any tissue.
AR-BIC-2


FDALable Database: A Rich Resource for Study Of Pharmacogenomics Biomarkers To Facilitate Precision Medicine And Drug Safety
Hong Fang, Joshua Xu, Zhichao Liu, Stephen Harris, Shraddha Thakkar, Guangxu Zhou, Daojun Liu, Paul Howard and Weida Tong

National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR

Pharmacogenomics (PGx) is the study of individual genetic differences (acquired and inherited) in correlation to drug response. Understanding the association between PGx markers and phenotypes improves knowledge of underlying mechanisms of diseases and treatment responses for enhanced drug safety and precision medicine. Research on this topic has been a challenge, because of lack of easy access to PGx data. We have developed the FDALabel database (http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm289739.htm), which allows users to perform customizable, full-text searches in over 80,000 drug labeling documents for 1600 small molecule drugs. FDALabel provides PGx information contained in the FDA-approved drug labeling (package insert). The prescription drug insert information provides consensus and combined information about product indications, target populations, and adverse drug reactions (ADRs) from FDA regulators, drug manufacturers, and scientific experts. In this study, biomarker information was used as query on the relevant PGx sections in FDALabel. As a result, we identified more than 170 drugs with genetic biomarker information. Furthermore, 36 biomarkers were identified and divided into three categories (1) drug metabolism variability (e.g., CYP enzymes); (2) increased risk of adverse events (e.g., G6PD, TPMT, HLA-B); (3) drug’s mechanism of action (e.g., CD30). These biomarkers are likely to impact the specified patient sub-population response to the drug. Network analysis and visualization were used to illustrate the relationship amongst drugs, biomarkers, and associated adverse effects. In summary, the case of PGx biomarkers has demonstrated the potential of using FDALabel for the study of ADRs (i.e., to identify new trend and frequency of genetic variability associated with increased risks to public health) in pursuit of improved pharmacovigilance and precision medicine.
AR-BIC-3


Bioinformatics Choices Substantially Impact Isoform Analysis of RNA-seq Data from a Toxicogenomics Study
Joshua Xu (1), Xi Chen (2), Suresh Subraman (1), Binsheng Gong(1) and Weida Tong (1)

(1) Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR; (2) Department of Histology and Embryology, Harbin Medical University, Harbin, Heilongjiang, China

Alternative splicing events greatly increase the diversity of proteins which might response differently to toxic insults. Identification and quantification of isoforms are important for toxicological research for an improved understanding of the underlying mechanisms of toxicity. The advent of RNA-seq technology and related bioinformatics tools enable detection of novel isoforms and quantification of relative transcript abundances and thus enhance the research of alternative splicing. However, recent works have shown highly inconsistent results for quantitative analysis and there is no clear understanding of the source of variation. In this study, we investigated the impact of five potential factors on isoform detection and quantification, including the choice of bioinformatics methods, sequencing depth, library preparation, transcript abundance and treatment effect. Liver RNA samples from six rats (three treated by aflatoxin B1 for 5 days and three matched controls) were profiled with two libraries prepared for each sample. The first batch of libraries was sequenced twice while the second batch sequenced only once. Five bioinformatics pipelines were used, with two mapping tools (TopHat2 and STAR) followed by three isoform analysis approaches (Cufflinks, IsoLasso and FlipFlop). We evaluated the consistency of isoform detection for each potential factor and compared the differential analysis results of isoform expression and splicing for each factor. In summary, the choice of bioinformatics pipelines has a substantial impact on all aspects of quantitative isoform analysis of RNA-seq data, including the number of isoforms detected, differentially expressed isoforms, and their low concordance (about 41% on average) between pipelines. Differentially used transcription starting sites or alternative splicing events for the short-term aflatoxin B1 treatment cannot be reliably detected.
AR-BIC-4


Potential Reuse of Oncologic Drugs for the Treatment of Rare Diseases
Zhichao Liu, Hong Fang, William Slikker and Weida Tong

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR

Cancer research has been a focus in the biomedical field resulting in many oncologic drugs in clinical use. In contrast, very few treatment options are available for rare diseases although they are progressive, disabling and life threatening. Therefore, we investigated the potential use of oncologic drugs for the treatment of rare diseases. A strong association between cancer and rare diseases was observed at the molecular level. Specifically, an overlap of approximately 60% was shown between 127 genes associated with cancer of many kinds and 2976 rare disease genes, and the same degree of overlap was also obtained when the analysis was conducted at the pathway level. By placing both gene lists mentioned above in a gene-gene network, over 95% gene pairs (one from each list) have two genes locating less than three genes apart in the network, indicating that cancer genes and rare disease genes likely involve similar biological processes. In addition, many drug targets for cancer were found to relate to rare diseases. The molecular level of association between cancer and rare diseases was further substantiated with existing clinical trial data and literature review. In summary, we ranked the rare disease classes by their potential to be treated with oncologic drugs. The study demonstrated that anticancer drugs are potential sources for the treatment of rare diseases, and the proposed framework offers an opportunity to identify potential therapeutics from cancer research for use in rare diseases.
AR-BIC-5


Landscape of circRNA Candidates Across 11 Organs and 4 Developmental Stages in Fischer 344 Rat
Binsheng Gong, Joshua Xu and Weida Tong

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR

Circular RNA (circRNA) is a class of endogenous noncoding RNAs and has attracted great attention due to their potential biological function as regulators of microRNAs as recently reported in some studies. The next-generation sequencing technologies and novel bioinformatics approaches enable the detection of circRNAs in many species. Thousands of novel circRNA candidates have been revealed in mammalians as well as nematode. This study provides an overview of circRNA candidates detected through an RNA-seq dataset across 11 organs of Fischer 344 rats from 4 developmental stages. The induction of circRNA candidates displays clear organ-specific patterns and gender differences for some organs. Liver and muscle have the lowest numbers of circRNA candidates and brain has the most and the pattern was also observed for expressed genes and transcripts in our previous study. Among the 1,793 parental genes, only 58 were detected with backspliced junctions in all eleven organs and63 detected in ten organs but absent in one organ. 333 genes were detected with backspliced junctions only in one organ. The overlap of the induced circRNAs between male and female are less than 50% in each non-sexual organ, except for brain with an up to 67% concordance observed in aged rats. A trend of increase in circRNA candidates along the four developmental stages was observed in brain andliver for both sexes. In contrast, there is a drop in circRNA candidates in thymus for aged rats of both sexes. The number of circRNA candidates was stable in heart and lung. In the sex organs, the number of circRNA candidates remained stable across the aging points in Uterus, increased in the younger ages (Juvenile through Adult) in thymus and then significantly dropped for aged rats. Further knowledge of circRNA candidates in rat will undoubtedly advance the study of drug toxicity at the RNA regulation level.
AR-BIC-6


The Largest Reference Drug List Ranked by the Risk for Developing Drug-Induced Liver Injury in Humans
Minjun Chen (1), Ayako Suzuki (2), Shraddha Thakkar (1), Ke Yu (1), ChuChu Hu (1), and Weida Tong (1)

(1) Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR; (2) Department of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR

Recently, personalized medicine has received a great attention to improve safety and effectiveness in drug development. Personalized medicine aims to provide medical treatment that is tailored to the patient’s characteristics such as genomic biomarkers, disease history, etc., so that the benefit of treatment can be optimized. Subpopulations identification is to divide patients into several different subgroups where each subgroup corresponds to an optimal treatment. For two subgroups, traditionally multivariate Cox proportional hazards model is fitted and used to calculate the risk score when outcome is survival time endpoint. Median is commonly chosen as the cutoff value to separate patients. Here we propose a novel tree-based method that adopts the algorithm of relative risk trees to identify subgroup patients. After growing a relative risk tree, we apply ??-means clustering to group the terminal nodes based on the averaged covariates. We adopt an ensemble Bagging method to improve the performance of a single tree since it is well known that the performance of a single tree is quite unstable. A simulation study is conducted to compare the performance between our proposed method and the multivariate Cox model. The applications of our proposed method to three public cancer data sets are also conducted for illustration.
AR-BIC-7


Investigating the Effect of Reads Coverage on Discovery of Single Nucleotide Variations in Human Genome with Alignment-Based and Assembly-Based Approaches
Leihong Wu, Gokhan Yavas, Huixiao Hong, Weida Tong and Wenming Xiao

National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR

Discovering genetic variants is one of major objectives for Next Generation Sequencing (NGS) in human genome research. Currently, the commonly preferred practice of variant discovery is through short sequence alignment against a known reference genome. This alignment-based variant calling approach has some limitations that might be overcome by other alternatives. Recent studies with de novo assembled personal genomes have reported a large list of novel variants, indicating that assembly-based variant calling might be an alternative strategy to identify genetic variants. However, due to a lack of ground truth and a limited range of laboratory validation in reported studies with assembly-based variant discovery, a comprehensive assessment is therefore critically needed to determine whether assembly-based approach is reliable. In this study, we use a set of simulated data to evaluate the validity of single nucleotide variants (SNVs) uncovered with assembled contigs by SOAPdenovo2, one of the mostly utilized tools for short-read assembly. Combining varSim and ART, we simulated ~3 million variants in short reads at various coverages between 2x and 50x. We then use both alignment-based and assembly-based approaches to identify SNVs and compare the rate of recall and precision at each coverage. Our results suggested that: (1) At least 30x coverage of reads is needed to get assembled contigs with a good coverage of genome and genes. (2) Also with 30x coverage of reads, more than 99% of variants could be recovered by alignment-based approach. (3) Comparing to alignment-based variant calling, assembly-based approach has much lower rate of recall and precision. (4) However, assembly-based approach can recover up to 12% of true SNVs that would be missed by alignment-based approach. Although assembly-based approach can serve as a complimentary way for SNVs discovery, with SOAPdenovo as the assembly tool, it associates with a great risk of erroneous calling for novel variants. Variants called from assembled contigs are not reliable unless much improved assembly outcomes are warranted with good completeness of genome, haplotype resolved and high fidelity of assembled sequences.
AR-BIC-8


Antibody Microarray Analysis of Protein Level Changes in an In Vitro Blood-Brain Barrier Model Following Exposures to Silver-Nanoparticles: Focusing on Apoptosis Signaling Proteins
Qiang Gu (1), Susan Lantz1, Elvis Cuevas (1), Syed F. Ali1, Jyotshna Kanungo (1), Merle G. Paule (1), Yongbin Zhang (2), and Victor Krauthamer (3)

(1) Division of Neurotoxicology, National Center for Toxicological Research, FDA, Jefferson, AR: (2) Nanotechnology Core Facility, National Center for Toxicological Research, FDA, Jefferson, AR; (3) Division of Biomedical Physics, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, FDA, Silver Spring, MD

Microarray experiments are a centerpiece of postgenomics life sciences and the current efforts to develop systems diagnostics for personalized medicine. In the present study, antibody microarrays were utilized to detect proteomics changes in an in vitro model of the blood brain barrier following exposures to nanoparticles. Micro-vessel endothelial cells (MVECs) were isolated from adult rat brains and primary cell cultures were made. When cells became confluent, typically two weeks post-seeding, they were exposed to various concentrations (0.01 – 50 ?g/mL) of 20 nm diameter citrate-coated silver nanoparticles (AgNPs). The physiochemical properties of the AgNPs (size, size distribution, surface charge, shape, etc.) were characterized using transmission electron microscopy and dynamic light scattering. The dose-dependent cytotoxic effects of AgNPs were determined using lactate dehydrogenase (LDH), 2,3-bis-(2-methoxy-4-nitro-5-sulfophenyl)-2H-tetrazolium-5-carboxanilide (XTT), and FluoroJade-C assays. Based on the cytotoxicity profile, a toxic dose of AgNPs (10 ?g/ml) was applied to MVECs for subsequent proteomics analyses. After 24-hours of treatment, proteins were extracted from AgNP-treated and control cultures and relative protein levels were quantified using antibody microarrays that targeted 1,358 proteins from a variety of biological signaling pathways. Our initial focus was on apoptosis signaling pathways because of the known cytotoxic effects of AgNPs. Among the two-dozen apoptosis pathway-associated proteins examined, fourteen were significantly down-regulated while three showed significant up-regulation, indicating that these proteins may play an important role in AgNP-induced toxicity. To further confirm these antibody microarray results, seven rat protein-specific antibodies were selected and used for capillary electrophoresis based immuno-blot analyses of AgNP-treated and control samples. The results confirmed changes in the levels of expression of these proteins which include BAD, BAX, caspase 2, caspase 9, cytochrome C, I?B?, and MCL-1. The changes in expression of apoptosis-associated proteins may represent molecular signature biomarkers of AgNP-induced cytotoxicity. Identifying such proteins should further elucidate the molecular mechanisms associated with nanoparticle-induced cytotoxicity and aid in the effective characterization and regulatory review of potential toxicities following human exposure to nanomaterials.
AR-BIC-9


 
An Integrative Method for Comprehensively Reconstructing Transcripts and Long Non-Coding RNA Identification
Dan Li and Mary Yang

The University of Arkansas at Little Rock

Reference-guided approach is often used to reconstruct human transcriptome. Without using a reference genome, de novo method also enables novel transcripts discovery. Here we assessed the assemblers built from these two types of assembly methods, using simulated data and experimental RNAseq data, for long non-coding RNAs (lncRNAs) identification. Moreover, we developed an integrative approach, combining the two different assemblers to identify a more comprehensive lncRNA set. Compared to mRNAs, lncRNAs are typically shorter, with fewer exons and less abundance. Using Polyester R package, we generated RNAseq reads based on known lncRNA annotations. The reference-guided and de novo assemblers identified 62.5% and 72.9% of the known lncRNAs, respectively. In our integrative approach, all transfrags from multiple samples assembled by the two assemblers were used as input for Cuffmerge utility. Then, the resulting assemblies were merged together, which resulted in a more comprehensive single collection for the following lncRNA identification procedure. Using the integrative approach, over 75% of known lncRNAs were identified, 88.1% of these identified lncRNAs overlapped >80% length of the known lncRNAs. The relative low discovery rates may attribute to rigorous filters applied to lncRNA candidates. To reduce false positive, we removed single exons that were not overlapped with known annotations and transfrags mapped to low mappablity and alignment regions. Additionally, an experimental RNASeq data set, consisting of RNASeq reads of 57 human tissue samples, were analyzed. The lncRNAs detected by the integrative method showed more comprehensive features, such as completeness, overlapping and splicing. Thus our integrative approach outperformed the individual methods. AR-BIC-10


Association of Age at Menarche with Composition and Diversity of Gut Microbiota Among Women in the Twins UK Study
Yan Wang (1), Robert Delongchamp (2), Philip Williams (1), Mohammed El Faramawi (2), Mohammed Orloff (2), Galina Glazko (3), Yasir Rahmatallah (3), Jordana Bell (4), Tim Spector (4) and Barbara Fuhrman (2)

(1) Joint Bioinformatics Program UALR/UAMS, Little Rock, AR; (2) Department of Epidemiology, UAMS, Little Rock, AR; (3) Department of Biomedical Informatics, UAMS, Little Rock, AR; (4) Department of Twin Research and Genetic Epidemiology, King’s College London, UK

Early menarche is associated with increased risks of cardiovascular disease (CVD) incidence and mortality. We have hypothesized that this association results from a shared cause; alterations in the gut microbiome. We tested this hypothesis in a study of 908 female adult twins (542 dizygotic and 366 monozygotic) drawn from the TwinsUK registry. Age at menarche was self-reported. Microbial amplicons from the V3-V4 hypervariable regions of the 16S-rRNA gene were sequenced from fecal samples. Demultiplexed RNA sequences were downloaded from the European Bioinformatics Institute (EBI). We carried out quality filtering and open-reference Operational Taxonomic Unit (OTU) picking using QIIME. Sequences are aligned against the reference sequences of the Green Genes database. Sequences that fail the alignment are clustered de novo and cluster centroids are chosen as new reference sequences. Alpha diversity was measured in samples rarefied to a sequencing depth of 8,136. Mixed model regression analysis was used to model the association of menarche with microbial diversity while adjusting for covariance in twinned measures and for the effects of potential confounders, including birth cohort and BMI. Compared to the second and the third quantiles, women in the first and last quantiles for age at menarche had reduced alpha diversity. Age at menarche and alpha diversity of the fecal microbiome each accounted for some variations in CVD risk factors. The mechanisms underlying the observed associations remain to be elucidated.
AR-BIC-11


Biomarker Discovery for Kidney Cancer Diagnosis Based on a Unique Signature of Metabolic Reprogramming
Intawat Nookaew (1,2), Francesco Gatto (2), Nicola Volpi (3), Helén Nilsson (4), Marco Maruzzo (5), Anna Roma (5), Martin E. Johansson (4), Ulrika Stierner (6), Sven Lundstam (7), Umberto Basso (4) and Jens Nielsen (2)

(1) Department Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas; (2) Department of Biology and Biological Engineering, Chalmers University of Technology, Göteborg, Sweden; (3) Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy; (4) Department of Translational Medicine Malmö, Center for Molecular Pathology, Lund University, Skåne University Hospital, Malmö, Sweden. (5) Medical Oncology Unit 1, Istituto Oncologico Veneto IOV - IRCCS, Padova, Italy; (6) Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Academy at the University of Gothenburg, Sahlgrenska University Hospital, Göteborg, Sweden; (7) Department of Urology, Sahlgrenska university Hospital and Sahlgrenska Academy, University of Gothenburg, Göteborg, Sweden

Otto Warburg firstly proposed aerobic glycolysis as a key metabolic reprogramming of cancer cell in 1956 that became a famous hallmark of cancer. However, in the past years, many studies showed different metabolic reprogramming indirectly associated with proliferation processes believed to be additional hallmarks of cancer. Thanks to high-throughput technologies that have been used to generate a fruitful amount of -omics data, shared across the research community, enabling a powerful holistic comparison of cancer metabolism. To search for the signatures of metabolic reprogramming in the different cancer types within this study, high-dimensional datasets derived from many cancer types, including SNP analysis, RNA-seq, and protein profile, were obtained from The Cancer Genome Atlas (TGCA) and The Human Proteome Atlas (HPA). Human Metabolic Atlas (HMA), our well-curated, comprehensive collection of human metabolism, was used as the scaffold for multilevel omics data mapping and integration. Through our developed computational pipelines/tools (e.g., PIANO, INIT), we identified divergences of kidney cancer metabolism from other cancer types. Metabolism of kidney cancer correlated with loss of von Hippel-Lindau tumor suppressor (VHL) located on chromosome 3p. Strikingly, the GAG pathway was discovered to be associated strongly with coordinated regulation and progression only in kidney cancer, and could, therefore, be used as a biomarker for clinical diagnosis. The GAG profile measured in both plasma and urine samples was distinctively altered in the cancer patients relative to healthy controls in a discovery cohort with accuracy greater than 82%. Furthermore, the biomarker was successfully validated in another independent cohort, strongly indicating the robustness of the biomarker. Applying systems biology to dissect the biological problem enables high quality biomarker discovery that can be translated into clinical diagnosis in practice.
AR-BIC-12


A Framework for Evaluating the Quality of the Personal Genomes Generated by De Novo Assembly Tools
Gokhan Yavas, Leihong Wu, Huixiao Hong, Weida Tong and Wenming Xiao

National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR

With the advent of the Next Generation Sequencing (NGS) technologies, it is now possible to generate millions of sequencing reads from a human genome, which can then be used towards many applications such as identifying the single nucleotide variations (SNVs) and structural variations. Another promising application is the de novo assembly of reads to build a personalized genome. For this purpose, many tools have been developed to build a novel genome or to evaluate the quality of assembly outcomes. Evaluation of assembly usually depends on the alignment of contigs to a reference genome, which demands greatly on computational resources such as runtime memory, CPU and storage space. Based on several comparative studies on existing assembly evaluation tools, it remains as a big challenge to architect a framework with good runtime performance. In this study, we present a framework that can maximize the usage of available computational environment by performing contig alignment and post processing in parallel. Our flexible design allows split jobs being run either on a high performance computing (HPC) cluster or a multi-core workstation. The input, given in the form of a set of contigs, can be partitioned into a user-defined number of chunks, each of which can then be aligned and processed in either the separate nodes of a HPC cluster or separate cores of a workstation. Based on carefully filtered alignment, it generates statistics such as the total genome coverage, gene and exon coverage, contig duplication and continuity as well as SNVs embedded in the assembly and SNV related statistics. Our framework also provides stand-alone quality statistics such as contig size distribution, Nx statistics, etc. We compared multiple genomes assembled via various assembly algorithms such as SOAPdenovo, Falcon, and Celera assembler. The results demonstrated the capability of our tool for providing a complete package of quality metrics with high performance on different settings of computer environment.
AR-BIC-13


Tree-Based Recursive Partitioning Methods for Subgroup Selection in Precision Medicine
Un Jung Lee, Yu-Chuan Chen and James J. Chen

Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR

Precision medicine is to customize a medical model for new tools and therapies to select best treatments, being tailored to the individual patients. Patients’ selection for subgroup plays an important role in precision medicine to assess the treatment effects in subgroups; it provides useful information to optimize the treatment assignment. In this study, we propose using tree-based recursive partitioning to identify patient subgroups with the enhanced treatment effect in clinical trials. Two subgroup identification strategies are presented. One is based on the Differential Effect Search (SIDES) algorithm where the subgroups are identified by maximizing the treatment effect between treatment group and control group. SIDES generates multiple candidate subgroups; it is desirable to have a single subgroup to be used for treatment assignment. We evaluate several methods to identify “optimal” subgroups from the list of subgroups identified. The second strategy is an ensemble tree-based method. For a given terminal node in a tree, the patients in that terminal node are assigned to have a score equaling to the proportion of the responders over the node size. The patient’s composite score is calculated as sum of all ensemble trees. A change-point algorithm is then applied to separate responder and non-responder subgroups. We conduct simulation experiments to evaluate these methods and compare with CN2-SD algorithm in term of sensitivity, specificity, and accuracy.
AR-BIC-14


A Novel Image Interpreting System for Clinics: From Tumor Response Tracking to Similar Image Retrieval
Bayraktar M (1), Topaloglu U (1), McDonald J (2), and Hutchins LF (3)

University of Arkansas for Medical Sciences, Little Rock AR - (1) Department of Biomedical Informatics; (2) Division of Nuclear Medicine Department: Radiology; (3) Division of Hematology/Oncology

Clinicians rely on radiologist for the imaging interpretation as well as annotation and markup. Due to software disconnects and some communication hurdles around the ordering images, the clinics usually don’t receive the imaging study results in a precision they need for the treatment of the patient. That plays vital roles in chronic diseases such as cancer where tumor progression should be closely monitored. Although huge progress made to date to overcome sync problems, due to personnel training deficiencies, inconsistencies on adopting accepted standards and weak communication between the two parties, many times the desired information cannot be captured or conveyed properly and obtained information cannot be analyzed in a timely fashion or intended purposes. This paper attempts to provide a solution to the aforementioned problem by creating a system with four distinct functionality. First, we propose bringing a referent tracking based method for the tracking response of lesions over time and the treatment modality. Second, we suggest an agile auto correction system to help those interpreting medical images better utilize the accepted vocabulary and markup patterns determined by NIH (National Institute of Health) recommended RECIST standards (Response Evaluation Criteria in Solid Tumors). Third, to reinforce accuracy, we will then automatically reprocess these slices and segment lesions, comparing them to the radiologists` manual delineations previously interpreted by RECIST.
AR-BIC-15


Genetic and genomic analyses of Staphylococcus agnetis, an agent of Bacterial Chondronecrosis with Osteomyelitis in Broilers
Adnan A. Al-Rubaye, Sura Zaki, Nnamdi Ekesi, Abdulkarim Shwani, Robert F. Wideman, Young Min Kwon and Douglas Rhoads

University of Arkansas, Fayetteville, AR, USA

Lameness is a significant problem in the poultry industry resulting in millions of dollars in lost revenue annually. In broilers, a common cause of lameness is bacterial chondronecrosis with osteomyelitis (BCO). Using a wire flooring model to induce lameness we identified Staphylococcus agnetis as the principle species isolated from BCO lesions on our research farm. Administration of S. agnetis isolates in drinking water at 20 days of age can induce high incidence of BCO in birds on wire flooring. Our data supports a model that rearing chicks on wire flooring leads to bacterial translocation across the intestinal epithelium into the blood. S. agnetis appears to colonize the susceptible proximal femoral and tibial growth plates inducing necrosis and lameness by 40-56 days. As this species has previously not been associated with BCO in poultry it may have emerged as a result of our protracted experiments inducing high levels of lameness. We have sequenced, assembled, and annotated the S. agnetis genome from chicken isolates. Current work is aimed at understanding the phylogenomic relationships between our poultry isolates and isolates from other sources. Through genomic analysis of the bacterium we seek to identify genetic determinants associated with the transition to a chicken pathogen. Our hypothesis is that defining the likely route of transmission to broilers, and genomic analyses will contribute substantially to the development of measures for mitigating BCO losses in poultry.Broiler; lameness; bacteria; pathogen; genome; leg
AR-BIC-16


Liver Toxicity Knowledge Base (LTKB): A comprehensive database to understand multiple dimensions of Drug-Induced Liver Injury
Shraddha Thakkar, Minjun Chen, Hong Fang, Zhichao Liu, Gerry Zhou, Jie Zhang and Weida Tong

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR

Drug Induced Liver Injury (DILI) is one of the foremost reasons for acute liver failure along with cause of termination of many clinical trials and discontinuation of many approved drugs. Therefore, DILI is one of the major concerns during the drug development as well as the reviewing process. To address that issue, Liver Toxicity Knowledge Base (LTKB) was developed to improve our understanding of underlying mechanisms involving DILI and its prediction. This knowledgebase includes the DILI related information from ~3000 prescription drugs, such as drug physicochemical properties along with its dosage, side effect, and therapeutic uses. In addition, mechanistically relevant cellular end points from various in vitro assays (conventional, high-throughput and high-content assays), drug-elicited toxicogenomic responses from both primary hepatocytes and animals, and histopathology were also incorporated in the database. We also linked the LTKB data to the data from the ToxCast (from EPA) and Tox21 (from NIH, EPA, National toxicology program and FDA) projects. LTKB drugs were analyzed for its potential to cause DILI and classified based on its severity. DILI annotations were also identified from the FDA-approved drug labels that provide the safety information from clinical trials and post marketing surveillance. As a result, 749 drugs were identified with some level of DILI concern and 619 drugs further annotated for the DILI information from LiverTox database (from NIH). Database was developed on Accelrys Isentris 4.0 platform and provides the comprehensive DILI information and findings at various level of biological complexity at one location. In summary, this poster will provide the use cases for extracting the desired information from database for generating the better understanding of DILI. This knowledgebase can be a resource to improve DILI predictive model for drug discovery and drug safety.
AR-BIC-17


LCS-Based Protein Structure Prediction
Cameron L. Walker, Venkata Kiran Melapu, Sravanthi Joginipelli and Karl Walker

Department of Bioinformatics, University of Arkansas at Pine Bluff, Pine Bluff, AR

There is an ever-increasing number of unsolved structures of proteins which are considered to be of low homology in the Protein Data Bank. Even the most accurate template-based protein structure prediction software show marginal performance against them. Accuracy of the structure predictions in this case, is always a function of the interdisciplinary knowledge shared by the research group. Sequence motifs are short, recurring patterns in DNA that are conjectured to have biological significance; These motifs often indicate specific binding sites for proteins such as nucleases and transcription factors. This tool gathers sequence segments shared between a target and its respective template(s) using LCS (Longest Common Substring). The longest common substring is the longest substring shared between two or more strings. It cross-references the previously mentioned sequence segments with against an interactive database of sequence motifs created as a part of this project from various public repositories. This allows users to access the templates with reasonable potential for future modeling. Although proteins are poly-amino acid sequences, for all practical purposes in this research, we considered them as character strings. This is an attempt to identify sequence motifs shared by target as well as the template(s) which is crucial for better construction of 3D models of proteins. This project was made possible by the Arkansas INBRE program, supported by grant funding from the National Institutes of Health (NIH) National Institute of General Medical Sciences (NIGMS) (P20 GM103429) (formerly P20RR016460).
AR-BIC-18


Network Analysis Pipelines for Systems Biology
Chirag Gupta
(1), Andrew Schneider (2), Eva Collakova (2) and Andy Pereira (1,2)

(1) Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville
(2) Virginia Tech, Blacksburg, VA

Seed development is an evolutionarily important phase of the plant life cycle that governs the fate of next progeny. Distinct sub-regions within seeds have diverse roles from protecting and nourishing the embryo as it enlarges, to synthesis of storage reserves that serve as an important source of human nutrition. Previous studies have revealed fine coordination between transcription factors (TFs) that genetically interact to ensure proper maintenance and development of the embryo. Here we present the first genome-wide predictions of regulatory interactions between TFs and target genes in the context of Arabidopsis seed development. Our gene regulatory network is based on a panel of high resolution seed-specific gene expression data. Querying the network with a list of genes with evidence of seed-specific activity, revealed several transcriptional regulators that are associated with different developmental programs having varying levels of confidence. We identified functional gene modules active during embryo, endosperm and seed coat formation, and delineated the topological architecture of tissue-specific networks that differ from non-tissue-specific gene interaction networks. Our easily adaptable network analysis pipeline can be used to discover regulatory programs in other organisms, including human disease specific genomic datasets.
AR-BIC-19


Alternative Splicing in Environmental Stress Regulated Genes
Thomas J, Hamilton M, Ramegowda V, Srivastava S, Basu S, Reddy AS and Pereira A

Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville, AR;
Colorado State University, Fort Collins, CO

Alternative splicing (AS) expands the transcriptome of humans and other metazoans, and is responsible for many human diseases due to mis-regulation. Plants cope with environmental stresses by expanding the transcriptome through AS, primarily through the mechanism of intron retention. However, most bioinformatics tools developed to analyze human AS are based on exon skipping. To analyze AS in plants new tools have been implemented, specifically for intron retention.  To study the response of rice to environmental stress, the rice cultivar Nipponbare (reference genome) was treated to drought and well-watered conditions at vegetative flag-leaf (V4) and early reproductive (R3) stages; as well as high-temperature at reproductive (R3) and grain filling (R6) stages. The response of rice in response to drought and temperature stress was studied using RNA-Seq, followed by bioinformatics analyses to identify and quantify differentially expressed genes and splice junctions by Tophat/Cufflinks and Splicegrapher. Predicted AS isoforms were validated by experimental approaches, and drought responsive AS transcripts were quantified using isoform specific primers for qPCR. Characterization of the isoform structure of AS genes by amplification, cloning and sequencing of individual isoforms of specific genes of interest, using primers designed across unique splice junctions that are differentially expressed under stress. The AS data of regulatory and downstream genes was used to develop models of regulation of drought tolerance by AS under drought.
AR-BIC-20


Best Practice in Mining Meaningful Topics from Regulatory Textual Documents
Weizhong Zhao, Yijun Ding, James J. Chen, Weida Tong, Roger Perkins and Wen Zou

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR

Probabilistic topic modeling offers a viable approach to structure huge textual document collections into latent topic themes to aid text mining. FDA lore describes drug applications arriving in eighteen wheelers. Today the agency handles vast digital textual information from submissions representing some 25% of U.S. GDP, and untold terabytes of information from post market surveillance. Where experts are too few or slow, the means to extract information germane to regulatory questions is paramount. Here we describe extensive sensitivity studies to determine best practices for generating effective topic models. To test effectiveness and validity of topic models, we constructed a ground truth data set from PubMed that contained some 40 health related themes including negative controls, and mixed it with a data set of unstructured documents. The most useful models, tuned to desired sensitivity versus specificity, require an iterative process wherein preprocessing steps, the type of topic modeling algorithm, and the algorithm’s model parameters are systematically varied. Models need to be compared with both qualitative, subjective assessments and quantitative, objective assessments, and care is required that Gibbs sampling in model estimation is sufficient to assure stable solutions. With a high quality model, documents can be rank-ordered in accordance with probability of being associated with complex regulatory query string, greatly lessoning text mining work. Importantly, topic models are agnostic about how words and documents are defined, and thus our findings are extensible to topic models where samples are defined as documents, and genes, proteins or their sequences are words.
AR-BIC-21


Formula Milk Alters Microbial Diversity and Impacts Immune Response in Porcine Neonatal Model
Manish K. Saraf (1,2), Anne K. Bowlin (1,2), Sree V. Chintapalli (1,2), Kartik Shankar (1,2), Tanya LeRoith (3), Martin J. Ronis (4), Thomas M. Badger (1,2) and Laxmi Yeruva (1,2)

(1) Arkansas Children’s Nutrition Center, Arkansas Children Hospital Research Institute; ( 2) The University of Arkansas for Medical Sciences, Department of Pediatrics, Little Rock, Arkansas; (3) Department of Biomedical Sciences & Pathobiology, Virginia-Maryland College of Veterinary Medicine; (4) Department of Pharmacology & Experimental Therapeutics, Louisiana State University Health Sciences Center, New Orleans, LA

Formula feeding in infants is associated with increased risk of upper respiratory tract infections, allergies and gut dysfunction possibly by compromising gut immune system. We hypothesized that formula feeding alters the host microbiome, resulting in morphology changes and modulation of large intestine immune response. It was tested in piglets fed formula diet or sow from postnatal day (PND) 2 to 21 (n=12/group). Principal component analyses of microbial data indicated clear separation of the formula fed group from the control sow group and more microbial diversity (p=0.01) was observed. Formula feeding showed 2- to 8-fold abundance of Bacteroidaceae, Porphyromondadaceae, Rokenellaceae, Odoribacteraceae within Bacteroides phylum at family level in comparison to sow group. However, in the sow fed group 5-fold higher Paraprevotellaceae was observed (p<0.01). In Firmicutes only Streptococcus spp was 5-fold higher in formula fed piglets (p<0.05). Furthermore, 11-fold higher Gammaproteobacteria and 20 folder higher Verrucomicrabae was observed in formula fed pigelts (p<0.01) in comparison to sow-fed, indicating that formula-fed piglets show more microbial diversity than sow fed piglets in distal colon. Cytokine analyses showed 1.5- to 2.0-fold increases (p<0.05) in gene expression of BMP4, CCL21, CCL25, CSF3, VEGF-A and 1.5- to 4-fold (p<0.05) decreases in CXCL-11, IL-27 in PC and DC in formula group, suggesting the participation of specific cytokines in alteration of mucosal barrier and activation of gut-associated immune response. A significant increase in crypt density (p<0.05) was noticed in PC of formula-fed group, suggesting alterations in colon morphology. In summary, formula diet-driven microbiome changes accompanied alterations in colon crypt density and cytokine response. (USDA-ARS Project 6026-51000-010-05S).
AR-BIC-22