Poster Number |
Title |
Affiliation |
Presenter and Affiliation |
Hong, Huixiao |
NCTR |
Advancing Regulatory Science
through Bioinformatics |
|
Chen,
Tao |
NCTR |
Discovery of Novel MicroRNAs in Rat Kidney Using Next Generation Sequencing, Microarray and Bioinformatics Technologies | |
Bisgin,
Halil |
NCTR |
Exploring the impact of miRNA-seq pipelines on downstream analysis | |
Ng,
Huiwen |
NCTR |
Development of a competitive molecular docking approach for predicting estrogen receptor agonists and antagonists | |
Luo,
Heng |
NCTR |
Collection and molecular docking identification of associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions | |
Hao, Ye | NCTR |
Deciphering adverse outcome pathways through network analysis of ToxCast data | |
Chen, Yu-Chuan | NCTR |
Ensemble Survival Trees for Identifying Subpopulations in Personalized Medicine | |
Chen, Minjun |
NCTR |
The development of Liver Toxicity Knowledge Base (LTKB) for research and review of drug-induced liver injury | |
Liu,
Zhichao |
NCTR |
Genome-wide comparison of four toxicogenomics assay systems | |
Wei,
Yu-Chung |
NCTR |
Adapting to Both Whole Genome and Targeted Exome NGS | |
Beger,
Richard |
NCTR |
3D-SDAR analysis of a diverse dataset of 180 hERG inhibitors: Structural factors determining the binding potential | |
Walker, Cameron |
NCTR |
Structural identification of unknown protein structures | |
Hunt,
Adrian |
NCTR |
The Effects of Carbon Emissions on Coral Reefs | |
Machooka, Daniel |
NCTR |
Phylogenetic analysis enzymes of amino acid biosynthetic pathways | |
Shang,
Zhenhua |
UAPB |
Down-regulation of genes involved in lignin biosynthesis and a genomic approach to deciphering lignin biosynthesis in rice | |
Vikrant, Vijay |
UAPB |
Sexual dimorphism in the expression of genes encoding drug metabolizing enzymes/transporters may influence a drug’s disposition in adult F344 rats | |
Zhao,
Weizhong |
NCTR |
Data Mining for Signal Detection from Adverse Event Reporting System Database | |
Li,
Dan |
UALR |
Down-regulation of genes involved in lignin biosynthesis and a genomic approach to deciphering lignin biosynthesis in rice | |
Acevedo, Horacio Gomez- | UAMS |
Bioinformatics challenges for research centers in Arkansas: ACNC case | |
Crabtree, Nathan | NCTR |
Building a computational evolution system to identify genes of interest in multi-class, RNA-seq data | |
Gokulan, Kuppan | NCTR |
Structure and specificity of L,D-Transpeptidase from Mycobacterium tuberculosis | |
AR-BIC-22 | Barabote, Ravi D. |
UAF |
Omics analyses of microbial response to environment |
AR-BIC-23 | Wu, Leihong | NCTR |
A novel clustering approach to find biomarkers in breast cancer subtypes based on expression network profiles |
AR-BIC-24 | Yu, Ke | NCTR
|
Making Tractable the Use of Vast Quantities of Regulatory-Related Textual Data |
AR-BIC-25 | Smith, Sidney | UAPB
|
Peptide Sequence Patterns Related to Omega Angles in Cis Conformation |
Abstracts |
|||
Advancing
Regulatory Science through Bioinformatics In
2010, the US FDA launched its Advancing Regulatory Science (ARS)
initiative aimed at developing new tools, standards,
and approaches to assessing
safety, efficacy, quality, and performance across FDA-regulated products.
The initiative identifies eight scientific areas that affect multiple
regulated product domains or human populations, where bioinformatics
play paramount roles. The Division of Bioinformatics and Biostatistics
at FDA’s Center for Toxicological Research (NCTR) engages in bioinformatics
applicable to such areas as biomarker development and validation, drug
safety and repurposing, and personalized medicine. This poster will highlight
selected bioinformatics research as well as selected databases and software
tools that have been developed both in past years and more recently in
support of FDA regulatory sciences. The DBB has led a large international
consortium for the past eight years that has assessed the reliability
of clinical and toxicological biomarkers derived from emerging microarray
and next generation sequencing. Knowledge bases have been developed that
aggregate diverse data associated with a disease, toxicity or phenotype,
providing a means for mechanistic studies and development of predictive
models. The Liver Toxicity Knowledge Base integrates in vitro, in vivo,
gene expression data and textual data. The Endocrine Disruptor Knowledge
Base contains in vitro and in vivo data for thousands of chemicals to
build models to predict endocrine activity mediated by estrogen and androgen
hormone receptors based solely on chemical structure. The Food-Borne
Pathogen Genomics Knowledge Base provides tools to detect and characterize
microbial isolates from gene expression data during pathogen outbreaks.
ArrayTrack is a genomics tools widely used within FDA, as well as the
public, private and academic research community worldwide. ArrayTrack
provides an integrated means to manage, analyze and interpret omics data.
It contains many statistical and visualization tools as well as libraries
for gene and protein function and biological pathways. FDALabel is a
web-based database containing the entire set of 40,000 FDA-approved drug
labels. It contains a powerful and flexible search capability, and much
other functionality valuable to researchers, regulators, drug developers
and clinicians. FDALabel will provide an improved bridge for transparent
drug safety knowledge exchange between the public and FDA. A common element
of the databases and bioinformatics tools cited above is that they either
are or will be openly available on the Internet, including an FDA external
Cloud when available, thus advancing FDA data liberation. Many of NCTR’s
bioinformatics tools can be accessed through the following link: FDA
Bioinformations Tools. |
|||
Discovery of Novel MicroRNAs in Rat Kidney Using
Next Generation Sequencing, Microarray and Bioinformatics Technologies
Tao Chen1, Fanxue Meng1, Michael Hackenberg2, Zhiguang Li1, Jian Yan1, 1Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, Food and Drug Administration, Jefferson. 2Dpto. de Genetica, Facultad de Ciencias, Universidad de Granada, Granada, Spain MicroRNAs (miRNAs) are small non-coding RNAs that regulate a variety
of biological processes. The version of the miRBase database (Release
18) includes 1,157 mouse and 680 rat mature miRNAs. Only one new rat
mature miRNA was added to the rat miRNA database from version 16 to version
18 of miRBase, suggesting that many rat miRNAs remain to be discovered.
Given the importance of rat as a model organism, discovery of the completed
set of rat miRNAs is necessary for understanding rat miRNA regulation.
In this study, next generation sequencing (NGS), microarray analysis
and bioinformatics technologies were applied to discover novel miRNAs
in rat kidneys. MiRanalyzer was utilized to analyze the sequences of
the small RNAs generated from NGS analysis of rat kidney samples. Hundreds
of novel miRNA candidates were examined according to the mappings of
their reads to the rat genome, presence of sequences that can form a
miRNA hairpin structure around the mapped locations, Dicer cleavage patterns,
and the levels of their expression determined by both NGS and microarray
analyses. Nine novel rat hairpin precursor miRNAs (pre-miRNA) were discovered
with high confidence. Five of the novel pre-miRNAs are also reported
in other species while four of them are rat specific. In summary, 9 novel
pre-miRNAs and 14 novel mature miRNAs were identified via combination
of NGS, microarray and bioinformatics high-throughput technologies. |
|||
Exploring the impact of miRNA-seq pipelines
on downstream analysis
Halil Bisgin, Binsheng Gong, Yuping Wang, Weida Tong Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration Background: Development of next-generation sequencing (NGS) techniques
opened a new era in genomic research and led several studies in RNA-Seq.
Despite the excitement, concerns have arisen about profiling tools and
defining the standards. In recent years, FDA SEQC consortium took an
initiative to address technical and statistical challenges in RNA-seq.
However, similar issues have not been extensively studied for miRNA-Seq
in the research community. |
|||
Development of a competitive
molecular docking approach for predicting estrogen receptor agonists
and antagonists Molecular docking is a well-established molecular modeling
technique commonly used in ligand screening and drug design. This method
attempts
to predict the binding mode and molecular interactions between a protein
and a ligand as well as rank the predicted poses with scoring functions.
The protein-ligand association in vivo is characterized by a dynamic
process whereby protein-ligand binding is accompanied by a conformational
change in the complex, a phenomenon commonly referred to as “induced-fit”.
However, due to high computational costs, fully flexible docking remains
impractical. In light of this, rigid docking and limited flexible docking
become the most commonly practiced methods. The estrogen receptors (ERs)
adopt distinctly different conformations upon binding to the agonists
and antagonists. Using the ER subtype a agonist and antagonist conformations,
we designed an in silico approach that more closely mimics the biological
process, and used it to differentiate the agonist versus antagonist status
of potential binders. The ability of this approach was first evaluated
using true agonists and antagonists extracted from the crystal structures
available in the protein data bank (PDB), and then further validated
using a larger set of ligands from the literature. The usefulness of
the approach was demonstrated with enrichment analysis in data sets with
a large number of decoy ligands. The performance of individual agonist
and antagonist docking conformations were found comparable to similar
models in the literature. When combined in a competitive docking approach,
they provided the ability to discriminate agonists from antagonists with
good accuracy, as well as the ability to efficiently select true agonists
and antagonists from decoys during enrichment analysis. In conclusion,
this approach offers potential applications not only in drug discovery
projects in the pharmaceutical industry but also in the screening of
potential endocrine disrupting compounds (EDCs) by regulatory authorities
to perform risk assessments on potential EDCs. |
|||
Collection and molecular docking identification of associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions Heng Luo1,2, Huixiao Hong1 Idiosyncratic
drug reactions (IDRs) are rare, somewhat dose-independent, patient-specific
and hard
to predict. Human leukocyte antigens (HLAs)
are the major histocompatibility complex (MHC) in humans, are highly
polymorphic and are associated with specific IDRs. Therefore, it is
important to identify potential drug-HLA associations so that individuals
who would
develop IDRs can be identified before drug exposure. We harvested the
associations between drugs and HLAs from the literature and built up
a database named HLADR. Molecular docking was used to explore the known
associations. From the analysis of docking scores between the 17 drugs
and 74 class I HLAs, it was observed that the significantly associated
drug-HLA pairs had statistically lower docking scores than those not
reported to be significantly associated (t-test p < 0.05). This indicates
that molecular docking can be utilized for screening drug-HLA interactions
and predicting potential IDRs, and may improve drug safety and the implementation
of personalized medicine. Examining the binding modes of drugs in the
docked HLAs suggested several distinct binding sites inside class I HLAs,
expanding our knowledge of the underlying interaction mechanisms between
drugs and HLAs. |
|||
Deciphering adverse outcome pathways through network analysis of ToxCast data Hao Ye 1, Heng Luo2, Hui Wen Ng1, Weigong Ge1, Weida Tong1, Huixiao
Hong1* ToxCast data have been demonstrated to be efficient in characterizing
the toxicological profiles of environmental chemicals. An adverse outcome
pathway (AOP) is a group of molecular events related at higher levels
of biological organizations (e.g. cell or tissue) that ultimately lead
to an adverse outcome. Network analysis was frequently used to investigate
the group properties of networks such as social network, electronic
commerce network, and biological network. We first constructed a network
in which the assays and chemicals assayed in ToxCast data were treated
as nodes and the positive assay results were used to connect the nodes.
We then applied a network analysis to inspire the understanding of
ToxCast data and to identify potential AOPs. We also demonstrated the
activity data of untested chemicals in the ToxCast assays could be
predicted using the network analysis. We found the compound-assay network
could be decomposed into seven densely connected modules based on its
topological properties. Moreover, each of the seven modules was associated
with different AOPs. For example, most of ER, AR, and GR related assays
were significantly enriched in module one. We will present our results
and discuss the implications, limitations and perspectives of the network
analysis on ToxCast data. |
|||
Ensemble Survival Trees for Identifying Subpopulations in Personalized Medicine Yu-Chuan Chen James J. Chen Recently,
personalized medicine has received a great attention to improve safety
and effectiveness in
drug development. Personalized medicine
aims to provide medical treatment that is tailored to the patient’s
characteristics such as genomic biomarkers, disease history, etc.,
so that the benefit of treatment can be optimized. Subpopulations identification
is to divide patients into several different subgroups where each subgroup
corresponds to an optimal treatment. For two subgroups, traditionally
multivariate Cox proportional hazards model is fitted and used to calculate
the risk score when outcome is survival time endpoint. Median is commonly
chosen as the cutoff value to separate patients. Here we propose a
novel tree-based method that adopts the algorithm of relative risk
trees to identify subgroup patients. After growing a relative risk
tree, we apply ??-means clustering to group the terminal nodes based
on the averaged covariates. We adopt an ensemble Bagging method to
improve the performance of a single tree since it is well known that
the performance of a single tree is quite unstable. A simulation study
is conducted to compare the performance between our proposed method
and the multivariate Cox model. The applications of our proposed method
to three public cancer data sets are also conducted for illustration. |
|||
The development of Liver Toxicity Knowledge
Base (LTKB) for research and review of drug-induced liver injury
Minjun Chen, Eileen E Navarro Almario, Guangxu, Zhou, Ruyi He, Chuchu
Hu, Marc Stone, Tina M Burgess, Shashi Amur, Victor Crentsi, Hong Fang,
Weida Tong Drug-induced liver injury (DILI) presents a significant challenge to
drug development and regulatory application. The Liver Toxicity Knowledge
Base (LTKB) aims to provide literature data and regulatory information
about DILI to support research and review of drug safety. The LTKB contains
~3000 unique prescription drugs, including ~1400 drugs approved by the
FDA, ~1300 drugs approved by other agencies like EMA, and 210 drugs withdrawn
from the worldwide market. The following data are available for most
of drugs in the LTKB: chemical structure, therapeutic use, PD/PK, DILI
types and severity, DILI mechanisms, histopathology, drug targets, side
effects, etc. The LTKB can serve as 1) a reference database when drug/DILI-related
data need to be queried; 2) an assessment tool of DILI risk in humans
for new chemical entities in the review process; and (3) a tool to support
biomarker studies using emerging technologies (e.g., genomics, in vitro
studies). |
|||
Zhichao Liu1, Hong Fang2, Joshua Xu1, Weida Tong1* Disclaimer: The views presented in this article do not necessarily reflect current or future opinion or policy of the US Food and Drug Administration. Any mention of commercial products is for clarification and not intended as endorsement. Assessing genome-wide difference and similarity of the in vitro and
in vivo responses to drug treatment is essential to choose relevant toxicogenomics
assays in drug safety study. We used the Japanese Toxicogenomics Project
dataset that profiles 131 compounds (most are drugs) with microarrays
in four assay systems for liver - two in vitro methods (rat and human
primary hepatocytes) and two in vivo experiments (single dose and repeat
dose studies). For each testing system, the drug-drug similarity score
between any two drugs was calculated based on their shared gene expression
patterns and ranked from the most similar to least similar pairs. Then,
the testing systems were pairwisely compared based on ROC curve analysis
to quantify the extent of ranking preservation of two ordered similarity
lists. Two in vivo systems (AUC=0.90) and two in vitro systems (AUC=0.77)
scored highest, indicating that the experiment platform (i.e., in vitro
or in vivo) is the utmost important factor affecting the assay results.
The results also implied that (a) an expensive assay testing system (i.e.,
in vivo repeat dose study) could be replaced by an inexpensive one (i.e.,
a short-term in vivo single dose study) and (b) species difference (i.e.,
rat in vitro and human in vitro) was less pronounced within the same
testing system. We also found that a good concordance (AUC=0.70) between
rat in vitro and in vivo repeat dose studies, indicating a potential
replacement of animal-based testing method with an animal-free in vitro
assay. Furthermore, we correlated the ranking preservation between assays
against various liver related toxicological endpoints. For all of these
endpoints examined, the concordance between rat and human was significantly
improved (over 10% in average), highlighting that the extrapolation of
rat data to humans was endpoint dependent. The proposed method in this
study has many advantages over the traditional approaches such as insensitive
to batch effect that is common for microarray data. |
|||
Detecting Copy Number Variations via a Bayesian Approach Adapting to Both Whole Genome and Targeted Exome NGS
Yu-Chung Wei12, Ching-Wei Chang1, Guan-Hua Huang2* Copy number variations (CNVs) are genomic structural mutations with
abnormal gene fragment copies. Current CNV detection algorithms for
next generation sequencing (NGS) are developed for specific genome
targets, including whole genome sequencing and targeted exome sequencing
based on the differently data types and corresponding assumptions.
Many whole genome tools assume the continuity of search space and reads
uniform coverage across the genome. However, these assumptions break
down in the exome capture because of discontinuous segments and exome
specific functional biases. In order to develop a method adapting to
both data types, we specify the large unconsidered genomic fragments
as gaps to preserve the truly location information. A Bayesian hierarchical
model was built and an efficient reversible jump Markov chain Monte
Carlo inference algorithm was utilized to incorporate the gap information.
The performance of gap settings for the Bayesian procedure was evaluated
and compared with competing approaches using both simulations and real
data from the 1000 Genomes Project. The proposed approach outperforms
other existing methods in accuracy for both whole genome and targeted
exome data. |
|||
3D-SDAR analysis of a diverse dataset of 180 hERG inhibitors: Structural factors determining the binding potential Iva Slavova1, Svetoslav H. Slavov1, Dan A. Buzatu1, Jon G. Wilkes1,
Richard D. Beger1 3D-SDAR
is a three dimensional spectral-data activity relationship (3D-QSDAR)
approach utilizing fingerprints constructed
from 13C and 15N NMR chemical
shifts augmented with interatomic distances. 3D-QSDAR was used to model
a diverse dataset of human Ether-a-go-go-Related Gene (hERG) blockers,
some of which were drugs that can cause heart beat arrhythmia. After
setting a commonly accepted IC50 threshold of 1?M, the 180 chemicals
forming the initial dataset were split into two classes: 67 were defined
as hERG blockers (or hERG+) while the remaining 113 compounds were labeled
as hERG-. A simple IC50 distribution based rule was used to split the
initial set of 180 compounds into a balanced modeling set (61 hERG+ and
57 hERG-) and an external test set (6 hERG+ and 56 hERG-). A total of
100 randomized PLS models splitting the modeling set into training (80%)
and hold-out test (20%) sets were performed. On each step the compounds
randomly assigned to the hold-out test and those in the external test
were predicted. At the end, the quantitative predictions for each compound
were averaged and a threshold of 0.5 was used to categorize them into
hERG+ and hERG-. Different grid granularities and fixed ratios (derived
from the gyromagnetic ratios of C and N) for the bin sizes in the C-C,
C-N and N-N regions were explored. A 4 latent variables (LVs) model based
on 6 ppm x 6 ppm x 1 Å bins for the C-C region, 6 ppm x 30 ppm
x 1 Å bins for the C-N region and 30 ppm x 30 ppm x 1 Å bins
for the N-N region performed best. The predictions for the 62 compounds
in the external test set classified correctly 84% of the compounds (sensitivity
= 1.00, specificity = 0.82 and area under the curve = 0.91). The bins
with the highest frequencies of occurrence from the top two LVs of the
randomized PLS model allowed the construction of a hERG toxicophore consisting
of an AR ring and an amino group. It was demonstrated that a second aromatic
ring would increase the hERG blocking potential. |
|||
Cameron Walker and Karl A. Walker This research involves the analysis of unknown protein structures. The
purpose of this research is to develop improved algorithms that aid in
the prediction of protein structure. It is our hope to provide alternatives
to existing algorithmic approaches to bioinformatics, specifically protein
threading. By matching protein sequences of unknown protein structures
to that of known structures stored in our database, we can determine
the longest common subsequences among proteins. Once data has been generated
from our protein-threading algorithm, we perform statistical analysis
upon that data to draw inferences that could lead to the identification
of new, useful enzymes. |
|||
Adrian Hunt, Britney Bolar and Karl Walker Coral Zooanthellae (coral reefs) are one of the most diverse ecosystems
in the world. A huge problem that is killing this species is carbon emission.
By comparing the DNA sequences of the coral reefs that are affected vs.
non-effected may help us to understand how some species have developed
resistance to these emissions, thus providing further knowledge on how
to protect and save this special species. |
|||
Daniel Machooka 1, Andrea carpenter 1, Joseph Onyilagha 1, Richard Walker
1, Karl Walker 1, Stephen Freeland 2, Serhan Dagtas 3 Understanding how life formed from only a few molecules remains a great
mystery. The biosynthetic pathways of the standard twenty amino acids
may hold the key to solving this puzzle. In this research, we have closely
examined the enzymes involved in these pathways and have conducted phylogenetic
analysis of them in order to better understand how each amino acid evolved
to become a part of the genetic code of life. |
|||
Down-regulation of genes involved in lignin
biosynthesis and a genomic approach to deciphering lignin biosynthesis
in rice
Zhenhua Shang1, Sathish Kumar Ponniah1, Vibha Srivastava2, and Muthusamy
Manoharan1* The objective of this project was to reduce lignin by down regulating
genes involved in lignin biosynthesis in rice. A strategy of down-regulation
of lignin biosynthetic genes, cinnamate 4-hydroxylase (C4H), hydroxycinnamoyl
CoA: shikimate hydroxycinnamoyl transferase (HCT), coumarate 3-hydroxylase
(C3'H), cinnamoyl CoA reductase (CCR), and cinnamyl alcohol dehydrogenase
(CAD) has been used to decrease lignin content in rice. A novel binary
vector (TL) in which the truncated lignin gene (s) driven only by the
promoter and no terminator was constructed and transferred to Agrobacterium
tumefaciens for infecting rice calli. Putative transgenic rice plants
were regenerated after selection in regeneration medium (N6 medium containing
2.0 mg/L Kinetin, 0.02 mg/L NAA, 100 mg/L geneticin (G418) and 500 mg/L
Carbenicillin) and confirmed by Polymerase Chain Reaction (PCR). Seeds
were collected and germinated on MS medium containing 200 mg/L geneticin
for segregation analysis. RNA was isolated from the segregated plants
and Real-time qPCR was conducted. The results indicated 50% reduction
of some of the genes (such as CAD) involved in lignin biosynthesis and
may |
|||
Sexual dimorphism in the expression
of genes encoding drug metabolizing enzymes/transporters may influence
a drug’s
disposition in adult F344 rats
Vikrant Vijay, Kejian Wang, Qiang Shi, James C Fuscoe A crucial step in developing a safe and effective drug is assessing how
the body processes the drug. During non-clinical drug development,
a drug candidate is often evaluated in adult animals of a single sex
(males). If there are age- and/or sex-differences in the enzymes that
metabolize the drug, there may be unrecognized age- and/or sex-related
differences in the disposition, safety, and efficacy of the drug. Drug
metabolizing enzymes including transporters (DME/T) play a major role
in a drug’s detoxification, excretion, and/or activation, and
thus differences in the DME/T expression profiles may play a key role
in drug safety. Therefore, a rat (F344) model was used to identify
differences in the basal hepatic transcriptional profiles of DME/T
genes in adult males and females at 4 different ages (8, 15, 21 and
52 weeks). A comprehensive list of 336 rat DME/T genes was prepared
using Pharmapendium as a key resource. In-house rat liver gene expression
data (normalized) was obtained for 298 out of 336 DME/T genes. Genes
were considered to be significantly differentially expressed between
females (F) and males (M) at any one of the four ages, if the t-test
p-value <0.05 and fold ratio (F/M or M/F) >2. 112 genes were
significantly differentially expressed between the sexes in at least
one age and 29 genes at all four ages. All of the 29 genes showed consistent
higher expression in either females (12 genes) or males (17 genes)
at all four ages. The genes with highest expression in females compared
to males were Abcc3, Cyp3a9, Sult2a1, Adh6 and Cyp2c12 with a range
of fold-differences of ~3-378. Genes with the highest expression in
males compared to females were Cyp2a2, Sult1e1, Cyp2c11, Cyp2c13 and
Cyp3a2 with a range of fold-differences of ~127-3876. The 29 enzymes
encoded by these differentially expressed genes metabolize more than
600 drugs. Based on these findings, the disposition of these drugs
may be different in the two sexes. In vivo studies in rats will be
conducted to confirm these predictions of differential disposition
of selected drugs metabolized by the differentially expressed enzymes.
Once confirmed by in vivo studies, the results could be translated
to humans in order to identify potential sexually dimorphic drug safety
issues. |
|||
Data Mining for Signal Detection from Adverse Event Reporting System Database Weizhong
Zhao1, 2, Zhichao Liu1, Yuping Wang1, James J. Chen 1, Wen Zou1 * The FDA centers receive reports from consumers, health care professionals,
manufacturers, and others regarding the safety of various regulated
products, such as drugs, vaccines, artificial hearts, surgical lasers,
and nutritional supplements. It is a challenge to extract the information
in these reports for better assessment of product safety and rapid
detection of adverse event signals. In this study, we collected adverse
reports in FDA Adverse Event Reporting System (FAERS) from the first
quarter of 2004 to the first quarter of 2014. In preprocessing procedure,
we cleaned the dataset and normalized the drug names by RxNorm, which
is a standard nomenclature developed by the United States National
Library of Medicine (NLM). Empirical Bayes geometric mean (EBGM)
approach was utilized to identify the safety signals in the adverse
reports related to 996 FDA approved drugs. New safety signals were
identified when comparing with the currently available information
in various sources. The outcome of this study is expected to enhance
information input to the decision making process for drug safety
detection and postmarketing surveillance. |
|||
Systematically identifying and annotating long non-coding RNAs Dan Li,
Mary Yang Long
non-coding RNAs (lncRNAs) have been shown to play important roles
in various biological processes and have been implicated in disease.
Although lncRNAs have gained substantial attention in recent years,
their regulatory mechanisms remain to be elucidated. High-throughput
RNA sequencing (RNA-Seq) provides the unprecedented ability to annotate
lncRNA transcripts, which can potentially advance our understanding
of their |
|||
Bioinformatics challenges for research centers in Arkansas: ACNC case Horacio
Gomez-Acevedo1,2, Brian D. Piccolo1,2, Sudeepa Bhattacharyya2 The use
of high-throughput technologies in basic and translational research has
grown steadily at the Arkansas Children’s Nutrition Center (ACNC).
Genomic research has included a broad spectrum of technologies including
RNA-seq, Chip-Seq, Methyl-Seq Human Methylation 450 Beadchip and traditional
Affymetrix
microarrays. Also, the center has acquired an UHPLC-Q orbitrap to carry
on metabolomics and lipidomics analyzes. This increase in big data collection
has concomitantly created bioinformatics challenges at different levels,
namely:
keeping up with changes in software, methodologies, statistical approaches,
data management, data storage, and data presentation. Based on our experience
at ACNC, we highlight some of our current solutions to these challenges
in genomics and metabolomics. We also present bottleneck areas in which
a more
integrative collaboration with other centers or institutes in the region
may synergistically increase the research quality in life sciences as well
as in
bioinformatics. |
|||
Building a computational evolution system to identify genes of interest in multi-class, RNA-seq data Nathan Crabtree1,
John Bowyer1, Nysia George1, Jason Moore2 Computational
evolution systems (CESs) are knowledge discovery engines that use post-processing,
pareto-optimization, and expert knowledge to
identify novel, unexpected, and interesting relationships in large datasets.CESs
have been developed to identify single nucleotide polymorphisms (SNPs)
that are associated with prostate cancer. Existing CESs discriminate
between binary-class datasets, e.g. treatment vs control or healthy
vs diseased.
Although previous work provides a great foundation, technological advancements
and complex experimental designs have made it necessary to accommodate
datasets with multiple classes. Multi-class discrimination can be done
using a one-verses-one or one-verses-all approach where the dataset is
broken down into multiple binary datasets. The other, better approach
is to discriminate between all classes simultaneously. In this study,
we develop
a CES for multi-class data using the simultaneous approach. We demonstrate
the performance of the proposed CES on a multiclass RNA sequencing (RNA-seq)
dataset that was generated from blood samples harvested from rats in
five different treatment groups. Results of the multiclass CES were
compared
to other machine learning approaches including random forests (RF) and
support vector machines (SVM). Methods were evaluated based pre-processing
strategies such as minimum redundancy maximum relevancy and expert knowledge.
Classifiers were assessed based on accuracy and their ability to identify
discriminant genes that play a role in immune system function. |
|||
Structure and specificity of L,D-Transpeptidase from Mycobacterium tuberculosis Kuppan Gokulan1*,
Sangeeta Khare1, Carl E. Cerniglia1, Steven L. Foley1, and
Kottayil I.Varughese2* The final step of peptidoglycan (PG) synthesis in all bacteria is the formation
of cross-linkage between PG stems. The cross-linking between amino acids
in different PG chains gives the peptidoglycan cell wall a 3-dimensional
structure and adds strength and rigidity to it. There are two distinct types
of cross-linkages in bacterial cell walls. D,D-transpeptidase (D,D-TP) generate
the classical 4?3 linkages and the L,D-transpeptidase (L,D-TP) generate 3?3
non-classical peptide cross linkages. The percentage of 3?3 cross linkages
are more in non-replicating and multi-drug resistant bacteria than replicating
and drug-susceptible bacteria. Penicillin and cephalosporin classes of ??lactams
cannot inhibit L,D-TP function; however, carbapenems inactivate its function.
We analyzed the structure of L,D-TP in the apo form and in complex with meropenem
and imipenem. The periplasmic region of L,D-TP folds into three domains.
The catalytic residues are situated in the C-terminal domain. The acylation
reaction occurs between carbapenem antibiotics and the catalytic Cys-354
forming a covalent complex. This adduct formation mimics the acylation of
L,D-TP with the donor PG stem. A novel aspect of this study is that in the
crystal structures of the apo and the carbapenem complexes, the N-terminal
domain has a muropeptide unit non-covalently bound to it. Another interesting
observation is that the calcium complex crystallized as a dimer through head
and tail interactions between the monomers. |
|||
AR-BIC-22 | |||
A novel clustering approach to find biomarkers in breast cancer subtypes based on expression network profiles Leihong Wu1, Zhichao Liu1, Joshua Xu1, Minjun Chen1, Hong Fang2, Weida
Tong1*, Wenming Xiao1* Introduction: Identifying biomarkers in breast cancer subtypes for
an improved clinical prognosis and precision treatment is a major purpose
in breast cancer research. Gene expression analysis has been long applied
to find biomarkers however is still challenging in breast cancer owing
to its high heterogeneity. Recent advancement in network based methodologies
would offer an enhanced approach to systematically study breast cancer
subtypes and identifying biomarkers from a whole genome scale. |
|||
Making Tractable the Use of Vast Quantities of Regulatory-Related Textual Data Ke
Yu, Yijun Ding, Weizhong Zhao, Shi-Heng Wang, Wen Zou, James J. Chen,
Roger Perkins, and Weida Tong* For FDA
to carry out its regulatory mission, prodigious quantities of largely
or poorly structured textual information must
be digested and
interpreted. Agency lore describes new drug applications before the digital
age arriving in 18-wheel tractor trailers. Post market safety surveillance
of digital media constitutes untold terabytes of unstructured textual
data. Where expert human eyes are too few, slow and/or expensive, the
means to sift out information germane to regulatory questions is paramount.
Probabilistic topic modeling offers a viable approach, where unstructured
documents are characterized as probability distributions of latent topic
themes that, in turn, are probability distributions of words. With such
a model, the untenable process of searching and reading for answers to
a regulatory question in a vast corpus reduces to more careful scrutiny
of a small set of documents thematically related to the question. To
test the effectiveness and validity of topic modelling, we constructed
a ground truth data set with 59201 abstracts from PubMed that contained
39 tobacco use-related themes, and two entirely unrelated negative control
themes. Latent Dirichlet allocation (LDA) and Pachinko Allocation Model
(PAM) algorithms were separately applied to building topic models with
the ground truth data set. Both approaches segregated documents into
proper thematic truth categories, even those containing small fractions
(<0.1%) of the documents, demonstrating high specificity and sensitivity
of thematic characterization. We found the sub-topics in PAM are highly
aligned to LDA topics, and the differentiation of sup-topics in PAM is
not shown in this study, which might be data-dependent. The findings
demonstrate the applicability of topic modeling in exploring FDA textual
data, which provides a promising way to promote the treatment of cumulated
documents in FDA. |
|||
Peptide Sequence Patterns Related to Omega Angles in Cis Conformation Sidney Smith1, Adrian Hunt1
, Karl Walker1 , Jerry Darsey2 Predicting
the backbone conformation of proteins involves estimating the configurations
of three torsion angles: phi, psi, and
omega. Most
of the flexibility in protein backbones is accounted for by torsion
angles phi and psi because they correspond to covalent single bonds.
Due to the partial double bond characteristic of peptide bonds, the
omega torsion angles of a protein are largely found in trans conformation
(close to 180º). The cis-trans isomerization of omega dihedral
angles is directly involved in the folding of proteins and many functional
aspects of proteins such as auto-inhibition control, channel gating,
membrane binding, and dimerization interfaces (Craveur et al., 2013).
In this study, we have analyzed the amino acid sequences relative omega
torsion angles found in cis conformation (close to 0º) in order
to better understand the mechanism of isomerization and to improve
prediction of omega dihedrals in cis conformation. |
|||