Poster Number |
Presenter |
Affiliation |
Title and
Affiliation |
Kakraba, Samuel |
UALR-UAMS |
Effects
of Small Molecules on Protein Aggregation and Paralysis In C. Elegans
Expressing Aß1–42 in the Muscle |
|
Townsend,
TA |
NCTR/DGMT
|
The Development of a Modified Comet Assay for High-Throughput Assessment of DNA Methylation Status | |
Fang,
Hong |
NCTR
|
FDALable Database: A Rich Resource For Study Of Pharmacogenomics Biomarkers To Facilitate Precision Medicine And Drug Safety | |
Xu, Joshua | NCTR/DBB
|
Bioinformatics Choices Substantially Impact Isoform Analysis of RNA-seq Data from a Toxicogenomics Study | |
Liu,
Zhichao |
NCTR/DBB
|
Bioinformatics Choices Substantially Impact Isoform Analysis of RNA-seq Data from a Toxicogenomics Study | |
Gong, Binsheng | NCTR/DBB |
Landscape of circRNA Candidates Across 11 Organs and 4 Developmental Stages in Fischer 344 Rat | |
Chen, Minjun | NCTR/DBB-UAMS |
The Largest Reference Drug List Ranked by the Risk for Developing Drug-Induced Liver Injury in Humans | |
Wu, Leihong |
NCTR/DBB |
Investigating the Effect of Reads Coverage on Discovery of Single Nucleotide Variations in Human Genome with Alignment-Based and Assembly-Based Approaches | |
Gu,
Qiang |
NCTR/DNT |
Antibody Microarray Analysis of Protein Level Changes in an In Vitro Blood-Brain Barrier Model Following Exposures to Silver-Nanoparticles: Focusing on Apoptosis Signaling Proteins | |
Li,
Dan |
UALR |
An Integrative Method for Comprehensively Reconstructing Transcripts and Long Non-Coding RNA Identification | |
Wang, Yan |
UALR-UAMS |
Association of Age at Menarche with Composition and Diversity of Gut Microbiota among Women in the Twins UK Study | |
Nookaew,
Intawat |
UAMS & more |
Biomarker Discovery for Kidney Cancer Diagnosis Based on a Unique Signature of Metabolic Reprogramming. | |
Yavas, Gokhan |
NCTR/DBB |
A Framework for Evaluating the Quality of the Personal Genomes Generated by De Novo Assembly Tools | |
Lee, Un Jung |
NCTR/DBB |
Tree-Based Recursive Partitioning Methods for Subgroup Selection in Precision Medicine | |
Bayraktar, M |
UAMS |
A Novel Image Interpreting System for Clinics: From Tumor Response Tracking to Similar Image Retrieval | |
Rhoads,
Douglas |
UAF |
Genetic and Genomic Analyses of Staphylococcus agnotis, an Agent of Bacterial Chondronecrosis with Osteomyelitis in Broilers | |
Thakkar, Shraddha |
NCTR/DBB |
Liver Toxicity Knowledge Base (LTKB): A comprehensive database to understand multiple dimensions of Drug-Induced Liver Injury | |
Walker,
Cameron L |
UAPB |
LCS-Based Protein Structure Prediction | |
Gupta, Chirag | UAF/VA
Tech |
Network Analysis Pipelines for Systems Biology | |
Thomas, J | UAF/CO
Sate |
Alternative Splicing in Environmental Stress Regulated Genes | |
Zou, W | NCTR/DBB |
Best Practices in Mining Meaningful Topics from Regulatory Textual Documents | |
AR-BIC-22 | Saraf,
Manish K |
ACH & more |
Formula Milk Alters Microbial Diversity and Impacts Immune Response in Porcine Neonatal Model |
Poster
Abstracts (Presenter name bolded) |
|||
Effects of
Small Molecules on Protein Aggregation and Paralysis in C. Elegans
Expressing Aß1–42 in the Muscle Proteins
require correct folding and maintenance in order to function effectively
and efficiently. Most or all common neurological disorders, such
as Alzheimer's and Parkinson's diseases, and possibly a wide range
of other age-associated diseases, are attributable to protein aggregation
that is cytotoxic, especially to nerve cells. Protein aggregation
is a biological phenomenon in which misfolded proteins aggregate
(i.e., adhere together in Large conglomerates) either intra- or extracellular.
Our goal is to determine whether anti-inflammatory compounds (i.e.
parthenolide, sclareol, Combretastatin, and thiadiazolidinones (TDZD)
analogs) are effective at reducing protein aggregation as well as
preventing paralysis in C. elegans strain CL4176, which expresses
a human Aß1–42 transgene in body-wall muscle. In addition
to conducting studies on a library of small molecules as a first
step in an iterative process of drug optimization, we have also assessed
dose-response functions for active lead compounds in reducing protein
aggregates.This project was made possible by the Arkansas INBRE program,
supported by grant funding from the National Institutes of Health
(NIH) National Institute of General Medical Sciences (NIGMS) (P20
GM103429) (formerly P20RR016460). |
|||
The Development of a Modified Comet Assay for High-Throughput
Assessment of DNA Methylation Status
TA Townsend and MG Manjanatha National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR Assessing DNA
damage and epigenetic modifications, including DNA methylation, is critical
in predicting carcinogenicity of pharmacological
and biological agents. The single cell gel electrophoresis (Comet) assay
is validated for regulatory use, with an OECD Test Guideline (TG489) approved
in 2014 for conducting the in vivo Comet assay. Here, we utilized the methylation-dependent
restriction endonuclease, McrBC, to develop a modified alkaline Comet assay
that allows the use a single platform to evaluate genotoxicity, global
and region-specific DNA methylation status of single cells under various
conditions. First, we confirmed dose-dependent induction of DNA damage
using a known genotoxicant, methyl methanesulfonate (MMS), in human cell
lines derived from breast (Mean % Tail DNA±SEM, Control: 12.5±2.8;
20 µg/ml MMS Treatment (3h): 53.9±1.9, p=0.01), cervix (4.7±1.3;
51.9±2.4, p=0.006), liver (8.0±4.1; 60.6±0.8, p=0.004),
and spleen (3.2±0.7; 50.1±0.7, p=0.001). Next, we defined
background levels of global (5-mC) methylation in these cell lines (5-mC
%±SEM for breast: 1.8±0.56%, cervix: 2.2±0.31%, spleen:
0.9±0.28%, liver: 1.5±0.47%), and characterized the dose-response
kinetics to several agents of interest to the FDA, including chemotherapeutic,
environmental and novel agents. We then demonstrated proof-of-principle
for our assay by detecting hypermethylation after 20 µM hydroxyurea
treatment (Difference in % Tail DNA with McrBC vs. buffer±SEM: 30.1±3.1,
p=0.002), and hypomethylation with 0.1 mM 5-Azacytidine treatment (-6.3±0.9,
p=0.03). To date, these results demonstrate high sensitivity of the modified
Comet assay for detecting as little as a 14% reduction in global DNA methylation
in single cells. The successful application of this novel technology will
aid in the hazard identification and risk characterization of FDA-regulated
products. Furthermore, this assay will have utility in investigating the
potential epigenetic mode of action of agents in target organs, since the
assay is amenable to cells in culture or cells from any tissue. |
|||
FDALable Database: A Rich
Resource for Study Of Pharmacogenomics Biomarkers To Facilitate Precision
Medicine And Drug Safety
Hong Fang, Joshua Xu, Zhichao Liu, Stephen Harris, Shraddha Thakkar, Guangxu Zhou, Daojun Liu, Paul Howard and Weida Tong National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR Pharmacogenomics
(PGx) is the study of individual genetic differences (acquired and inherited)
in correlation to drug response. Understanding the association between PGx
markers and phenotypes improves knowledge of underlying mechanisms of diseases
and treatment responses for enhanced drug safety and precision medicine.
Research on this topic has been a challenge, because of lack of easy access
to PGx data. We have developed the FDALabel database (http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm289739.htm),
which allows users to perform customizable, full-text searches in over 80,000
drug labeling documents for 1600 small molecule drugs. FDALabel provides
PGx information contained in the FDA-approved drug labeling (package insert).
The prescription drug insert information provides consensus and combined
information about product indications, target populations, and adverse drug
reactions (ADRs) from FDA regulators, drug manufacturers, and scientific
experts. In this study, biomarker information was used as query on the relevant
PGx sections in FDALabel. As a result, we identified more than 170 drugs
with genetic biomarker information. Furthermore, 36 biomarkers were identified
and divided into three categories (1) drug metabolism variability (e.g.,
CYP enzymes); (2) increased risk of adverse events (e.g., G6PD, TPMT, HLA-B);
(3) drug’s mechanism of action (e.g., CD30). These biomarkers are likely
to impact the specified patient sub-population response to the drug. Network
analysis and visualization were used to illustrate the relationship amongst
drugs, biomarkers, and associated adverse effects. In summary, the case of
PGx biomarkers has demonstrated the potential of using FDALabel for the study
of ADRs (i.e., to identify new trend and frequency of genetic variability
associated with increased risks to public health) in pursuit of improved
pharmacovigilance and precision medicine. |
|||
Bioinformatics Choices Substantially
Impact Isoform Analysis of RNA-seq Data from a Toxicogenomics Study Alternative
splicing events greatly increase the diversity of proteins which might
response differently to toxic insults. Identification and quantification
of isoforms are important for toxicological research for an improved
understanding
of the underlying mechanisms of toxicity. The advent of RNA-seq technology
and related bioinformatics tools enable detection of novel isoforms and
quantification of relative transcript abundances and thus enhance the
research of alternative splicing. However, recent works have shown
highly inconsistent
results for quantitative analysis and there is no clear understanding
of the source of variation. In this study, we investigated the impact
of five
potential factors on isoform detection and quantification, including
the choice of bioinformatics methods, sequencing depth, library preparation,
transcript abundance and treatment effect. Liver RNA samples from six
rats
(three treated by aflatoxin B1 for 5 days and three matched controls)
were profiled with two libraries prepared for each sample. The first
batch of
libraries was sequenced twice while the second batch sequenced only once.
Five bioinformatics pipelines were used, with two mapping tools (TopHat2
and STAR) followed by three isoform analysis approaches (Cufflinks, IsoLasso
and FlipFlop). We evaluated the consistency of isoform detection for
each potential factor and compared the differential analysis results
of isoform
expression and splicing for each factor. In summary, the choice of bioinformatics
pipelines has a substantial impact on all aspects of quantitative isoform
analysis of RNA-seq data, including the number of isoforms detected,
differentially expressed isoforms, and their low concordance (about
41% on average) between
pipelines. Differentially used transcription starting sites or alternative
splicing events for the short-term aflatoxin B1 treatment cannot be reliably
detected. |
|||
Potential Reuse of Oncologic
Drugs for the Treatment of Rare Diseases Cancer research has been a focus in the biomedical field resulting in
many oncologic drugs in clinical use. In contrast, very few treatment
options are available for rare diseases although they are progressive,
disabling and life threatening. Therefore, we investigated the potential
use of oncologic drugs for the treatment of rare diseases. A strong association
between cancer and rare diseases was observed at the molecular level.
Specifically, an overlap of approximately 60% was shown between 127 genes
associated with cancer of many kinds and 2976 rare disease genes, and
the same degree of overlap was also obtained when the analysis was conducted
at the pathway level. By placing both gene lists mentioned above in a
gene-gene network, over 95% gene pairs (one from each list) have two
genes locating less than three genes apart in the network, indicating
that cancer genes and rare disease genes likely involve similar biological
processes. In addition, many drug targets for cancer were found to relate
to rare diseases. The molecular level of association between cancer and
rare diseases was further substantiated with existing clinical trial
data and literature review. In summary, we ranked the rare disease classes
by their potential to be treated with oncologic drugs. The study demonstrated
that anticancer drugs are potential sources for the treatment of rare
diseases, and the proposed framework offers an opportunity to identify
potential therapeutics from cancer research for use in rare diseases. |
|||
Landscape of circRNA Candidates Across 11 Organs and 4 Developmental Stages
in Fischer 344 Rat Circular
RNA (circRNA) is a class of endogenous noncoding RNAs and has attracted
great attention
due to their potential biological function
as regulators of microRNAs as recently reported in some studies. The
next-generation sequencing technologies and novel bioinformatics approaches
enable the detection of circRNAs in many species. Thousands of novel
circRNA candidates have been revealed in mammalians as well as nematode.
This study provides an overview of circRNA candidates detected through
an RNA-seq dataset across 11 organs of Fischer 344 rats from 4 developmental
stages. The induction of circRNA candidates displays clear organ-specific
patterns and gender differences for some organs. Liver and muscle have
the lowest numbers of circRNA candidates and brain has the most and the
pattern was also observed for expressed genes and transcripts in our
previous study. Among the 1,793 parental genes, only 58 were detected
with backspliced junctions in all eleven organs and63 detected in ten
organs but absent in one organ. 333 genes were detected with backspliced
junctions only in one organ. The overlap of the induced circRNAs between
male and female are less than 50% in each non-sexual organ, except for
brain with an up to 67% concordance observed in aged rats. A trend of
increase in circRNA candidates along the four developmental stages was
observed in brain andliver for both sexes. In contrast, there is a drop
in circRNA candidates in thymus for aged rats of both sexes. The number
of circRNA candidates was stable in heart and lung. In the sex organs,
the number of circRNA candidates remained stable across the aging points
in Uterus, increased in the younger ages (Juvenile through Adult) in
thymus and then significantly dropped for aged rats. Further knowledge
of circRNA candidates in rat will undoubtedly advance the study of drug
toxicity at the RNA regulation level. |
|||
The Largest Reference Drug
List Ranked by the Risk for Developing Drug-Induced Liver Injury in Humans Recently,
personalized medicine has received a great attention to improve safety
and effectiveness in
drug development. Personalized medicine
aims to provide medical treatment that is tailored to the patient’s
characteristics such as genomic biomarkers, disease history, etc.,
so that the benefit of treatment can be optimized. Subpopulations
identification is to divide patients into several different subgroups
where each subgroup
corresponds to an optimal treatment. For two subgroups, traditionally
multivariate Cox proportional hazards model is fitted and used to
calculate the risk score when outcome is survival time endpoint.
Median is commonly
chosen as the cutoff value to separate patients. Here we propose
a novel tree-based method that adopts the algorithm of relative risk
trees to identify subgroup patients. After growing a relative risk
tree, we apply ??-means clustering to group the terminal nodes based
on the averaged covariates. We adopt an ensemble Bagging method to
improve the performance of a single tree since it is well known that
the performance of a single tree is quite unstable. A simulation
study
is conducted to compare the performance between our proposed method
and the multivariate Cox model. The applications of our proposed
method to three public cancer data sets are also conducted for illustration. |
|||
Investigating the Effect of Reads Coverage on Discovery
of Single Nucleotide Variations in Human Genome with Alignment-Based
and Assembly-Based Approaches
Leihong Wu, Gokhan Yavas, Huixiao Hong, Weida Tong and Wenming Xiao National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR RD, Jefferson, AR Discovering
genetic variants is one of major objectives for Next Generation
Sequencing (NGS) in human genome research. Currently, the commonly
preferred practice of variant discovery is through short sequence
alignment against a known reference genome. This alignment-based
variant calling
approach has some limitations that might be overcome by other alternatives.
Recent studies with de novo assembled personal genomes have reported
a large list of novel variants, indicating that assembly-based
variant calling might be an alternative strategy to identify genetic
variants.
However, due to a lack of ground truth and a limited range of laboratory
validation in reported studies with assembly-based variant discovery,
a comprehensive assessment is therefore critically needed to determine
whether assembly-based approach is reliable. In this study, we
use a set of simulated data to evaluate the validity of single
nucleotide variants (SNVs) uncovered with assembled contigs
by SOAPdenovo2, one of the mostly utilized tools for short-read
assembly. Combining varSim and ART, we simulated ~3 million variants
in short
reads at various coverages between 2x and 50x. We then use both
alignment-based and assembly-based approaches to identify SNVs
and compare the rate
of recall and precision at each coverage. Our results suggested
that: (1) At least 30x coverage of reads is needed to get assembled
contigs with a good coverage of genome and genes.
(2) Also with 30x coverage of reads, more than 99% of variants
could be recovered by alignment-based approach. (3) Comparing to
alignment-based
variant calling, assembly-based approach has much lower rate of
recall and precision. (4) However, assembly-based approach can
recover up
to 12% of true SNVs that would be missed by alignment-based approach.
Although assembly-based approach can serve as a complimentary way
for SNVs discovery, with SOAPdenovo as the assembly tool, it associates
with a great risk of erroneous calling for novel variants. Variants
called from assembled contigs are not reliable unless much improved
assembly outcomes are warranted with good completeness of genome,
haplotype
resolved and high fidelity of assembled sequences. |
|||
Antibody Microarray Analysis of Protein Level Changes
in an In Vitro Blood-Brain Barrier Model Following Exposures to Silver-Nanoparticles:
Focusing on Apoptosis Signaling Proteins
Qiang Gu (1), Susan Lantz1, Elvis Cuevas (1), Syed F. Ali1, Jyotshna Kanungo (1), Merle G. Paule (1), Yongbin Zhang (2), and Victor Krauthamer (3) (1) Division of Neurotoxicology, National Center for Toxicological Research, FDA, Jefferson, AR: (2) Nanotechnology Core Facility, National Center for Toxicological Research, FDA, Jefferson, AR; (3) Division of Biomedical Physics, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, FDA, Silver Spring, MD Microarray
experiments are a centerpiece of postgenomics life sciences and
the current efforts to develop systems diagnostics for personalized
medicine. In the present study, antibody microarrays were utilized
to detect proteomics changes in an in vitro model of the blood
brain
barrier following exposures to nanoparticles. Micro-vessel endothelial
cells (MVECs) were isolated from adult rat brains and primary cell
cultures were made. When cells became confluent, typically two
weeks post-seeding, they were exposed to various concentrations
(0.01 – 50
?g/mL) of 20 nm diameter citrate-coated silver nanoparticles (AgNPs).
The physiochemical properties of the AgNPs (size, size distribution,
surface charge, shape, etc.) were characterized using transmission
electron microscopy and dynamic light scattering. The dose-dependent
cytotoxic effects of AgNPs were determined using lactate dehydrogenase
(LDH), 2,3-bis-(2-methoxy-4-nitro-5-sulfophenyl)-2H-tetrazolium-5-carboxanilide
(XTT), and FluoroJade-C assays. Based on the cytotoxicity profile,
a toxic dose of AgNPs (10 ?g/ml) was applied to MVECs for subsequent
proteomics analyses. After 24-hours of treatment, proteins were
extracted from AgNP-treated and control cultures and relative protein
levels
were quantified using antibody microarrays that targeted 1,358
proteins from a variety of biological signaling pathways. Our initial
focus
was on apoptosis signaling pathways because of the known cytotoxic
effects of AgNPs. Among the two-dozen apoptosis pathway-associated
proteins examined, fourteen were significantly down-regulated while
three showed significant up-regulation, indicating that these proteins
may play an important role in AgNP-induced toxicity. To further
confirm these antibody microarray results, seven rat protein-specific
antibodies
were selected and used for capillary electrophoresis based immuno-blot
analyses of AgNP-treated and control samples. The results confirmed
changes in the levels of expression of these proteins which include
BAD, BAX, caspase 2, caspase 9, cytochrome C, I?B?, and MCL-1.
The changes in expression of apoptosis-associated proteins may
represent
molecular signature biomarkers of AgNP-induced cytotoxicity. Identifying
such proteins should further elucidate the molecular mechanisms
associated with nanoparticle-induced cytotoxicity and aid in the
effective characterization
and regulatory review of potential toxicities following human exposure
to nanomaterials. |
|||
An Integrative Method for Comprehensively Reconstructing
Transcripts and Long Non-Coding RNA Identification Dan Li and Mary Yang The University of Arkansas at Little Rock Reference-guided approach is often used to reconstruct human transcriptome. Without using a reference genome, de novo method also enables novel transcripts discovery. Here we assessed the assemblers built from these two types of assembly methods, using simulated data and experimental RNAseq data, for long non-coding RNAs (lncRNAs) identification. Moreover, we developed an integrative approach, combining the two different assemblers to identify a more comprehensive lncRNA set. Compared to mRNAs, lncRNAs are typically shorter, with fewer exons and less abundance. Using Polyester R package, we generated RNAseq reads based on known lncRNA annotations. The reference-guided and de novo assemblers identified 62.5% and 72.9% of the known lncRNAs, respectively. In our integrative approach, all transfrags from multiple samples assembled by the two assemblers were used as input for Cuffmerge utility. Then, the resulting assemblies were merged together, which resulted in a more comprehensive single collection for the following lncRNA identification procedure. Using the integrative approach, over 75% of known lncRNAs were identified, 88.1% of these identified lncRNAs overlapped >80% length of the known lncRNAs. The relative low discovery rates may attribute to rigorous filters applied to lncRNA candidates. To reduce false positive, we removed single exons that were not overlapped with known annotations and transfrags mapped to low mappablity and alignment regions. Additionally, an experimental RNASeq data set, consisting of RNASeq reads of 57 human tissue samples, were analyzed. The lncRNAs detected by the integrative method showed more comprehensive features, such as completeness, overlapping and splicing. Thus our integrative approach outperformed the individual methods. AR-BIC-10 |
|||
Association of Age at Menarche
with Composition and Diversity of Gut Microbiota Among Women in the Twins
UK Study Early menarche is associated with increased risks of cardiovascular
disease (CVD) incidence and mortality. We have hypothesized that this
association results from a shared cause; alterations in the gut microbiome.
We tested this hypothesis in a study of 908 female adult twins (542 dizygotic
and 366 monozygotic) drawn from the TwinsUK registry. Age at menarche
was self-reported. Microbial amplicons from the V3-V4 hypervariable regions
of the 16S-rRNA gene were sequenced from fecal samples. Demultiplexed
RNA sequences were downloaded from the European Bioinformatics Institute
(EBI). We carried out quality filtering and open-reference Operational
Taxonomic Unit (OTU) picking using QIIME. Sequences are aligned against
the reference sequences of the Green Genes database. Sequences that fail
the alignment are clustered de novo and cluster centroids are chosen
as new reference sequences. Alpha diversity was measured in samples rarefied
to a sequencing depth of 8,136. Mixed model regression analysis was used
to model the association of menarche with microbial diversity while adjusting
for covariance in twinned measures and for the effects of potential confounders,
including birth cohort and BMI. Compared to the second and the third
quantiles, women in the first and last quantiles for age at menarche
had reduced alpha diversity. Age at menarche and alpha diversity of the
fecal microbiome each accounted for some variations in CVD risk factors.
The mechanisms underlying the observed associations remain to be elucidated. |
|||
Biomarker Discovery for Kidney Cancer Diagnosis Based on a Unique Signature of
Metabolic Reprogramming Otto
Warburg firstly proposed aerobic glycolysis as a key metabolic reprogramming
of cancer
cell in 1956 that became a famous hallmark
of cancer. However, in the past years, many studies showed different
metabolic reprogramming indirectly associated with proliferation processes
believed to be additional hallmarks of cancer. Thanks to high-throughput
technologies that have been used to generate a fruitful amount of -omics
data, shared across the research community, enabling a powerful holistic
comparison of cancer metabolism. To search for the signatures of metabolic
reprogramming in the different cancer types within this study, high-dimensional
datasets derived from many cancer types, including SNP analysis, RNA-seq,
and protein profile, were obtained from The Cancer Genome Atlas (TGCA)
and The Human Proteome Atlas (HPA). Human Metabolic Atlas (HMA), our
well-curated, comprehensive collection of human metabolism, was used
as the scaffold for multilevel omics data mapping and integration.
Through our developed computational pipelines/tools (e.g., PIANO, INIT),
we identified divergences of kidney cancer metabolism from other cancer
types. Metabolism of kidney cancer correlated with loss of von Hippel-Lindau
tumor suppressor (VHL) located on chromosome 3p. Strikingly, the GAG
pathway was discovered to be associated strongly with coordinated regulation
and progression only in kidney cancer, and could, therefore, be used
as a biomarker for clinical diagnosis. The GAG profile measured in
both plasma and urine samples was distinctively altered in the cancer
patients relative to healthy controls in a discovery cohort with accuracy
greater than 82%. Furthermore, the biomarker was successfully validated
in another independent cohort, strongly indicating the robustness of
the biomarker. Applying systems biology to dissect the biological problem
enables high quality biomarker discovery that can be translated into
clinical diagnosis in practice. |
|||
A Framework for Evaluating the Quality of the Personal Genomes Generated
by De Novo Assembly Tools
Gokhan Yavas, Leihong Wu, Huixiao Hong, Weida Tong and Wenming Xiao National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR With
the advent of the Next Generation Sequencing (NGS) technologies,
it is now possible to generate millions of sequencing reads from
a human genome, which can then be used towards many applications
such as identifying the single nucleotide variations (SNVs) and structural
variations. Another promising application is the de novo assembly
of
reads to build a personalized genome. For this purpose, many tools
have been developed to build a novel genome or to evaluate the
quality
of assembly outcomes. Evaluation of assembly usually depends on
the alignment of contigs to a reference genome, which demands greatly
on computational resources such as runtime memory, CPU and storage
space.
Based on several comparative studies on existing assembly evaluation
tools, it remains as a big challenge to architect a framework with
good runtime performance. In this study, we present a framework
that
can maximize the usage of available computational environment by
performing contig alignment and post processing in parallel. Our
flexible design
allows split jobs being run either on a high performance computing
(HPC) cluster or a multi-core workstation. The input, given in
the form of a set of contigs, can be partitioned into a user-defined
number of chunks, each of which can then be aligned and processed
in
either
the separate nodes of a HPC cluster or separate cores of a workstation.
Based on carefully filtered alignment, it generates statistics
such as the total genome coverage, gene and exon coverage, contig
duplication
and continuity as well as SNVs embedded in the assembly and SNV
related statistics. Our framework also provides stand-alone quality
statistics
such as contig size distribution, Nx statistics, etc. We compared
multiple genomes assembled via various assembly algorithms such as
SOAPdenovo,
Falcon, and Celera assembler. The results demonstrated the capability
of our tool for providing a complete package of quality metrics
with high performance on different settings of computer environment. |
|||
Tree-Based Recursive Partitioning Methods for Subgroup
Selection in Precision Medicine
Un Jung Lee, Yu-Chuan Chen and James J. Chen Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR Precision
medicine is to customize a medical model for new tools and therapies
to select best treatments, being tailored to the individual
patients. Patients’ selection for subgroup plays an important
role in precision medicine to assess the treatment effects in subgroups;
it provides useful information to optimize the treatment assignment.
In this study, we propose using tree-based recursive partitioning to
identify patient subgroups with the enhanced treatment effect in clinical
trials. Two subgroup identification strategies are presented. One is
based on the Differential Effect Search (SIDES) algorithm where the
subgroups are identified by maximizing the treatment effect between
treatment group and control group. SIDES generates multiple candidate
subgroups; it is desirable to have a single subgroup to be used for
treatment assignment. We evaluate several methods to identify “optimal” subgroups
from the list of subgroups identified. The second strategy is an ensemble
tree-based method. For a given terminal node in a tree, the patients
in that terminal node are assigned to have a score equaling to the
proportion of the responders over the node size. The patient’s
composite score is calculated as sum of all ensemble trees. A change-point
algorithm is then applied to separate responder and non-responder
subgroups. We conduct simulation experiments to evaluate these methods
and compare
with CN2-SD algorithm in term of sensitivity, specificity, and accuracy. |
|||
A Novel Image Interpreting System for Clinics: From
Tumor Response Tracking to Similar Image Retrieval Bayraktar M (1), Topaloglu U (1), McDonald J (2), and Hutchins LF (3) University of Arkansas for Medical Sciences, Little Rock AR - (1) Department of Biomedical Informatics; (2) Division of Nuclear Medicine Department: Radiology; (3) Division of Hematology/Oncology Clinicians
rely on radiologist for the imaging interpretation as well
as annotation and markup. Due to software disconnects and some
communication hurdles around the ordering images, the clinics
usually don’t receive the imaging study results in a
precision they need for the treatment of the patient. That
plays vital roles in chronic diseases such as cancer where
tumor progression should be closely monitored. Although huge
progress made to date to overcome sync problems, due to personnel
training deficiencies, inconsistencies on adopting accepted
standards and weak communication between the two parties, many
times the desired information cannot be captured or conveyed
properly and obtained information cannot be analyzed in a timely
fashion or intended purposes. This paper attempts to provide
a solution to the aforementioned problem by creating a system
with four distinct functionality. First, we propose bringing
a referent tracking based method for the tracking response
of lesions over time and the treatment modality. Second, we
suggest an agile auto correction system to help those interpreting
medical images better utilize the accepted vocabulary and markup
patterns determined by NIH (National Institute of Health) recommended
RECIST standards (Response Evaluation Criteria in Solid Tumors).
Third, to reinforce accuracy, we will then automatically reprocess
these slices and segment lesions, comparing them to the radiologists`
manual delineations previously interpreted by RECIST. |
|||
Genetic and genomic analyses of Staphylococcus agnetis,
an agent of Bacterial Chondronecrosis with Osteomyelitis in Broilers
Adnan A. Al-Rubaye, Sura Zaki, Nnamdi Ekesi, Abdulkarim Shwani, Robert F. Wideman, Young Min Kwon and Douglas Rhoads University of Arkansas, Fayetteville, AR, USA Lameness is a significant problem in the poultry industry resulting
in millions of dollars in lost revenue annually. In broilers, a common
cause of lameness is bacterial chondronecrosis with osteomyelitis (BCO).
Using a wire flooring model to induce lameness we identified Staphylococcus
agnetis as the principle species isolated from BCO lesions on our research
farm. Administration of S. agnetis isolates in drinking water at 20
days of age can induce high incidence of BCO in birds on wire flooring.
Our data supports a model that rearing chicks on wire flooring leads
to bacterial translocation across the intestinal epithelium into the
blood. S. agnetis appears to colonize the susceptible proximal femoral
and tibial growth plates inducing necrosis and lameness by 40-56 days.
As this species has previously not been associated with BCO in poultry
it may have emerged as a result of our protracted experiments inducing
high levels of lameness. We have sequenced, assembled, and annotated
the S. agnetis genome from chicken isolates. Current work is aimed
at understanding the phylogenomic relationships between our poultry
isolates and isolates from other sources. Through genomic analysis
of the bacterium we seek to identify genetic determinants associated
with the transition to a chicken pathogen. Our hypothesis is that defining
the likely route of transmission to broilers, and genomic analyses
will contribute substantially to the development of measures for mitigating
BCO losses in poultry.Broiler; lameness; bacteria; pathogen; genome;
leg |
|||
Liver
Toxicity Knowledge Base (LTKB): A comprehensive database to understand
multiple dimensions of Drug-Induced Liver Injury
Shraddha Thakkar, Minjun Chen, Hong Fang, Zhichao Liu, Gerry Zhou, Jie Zhang and Weida Tong Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR Drug
Induced Liver Injury (DILI) is one of the foremost reasons for acute
liver failure along with cause of termination of many clinical
trials and discontinuation of many approved drugs. Therefore, DILI
is one of the major concerns during the drug development as well as
the reviewing process. To address that issue, Liver Toxicity Knowledge
Base
(LTKB) was developed to improve our understanding of underlying mechanisms
involving DILI and its prediction. This knowledgebase includes the
DILI related information from ~3000 prescription drugs, such as drug
physicochemical
properties along with its dosage, side effect, and therapeutic uses.
In addition, mechanistically relevant cellular end points from various
in vitro assays (conventional, high-throughput and high-content assays),
drug-elicited toxicogenomic responses from both primary hepatocytes
and animals, and histopathology were also incorporated in the database.
We
also linked the LTKB data to the data from the ToxCast (from EPA)
and
Tox21 (from NIH, EPA, National toxicology program and FDA) projects.
LTKB drugs were analyzed for its potential to cause DILI and classified
based on its severity. DILI annotations were also identified from
the FDA-approved drug labels that provide the safety information from
clinical
trials and post marketing surveillance. As a result, 749 drugs were
identified with some level of DILI concern and 619 drugs further annotated
for the
DILI information from LiverTox database (from NIH). Database was
developed on Accelrys Isentris 4.0 platform and provides the comprehensive
DILI
information and findings at various level of biological complexity
at one location. In summary, this poster will provide the use cases
for extracting the desired information from database for generating the
better
understanding of DILI. This knowledgebase can be a resource to improve
DILI predictive model for drug discovery and drug safety. |
|||
LCS-Based Protein Structure Prediction There is an ever-increasing number of unsolved structures of proteins
which are considered to be of low homology in the Protein Data Bank.
Even the most accurate template-based protein structure prediction
software show marginal performance against them. Accuracy of the structure
predictions in this case, is always a function of the interdisciplinary
knowledge shared by the research group. Sequence motifs are short,
recurring patterns in DNA that are conjectured to have biological significance;
These motifs often indicate specific binding sites for proteins such
as nucleases and transcription factors. This tool gathers sequence
segments shared between a target and its respective template(s) using
LCS (Longest Common Substring). The longest common substring is the
longest substring shared between two or more strings. It cross-references
the previously mentioned sequence segments with against an interactive
database of sequence motifs created as a part of this project from
various public repositories. This allows users to access the templates
with reasonable potential for future modeling. Although proteins are
poly-amino acid sequences, for all practical purposes in this research,
we considered them as character strings. This is an attempt to identify
sequence motifs shared by target as well as the template(s) which is
crucial for better construction of 3D models of proteins. This project
was made possible by the Arkansas INBRE program, supported by grant
funding from the National Institutes of Health (NIH) National Institute
of General Medical Sciences (NIGMS) (P20 GM103429) (formerly P20RR016460). |
|||
Network Analysis Pipelines for Systems Biology Seed development
is an evolutionarily important phase of the plant life cycle that governs
the fate of next progeny. Distinct
sub-regions within
seeds have diverse roles from protecting and nourishing the embryo as
it enlarges, to synthesis of storage reserves that serve as an important
source
of human nutrition. Previous studies have revealed fine coordination
between transcription factors (TFs) that genetically interact to ensure
proper
maintenance and development of the embryo. Here we present the first
genome-wide predictions of regulatory interactions between TFs and
target genes in
the context of Arabidopsis seed development. Our gene regulatory network
is based on a panel of high resolution seed-specific gene expression
data. Querying the network with a list of genes with evidence of seed-specific
activity, revealed several transcriptional regulators that are associated
with different developmental programs having varying levels of confidence.
We identified functional gene modules active during embryo, endosperm
and
seed coat formation, and delineated the topological architecture of tissue-specific
networks that differ from non-tissue-specific gene interaction networks.
Our easily adaptable network analysis pipeline can be used to discover
regulatory programs in other organisms, including human disease specific
genomic datasets. |
|||
Alternative Splicing in Environmental
Stress Regulated Genes Alternative
splicing (AS) expands the transcriptome of humans and other metazoans,
and is responsible for many human diseases due to mis-regulation.
Plants cope with environmental stresses by expanding the transcriptome
through AS, primarily through the mechanism of intron retention. However,
most bioinformatics tools developed to analyze human AS are based on
exon skipping. To analyze AS in plants new tools have been implemented,
specifically for intron retention. To study the response of rice to environmental
stress, the rice cultivar Nipponbare (reference genome) was treated
to drought and well-watered
conditions at vegetative flag-leaf (V4) and early reproductive (R3) stages;
as well as high-temperature at reproductive (R3) and grain filling (R6)
stages. The response of rice in response to drought and temperature stress
was studied using RNA-Seq, followed by bioinformatics analyses to identify
and quantify differentially expressed genes and splice junctions by Tophat/Cufflinks
and Splicegrapher. Predicted AS isoforms were validated by experimental
approaches, and drought responsive AS transcripts were quantified using
isoform specific primers for qPCR. Characterization of the isoform structure
of AS genes by amplification, cloning and sequencing of individual isoforms
of specific genes of interest, using primers designed across unique splice
junctions that are differentially expressed under stress. The AS data
of regulatory and downstream genes was used to develop models of regulation
of drought tolerance by AS under drought. |
|||
Best Practice in Mining Meaningful
Topics from Regulatory Textual Documents Probabilistic
topic modeling offers a viable approach to structure huge textual document
collections into latent topic themes to aid text mining.
FDA lore describes drug applications arriving in eighteen wheelers. Today
the agency handles vast digital textual information from submissions
representing some 25% of U.S. GDP, and untold terabytes of information
from post market surveillance. Where experts are too few or slow, the
means to extract information germane to regulatory questions is paramount.
Here we describe extensive sensitivity studies to determine best practices
for generating effective topic models. To test effectiveness and validity
of topic models, we constructed a ground truth data set from PubMed that
contained some 40 health related themes including negative controls,
and mixed it with a data set of unstructured documents. The most useful
models, tuned to desired sensitivity versus specificity, require an iterative
process wherein preprocessing steps, the type of topic modeling algorithm,
and the algorithm’s model parameters are systematically varied.
Models need to be compared with both qualitative, subjective assessments
and quantitative, objective assessments, and care is required that Gibbs
sampling in model estimation is sufficient to assure stable solutions.
With a high quality model, documents can be rank-ordered in accordance
with probability of being associated with complex regulatory query string,
greatly lessoning text mining work. Importantly, topic models are agnostic
about how words and documents are defined, and thus our findings are
extensible to topic models where samples are defined as documents, and
genes, proteins or their sequences are words. |
|||
Formula Milk Alters Microbial Diversity and
Impacts Immune Response in Porcine Neonatal Model Formula
feeding in infants is associated with increased risk of upper respiratory
tract
infections, allergies and gut dysfunction possibly
by compromising gut immune system. We hypothesized that formula feeding
alters the host microbiome, resulting in morphology changes and modulation
of large intestine immune response. It was tested in piglets fed formula
diet or sow from postnatal day (PND) 2 to 21 (n=12/group). Principal
component analyses of microbial data indicated clear separation of the
formula fed group from the control sow group and more microbial diversity
(p=0.01) was observed. Formula feeding showed 2- to 8-fold abundance
of Bacteroidaceae, Porphyromondadaceae, Rokenellaceae, Odoribacteraceae
within Bacteroides phylum at family level in comparison to sow group.
However, in the sow fed group 5-fold higher Paraprevotellaceae was observed
(p<0.01). In Firmicutes only Streptococcus spp was 5-fold higher in
formula fed piglets (p<0.05). Furthermore, 11-fold higher Gammaproteobacteria
and 20 folder higher Verrucomicrabae was observed in formula fed pigelts
(p<0.01) in comparison to sow-fed, indicating that formula-fed piglets
show more microbial diversity than sow fed piglets in distal colon. Cytokine
analyses showed 1.5- to 2.0-fold increases (p<0.05) in gene expression
of BMP4, CCL21, CCL25, CSF3, VEGF-A and 1.5- to 4-fold (p<0.05) decreases
in CXCL-11, IL-27 in PC and DC in formula group, suggesting the participation
of specific cytokines in alteration of mucosal barrier and activation
of gut-associated immune response. A significant increase in crypt density
(p<0.05) was noticed in PC of formula-fed group, suggesting alterations
in colon morphology. In summary, formula diet-driven microbiome changes
accompanied alterations in colon crypt density and cytokine response.
(USDA-ARS Project 6026-51000-010-05S). |
|||