AURP: An AUV-Aided Underwater Routing Protocol for Underwater Acoustic Sensor Networks
Seokhoon Yoon,Abul K. Azad,Hoon Oh,Sunghwan Kim
Sensors , 2012, DOI: 10.3390/s120201827
Abstract: Deploying a multi-hop underwater acoustic sensor network (UASN) in a large area brings about new challenges in reliable data transmissions and survivability of network due to the limited underwater communication range/bandwidth and the limited energy of underwater sensor nodes. In order to address those challenges and achieve the objectives of maximization of data delivery ratio and minimization of energy consumption of underwater sensor nodes, this paper proposes a new underwater routing scheme, namely AURP (AUV-aided underwater routing protocol), which uses not only heterogeneous acoustic communication channels but also controlled mobility of multiple autonomous underwater vehicles (AUVs). In AURP, the total data transmissions are minimized by using AUVs as relay nodes, which collect sensed data from gateway nodes and then forward to the sink. Moreover, controlled mobility of AUVs makes it possible to apply a short-range high data rate underwater channel for transmissions of a large amount of data. To the best to our knowledge, this work is the first attempt to employ multiple AUVs as relay nodes in a multi-hop UASN to improve the network performance in terms of data delivery ratio and energy consumption. Simulations, which are incorporated with a realistic underwater acoustic communication channel model, are carried out to evaluate the performance of the proposed scheme, and the results indicate that a high delivery ratio and low energy consumption can be achieved.
Abbreviation definition identification based on automatic precision estimates
Sunghwan Sohn, Donald C Comeau, Won Kim, W John Wilbur
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-402
Abstract: On the Medstract corpus our algorithm produced 97% precision and 85% recall which is higher than previously reported results. We also annotated 1250 randomly selected MEDLINE records as a gold standard. On this set we achieved 96.5% precision and 83.2% recall. This compares favourably with the well known Schwartz and Hearst algorithm.We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result. This process is purely automatic.Abbreviations are widely used in biomedical text. The amount of biomedical text is growing faster than ever. In early 2007, MEDLINE included about 17 million references. For common technical terms in biomedical text, people tend to use an abbreviation rather than using the full term [1,2]. In this paper we interchangeably use the term short form (SF) for an abbreviation and long form (LF) for its definition. Along with the growing volume of biomedical texts the number of resulting SF-LF pairs will also increase. The presence of unrecognized words in text affects information retrieval and information extraction in the biomedical domain [3-5]. This creates the continual need to keep up with new information, such as new SF-LF pairs. A robust method to identify the SFs and their corresponding LFs within the same article can resolve the meaning of the SF later in the article. In addition, an automatic method enables one to construct an abbreviation and definition database from a large data set.Another challenging issue is how to evaluate the pairs found by an automatic abbreviation identification algorithm, especially when dealing with a large and growing database such as MEDLINE. It is impractical to manually annotate the whole database to evaluate the accuracy of pairs found by the algorithm. An automatic way to estimate the accuracy of extracted SF-LF pairs is helpful to save human labor and to accompl
PubChem3D: Similar conformers
Evan E Bolton, Sunghwan Kim, Stephen H Bryant
Journal of Cheminformatics , 2011, DOI: 10.1186/1758-2946-3-13
Abstract: The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST) of 0.8 or greater and a color Tanimoto (CT) of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds") for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs), despite limited PubChem 2-D similarity.In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average count of conformer neighbors per conformer increases rather slowly as a function of diverse conformers considered, with only a 70% increase for a ten times growth in conformers per compound (a 68-fold increase in the conformer pairs considered).Neighboring 3-D conformers on the scale performed, if implemented naively, is an intractable problem using a modest sized compute cluster. Methodology developed in this work relies on a series of filters to prevent performing 3-D superposition optimization, when it can be determined that two conformers cannot possibly be a neighbor. Most filters are based on Tanimoto equation volume constraints, avoiding incompatible conformers; however, others consider preliminary superposition between conformers using reference shapes.The "Simila
PubChem3D: Biologically relevant 3-D similarity
Sunghwan Kim, Evan E Bolton, Stephen H Bryant
Journal of Cheminformatics , 2011, DOI: 10.1186/1758-2946-3-26
Abstract: The similarity value distributions of 269.7 billion unique conformer pairs from 734,486 biologically tested compounds (all-against-all) from PubChem were utilized to help work towards an answer to the question: what is a biologically meaningful 3-D similarity score? The average and standard deviation for the six similarity measures STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt were 0.54 ± 0.10, 0.07 ± 0.05, 0.62 ± 0.13, 0.41 ± 0.11, 0.18 ± 0.06, and 0.59 ± 0.14, respectively. Considering that this random distribution of biologically tested compounds was constructed using a single theoretical conformer per compound (the "default" conformer provided by PubChem), further study may be necessary using multiple diverse conformers per compound; however, given the breadth of the compound set, the single conformer per compound results may still apply to the case of multi-conformer per compound 3-D similarity value distributions. As such, this work is a critical step, covering a very wide corpus of chemical structures and biological assays, creating a statistical framework to build upon.The second part of this study explored the question of whether it was possible to realize a statistically meaningful 3-D similarity value separation between reputed biological assay "inactives" and "actives". Using the terminology of noninactive-noninactive (NN) pairs and the noninactive-inactive (NI) pairs to represent comparison of the "active/active" and "active/inactive" spaces, respectively, each of the 1,389 biological assays was examined by their 3-D similarity score differences between the NN and NI pairs and analyzed across all assays and by assay category types. While a consistent trend of separation was observed, this result was not statistically unambiguous after considering the respective standard deviations. While not all "actives" in a biological assay are amenable to this type of analysis, e.g., due to different mechanisms of action or binding configura
PubChem3D: Diversity of shape
Evan E Bolton, Sunghwan Kim, Stephen H Bryant
Journal of Cheminformatics , 2011, DOI: 10.1186/1758-2946-3-9
Abstract: The diversity of shape space was investigated by determining the shape similarity threshold to achieve a maximum on the count of reference shapes per unit of conformer volume. The rate of growth in shape space, as represented by a decreasing shape similarity threshold, was found to be remarkably smooth as a function of volume. There was no apparent correlation between the count of conformers per unit volume and their diversity, meaning that a single reference shape can describe the shape space of many chemical structures. The ability of a volume to describe the shape space of lesser volumes was also examined. It was shown that a given volume was able to describe 40-70% of the shape diversity of lesser volumes, for the majority of the volume range considered in this study.The relative growth of shape diversity as a function of volume and shape similarity is surprisingly uniform. Given the distribution of chemicals in PubChem versus what is theoretically synthetically possible, the results from this analysis should be considered a conservative estimate to the true diversity of shape space.Virtual screening of large chemical databases is now a routine practice in modern drug discovery [1-8]. One successful virtual screening approach is to compare the 3-D shape similarity of chemical structures using atom-centered Gaussian functions [9-11], e.g., as implemented in ROCS [12]. While this Gaussian-based approach to shape can perform hundreds or even thousands of chemical structure 3-D shape superposition computations per second per Central Processing Unit (CPU) core, even faster approaches with similar efficacy would be welcomed when searching a database of millions of chemical structures and (potentially) billions of conformers.Attempts [13,14] have been made to use ROCS to identify reference shapes, which are then used to compute 3-D shape similarities at dramatically enhanced rates. One approach [13] created a binary "shape fingerprint" used much like traditional 2-D mo
PubChem3D: Conformer generation
Evan E Bolton, Sunghwan Kim, Stephen H Bryant
Journal of Cheminformatics , 2011, DOI: 10.1186/1758-2946-3-4
Abstract: Using the software package OMEGA from OpenEye Scientific Software, Inc., theoretical 3-D conformer models were generated for 25,972 small-molecule ligands, whose 3-D structures were experimentally determined. Different values for primary conformer generation parameters were systematically tested to find optimal settings. Employing a greater fragment sampling rate than the default did not improve the accuracy of the theoretical conformer model ensembles. An ever increasing energy window did increase the overall average accuracy, with rapid convergence observed at 10 kcal/mol and 15 kcal/mol for model building and torsion search, respectively; however, subsequent study showed that an energy threshold of 25 kcal/mol for torsion search resulted in slightly improved results for larger and more flexible structures. Exclusion of coulomb terms from the 94s variant of the Merck molecular force field (MMFF94s) in the torsion search stage gave more accurate conformer models at lower energy windows. Overall average accuracy of reproduction of bioactive conformations was remarkably linear with respect to both non-hydrogen atom count ("size") and effective rotor count ("flexibility"). Using these as independent variables, a regression equation was developed to predict the RMSD accuracy of a theoretical ensemble to reproduce bioactive conformations. The equation was modified to give a minimum RMSD conformer sampling value to help ensure that 90% of the sampled theoretical models should contain at least one conformer within the RMSD sampling value to a "bioactive" conformation.Optimal parameters for conformer generation using OMEGA were explored and determined. An equation was developed that provides an RMSD sampling value to use that is based on the relative accuracy to reproduce bioactive conformations. The optimal conformer generation parameters and RMSD sampling values determined are used by the PubChem3D project to generate theoretical conformer models.PubChem [1-4] is an open
PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
Sunghwan Kim, Evan E Bolton, Stephen H Bryant
Journal of Cheminformatics , 2011, DOI: 10.1186/1758-2946-3-25
Abstract: Using a basis set of 4.18 billion 3-D neighbor pairs identified from single conformer per compound neighboring of 17.1 million molecules, shape descriptors were computed for all conformers. These steric shape descriptors included several forms of molecular volume and shape quadrupoles, which essentially embody the length, width, and height of a conformer. For a given 3-D neighbor conformer pair, the volume and each quadrupole component (Qx, Qy, and Qz) were binned and their frequency of occurrence was examined. Per molecular volume type, this effectively produced three different maps, one per quadrupole component (Qx, Qy, and Qz), of allowed values for the similarity metric, shape Tanimoto (ST) ≥ 0.8.The efficiency of these relationships (in terms of true positive, true negative, false positive and false negative) as a function of ST threshold was determined in a test run of 13.2 billion conformer pairs not previously considered by the 3-D neighbor set. At an ST ≥ 0.8, a filtering efficiency of 40.4% of true negatives was achieved with only 32 false negatives out of 24 million true positives, when applying the separate Qx, Qy, and Qz maps in a series (Qxyz). This efficiency increased linearly as a function of ST threshold in the range 0.8-0.99. The Qx filter was consistently the most efficient followed by Qy and then by Qz. Use of a monopole volume showed the best overall performance, followed by the self-overlap volume and then by the analytic volume.Application of the monopole-based Qxyz filter in a "real world" test of 3-D neighboring of 4,218 chemicals of biomedical interest against 26.1 million molecules in PubChem reduced the total CPU cost of neighboring by between 24-38% and, if used as the initial filter, removed from consideration 48.3% of all conformer pairs at almost negligible computational overhead.Basic shape descriptors, such as those embodied by size, length, width, and height, can be highly effective in identifying shape incompatible compound confo
Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis
Kim Sunghwan,Bolton Evan E,Bryant Stephen H
Journal of Cheminformatics , 2012, DOI: 10.1186/1758-2946-4-28
Abstract: Background To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D) descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules. Results The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set) and between non-inactive compounds in a given biological assay (156-K set). When the “best-conformer-pair” approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive–non-inactive (NN) pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs. Conclusion These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non-inactive from random and inactive spaces “on average”, although some assays show a noticeable separation between the non-inactive and random spaces when multiple conformers are used for each compound. The present study is a critical next step to understand effects of conformational diversity of the molecules upon the 3-D molecular similarity and its application to biological activity data analysis in PubChem. The results of this study may be helpful to build search and analysis tools that exploit 3-D molecular similarity between compounds archived in PubChem and other molecular libraries in a more efficient way.
PubChem3D: conformer ensemble accuracy
Kim Sunghwan,Bolton Evan E,Bryant Stephen H
Journal of Cheminformatics , 2013, DOI: 10.1186/1758-2946-5-1
Abstract: Background PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. Construction of the conformer models currently available in PubChem3D involves a clustering stage to sample the conformational space spanned by the molecule. While this stage allows one to downsize the conformer models to more manageable size, it may result in a loss of the ability to reproduce experimentally determined “bioactive” conformations, for example, found for PDB ligands. This study examines the extent of this accuracy loss and considers its effect on the 3-D similarity analysis of molecules. Results The conformer models consisting of up to 100,000 conformers per compound were generated for 47,123 small molecules whose structures were experimentally determined, and the conformers in each conformer model were clustered to reduce the size of the conformer model to a maximum of 500 conformers per molecule. The accuracy of the conformer models before and after clustering was evaluated using five different measures: root-mean-square distance (RMSD), shape-optimized shape-Tanimoto (STST-opt) and combo-Tanimoto (ComboTST-opt), and color-optimized color-Tanimoto (CTCT-opt) and combo-Tanimoto (ComboTCT-opt). On average, the effect of clustering decreased the conformer model accuracy, increasing the conformer ensemble’s RMSD to the bioactive conformer (by 0.18 ± 0.12 ), and decreasing the STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt scores (by 0.04 ± 0.03, 0.16 ± 0.09, 0.09 ± 0.05, and 0.15 ± 0.09, respectively). Conclusion This study shows the RMSD accuracy performance of the PubChem3D conformer models is operating as designed. In addition, the effect of PubChem3D sampling on 3-D similarity measures shows that there is a linear degradation of average accuracy with respect to molecular size and flexibility. Generally speaking, one can likely expect the worst-case minimum accuracy of 90% or more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13, in terms of STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt, respectively. This expected accuracy improves linearly as the molecule becomes smaller or less flexible.
An efficient finite element method applied to quantum billiard systems
Woo-Sik Son,Sunghwan Rim,Chil-Min Kim
Physics , 2009,
Abstract: An efficient finite element method (FEM) for calculating eigenvalues and eigenfunctions of quantum billiard systems is presented. We consider the FEM based on triangular $C_1$ continuity quartic interpolation. Various shapes of quantum billiards including an integrable unit circle are treated. The numerical results show that the applied method provides accurate set of eigenvalues exceeding a thousand levels for any shape of quantum billiards on a personal computer. Comparison with the results from the FEM based on well-known $C_0$ continuity quadratic interpolation proves the efficiency of the method.
