Abstract:
We have investigated time evolutions of power spectra of density fluctuations for long time after the first appearance of caustics in the expanding one-dimensional universe. It is found that when an initial power spectrum is sale-free with a power index $n$, a self-similarity of the time evolution of the power spectrum is achieved. We find that the power spectrum can be separated roughly into three regimes according to the shape of the power spectrum: the linear regime ($k < k_{nl}$ : the regime {\cal 1}),the single-caustic regime($k_{nl} < k < k_{snl}$ : the regime 2), and the multi-caustics regime($k > k_{snl}$ : the regime 3). The power index of the power spectrum in each regime has the values of $n,-1$, and $\mu$ which depends on $n$, respectively. Even in the case of an initial power-law spectrum with a cutoff scale, there might be the possibility of the self-similar evolution of the power spectrum after the appearance of the caustics. It is found, however, the self-similarity is not achieved in this case. The shape of the power spectrum on scales smaller than the cutoff scale can be separated roughly in two regimes: the virialized regime ($k_{cut}< k < k_{cs}$ : the regime 4), and the smallest-single-caustic regime ($ k > k_{cs}$ : the regime 5). The power index of the power spectrum is $\nu$ which may be determined by the distribution of singular points in the regime 4. In the regime 5, the value of the power index is -1. Moreover we show the general property about the shape of a power spectrum with a general initial condition.

Abstract:
We investigate whether non-linear effects on the large-scale power spectrum of dark matter, namely the increase in small-scale power and the smearing of baryon acoustic oscillations, can be decreased by a log-transformation or emulated by an exponential transformation of the linear spectrum. To that end we present a formalism to convert the power spectrum of a log-normal field to the power spectrum of the logarithmic Gaussian field and vice versa. All ingredients of our derivation can already be found in various publications in cosmology and other fields. We follow a more pedagogical approach providing a detailed derivation, application examples, and a discussion of implementation subtleties in one text. We use the formalism to show that the non-linear increase in small-scale power in the matter power spectrum is significantly smaller for the log-transformed spectrum which fits the linear spectrum (with less than 20% error) for redshifts down to 1 and $k\leq1.0\,h\,\mathrm{Mpc}$. For lower redshifts the fit to the linear spectrum is not as good, but the reduction of non-linear effects is still significant. Similarly, we show that applying the linear growth factor to the logarithmic density leads to an automatic increase in small-scale power for low redshifts fitting to third-order perturbation spectra and Cosmic Emulator spectra with an error of less than $20%$. Smearing of baryon acoustic oscillations is at least three times weaker, but still present.

Abstract:
With large scale text classification labeling a large number of documents for training poses a considerable burden on human experts who need to read each document and assign it to appropriate categories. With this problem in mind, our goal was to develop a text categorization system that uses fewer labeled examples for training to achieve a given level of performance using a similarity-based learning algorithm and thresholding strategies. Experimental results show that the proposed model is quite useful to build document categorization systems. This has been designed for a small level implementation considering the size of the corpus being used. This can be enhanced for a larger data set and the efficiency can be proved against the performance of the presently available methods like SVM, naive bayes etc. This approach on the whole concentrates on categorizing small level documents and does the assigned task with completeness.

Abstract:
This paper has implemented a text categorization system based on Vector Space Model(VSM) and Naive-Bayes(NB).When estimating the category,the authors enhance the veracity of parent-category by emendation of sub-category,and judge whether document has multi-classification and multi-label by estimating the similar difference of classifier's final values.The experiment proves that VSM is better than NB in text representation: MicroF1 increases of 25.2 percent of parent-category,and MicroF1 increases of 26.3 percent of sub-category.

Abstract:
This paper proposes a new indicator of text structure, called the lexical cohesion profile (LCP), which locates segment boundaries in a text. A text segment is a coherent scene; the words in a segment are linked together via lexical cohesion relations. LCP records mutual similarity of words in a sequence of text. The similarity of words, which represents their cohesiveness, is computed using a semantic network. Comparison with the text segments marked by a number of subjects shows that LCP closely correlates with the human judgments. LCP may provide valuable information for resolving anaphora and ellipsis.

Abstract:
Computational methods have been used to find duplicate biomedical publications in MEDLINE. Full text articles are becoming increasingly available, yet the similarities among them have not been systematically studied. Here, we quantitatively investigated the full text similarity of biomedical publications in PubMed Central.

Abstract:
As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is reduced without affecting the effectiveness of the plagiarism detection.

Abstract:
In this new and current era of technology, advancements and techniques, efficient and effective text document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Fuzzy similarity provides a way to find the similarity of features among various documents. In this paper, a technical review on various fuzzy similarity basedmodels is given. These models are discussed and compared to frame out their use and necessity. A tour of different methodologies is provided which is based upon fuzzy similarity related concerns. It shows that how text and web documents are categorized efficiently into different categories. Various experimentalresults of these models are also discussed. The technical comparisons among each model’s parameters are shown in the form of a 3-D chart. Such study and technical review provide a strong base of research work done on fuzzy similarity based text document categorization.

Abstract:
In this new and current era of technology, advancements and techniques, efficient and effective text document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Fuzzy similarity provides a way to find the similarity of features among various documents. In this paper, a technical review on various fuzzy similarity based models is given. These models are discussed and compared to frame out their use and necessity. A tour of different methodologies is provided which is based upon fuzzy similarity related concerns. It shows that how text and web documents are categorized efficiently into different categories. Various experimental results of these models are also discussed. The technical comparisons among each model's parameters are shown in the form of a 3-D chart. Such study and technical review provide a strong base of research work done on fuzzy similarity based text document categorization.

Abstract:
In text mining area, popular methods use the bag-of-words models, which represent a document as a vector. These methods ignored the word sequence information, and the good clustering result limited to some special domains. This paper proposes a new similarity measure based on suffix tree model of text documents. It analyzes the word sequence information, and then computes the similarity between the text documents of corpus by applying a suffix tree similarity that combines with TF-IDF weighting method. Experimental results on standard document benchmark corpus RUTERS and BBC indicate that the new text similarity measure is effective. Comparing with the results of the other two frequent word sequence based methods, our proposed method achieves an improvement of about 15% on the average of F-Measure score.