Abstract:
The study of open-end fund is conducted in this paper in terms of the theory of Random process and the theory of Sequential Decision, which based on the benefit of investors and the cost of transaction (commission occurred in the transaction). In addition the thesis introduces the method of factor of random discounting, by which investors can choose the optimal way of investment, which is calculated in an analogue case.

Abstract:
AdaBoost is one of the most popular machine-learning algorithms. It is simple to implement and often found very effective by practitioners, while still being mathematically elegant and theoretically sound. AdaBoost's behavior in practice, and in particular the test-error behavior, has puzzled many eminent researchers for over a decade: It seems to defy our general intuition in machine learning regarding the fundamental trade-off between model complexity and generalization performance. In this paper, we establish the convergence of "Optimal AdaBoost," a term coined by Rudin, Daubechies, and Schapire in 2004. We prove the convergence, with the number of rounds, of the classifier itself, its generalization error, and its resulting margins for fixed data sets, under certain reasonable conditions. More generally, we prove that the time/per-round average of almost any function of the example weights converges. Our approach is to frame AdaBoost as a dynamical system, to provide sufficient conditions for the existence of an invariant measure, and to employ tools from ergodic theory. Unlike previous work, we do not assume AdaBoost cycles; actually, we present empirical evidence against it on real-world datasets. Our main theoretical results hold under a weaker condition. We show sufficient empirical evidence that Optimal AdaBoost always met the condition on every real-world dataset we tried. Our results formally ground future convergence-rate analyses, and may even provide opportunities for slight algorithmic modifications to optimize the generalization ability of AdaBoost classifiers, thus reducing a practitioner's burden of deciding how long to run the algorithm.

Abstract:
Numerous data mining problems involve an investigation of associations between features in heterogeneous datasets, where different prediction models can be more suitable for different regions. We propose a technique of boosting localized weak learners; rather than having constant weights attached to each learner (as in standard boosting approaches), we allow weights to be functions over the input domain. In order to find out these functions, we recognize local regions having similar characteristics and then build local experts on each of these regions describing the association between the data characteristics and the target value. We performed a comparison with other well known combining methods on standard classification and regression benchmark datasets using decision stump as based learner, and the proposed technique produced the most accurate results.

Abstract:
One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. Performance guarantees become crucial for tasks such as microarray data analysis due to very small sample sizes resulting in limited empirical evaluation. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of well known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with much smaller number of genes while giving competitive classification accuracy but also have tight risk guarantees on future performance unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.

Abstract:
There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which creates what we denote as a "spikey-smooth" classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

Abstract:
We propose a method and a tool for automatic generation of hardware implementation of a decision rule based on the Adaboost algorithm. We review the principles of the classification method and we evaluate its hardware implementation cost in terms of FPGA's slice, using different weak classifiers based on the general concept of hyperrectangle. The main novelty of our approach is that the tool allows the user to find automatically an appropriate tradeoff between classification performances and hardware implementation cost, and that the generated architecture is optimized for each training process. We present results obtained using Gaussian distributions and examples from UCI databases. Finally, we present an example of industrial application of real-time textured image segmentation.

Abstract:
We reformulate Pratt's tableau decision procedure of checking satisfiability of a set of formulas in PDL. Our formulation is simpler and more direct for implementation. Extending the method we give the first EXPTIME (optimal) tableau decision procedure not based on transformation for checking consistency of an ABox w.r.t. a TBox in PDL (here, PDL is treated as a description logic). We also prove the new result that the data complexity of the instance checking problem in PDL is coNP-complete.

Abstract:
The decision tree is one of the most fundamental programming abstractions. A commonly used type of decision tree is the alphabetic binary tree, which uses (without loss of generality) ``less than'' versus ''greater than or equal to'' tests in order to determine one of $n$ outcome events. The process of finding an optimal alphabetic binary tree for a known probability distribution on outcome events usually has the underlying assumption that the cost (time) per decision is uniform and thus independent of the outcome of the decision. This assumption, however, is incorrect in the case of software to be optimized for a given microprocessor, e.g., in compiling switch statements or in fine-tuning program bottlenecks. The operation of the microprocessor generally means that the cost for the more likely decision outcome can or will be less -- often far less -- than the less likely decision outcome. Here we formulate a variety of $O(n^3)$-time $O(n^2)$-space dynamic programming algorithms to solve such optimal binary decision tree problems, optimizing for the behavior of processors with predictive branch capabilities, both static and dynamic. In the static case, we use existing results to arrive at entropy-based performance bounds. Solutions to this formulation are often faster in practice than ``optimal'' decision trees as formulated in the literature, and, for small problems, are easily worth the extra complexity in finding the better solution. This can be applied in fast implementation of decoding Huffman codes.

Abstract:
The effects of collar diameter and lifting period on shoot biomass production of Teak (Tectona grandis Linn. F) stumps were investigated during the 2006 dry season on the research farm of Kwame Nkrumah University of Science and Technology, Kumasi, Ghana. The lifting period is time when planting of stumps is delayed. A 3x5 split-plot factorial experiment in randomized complete block design was used. The main aim was to determine suitable methods of converting teak seedlings into stumps and evaluate the consequences of delaying planting of teak stumps to enhance high amount of teak shoot production thus address the persistent problem of poor sprouting and low biomass productivity in the early stages of planted teak in large-scale plantation development in Ghana. Collar diameter of stumps, and time as well as the interaction of collar diameter and period of delay of planting stumps and the combined effects of these factors had significant effect on number of shoots per plot. The highest number of teak shoots was obtained from teak stumps of collar diameter of 2.75 cm planted immediately after harvesting. For teak stumps of collar diameter of 2.75 cm if planting is delayed for 4 weeks the number of shoots reduces to 25 % of its value if planted immediately. Teak stumps stored in the open air when planting is delayed beyond 15 weeks may not produce shoots.

Abstract:
In this article, a general problem of sequential statistical inference for general discrete-time stochastic processes is considered. The problem is to minimize an average sample number given that Bayesian risk due to incorrect decision does not exceed some given bound. We characterize the form of optimal sequential stopping rules in this problem. In particular, we have a characterization of the form of optimal sequential decision procedures when the Bayesian risk includes both the loss due to incorrect decision and the cost of observations.