Abstract:
Learning representations for semantic relations is important for various tasks such as analogy detection, relational search, and relation classification. Although there have been several proposals for learning representations for individual words, learning word representations that explicitly capture the semantic relations between words remains under developed. We propose an unsupervised method for learning vector representations for words such that the learnt representations are sensitive to the semantic relations that exist between two words. First, we extract lexical patterns from the co-occurrence contexts of two words in a corpus to represent the semantic relations that exist between those two words. Second, we represent a lexical pattern as the weighted sum of the representations of the words that co-occur with that lexical pattern. Third, we train a binary classifier to detect relationally similar vs. non-similar lexical pattern pairs. The proposed method is unsupervised in the sense that the lexical pattern pairs we use as train data are automatically sampled from a corpus, without requiring any manual intervention. Our proposed method statistically significantly outperforms the current state-of-the-art word representations on three benchmark datasets for proportional analogy detection, demonstrating its ability to accurately capture the semantic relations among words.

Abstract:
Benjamini, Kalai and Schramm (2001) showed that weighted majority functions of $n$ independent unbiased bits are uniformly stable under noise: when each bit is flipped with probability $\epsilon$, the probability $p_\epsilon$ that the weighted majority changes is at most $C\epsilon^{1/4}$. They asked what is the best possible exponent that could replace 1/4. We prove that the answer is 1/2. The upper bound obtained for $p_\epsilon$ is within a factor of $\sqrt{\pi/2}+o(1)$ from the known lower bound when $\epsilon \to 0$ and $n\epsilon\to \infty$.

Abstract:
We revisit the classical decision-theoretic problem of weighted expert voting from a statistical learning perspective. In particular, we examine the consistency (both asymptotic and finitary) of the optimal Nitzan-Paroush weighted majority and related rules. In the case of known expert competence levels, we give sharp error estimates for the optimal rule. When the competence levels are unknown, they must be empirically estimated. We provide frequentist and Bayesian analyses for this situation. Some of our proof techniques are non-standard and may be of independent interest. The bounds we derive are nearly optimal, and several challenging open problems are posed. Experimental results are provided to illustrate the theory.

Abstract:
Majority-minority relations in Ukraine, as in any other country, are a complex phenomenon. What differentiates the Ukrainian case from many old polities and from some recently established ones is that the identities of both majority and minority groups probably have been settled to a much lesser degree than is usually the case in Europe. The process of defining what it means to be a majority or a minority group in Ukraine goes along with all the other identity-related processes that a newly independent country has to face. The fact that the identity of both majority and minority is still 'in the making' has numerous implications for how the Ukrainian state positions itself with regard to various international standards and mechanisms of minority protection and how international bodies - both intergovernmental and nongovernmental - approach the issue of Ukraine's adherence to these standards and mechanisms.

Abstract:
Majority voting is the most ancient, primitive, divisive and inaccurate measure of collective opinion ever invented. Yet many people believe it to be the very foundation of democracy. The consequences are widespread. Firstly, the outcomes of binary referendums are often held to be “the will of the people”. Secondly, in the wake of general elections, the new intake of elected representatives then forms a majority administration, with some of them having all the power while others have none. And thirdly, in numerous plural societies, ethno- religious minorities/majorities feel justified in resorting to violence against that which they perceive to be majority/minority oppression. Accordingly, this article first compares binary voting with other decision-making voting procedures before then discussing what could be the methodology, the implications and the potential consequences of a more accurate non-majoritarian procedure.

Abstract:
Consider an election between two candidates in which the voters' choices are random and independent and the probability of a voter choosing the first candidate is $p>1/2$. Condorcet's Jury Theorem which he derived from the weak law of large numbers asserts that if the number of voters tends to infinity then the probability that the first candidate will be elected tends to one. The notion of influence of a voter or its voting power is relevant for extensions of the weak law of large numbers for voting rules which are more general than simple majority. In this paper we point out two different ways to extend the classical notions of voting power and influences to arbitrary probability distributions. The extension relevant to us is the ``effect'' of a voter, which is a weighted version of the correlation between the voter's vote and the election's outcomes. We prove an extension of the weak law of large numbers to weighted majority games when all individual effects are small and show that this result does not apply to any voting rule which is not based on weighted majority.

Abstract:
This work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represent relations between individual words explicitly in my model. Lexical attraction is defined as the likelihood of such relations. I introduce a new class of probabilistic language models named lexical attraction models which can represent long distance relations between words and I formalize this new class of models using information theory. Within the framework of lexical attraction, I developed an unsupervised language acquisition program that learns to identify linguistic relations in a given sentence. The only explicitly represented linguistic knowledge in the program is lexical attraction. There is no initial grammar or lexicon built in and the only input is raw text. Learning and processing are interdigitated. The processor uses the regularities detected by the learner to impose structure on the input. This structure enables the learner to detect higher level regularities. Using this bootstrapping procedure, the program was trained on 100 million words of Associated Press material and was able to achieve 60% precision and 50% recall in finding relations between content-words. Using knowledge of lexical attraction, the program can identify the correct relations in syntactically ambiguous sentences such as ``I saw the Statue of Liberty flying over New York.''

Abstract:
Voluntary control of information processing is crucial to allocate resources and prioritize the processes that are most important under a given situation; the algorithms underlying such control, however, are often not clear. We investigated possible algorithms of control for the performance of the majority function, in which participants searched for and identified one of two alternative categories (left or right pointing arrows) as composing the majority in each stimulus set. We manipulated the amount (set size of 1, 3, and 5) and content (ratio of left and right pointing arrows within a set) of the inputs to test competing hypotheses regarding mental operations for information processing. Using a novel measure based on computational load, we found that reaction time was best predicted by a grouping search algorithm as compared to alternative algorithms (i.e., exhaustive or self-terminating search). The grouping search algorithm involves sampling and resampling of the inputs before a decision is reached. These findings highlight the importance of investigating the implications of voluntary control via algorithms of mental operations.

Abstract:
In this paper, a novel approach for the optimal combination of binary classifiers is proposed. The classifier combination problem is approached from a Game Theory perspective. The proposed framework of adapted weighted majority rules (WMR) is tested against common rank-based, Bayesian and simple majority models, as well as two soft-output averaging rules. Experiments with ensembles of Support Vector Machines (SVM), Ordinary Binary Tree Classifiers (OBTC) and weighted k-nearest-neighbor (w/k-NN) models on benchmark datasets indicate that this new adaptive WMR model, employing local accuracy estimators and the analytically computed optimal weights outperform all the other simple combination rules.

Abstract:
This paper explores the kinds of probabilistic relations that are important in syntactic disambiguation. It proposes that two widely used kinds of relations, lexical dependencies and structural relations, have complementary disambiguation capabilities. It presents a new model based on structural relations, the Tree-gram model, and reports experiments showing that structural relations should benefit from enrichment by lexical dependencies.