How to perform research in Hadoop environment not losing mental equilibrium - case study

Conducting a research in an efficient, repetitive, evaluable, but also convenient (in terms of development) way has always been a challenge. To satisfy those requirements in a long term and simultaneously minimize costs of the software engineering process, one has to follow a certain set of guidelines. This article describes such guidelines based on the research environment called Content Analysis System (CoAnSys) created in the Center for Open Science (CeON). Best practices and tools for working in the Apache Hadoop environment, as well as the process of establishing these rules are portrayed.


