The large scale content distribution systems were improved broadly using the replication techniques. The demanded contents can be brought closer to the clients by multiplying the source of information geographically, which in turn reduce both the access latency and the network traffic. The system scalability can be improved by distributing the load across multiple servers which is proposed by replication. If a copy of the requested object (e.g., a web page or an image) is located in its closer proximity then the clients would feel low access latency. Depending on the position of the replicas, the effectiveness of replication tends to a large extent. A QoS based overlay network architecture involving an intelligent replica placement algorithm is proposed in this paper. Its main goal is to improve the network utilization and fault tolerance of the P2P system. In addition to the replica placement, it also has a caching technique, to reduce the search latency. We are able to show that our proposed architecture attains less latency and better throughput with reduced bandwidth usage, through the simulation results.