Containerised databases, complex application initialisation and use of SSH with containers

Advice sought by newbie about assembling a complex application using Docker!

We currently have an application which runs on a single Linux instance and comprises the following high level services:

  • A node.js server for frontend with associated nginx configuration
  • Ruby on rails
  • Neo4j graph database behind custom API (JVM)
  • Elasticsearch document database (JVM)
  • 3 x Postgres databases
  • Python stack exposing large set of application tasks
  • Standalone Python Flask web service

Our normal operating model (cloud based SaaS) is to spin up a Linux instance then build + configure the application via puppet. However we are attempting to sell the application into large enterprises who insist that the application must be installed on-premises and furthermore are restricting which flavours/versions of Linux can be utilised (our puppet config currently relates to a rather old version of CentOS)

We are therefore investigating a container-based approach (Docker) to delivering this software - objective is ease and flexibility of installation for these large enterprises. This though has raised several questions -

(1) Docker appears to be favour stateless services. Many of the services above are databases (opposite of stateless!) and there appears to be some dissenting arguments concerning the use of containerised databases in production. My main concern is scope for data corruption (e.g. disk flushing, dirty termination of containers). Are these concerns valid?

(2) I am confused about best practice for initialising the application. Other than component installation, the puppet config is also responsible for seeding the various databases, and eg that often entails linking the ID of one row to a new row in another database. Assuming that the state of the database services live outside of the container, what’s the favoured approach to achieving this initialisation step?

(3) Our application also requires some bespoke scripting in order to ingest customer-specific data. At the moment we log into the server and build some customer specific scripts (shell and/or python) which read/transform the data via calling internal APIs or python tasks. SSHing into a container appears to be a no-no, in any case the scripts are state and therefore must be somehow injected. It appears that an architecture change is required. Has anybody solved a similar issue?

Sorry for the rather long post. Any advice would be gratefully received!