Prestodb | Configuring for accessing Mysql, SqlServer, Redshift and Hive

Im trying to configure PrestoDB ( distributed query engine) for my testing/understanding purposes. Here are the steps I did to deploy prestodb in my env.

PrestoDB Setup:
  1. Download the latest prestoDB server(presto-server-<version>.tar.gz) from https://prestodb.io/docs/current/installation/deployment.html
  2. Download presto-CLI(command line interface)- ( presto-cli-<version>-executable.jar) from https://prestodb.io/docs/current/installation/cli.html
  3. Below is my master-worker setup
    • MASTER ( or coordinator)
      • IP- 10.x.x.123
      • works as both Coordinator and worker
    • WORKER
      • IP- 10.x.x.122
      • works are worker
      • presto CLI works from this machine
  4. DBs configured - 
    • Mysql 
    • MSSql-server
    • Redshift
    • Hive

Deployment Steps on MASTER / COORDINATOR
  1. create a folder called "prestodb". I would like to keep all my directories, related to prestodb, under one folder. but in reality you can keep your folder anywhere you want.
  2. now create a folder like "data" in dir - "prestodb"
  3. Copy the file presto-server-<version>.tar.gz into dir prestodb
  4. unzip the above file and you will see a folder like "presto-server-<version>"
  5. Now create a "etc" folder under "presto-server-<version>"
  6. Under "etc" folder create 3 empty config files
    • config.properties
    • jvm.config
    • node.properties
  7. Create a folder "catalog" within dir "etc"
  8. Create the .properties files for the required dbs. 
  9. do the above in both MASTER and WORKER nodes
  10. My folder structure
   /home/vsubr/
              |__prestodb
                        |---- data
                        |---- presto-server-<version>  ( this is the folder created after unziping the .tar.gz file) 
                                            |----etc
                                            |       |----catalog
                                            |                  |----- hive.properties
                                            |                  |----- mysql.properties
                                            |                  |----- rsdd.properties
                                            |                  |----- ss.properties
                                            |---- config.properties
                                            |---- jvm.config
                                            |---- node.properties
                                            
---------------------------------
..../etc/config.properties
---------------------------------

Master configuration ( ip=x.x.x.123)
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=8GB
discovery-server.enabled=true
discovery.uri=http://x.x.x.123:8080
#datasources=mysql,rsdd

Worker configuration ( ip=x.x.x.122)
coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=8GB
discovery.uri=http://x.x.x.123:8080

---------------------------------
..../etc/node.properties
---------------------------------

Master configuration
node.environment=dev
node.id=87ef5936-fbeb-11e7-938a-005056806f7d   ( unique id can be generated using uuid tool)
node.data-dir=/home/vsubr/prestodb/data    ( data folder can be anywhere )

Worker configuration
node.environment=dev
node.id=8e643b74-fbeb-11e7-a668-005056803f70
node.data-dir=/home/vsubr/prestodb/data

---------------------------------
..../etc/jvm.config
---------------------------------

Master and Worker configuration
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

prestoDB configuration for the databases.

Do this on both the worker and master nodes.

----------------------------------------
..../etc/catalog/hive.properties
-----------------------------------------
connector.name=hive-hadoop2
hive.metastore.uri=thrift://<hive-server-ip>:9083   (you can also find this info under the Hive metastore config)

----------------------------------------
..../etc/catalog/rsdd.properties
-----------------------------------------
connector.name=redshift
connection-url=jdbc:postgresql://<redshift-endpoint-from-aws>:5439/<db-name>  (not schema)
connection-user=<rs-user>
connection-password=<password>

----------------------------------------
..../etc/catalog/ss.properties
-----------------------------------------
connector.name=sqlserver
connection-url=jdbc:sqlserver://<ms-sqlserver-ip/name>:<port>;DatabaseName=<db-name>
connection-user=<user>
connection-password=<pwd>

----------------------------------------
..../etc/catalog/mysql.properties
-----------------------------------------
connector.name=mysql
connection-url=jdbc:mysql://<mysql-ip/name>:3306
connection-user=<user-id>
connection-password=<pwd>

Starting MASTER and WORKER nodes.
  1. run this command ./prestodb/presto-server-0.192/bin/launcher run
  2. Do the same on both master and worker node
Configuring the prestoDB CLI
  1. Get to the machine where you want the CLI to be configured. It can be any machine, but I'm using my worker node(x.x.x.122) to do the job.
  2. copy the file  presto-cli-<version>-executable.jar into prestodb dir
  3. create a softlink (presto)to the above jar as below
    • presto -> ./presto-cli-0.192-executable.jar
  4. now run   ./presto --server x.x.x.123:8080
  5. you can now run the sqls here. 

Comments

Popular posts from this blog

Tableau - Accessing Tableau's DB

react-bootstrap-table | header column alignment fix

Tableau : Convert ESRI shapes into Tableau Format