Prestodb | Configuring for accessing Mysql, SqlServer, Redshift and Hive
Im trying to configure PrestoDB ( distributed query engine) for my testing/understanding purposes. Here are the steps I did to deploy prestodb in my env.
PrestoDB Setup:
PrestoDB Setup:
- Download the latest prestoDB server(presto-server-<version>.tar.gz) from https://prestodb.io/docs/current/installation/deployment.html
- Download presto-CLI(command line interface)- ( presto-cli-<version>-executable.jar) from https://prestodb.io/docs/current/installation/cli.html
- Below is my master-worker setup
- MASTER ( or coordinator)
- IP- 10.x.x.123
- works as both Coordinator and worker
- WORKER
- IP- 10.x.x.122
- works are worker
- presto CLI works from this machine
- DBs configured -
- Mysql
- MSSql-server
- Redshift
- Hive
Deployment Steps on MASTER / COORDINATOR
- create a folder called "prestodb". I would like to keep all my directories, related to prestodb, under one folder. but in reality you can keep your folder anywhere you want.
- now create a folder like "data" in dir - "prestodb"
- Copy the file presto-server-<version>.tar.gz into dir prestodb
- unzip the above file and you will see a folder like "presto-server-<version>"
- Now create a "etc" folder under "presto-server-<version>"
- Under "etc" folder create 3 empty config files
- config.properties
- jvm.config
- node.properties
- Create a folder "catalog" within dir "etc"
- Create the .properties files for the required dbs.
- do the above in both MASTER and WORKER nodes
- My folder structure
/home/vsubr/
|__prestodb
|---- data
|---- presto-server-<version> ( this is the folder created after unziping the .tar.gz file)
|----etc
| |----catalog
| |----- hive.properties
| |----- mysql.properties
| |----- rsdd.properties
| |----- ss.properties
|---- config.properties
|---- jvm.config
|---- node.properties
---------------------------------
..../etc/config.properties
---------------------------------
Master configuration ( ip=x.x.x.123)
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=8GB
discovery-server.enabled=true
discovery.uri=http://x.x.x.123:8080
#datasources=mysql,rsdd
Worker configuration ( ip=x.x.x.122)
coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=8GB
discovery.uri=http://x.x.x.123:8080
---------------------------------
..../etc/node.properties
---------------------------------
Master configuration
node.environment=dev
node.id=87ef5936-fbeb-11e7-938a-005056806f7d ( unique id can be generated using uuid tool)
node.data-dir=/home/vsubr/prestodb/data ( data folder can be anywhere )
Worker configuration
node.environment=dev
node.id=8e643b74-fbeb-11e7-a668-005056803f70
node.data-dir=/home/vsubr/prestodb/data
---------------------------------
..../etc/jvm.config
---------------------------------
Master and Worker configuration
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
prestoDB configuration for the databases.
Do this on both the worker and master nodes.
----------------------------------------
..../etc/catalog/hive.properties
-----------------------------------------
connector.name=hive-hadoop2
hive.metastore.uri=thrift://<hive-server-ip>:9083 (you can also find this info under the Hive metastore config)
----------------------------------------
..../etc/catalog/rsdd.properties
-----------------------------------------
connector.name=redshift
connection-url=jdbc:postgresql://<redshift-endpoint-from-aws>:5439/<db-name> (not schema)
connection-user=<rs-user>
connection-password=<password>
----------------------------------------
..../etc/catalog/ss.properties
-----------------------------------------
connector.name=sqlserver
connection-url=jdbc:sqlserver://<ms-sqlserver-ip/name>:<port>;DatabaseName=<db-name>
connection-user=<user>
connection-password=<pwd>
----------------------------------------
..../etc/catalog/mysql.properties
-----------------------------------------
connector.name=mysql
connection-url=jdbc:mysql://<mysql-ip/name>:3306
connection-user=<user-id>
connection-password=<pwd>
Starting MASTER and WORKER nodes.
- run this command ./prestodb/presto-server-0.192/bin/launcher run
- Do the same on both master and worker node
Configuring the prestoDB CLI
- Get to the machine where you want the CLI to be configured. It can be any machine, but I'm using my worker node(x.x.x.122) to do the job.
- copy the file presto-cli-<version>-executable.jar into prestodb dir
- create a softlink (presto)to the above jar as below
- presto -> ./presto-cli-0.192-executable.jar
- now run ./presto --server x.x.x.123:8080
- you can now run the sqls here.
Comments