Posts

AWS Code Commit - How To setup

We are using AWS platform for our data warehouse, so we wanted to use AWS-Code-Commit for our source control. Its a private git repository on the AWS platform and its very easy to use. Install local git in your Local working machine For Linux : sudo yum install git-all For Windows- Download git https://git-scm.com/downloads and install it. When the installation is done, then type the below to check if the installation was successful git help Install AWS CLI If you are already working with AWS platform then you should have already installed it. If yes good, else install it. To check if it is already installed or not type the "help" command as given below. For windows – download and install from -http://docs.aws.amazon.com/cli/latest/userguide/installing.html#install-msi-on-windows For Linux -  http://docs.aws.amazon.com/cli/latest/userguide/installing.html#install-bundle-other-os Finally type this command to make sure you have installed the AWS CLI – aws hel...

Py | Rundeck Delete Executions

We were running a Rundeck instance using a File-DB ( not MYSQL) and after a month we found that rundeck webpages were very slow and it was unusable. It was because we had lot of job that runs repeatedly in shorter interval and there were so many execution history. This slowed the whole rundeck web front end. I tried to use the normal bulk delete operation using the API and it was still very slow to get all the execution IDs and then to delete them. So I used a work around to get the Execution IDs from the log-filename instead of using the rundeck API. It was still slow but it reduced the overall time by like 60-70% as getting the IDs was the time consuming task. Below is the code #!/usr/bin/python -tt ''' Rundeck delete execution --------------- change-history --------------- 1.0|21-oct-2015|vsubr|created ''' __version__ = '1.0' import sys import time from datetime import datetime import re from os import listdir from os.path import isfile, join...

Py | Generic XML 2 Relational Data Convertor - Advanced Options

Image
Please have a look into the basic options first - http://venkat-echo.blogspot.com.au/2015/12/py-generic-xml-2-relational-data.html 1. If you want to force a specific XML Level into a separate table.  In the above example if you want all toppings details in a separate table and each topping as a seperate row in that able, use the following option #If you want to force a specific XML Level into a separate table. xml2rd.arr_predefined_xmlPath4Tables = ['/items/item/topping'] Result of above parameter 2. If you would like to add/merge/insert different xmlpath to the same table then use the below config. But please note that Merge XML paths as the same DEPTH.  In the above example if you want to merge <batters> and <batterscost> then use this config # to put multiple XMLPath into the same table. But make sure all XMLPATH are at the same level xml2rd.d_common_table_4XmlPaths = { '/items/item/batters':{'table':'common_tab...

Py | Generic XML 2 Relational Data Convertor - Basic

Image
Why I need this Generic XML Parser Its often required to load xml data into tables, so that business users can access the XML data and also use those table to write reports.  XMLs I get normally are from 3rd parties and most of the time I dont get the xsd for those XML. Plus we get different XMLs from time to time and if we had hard-code the path to read a specific XML then I had to write code for every XML that I received. So I thought it is better to make a GENERIC XML PARSER which can take any XML file and convert that into RELATIONAL DATA style and write that into CSVs ( which can be used to load the tables) Logic applied for  XML 2 Relational Data conversion Even Level in the XML is a table. ie  items/ item is a table  items/item/baters is a table Can not handle Namespaces All key columns + columns created (by XML2RD) for reference created by with prefix = '_' (eg _xid , _xpath etc) <Element> name becomes the column name. In case of...

My First Infographic poster

Image

R | Vectors vs List vs Array vs Matrix vs dataframe

For a non-R programmers some of the R data-types might be a bit confusing. R Non-R InR, vector and list and arrays are different and has different properties. List can be named or un-named Un-named list are Arrays Named lists are dictionary However Array and Dictionary can contain another Array or a dictionary, but its still called Arrays and dictionaries. Vector Simplest form of R object All values in a vector should be of same obj/class. However they cant contain another vector or array or list. (non-R programmers terms) its like array but with all values in the array having same data-types List It is like a vector but can contain any obj/class. class also have named attributes. can contain list within a list etc. (non-R programmers terms) its like array (if not named) and dictionaries( if named) Array Its a list with one, two or more dimensions dimensions can be named (non-R programmers te...

R | Rcmdr ; Rattle on R3.1 and Mac OSX yosemite

Image
Today I tried to install Rcmdr on my MBP( yosemite ) and found that it was the installation was not successful and kept throwing a lot of errors and also some of the dependancies packages were not getting installed. I also noticed that some of the dependancies are not available for R.3.1 and yosemite. After spending a lot of time googling and trial and error, now I got everything working  Steps to do Since Rcmdr uses xWindows system, we need a OSX version of the same. So make sure you have installed the Quartz(X11) from this site - http://xquartz.macosforge.org/landing/ Install macports -  https://www.macports.org/install.php Make sure you have already installed the Xcode. Since some of the dependancies binaries are not available for the current R version, we might have to compile them from source Download RGtk2 source and cairoDevice source from CRAN ( download the "packaged source") http://cran.r-project.org/web/packages/RGtk2/     http://cran.r-pro...