Improve Python + Influxdb import performance

Mon Apr 3 12:22:04 EDT 2017

You can reuse connection, instead of creating for each request. (HTTP
keep-alive).

On Tue, Apr 4, 2017 at 1:11 AM, Prathamesh <prathamesh.nimkar at gmail.com> wrote:
> Hello World
>
> The following script is an extract from
>
> https://github.com/RittmanMead/obi-metrics-agent/blob/master/obi-metrics-agent.py
>
> <<Code begins>>
>
> import calendar, time
> import sys
> import getopt
>
> print '---------------------------------------'
>
> # Check the arguments to this script are as expected.
> # argv[0] is script name.
> argLen = len(sys.argv)
> if argLen -1 < 2:
>     print "ERROR: got ", argLen -1, " args, must be at least two."
>     print '$FMW_HOME/oracle_common/common/bin/wlst.sh obi-metrics-agent.py  <AdminUserName> <AdminPassword> [<AdminServer_t3_url>] [<Carbon|InfluxDB>] [<target host>] [<target port>] [targetDB influx db>'
>     exit()
>
> outputFormat='CSV'
> url='t3://localhost:7001'
> targetHost='localhost'
> targetDB='obi'
> targetPort='8086'
>
> try:
>     wls_user = sys.argv[1]
>     wls_pw = sys.argv[2]
>     url  = sys.argv[3]
>     outputFormat=sys.argv[4]
>     targetHost=sys.argv[5]
>     targetPort=sys.argv[6]
>     targetDB=sys.argv[7]
> except:
>     print ''
>
> print wls_user, wls_pw,url, outputFormat,targetHost,targetPort,targetDB
>
> now_epoch = calendar.timegm(time.gmtime())*1000
>
> if outputFormat=='InfluxDB':
>     import httplib
>     influx_msgs=''
>
> connect(wls_user,wls_pw,url)
> results = displayMetricTables('Oracle_BI*','dms_cProcessInfo')
> for table in results:
>     tableName = table.get('Table')
>     rows = table.get('Rows')
>     rowCollection = rows.values()
>     iter = rowCollection.iterator()
>     while iter.hasNext():
>         row = iter.next()
>         rowType = row.getCompositeType()
>         keys = rowType.keySet()
>         keyIter = keys.iterator()
>         inst_name= row.get('Name').replace(' ','-')
>         try:
>             server= row.get('Servername').replace(' ','-').replace('/','_')
>         except:
>             try:
>                 server= row.get('ServerName').replace(' ','-').replace('/','_')
>             except:
>                 server='unknown'
>         try:
>             host= row.get('Host').replace(' ','-')
>         except:
>             host=''
>         while keyIter.hasNext():
>             columnName = keyIter.next()
>             value = row.get(columnName )
>             if columnName.find('.value')>0:
>                 metric_name=columnName.replace('.value','')
>                 if value is not None:
>                     if value != 0:
>                         if outputFormat=='InfluxDB':
>                             influx_msg= ('%s,server=%s,host=%s,metric_group=%s,metric_instance=%s value=%s %s') % (metric_name,server,host,tableName,inst_name,  value,now_epoch*1000000)
>                             influx_msgs+='\n%s' % influx_msg
>                             conn = httplib.HTTPConnection('%s:%s' % (targetHost,targetPort))
>                             ## TODO pretty sure should be urlencoding this ...
>                             a=conn.request("POST", ("/write?db=%s" % targetDB), influx_msg)
>                             r=conn.getresponse()
>
> <<Code ends>>
>
> It currently takes about 3 minutes to execute completely and I was thinking of a way to make it run faster
>
> Data alignment (Influx line protocol) & data loading - done together takes up most of the time
>
> Influxdb is currently loading data at around 3 points/second
>
> Any way to align the data separately, store it and load it as a batch?
>
> I feel that would help improve performance
>
> Please let me know if you have any pointers
> I can send the data sheet if required
>
> Thanks P
> --
> https://mail.python.org/mailman/listinfo/python-list