Python ETL Developer/Data Engineer ( Remote)
IPV Holdings Ltd
Remote, Remote, Remote
Job TitlePython ETL Developer/Data Engineer ( Remote)
- Reviewing, designing, developing ETL jobs to ingest data into Data Lake, load data to data marts;
- extract data to integrate with various business applications.
- Parse unstructured data, semi structured data such XML etc.
- Design and develop efficient Mapping and workflows to load data to Data Marts
- Map XML DTD schema in Python (customized table definitions)
- Write efficient queries and reports in Hive or Impala to extract data on ad hoc basis for data analysis.
- Identify the performance bottlenecks in ETL Jobs and tune their performance by enhancing or redesigning them.
- Responsible for performance tuning of ETL mappings and queries.
- import tables and all necessary lookup tables to facilitate the ETL process required to process daily XML files in addition to processing the very large (multi-terabytes) historical XML data files
Compensation: Commensurate with experience.
- Telecommuting is OK
- No Agencies Please
- Self- Starter
- very organized with a high aptitude for learning and solving complex problems
- Proficiency in using query languages such as SQL, R, Python
- Proficiency in using Hive, HADOOP, Impala
- Should have deep knowledge on performance tuning of ETL Jobs, Hadoop Jobs, SQL's, Partitioning, Indexing and various other techniques