Senior Software Engineer (Back End)

Location: Cambridge, MA
Date Posted: 04-10-2018
As a Senior Software Engineer, you will play a key role in developing the next- generation data mining and search platform. You’ll design and implement abstractions, data schemas, and APIs while working closely with our product engineers, product designers, and data scientists on new products. You’ll help scale our data infrastructure to seamlessly integrate an order-of-magnitude-more data sources by building systems to automate and optimize large scale distributed computing jobs. Your solutions will leverage distributed computing technologies to enable machine learning and NLP algorithms to be run on large scale data. As a data engineer, you will work with world-class data scientists, product designers, and engineers to create products that solve important real-world business problems in a collaborative, fast-paced, and fun startup environment.
Responsibilities:
  • Building and managing highly reliable distributed data pipelines with high throughput
  • Working with our data scientists to turn large scale messy and diverse unstructured data into structured, normalized data
  • Maintaining data integrity across various data sources
  • Optimizing slow running database queries and data pipelines
  • Helping enhance our search engine, capable of running sophisticated user queries quickly and efficiently
  • Building internal tools and back-end services to enable our data scientists and product engineers to improve efficiency

Requirements

  • BS, MS, or PhD in Computer Science or related field, or equivalent work experience
  • 3+ years of experience in working with large scale data in a production environment
  • Significant performance engineering experience (e.g., profiling slow code, understanding complicated query plans, etc.)
  • Experience with monitoring and profiling tools such as Nagios, perf, htop, etc
  • Database skills
  • Deep knowledge of at least one dynamic programming language (Python, Ruby)
  • Experience with scripting languages (Bash, Python, Ruby)
  • Experience developing software on Linux-based OSes
  • Experience with distributed version control systems

Good to Have

  • Statistics knowledge
  • Familiarity with column store databases
  • Familiarity with database internals, especially PostgreSQL
  • Familiarity with large-scale, distributed data processing systems, such as MapReduce pipelines, Mesos, Spark, etc.
  • Contributions to open-source software

 
or
this job portal is powered by CATS