As a data engineer on our team, you will play a key role in developing our next-generation data science infrastructure and underlying core technologies. You will work with Panjiva's world-class data scientists, analysts, and engineers to create products that solve important real-world business problems in a collaborative, fast-paced, and fun environment.
You'll work closely with our data science team to develop new platforms, infrastructure, and tools that will allow for machine learning applications at production scale over ever-growing datasets.
You'll design and leverage distributed computing technologies, data schemas, and APIs to construct data science pipelines. In addition, you'll be expected to participate in augmenting our infrastructure to seamlessly integrate new data sets through constant R&D of the technologies and systems we use.
Join us in building the next generation of products as we continue to deliver valuable and actionable insights to decision-makers in the $15 trillion global trade industry.
Architect and implement distributed systems that perform complex transformations, processing, and analysis over very large scale datasets
Develop processes to monitor and automate detection of quality regressions in raw data or in the output of Panjiva's machine learning models
Working with our data scientists to turn large-scale messy, diverse, and often unstructured data into a source of meaningful insights for our customers
Optimizing slow-running database queries and data pipelines
Helping enhance our search engine, capable of running sophisticated user queries quickly and efficiently
Building internal tools and Back End services to enable our data scientists and product engineers to improve efficiency
S&P Global states that the anticipated base salary range for this position is $91,500 to $190,100. Base salary ranges may vary by geographic location.
B.S., M.S., or Ph.D. in Computer Science (or a related field) or equivalent work experience
3+ years of experience working with data-at-scale in a production environment
Experience designing and implementing large-scale, distributed systems
Experience in multi-threaded software development (or some form of parallelism)
Significant performance engineering experience (eg, profiling slow code, understanding complicated query plans, etc.)
Solid understanding of core algorithms and data structures, including the ability to select (and apply) the optimal ones to computationally expensive operations over data-at-scale
Strong understanding of relational databases and proficiency with SQL
Deep knowledge of at least one compiled language (eg, Scala, C++, Java, Go)
Experience developing software on Linux-based operating systems
Experience with distributed version control systems
Familiarity with relational database internals (especially PostgreSQL)
Proficiency with cloud computing platforms, specifically AWS
Working knowledge of probability & statistics
Contributions to open-source software
Experience building customer-centric products