REMOTE POSITION BUT CANDIDATE MUST BE NEAR A FRONTIER OFFICE LOCATION. *** network generates a large sum of data each day form of communications data, network device data, log files, customer interaction data, etc. Resource will be architecting a "big data" platform to store, process these data to be used by data scientists and machine learning engineers. You will have the opportunity to work with a small team of data scientists and machine learning engineers to build products and services to improve the state of the *** network and elevate the customer experience.
• You will design, develop, test, and maintain big data infrastructure in the cloud and on-prem locations.
• You will develop ETL pipelines to collect data from various sources, transform and store them, and enable stakeholders to consume it.
• You will be developing pipelines to support machine learning application development processes.
• You will be working with different parts of a large organization to locate, understand, and extract data from a diverse variety of systems and transform them into a big data platform.
• Monitoring data performance and modifying infrastructure as
• Define data retention policies
o Computer Science degree or relevant experience.
o 5+ years of industry experience in Data Engineering not necessarily Telecom
o Strong experience with MapReduce development for large datasets (Hadoop, HDFS, YARN).
o Strong background in Linux
o 5+ years of experience in python
o Industry experience in developing ETL pipelines to manage large datasets.
o Working knowledge of machine learning development process
o Master's degree in computer science
o Experience in terabyte scale data manipulation.
o Experience in ingestion tools such as sqoop, and flume is a plus.
o Experience in Kafka and airflow is a plus.
o Processing framework such as Spark and Hive is a plus
o Experience in Apache HBase is a plus
o AWS big data certification is a plus