Distributed Systems Engineer
At Daltix (Full-time), in Lisbon, Portugal
Expires at: 2019-11-10
We are looking for talented profiles to help build and maintain the distributed data collection system that is at the heart of our business.
We are a data-driven company which collects and processes more than 500GB of raw data daily. We leverage big data technologies such as Serverless, Spark on AWS EMR to crunch these volumes of data and make it queryable.
In this role, you will ensure that our data collection engine, which consists of distributed web crawlers, is state of the art and ahead of our competition. You will ensure that we can scrape any webshop, no matter the ban-detection that has been put in place. Then, it will be important that proper monitoring tools are in place. We are currently scraping 60 sites and your goal is to at least triple that without losing completeness and quality.
Your responsibilities will include:
Creating and implementing Distributed web crawling architectures
Implementing cost-effective data processing architectures
Creating advanced system monitoring solutions & dashboards
Designing advanced ways of interpreting scraped HTMLs
Managing advanced proxies
Main requirements
At least 5 years of experience in object-oriented software engineering & design in any object-oriented programming language
Experience with and understanding of large-scale web crawling
Experience with databases, SQL
Experience with infrastructure such as load-balancers, caches
Highly proficient in spoken and written English
You never stop learning
Nice to have
Have experience building on top of Amazon Web Services
Have programming experience with Python
Expert knowledge of web-scraping & web-scraping architectures
Experience with GoLang & JavaScript (Node.js) is a plus
Experience with big data technologies (such as Hadoop, Spark, Airflow, Cassandra, Elasticsearch) is a plus
Have a deep understanding of cloud possibilities and limitations in the areas of distributed systems, load balancing and networking, massive data storage, and security
Get energy from working in a highly complex and challenging startup environment with a high tech product
Knowledge of DevOps & automation (Terraform, Ansible)
Data analysis using Pandas (Python)
Perks
Work with the latest tech stack
We're a quickly growing company => you're personal growth can be huge too.
Health Benefits – Comprehensive coverage for medical needs
Meal allowance – Monthly meal card along with a fully stocked kitchen with enough coffee and fruits, along with monthly team drinks and dinners
Work-Life Balance – We trust you to know your schedule and work when you feel most productive
Learning and Development – Attend meet-up, conferences, and events that interest you and benefit your personal and career growth.
Apply for this position
---------------------------------------------------------------------------
Visit this link to stop these emails: http://zpr.io/gkQ3Q
Post a Comment