Dmitry Kireev
Verified Expert in Engineering
Cloud Architect Developer
Dmitry is a cloud architect and site reliability engineer with over a decade of intense professional experience strictly adhering to the DevOps methodology. He has architected and built multiple platform-agnostic infrastructures from scratch for modern cloud systems. Dmitry has a proven track record of hands-on operations in high-scale environments. He is also proficient with IaC, automation, and scripting, as well as monitoring and observability.
Portfolio
Experience
Availability
Preferred Environment
Terraform, Linux, GitHub, Docker, Amazon Web Services (AWS), DevOps, Serverless, Amazon Elastic Container Service (Amazon ECS), Amazon EKS, CI/CD Pipelines, Architecture
The most amazing...
...thing I've architected, deployed, and managed is a scalable, highly available cloud for an IoT security product alongside the software engineering team.
Work Experience
Head of Site Reliability Engineering | Consultant
HazelOps
- Took on multiple consulting positions in different projects.
- Built scalable infrastructures for startups: multi-environment, with infrastructure as code, self-healing, scalable, and predictable environments on AWS.
- Took care of the legacy code for Dockerizing JVM, PHP, and Python apps.
- Analyzed and audited performance for dozens of full-cycle reports based on key factors of infrastructure performance and action items based on proposals.
- Helped software engineers implement DevOps, including close communication, strategy, and process improvement.
- Instrumented site reliability practices by owning SLA, SLO, and SLIs; eliminating toil; and increasing observability—automation, monitoring, and error budgeting.
- Implemented CI/CD, facilitating a streamlined deployment pipeline for dozens of different projects, including GitLab, Jenkins, and CircleCI. Utilized Docker, registry, and multi-stage builds.
- Created OPS procedures in customers' environments, including service-based alerting, on-call rotation, and escalations.
- Deployed and maintained Apache Kafka, including full-cycle management via Terraform, Ansible, and Docker.
EKS Expert
SimplyWise, Inc.
- Decomposed a complex transient bug into a series of hypotheses.
- Analyzed the current EKS/Django configuration on AWS with minimal documentation.
- Tracked down the source of odd errors in Django/EKS using troubleshooting methods with Datadog and CloudWatch.
- Proposed an optimal solution to the issue and supported the team in implementing it.
Senior Cloud Architect
Sport Betting B2B SaaS Provider (US)
- Improved the current infrastructure to pass the GLI certification.
- Implemented IAM and role-based PostgreSQL password-less access managed via Terraform.
- Improved remote access patterns; migrated to OpenVPN.
Cloud Architect | Site Reliability Engineer
Game Asset Marketplace (Stealth)
- Improved the security and reliability of the current EKS cluster.
- Designed and implemented the CI/CD system to deploy/roll back the application with zero downtime.
- Handled troubleshooting and maintenance support for legacy systems.
- Designed security improvements for the infrastructure overall.
- Designed branching and staging improvements to facilitate faster QA.
Cloud Architect | Site Reliability Engineer
ONFO, LLC
- Designed and implement a multi-environment AWS architecture.
- Designed and implemented a CI/CD system to deploy/roll back the application with zero downtime.
- Updated and migrated a .NET application to the new environment.
- Updated and migrated a TypeScript/Serverless application to the new environment.
- Designed and deployed an immutable infrastructure for private Stellar nodes.
Cloud Architect | Site Reliability Engineer
Tatango, Inc.
- Designed and implemented a multi-environment AWS architecture.
- Designed and implemented a CI/CD system to deploy/roll back the application with zero downtime.
- Updated and migrated a Rails application to the new environment.
- Updated and migrated a TypeScript/Serverless application to the new environment.
Lead Site Reliability Engineer
Flo Technologies
- Designed and executed a complex IoT infrastructure from scratch on AWS: multi-tier, multi-subnet scalable cloud AWS infrastructure, multi-application stateless stack with Elastic Beanstalk and ECS and Docker, platform-agnostic local workspaces with Docker.
- Created and administered Ansible infrastructure: idempotent plays and roles to support infrastructure needs and wrote community-available roles for multiple platforms under Apache Foundation.
- Designed and implemented CI/CD: complete application lifecycle with green deployments of high-traffic services, platform-agnostic framework to support SaaS or hosted CI servers, and hassle-free pipelines for software engineers.
- Constructed and administered monitoring solutions: log and data aggregation from multiple sources (ELK), on-prem monitoring via TICK, Grafana. SaaS monitoring with Datadog and New Relic when needed.
- Devised and executed operational procedures: service-oriented OLA, Pagerduty with monitoring solutions, and Pagerduty "Service Owner First" policy.
- Created and maintained an upgrade procedure for critical distributed systems to allow no-downtime and no-data loss upgrades for the whole three-year time span.
Senior Member of Technical Staff
Delphix
- Architected and implemented multi-tier hybrid cloud AWS infrastructure for a new project for a high-scale testing framework.
- Constructed log and data aggregation from multiple sources (ELK).
- Created a virtual and bare-metal host provisioning system (Foreman).
- Designed and implemented Nmap-based inventory software.
- Contributed to company-wide IT processes and improvements.
- Came up with major portions to on-call rotation, monitoring, SOA, and OLA designs and implementations.
Senior DevOps Engineer
Intuit
- Managed a hybrid cloud with around 300 nodes: AWS, VMware, and bare metal.
- Implemented automation, config management, and provisioning. 90% of the environment is in Puppet and Git.
- Managed the lifecycle of legacy systems in .NET and C# and the automation of manually deployed systems.
- Provided CI in configuration management and IaaC: GitFlow, reusable code, and open-source contribution.
- Managed and mentored junior IT staff, including separation of concerns and easy onboarding.
- Led most of the post-acquisition infrastructure integration projects.
DevOps Engineer
Docstoc (Acquired by Intuit)
- Supported colocation with 180+ Windows and Linux dedicated servers as well as new server deployment.
- Managed network security and performance (Juniper SSG, SRX Firewalls, A10 networks load balancer, Radius, IPsec, NAT, and Amazon EC2 VPC).
- Implemented proactive monitoring using Nagios, ELK, and New Relic.
- Optimized Linux and Windows server performance for high scale.
- Deployed and maintained on-premise MySQL databases.
- Introduced and implemented an ELK stack, Elasticsearch, Logstash, and Kibana.
Experience
ICMK - Infrastructure as Code Make Framework
https://github.com/hazelops/icmkThe idea is to use GNU Make as a vehicle for wrapping the complexity and presenting a nice runner experience. This way, a coherent set of commands can be used locally or on the CI, as simple as "make deploy."
Article: Runner Experience Design
https://automationd.com/developer-experience-design/While such a poetic way of calling idempotent infrastructure has many important technical characteristics, this time, I'd like to talk about the other side of it:—anyone or anything with sufficient permissions - runners and their experience.
Article: How to Avoid Human Bottlenecks in Production
https://automationd.com/how-to-avoid-human-bottlenecks-in-production/Generally speaking, it is required to have multiple humans to run a larger business to perform ideation, design, project management, development, QA, marketing, and infrastructure operations. When a single human limits a capacity of a team, it becomes a human bottleneck.
In this post, I'd like to highlight two distinct types of human bottlenecks, which can both make a negative impact on the productivity of the team from the perspective of operations and site reliability.
OpenVPN AS Docker with DUO Security
https://github.com/AutomationD/docker-openvpnasDuo Security is optional but is highly recommended since the basic account is free. All you need to do is get API credentials and enable the post-auth script.
Windows Imaging Toolkit
https://github.com/AutomationD/wimagingAll relevant configuration files like unattend.xml are rendered by Foreman and downloaded at build time.
IZE: Opinionated Infrastructure Tool
https://github.com/hazelops/izeIt combines infra, build, and deploy workflows in one and is too simple to be considered sophisticated. So let's not do it but rather embrace the simplicity and minimalism.
Skills
Tools
Git, GNU Make, Ansible, AWS CloudFormation, ELK (Elastic Stack), GitLab, GitLab CI/CD, Terraform, Docker Compose, Grafana, Telegraf, CircleCI, Travis CI, Traefik, Amazon CloudWatch, Amazon Elastic Container Service (Amazon ECS), GitHub, VPN, AWS Fargate, Amazon CloudFront CDN, AWS IAM, Amazon Virtual Private Cloud (VPC), Docker Swarm, NGINX, Puppet, Jenkins, Amazon EKS, Amazon Simple Queue Service (SQS), RabbitMQ, Celery, TeamCity, Nagios, Makefile, AWS CodeDeploy, Jira, Helm, Confluence, Splunk, Stellar SDK, AWS CodeBuild
Paradigms
Agile, Continuous Delivery (CD), Continuous Integration (CI), DevOps, Microservices, Microservices Architecture, Serverless Architecture, Azure DevOps, Automation, Agile Software Development
Platforms
Linux, Docker, Amazon Web Services (AWS), AWS Elastic Beanstalk, Amazon EC2, Kubernetes, AWS Lambda, Windows, Apache Kafka, JVM, Heroku, Azure, WordPress, New Relic, Windows Server, Google Cloud Platform (GCP), Blockchain, Rancher
Storage
Datadog, Amazon Aurora, MySQL, MongoDB, InfluxDB, Redis, On-premise, Amazon DynamoDB, Redis Cache, Elasticsearch, MySQL/MariaDB, Databases, PostgreSQL, Redshift
Industry Expertise
Network Security, Cybersecurity
Other
Site Reliability Engineering (SRE), GitHub Actions, AWS DevOps, SSL Certificates, Digital Certificates, CI/CD Pipelines, Amazon RDS, Software as a Service (SaaS), Infrastructure as Code (IaC), SSL, Cloud, Containerization, AWS Cloud Architecture, Containers, Deployment, Cloud Architecture, Container Orchestration, Agile DevOps, Monitoring, Infrastructure Monitoring, Application Monitoring, Cloud Infrastructure, Architecture, Infrastructure, Load Balancers, Scaling, Networking, Internet of Things (IoT), AWS NAT Gateway, APIs, Enterprise Architecture, CORS, TICK Stack, Transport Layer Security (TLS), Foreman, Juniper, LB, ECS, Serverless, HAProxy, Communication, English, Business, Economics, Software Development, Business Planning, Cloudflare, Hospitality, Network Monitoring, ChatGPT, Machine Learning
Languages
Python, Bash, Java, PHP, Markdown, Go, JavaScript, SQL, TypeScript
Frameworks
Flask, Django, Ruby on Rails (RoR), Windows PowerShell, .NET, Serverless Framework
Libraries/APIs
Amazon API, Node.js
Education
Bachelor's Degree in Business Communication (English)
Tula State University - Tula, Russia
Bachelor's Degree in Business Administration
Tula State University - Tula, Russia
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring