AuFeminin com / Smartadserver
- Site Reliability Engineer
Paris
2012 - 2015
Site Reliability Engineering (SRE) has emerged from the Internet industry and is usually set in the organisation as an interface between Infra/Ops, QA and Dev. The goal is to guarantee the quality of service in terms of availability, performance, and scalability.
Usually requires communication skills, various technical background and ability to specify some process and methodologies to make sure new features integrate pretty well in a timely manner.
The SRE team mission is to provide means, tools, methods, and technical skills to make sure the core services of Smartadserver, Aufeminin and Marmiton work in an optimal way.
Perimeter
all aufeminin group sites and infrastructures (includes aufeminin.com and all brands/countries, marmiton.org and smartadserver.com)
Monitoring
- Monitor general delivery quality of production platforms : KPI, provide tooling
- service providers performance and sla assessment (at least annualy)
Methodology / Team communication
- write/put in place processes, check they are applied. Work on what-if scenarii
- Try to relay common issues / resolutions across teams
- definition of RFP processes
- RFP execution and controlling on 3rd party services helping in delivery (DNS, CDN, security, tooling for monitoring
- contract management, legal reviews
- SCRUM approach
- documentation
Implementation
- install applicative layer for new front/sql
- follow up with dev team technological choices
- 3rd party integration (CDN, Cedexis, Keynote...)
Security & Audits
- IT security audits
- IT general audits
- risk management (identification, evaluation and treatment) and reporting
- documentation and implementation of an IT-related internal control system based on the risk management process and specification of corporate/ central IT of the group in order to ensure a transparent control processing and securing business processes
- documentation and implementation of an ISMS considering the central/ corporate IT specification and standards of the group.
- documentation of a detailed disaster recovery concept and plan including recovery point and recovery time objectives (RPO/ RTO) considering the central/ corporate IT specification and standards of the group. Test should be performed regularly to verify adequateness of the concept.
Investigation
- Investigate/resolve technical issues impacting production: latency, connectivity, SSL
Smart / Aufem Front servers fine tuning (IIS config, LB config)
SQL server fine tuning (SQL Server config) with the gentle help from database team
Participation to infrastructure project