Anthony J. Foiani
Physical: 1543 Rainier Ave / Napa, California, USA / 94558
Phone: +1 650 296 9563
Also available via Zoom or Google Meet, happy to try others.
“... the most important function that software builders do for their clients is the iterative extraction and refinement of the product requirements. ... in planning any software activity, it is necessary to allow for an extensive iteration between the client and the designer as part of the system definition.”
— Fred Brooks, The Mythical Man-Month
Communication is a critical component of any successful system; we must understand the problem we're solving before we can build a solution.
I've been privileged to work for companies of many sizes, on projects ranging from single-person to teams of hundreds. In every case, communication was vital: requirements, constraints, stakeholders, plans, prototypes, reviews, implementation, maintenance, and evolution.
In many of these cases, my ability to understand technical systems together with the human and organizational side has allowed me to translate across multiple groups and dramatically reduce misunderstanding.
Between users and implementors: Does this do what needs to be done? Is the usage clear? What are the corner cases? Where do you think this might go in the future?
Between different teams of implementors: What are the interfaces? What really needs to be exposed? What technologies are we assuming / preferring / avoiding? What platforms will be used?
Within a single team: How can we structure this for simplicity? Can we generalize it? Can we re-use existing tools? How do we document this, especially for maintenance?
Between individuals, there's mentoring and education. I've discovered that I'm good at this, and it surprised me how fulfilling I found this aspect of my roles.
Instruction and Consulting
I have a knack for extracting requirements from users and managers; an ability to help different parties gain a shared understanding; the technical background to understand systems deeply; and a passion for encouraging the best solution for each situation while minimizing overall complexity.
Operating Systems and Platforms
Amazon Web Services
Data Representation and Interchange
Source Code / Configuration Management
Google Internal Tools
|Expert (but probably slightly dated as of 2022):||Skilled:|
Many older skills have been moved to another document.
Firstup: Senior Site Reliability Engineer
San Francisco, California, USA; August 2022 — October 2023
Responsible for all infrastructure, including multiple production AWS accounts / EKS clusters, as well as staging clusters. Reverse-engineered a complicated setup that had evolved over time, with most of the original authors departed, with an eye to updating to modern practices.
- Assisted product version upgrades / releases
- Upgraded multiple EKS clusters
- Managed our Sendgrid email configuration (DKIM, SPF, DMARC)
- Managed, evolved, and streamlined our solution for provisioning custom domains for customers
Helped teammates across the organization understand the benefits and limitations of our platform. Collaborated to obtain solutions that were secure, compliant, effective, and efficient.
- Promoted uniform monitoring so all engineers could see how well existing systems were running
- Consulted with multiple other teams to provide secure and compliant solutions for specific needs.
Enabled our compliance team to achieve and maintain SOC 2 and ISO 27001 certifications. We also maintained a clean separation for data which fell under the GDPR.
- Wrote custom scripts to probe the boundaries of a complicated deployment
- Acted as security point-of-contact on the CloudOps team
Production Access: AWS SSO
Most of the effort involved in the AWS SSO Migration was ensuring that any new solution satisfied existing access requirements. Given that the environment setup was legacy and under-documented, this was a substantial challenge.
- Worked closely with our Corp IT team to use our Okta instance for athentication
- Iterated to ensure the SSO roles were sufficient but not overly broad
- Migrated all ad hoc users to AWS SSO for all AWS accounts
Airtable: Senior Site Reliability Engineer
Mountain View, California, USA; November 2020 — June 2022
At the time of my departure, Production Engineering was still the only 24×7 rotation within the company; while daytime alerts were directed at more specific teams, we were the only ones available outside business hours.
- Was primary oncall for a 24×7 week-long rotations with a 5-minute time-to-action SLO
- Handled incident response (ongoing management, postmortems, reviews)
- Triaged (and sometimes handled) miscellaneous requests from other users ("interrupts")
- Updated documentation (playbooks, checklists)
- Mentored teammates as they went oncall
- Ran "table top" oncall exercises, complete with postmortem writeups
Airtable maintained SOC 2 and ISO 27001 certifications. Keeping these certifications required regular work; some scheduled (e.g., review who has access to which systems every quarter), and some on demand (evidence gathering, security patching).
- Was technical contact on the Prod Eng team for Compliance team
- Performed and streamlined Quarterly Access Reviews
- Provided evidence for SOC 2 and ISO 27001 audits
- Helped remediate security concerns
Production Access Onboarding / Mentoring
Airtable restricted access to sensitive environments to a small number of engineers. This access required a separate laptop and specific Security Team approval; coordinating that process for dozens of users required documenting, revising, and finally optimizing the steps required. (This was especially true as the duties that used to be on a single team were spread out to almost a dozen.)
- Enabled dozens of users to have "full production access", including training
- Evolved the onboarding process: optimization, documentation
- Handled tool evolution
Google: Senior Site Reliability Engineer
Mountain View, California, USA; October 2013 — July 2020
YouTube Trust & Safety SRE
Founding member of a new team of 2 SREs, which grew to 5 within a year:
- Created infrastructure (permissions, mailing lists, group memberships, etc)
- Established team culture
- Initiated regular meetings with product developers
- Reviewed incident postmortems alongside product developers and managment, and helped clear a backlog of prior incidents
- Investigated multiple aspects of existing developer-supported systems, then presented that knowledge to multiple groups as well as to the new SREs
- Explored multiple options for consolidating and hosting those systems (investigation and initial feasibility)
- Assisted emergency response to the emerging COVID-19 situation
YouTube Search & Discovery SRE
- Worked closely with developers to deploy a custom instance of the Google WebSearch technology stack for YouTube content
- Optimized that search stack by using more advanced container / cluster features (saved ~20% out of O(1M) CPU cores)
- Senior member of a 24×7 oncall rotation with a 5-minute response SLA
- Mentored 10+ new/junior SREs to full solo oncall capability
- Managed services deployed globally on millions of CPU cores: cluster migrations, organic growth, feature launches, multiple releases per week
- Handled multiple large public-facing incidents, including postmortem creation, analysis, and followup
- Worked closely with our sister team in Zürich, Switzerland (multiple in-person trips, weekly video conferences, daily status handoff emails)
- Co-owned responsibility for our “panic room” (providing privileged access to production networks in case of an on-corp / in-office network outage)
Internal Consulting, Educating, Mentoring, and Interviewing
This wasn't a distinct role; instead, it calls out the areas where I specialized and providing extra value to my teams.
Mentored many peers on a 1-to-1 basis:
- Brought 10+ SREs to full solo oncall ability
- Maintained a list of resources for new SREs
Supported other oncallers during their shifts:
- Was often the designated contact person for new SREs during their first few solo shifts
- Was the YouTube SRE group expert in multiple technologies (e.g., Search, Laelaps, GoogleSQL).
- As a senior member of the overall team, often helped manage and resolve massive incidents involving multiple shards of YouTube SRE.
Developed, presented, and trained others to present courses on topics including:
- GCL — a custom configuration language with highly unusual semantics
- GSLB — Google's global load balancer, which routes trillions of requests per second
- ST-BTI — an obsolescent storage and indexing solution which a team wasn't yet able to migrate away from
- YouTube Search — a custom instance of the Google WebSearch stack
- “Going On-Call for Developer Rotations” — introduced hundreds of developers to the principles of going oncall
Helped fellow Googlers across the company with questions in my fields of expertise:
- GCL — the most widely used configuration language within Google
BigQuery / GoogleSQL — Google's implementation of
standard SQL on top of petabytes of protocol buffers
gqui, an ad hoc query engine for those same files)
- Production Management — especially with rarely-used edge cases at the intersection of virtual and physical machines
- Regularly received “peer bonuses” for this assistance (being beyond my regular job duties)
Interviewed 50+ candidates:
- Specifically volunteered for interviews with candidates from more diverse backgrounds (and received a “peer bonus“ for using inclusive language in my feedback)
- Became a “calibrated” interviewer (my scores were generally aligned with other interviewers' and ultimately with those of the hiring committees)
Helped build and maintain team culture and cohesiveness:
- Researched and initiated inclusive events
- Planned and ran multiple off-site activities
- Took photographs to share with the team, organization, and company
Foiani LLC: Sole Proprietor
Albuquerque, New Mexico; July 2009 — October 2013
Universal Non-Destructive Assay Platform: Software Architect / Implementor
- Worked with a multidisciplinary, international group including:
- Adapted Linux and supporting libraries to custom embedded processor
- Designed and built custom software for realtime data acquisition
- Provided high security data transfer and device configuration
- Wrote and generated in-depth API / extension documentation
- Advised a team new to Linux and many other current technologies (XML, HTTP, TLS, NTP, etc).
- Integrated many technologies while creating a long-running
unattended data acquisition system, including:
- Busses: I²C, USB, PCI
- Security: OpenSSL+OpenSC+PKCS11, tamper sensors
- Real-Time Processing: Threads, Watchdogs, Optimizations
- Worked with electrical engineers and digital designers
- Provided project administration: version control, builds, and release management
Yahoo!: Technical Yahoo
San Diego, California; October 2004 — July 2009
Worldwide Data Distribution System: Architect / Implementor
- Adapted existing system for serving data to hundreds of thousands of nodes around the world
- Documented a substantial corpus of existing code
- Created a system for publishing that documentation to company standards
Mobile Entertainment Provisioning: Architect / Implementor
- Put Yahoo! Music onto mobile phones:
- Multi-tier architecture (J2EE, Tomcat, AXIS, AJP)
- Multi-client presentation (WAP, XHTML, JSP)
- Dealt with non-traditional (Mobile vs. PC) browser factors: memory, display, latency, bandwidth
- Used custom packaging / deployment technology (similar to RPMs); became site expert on technology (out of 70-100 engineers)
- Pushed technology envelope within a large company
- Early Linux (RHEL4) adopter
- Early J2EE (Apache/Tomcat/AXIS) adopter
Backoffice Data Reorganization: Manager / Architect / Implementor
- Re-organized some 60TiB of business-critical live data
- Coordinated 10 people doing various aspects of necessary work
- Wrote before/after comparison scripts to validate the operation
- Designed and wrote helper utilities to make the motion transparent to client processes
Miscellaneous Knowledge Sharing
- Shared in-depth knowledge of Perl and Unix on company mailing lists
- Facilitated a “add your own map” mashup on the Y! Maps site
- Implemented a fast graph search algorithm for a remote colleague
- Contributed to various open-source packages
San Diego, California; November 2001 — acquired by Yahoo! in October 2004
Digital Audio Processing Engineer
Inherited and extended a distributed audio processing system:
- Updated and enhanced a heterogeneous cluster of processing nodes
- Rewrote DSP core in C++
- Handled multiple standards (MP3, AAC, WMA; DRM / no-DRM; tagging)
- Managed 60+ terabytes of audio data and associated metadata
- Helped build rules-based metadata engine for popular and classical audio tracks
- Helped evaluate various encoding / processing schemes
Database Application Programmer
- Worked with Oracle (versions 8, 9, 10)
- Wrote, debugged, supported, and optimized DDL, DML, and bulk loading
- Implemented and supported a
mod_perl-based administrative interface (including DHTML features)
Streaming Digital Audio Engineer
- Generated weekly metadata builds providing streaming audio to 30M desktops
- Optimized legacy system to accomodate 100x original design capability
- Helped scale related subsystems
- Extended and optimized browser-based administration tools
Miscellaneous Knowledge Sharing
- Supported and extended existing systems
- Answered Unix / shell / Perl questions
- Optimized database queries
Older entries have been moved to the historical file.
The Perl Journal
A Web Spider in One Line?, Fall 1999
(republished in The Best of The Perl Journal, Volume 2: Web, Graphics, & Perl/Tk Programming )
Bachelor of Science
Bachelor of Science in Computer Science and Math, with a
minor in German.
New Mexico State University
Las Cruces, New Mexico
Date of graduation: May 1995 GPA: 3.00 out of 4.00
I follow and contribute answers to many lists, including:
- Boulder Linux Users' Group
- Northern Colorado Linux Users' Group
- San Diego Perl Mongers
- Portland Perl Mongers
I contribute answers and a few patches to many lists, including:
- Boost C++ Libraries
- OpenSC (Smart Card support)
- Linux (Kernel, USB, PowerPC, Networking, Filesystems)
References are available upon request.