Anthony J. Foiani
Note: As of 2022-08-09, I have found a new role
and am not currently looking for work. Thank you very much for
your interest, regardless!
A printable, one-page summary of this resumé is
available in
A4 or
US Letter format.
Email: anthony@foiani.com
Physical: 1543 Rainier Ave / Napa, California, USA / 94558
Phone: +1 650 296 9563
Also available via Zoom or Google Meet.
“... the most important function that software builders
do for their clients is the iterative extraction and refinement
of the product requirements. ... in planning any software
activity, it is necessary to allow for an extensive iteration
between the client and the designer as part of the system
definition.”
— Fred Brooks, The Mythical Man-Month
Communication is a critical component of any successful system;
we must understand the problem we're solving before we can build
a solution.
I've been privileged to work for companies of many sizes, on
projects ranging from single-person to teams of hundreds. In
every case, communication was vital: requirements, constraints,
stakeholders, plans, prototypes, reviews, implementation,
maintenance, and evolution.
In many of these cases, my ability to understand technical
systems together with the human and organizational side has
allowed me to translate across multiple groups and dramatically
reduce misunderstanding.
Between users and implementors: Does this do what needs to be
done? Is the usage clear? What are the corner cases? Where do
you think this might go in the future?
Between different teams of implementors: What are the
interfaces? What really needs to be exposed? What technologies
are we assuming / preferring / avoiding? What platforms will be
used?
Within a single team: How can we structure this for simplicity?
Can we generalize it? Can we re-use existing tools? How do we
document this, especially for maintenance?
Between individuals, there's mentoring and education. I've
discovered that I'm good at this, and it surprised me how
fulfilling I found this aspect of my roles.
I have a knack for extracting requirements from users and
managers; an ability to help different parties gain a shared
understanding; the technical background to understand systems
deeply; and a passion for encouraging the best solution for each
situation while minimizing overall complexity.
Expert: |
Skilled: |
-
Mentoring:
- New employees
- Internal transfers
-
Training:
- Research & Development
- Composition & Presentation
- Training-the-Trainer
-
Facilitating Communication:
- Understanding multiple parties
- Translating between them
-
Promoting Best Solution:
- Identify strengths and weaknesses of systems
- Evaluate groups’ needs against those systems
- Optimize and compromise to minimize complexity
|
-
Extracting Requirements: user studies, similar systems,
design patterns
|
Expert: |
Skilled: |
Some Experience: |
- C
- C++ (+ Boost)
- Makefiles
- Perl
- Regular Expressions
- Shell Scripts
- Terraform
|
awk
- Emacs-Lisp
- Java
- Javascript / Typescript
- Python
sed
- SQL (+ Optimization)
- Visual Basic for Applications (VBA)
|
|
Expert: |
Skilled: |
- Linux (1995+): userspace, kernel, and embedded
- Amazon Web Services (depth)
- U-Boot firmware loader
|
- Containers / Virtual Machines
- Amazon Web Services (breadth)
- Monitoring: Cronitor, DataDog
- Microsoft Windows (3.1+): power user, some programming
- Other Unix-derived systems: Solaris, SunOS, HP/UX, AIX
- Mac OS X (2001+): power user, unix-mode programming
- X11 (1988+): power user, some programming
- Older mainframe, micro, and embedded systems
- (and some older ones)
|
Skilled: |
Some Experience: |
- IAM
- EC2
- Networking (VPCs, security groups, subnets, gateways)
- RDS (Multi-AZ, MySQL)
- S3
- Load Balancers (ELB, ALB)
- CloudFront
- ACM
- CodeBuild
|
- ECS / Fargate
- CloudFormation
- Elastic Beanstalk
- Multi-Region Deployment
- ElastiCache (Redis)
- AWS Single Sign-On
|
Expert: |
Skilled: |
- JSON
- Email (SMTP, DKIM, SPF)
- Unicode (e.g., UTF-8)
- XML, HTML
- ASN.1
- CSV, other ETL
|
- Compression Algorithms and Archive Formats
- Protocol Buffers
- PNG, GIF, PBM, JPEG
- SVG
- PDF, PostScript
- LaTeX,
nroff
- RTF
|
Skilled: |
|
- Threat Modeling
- Identity Providers (SSO, SAML)
- PKI (Certificates, CA, X509)
- Hardware Tokens / U2F
- Secure Shell (SSH)
- OpenSSL (command-line and API)
- CMS
- ASN.1
|
|
Skilled: |
|
- Technical Controls
- Creating and improving processes
- Maintaining Certification (SOC 2, ISO 27001)
- Generating evidence for audits
|
|
Expert: |
Skilled: |
- Linux Kernel customization
- Toolchains Creation and Use
- I2C Bus
- Flash Memory (NOR vs NAND, MTD / UBI etc)
- Realtime Constraints
- Hardware Interfaces
|
- Boot Loader
- Device Tree
- Serial Ports (RS-232, RS-485, etc)
- Oscilloscopes / Logic Analyzers
|
Expert: |
Skilled: |
- Test-Driven Development
- Refactoring
- POSIX Threads
- C++ RAII
- Profiling / Optimization (both low- and high-level)
|
- Design Patterns
- Pair Programming
- Input Fuzzing
- Packaging
- Disassembly / Reverse Engineering
- Google RPC
|
Expert: |
Skilled: |
Some Experience: |
|
|
- Mercurial (
hg )
- Perforce (
g4 )
|
Expert: |
Skilled: |
|
- IP (TCP, UDP)
- BSD Sockets API
- HTTP
- Telnet
- FTP
|
- FTP
- NTP
- SMTP
- SNMP
- SSH
- SSL/TLS
- TFTP
|
Expert (but probably slightly dated as of 2022): |
Skilled: |
-
Production Deployments: GCL/BCL, Borg (especially
Dedicated Machines and SSD), GSLB, cluster migration,
capacity planning
- Monitoring: GMon, Monarch, Mash, Viceroy, BorgMon, Nebgua
- Data Storage: Colossus, Effingo, Placer, BigTable, Piper
- Data Analysis: GoogleSQL (Dremel, F1),
gqui
- Search Technology: SuperRoot, Laelaps, Muppet, Raffia, Union, ST-BTI, FBM
|
- KeyStore
- Piccolo
- Spanner (especially Spanner Queues / Manifold)
|
Skilled: |
Some Experience: |
- Nuclear Safeguards Instrumentation (Neutron Counting hardware and software)
- Open Source licensing
- Journeyman-level Electronics
|
- Computer Graphics
- Basic competency in German
|
Many older skills have been moved to
another document.
Mountain View, California, USA; November 2020 — June 2022
At the time of my departure, Production Engineering was still
the only 24×7 rotation within the company; while daytime
alerts were directed at more specific teams, we were the only
ones available outside business hours.
-
Was primary oncall for a 24×7 week-long rotations with a
5-minute time-to-action SLO
- Handled incident response (ongoing management, postmortems, reviews)
-
Triaged (and sometimes handled) miscellaneous requests from other users ("interrupts"
- Updated documentation (playbooks, checklists)
- Mentored teammates as they went oncall
- Ran "table top" oncall exercises, complete with postmortem writeups
Airtable maintained SOC 2 and ISO 27001
certifications. Keeping these certifications required regular
work; some scheduled (e.g., review who has access to which
systems every quarter), and some on demand (evidence gathering,
security patching).
- Was technical contact on the Prod Eng team for Compliance team
- Performed and streamlined Quarterly Access Reviews
- Provided evidence for SOC 2 and ISO 27001 audits
- Helped remediate security concerns
Airtable restricted access to sensitive environments to a small
number of engineers. This access required a separate laptop and
specific Security Team approval; coordinating that process for
dozens of users required documenting, revising, and finally
optimizing the steps required. (This was especially true as the
duties that used to be on a single team were spread out to
almost a dozen.)
- Enabled dozens of users to have "full production access", including training
- Evolved the onboarding process: optimization, documentation
- Handled tool evolution
Mountain View, California, USA; October 2013 — July 2020
Original Tech Lead for the
Site Reliability Engineer (SRE)
team formed to support and productionize the
YouTube Trust & Safety tools (for managing
abuse, fraud, child safety, etc).
-
Founding member of a new team of 2 SREs, which grew to 5 within a year:
- Created infrastructure (permissions, mailing lists, group memberships, etc)
- Established team culture
- Initiated regular meetings with product developers
- Reviewed incident postmortems alongside product developers
and managment, and helped clear a backlog of prior incidents
- Investigated multiple aspects of existing
developer-supported systems, then presented that knowledge to
multiple groups as well as to the new SREs
- Explored multiple options for consolidating and hosting
those systems (investigation and initial feasibility)
- Assisted emergency response to the emerging COVID-19 situation
First member of the
SRE
team dedicated to managing
YouTube's
“content discovery” systems: Search,
Personalization, Watch Next, Recommendations, etc.
-
Worked closely with developers to deploy a custom instance of
the Google WebSearch technology stack for YouTube content
-
Optimized that search stack by using more advanced container /
cluster features (saved ~20% out of O(1M) CPU cores)
-
Senior member of a 24×7 oncall rotation with a 5-minute
response SLA
-
Mentored 10+ new/junior SREs to full solo oncall capability
-
Managed services deployed globally on millions of CPU cores:
cluster migrations, organic growth, feature launches, multiple
releases per week
-
Handled multiple large public-facing incidents, including
postmortem creation, analysis, and followup
-
Worked closely with our sister team in Zürich,
Switzerland (multiple in-person trips, weekly video
conferences, daily status handoff emails)
-
Co-owned responsibility for our “panic room”
(providing privileged access to production networks in case of
an on-corp / in-office network outage)
This wasn't a distinct role; instead, it calls out the areas
where I specialized and providing extra value to my teams.
-
Mentored many peers on a 1-to-1 basis:
- Brought 10+ SREs to full solo oncall ability
- Maintained a list of resources for new SREs
-
Supported other oncallers during their shifts:
-
Was often the designated contact person for new SREs
during their first few solo shifts
-
Was the YouTube SRE group expert in multiple technologies
(e.g., Search, Laelaps, GoogleSQL).
-
As a senior member of the overall team, often helped
manage and resolve massive incidents involving multiple
shards of YouTube SRE.
-
Developed, presented, and trained others to present courses on topics including:
- GCL — a custom configuration language with highly unusual semantics
- GSLB — Google's global load balancer, which routes trillions of requests per second
- ST-BTI — an obsolescent storage and indexing solution which a team wasn't yet able to migrate away from
- YouTube Search — a custom instance of the Google WebSearch stack
- “Going On-Call for Developer Rotations” —
introduced hundreds of developers to the principles of going oncall
-
Helped fellow Googlers across the company with questions in my fields of expertise:
-
GCL — the most widely used configuration language
within Google
-
BigQuery / GoogleSQL — Google's implementation of
standard SQL on top of petabytes of protocol buffers
(and
gqui
, an ad hoc query engine for
those same files)
-
Production Management — especially with rarely-used
edge cases at the intersection of virtual and physical
machines
-
Regularly received “peer bonuses” for this
assistance (being beyond my regular job duties)
-
Interviewed 50+ candidates:
-
Specifically volunteered for interviews with candidates
from more diverse backgrounds (and received a “peer
bonus“ for using inclusive language in my
feedback)
-
Became a “calibrated” interviewer (my scores
were generally aligned with other interviewers' and
ultimately with those of the hiring committees)
-
Helped build and maintain team culture and cohesiveness:
- Researched and initiated inclusive events
- Planned and ran multiple off-site activities
- Took photographs to share with the team, organization, and company
Albuquerque, New Mexico; July 2009 — October 2013
-
Worked with a multidisciplinary, international group including:
- Adapted Linux and supporting libraries to custom embedded processor
- Designed and built custom software for realtime data acquisition
- Provided high security data transfer and device configuration
- Wrote and generated in-depth API / extension documentation
- Advised a team new to Linux and many other
current technologies (XML, HTTP, TLS, NTP, etc).
- Integrated many technologies while creating a long-running
unattended data acquisition system, including:
- Busses: I²C, USB, PCI
- Security: OpenSSL+OpenSC+PKCS11, tamper sensors
- Web-Based UI: HTML, JavaScript, AJAX, CSS
- Real-Time Processing: Threads, Watchdogs, Optimizations
- Worked with electrical engineers and digital designers
- Provided project administration: version control, builds, and
release management
San Diego, California; October 2004 — July 2009
- Adapted existing system for serving data to hundreds of
thousands of nodes around the world
- Documented a substantial corpus of existing code
- Created a system for publishing that documentation to company standards
- Put Yahoo! Music onto mobile phones:
- Multi-tier architecture (J2EE, Tomcat, AXIS, AJP)
- Multi-client presentation (WAP, XHTML, JSP)
- Dealt with non-traditional (Mobile vs. PC) browser factors: memory,
display, latency, bandwidth
- Used custom packaging / deployment technology (similar to RPMs);
became site expert on technology (out of 70-100 engineers)
- Pushed technology envelope within a large company
- Early Linux (RHEL4) adopter
- Early J2EE (Apache/Tomcat/AXIS) adopter
- Re-organized some 60TiB of business-critical live data
- Coordinated 10 people doing various aspects of necessary work
- Wrote before/after comparison scripts to validate the operation
- Designed and wrote helper utilities to make the motion
transparent to client processes
- Shared in-depth knowledge of Perl and Unix on company
mailing lists
- Facilitated a “add your own map” mashup on the Y! Maps site
- Implemented a fast graph search algorithm for a remote colleague
- Contributed to various open-source packages
San Diego, California; November 2001 — acquired by Yahoo! in October 2004
Inherited and extended a distributed audio processing system:
- Updated and enhanced a heterogeneous cluster of processing nodes
- Rewrote DSP core in C++
- Handled multiple standards (MP3, AAC, WMA; DRM / no-DRM; tagging)
- Managed 60+ terabytes of audio data and associated metadata
- Helped build rules-based metadata engine for popular and classical audio tracks
- Helped evaluate various encoding / processing schemes
- Worked with Oracle (versions 8, 9, 10)
- Wrote, debugged, supported, and optimized DDL, DML, and bulk loading
- Implemented and supported a
mod_perl
-based administrative interface
(including DHTML features)
- Generated weekly metadata builds providing streaming audio to 30M desktops
- Optimized legacy system to accomodate 100x original design capability
- Helped scale related subsystems
- Extended and optimized browser-based administration tools
- Supported and extended existing systems
- Answered Unix / shell / Perl questions
- Optimized database queries
Older entries have been moved to the historical file.
Bachelor of Science in Computer Science and
Math, with a minor in German.
New Mexico State University
Las Cruces, New Mexico
Date of graduation: May 1995
GPA: 3.00 out of 4.00
I follow and contribute answers to many lists, including:
I contribute answers and a few patches to many lists, including:
References are available upon request.