Anthony J. Foiani
[A printable, one-page summary of this resumé is
available in
A4 or
US Letter format.]
Email: anthony@foiani.com
USPS: 414 Kehoe Ave / Half Moon Bay, California, USA / 94019
Phone: +1 650 296 9563
Also available via Google Hangouts / Meet, and Zoom
My most recent role was as a Senior Site Reliability Engineer at
YouTube / Google. I enjoyed working with their amazing
technology, learning the ins and outs of global service
provisioning, and dealing with incidents (managing and
mitigating them as they occur, and then running a postmortem
process to ensure they never fail that way again).
For me, though, the most fulfilling part of the job was the
educational and mentoring aspect. Google was all about scaling
sub-linearly -- having more impact than a single person can have
with their own action. I found teaching others to be a wonderful
example of this: I can't do everything, and I can't write all
the tools to do everything, but if we have enough smart
engineers, we can get closer to that goal.
I also enjoy working with problem space owners -- "application
domain experts" -- and helping them translate their needs onto
available technology, or finding out if there are gaps where we
need new technology.
My favorite roles have combined those two features: education,
and expertise. I very much enjoy researching and investigating a
system in depth, and then finding ways to help others within the
company apply that technology stack to solve their
problems. (And I make sure to learn the complementary systems,
so I know when I can honestly recommend some *other* system to
teams. I'm not about growing my "turf"; I'm about using the best
technology for the job.)
I'm currently based out of the San Francisco Bay Area, but I'm
open to many different locations, both within the USA and
abroad, and also remote work opportunities.
Within the USA, I'd be open to almost any of the typical tech
hubs -- Seattle, Portland, Bay Area, San Diego, Denver, Austin,
etc. I haven't spent as much time on the East Coast, so I don't
have strong opinions there, but I'd probably be ok anywhere I
can live somewhere with trees around. :-)
I can speak a bit of German, and am comfortable everywhere I've
been in Western Europe.
(Regardless, I expect most work to be remote through most of
2021, so discussions on eventual location can probably be
deferred.)
I have a knack for extracting requirements from users and
managers; an ability to help different parties gain a shared
understanding; the technical background to understand systems
deeply; and a passion for encouraging the best solution for each
situation while minimizing overall complexity.
- Expert:
-
Mentoring: new employees, or internal transfers
Creating Training: research, development, presentation, training-the-trainer
Facilitating Communication: understanding each party
individually, then helping create a mapping between them
Promoting Best Solution: learning the strengths and weaknesses
of multiple systems, then helping other groups evaluate
their needs against those systems, encouraging compromises
to reduce final system complexity
- Skilled:
-
Extracting Requirements: user studies, similar systems, design patterns
- Expert:
- C++ (+ Boost), Java, Perl, C, Shell Scripts, Makefiles,
Regular Expressions
- Skilled:
- Python, JavaScript, Oracle SQL (+ DDL Optimization), Emacs-Lisp, awk, sed, VBA
- Some Experience:
- Go, Tcl, Fortran (and many more...)
- Expert:
-
Linux (1995+): userspace, kernel, and embedded
U-Boot firmware loader
- Skilled:
-
Containers / Virtual Machines
Microsoft Windows (3.1+): power user, some programming
Other Unix-derived systems: Solaris, SunOS, HP/UX, AIX
Mac OS X (2001+): power user, unix-mode programming
X11 (1988+): power user, some programming
Older mainframe, micro, and embedded systems
- Expert:
- Protocol Buffers, XML, JSON, HTML, ASN.1, UTF-8, CSV
- Skilled:
-
Character Encodings, Compression Algorithms and
Archive Formats, PNG, SVG, PDF, PostScript, LaTeX,
RTF, nroff
- Skilled:
-
PKI (X509, CA)
Smart Cards / Crypto Tokens / U2F
Secure Shell (SSH)
OpenSSL (command-line and API)
CMS, ASN.1
- Expert:
-
I2C Bus
Toolchains Creation and Use
Linux Kernel customization
Flash Memory (NOR vs NAND, MTD / UBI etc)
Realtime Constraints
Hardware Interfaces
- Skilled:
-
Boot Loader
Device Tree
Serial Ports (RS-232, RS-485, etc)
Oscilloscopes / Logic Analyzers
- Expert:
-
POSIX Threads
Test-Driven Design
C++ RAII
Google RPC
Profiling / Optimization (both low- and high-level)
- Skilled:
-
Design Patterns
Pair Programming
Input Fuzzing
Disassembly / Reverse Engineering
- Expert:
- git, SVN, Perforce (g4)
- Skilled:
- Mercurial (hg), CVS
- Expert:
-
IP (TCP, UDP), BSD Sockets API
HTTP, Telnet, FTP
- Skilled:
- SSL/TLS, SSH, FTP, NTP, TFTP, SNMP, SMTP
- Expert:
-
Production Deployments: GCL/BCL, Borg (especially Dedicated Machines and SSD),
GSLB, cluster migration, capacity planning;
Monitoring: GMon, Monarch, Mash, Viceroy, BorgMon, Nebgua;
Data Storage: Colossus, Effingo, Placer, BigTable, Piper;
Data Analysis: GoogleSQL (Dremel, F1), gqui;
Search Technology: SuperRoot, Laelaps, Muppet, Raffia, Union, ST-BTI, FBM
- Skilled:
-
KeyStore, Piccolo, Spanner (especially Spanner Queues / Manifold)
- Skilled:
-
Nuclear Safeguards Instrumentation (Neutron Counting hardware
and software)
Open Source licensing
Journeyman-level Electronics
- Some skill with:
-
Computer Graphics
Basic competency in German
Many older skills have been moved to
another document.
Mountain View, California, USA; October 2013 — July 2020
Original Tech Lead for the
Site Reliability Engineer (SRE)
team formed to support and productionize the
YouTube Trust & Safety tools (for managing
abuse, fraud, child safety, etc).
-
Founding member of a new team of 2 SREs, which grew to 5 within a year:
- Created infrastructure (permissions, mailing lists, group memberships, etc)
- Established team culture
- Initiated regular meetings with product developers
- Reviewed incident postmortems alongside product developers
and managment, and helped clear a backlog of prior incidents
- Investigated multiple aspects of existing
developer-supported systems, then presented that knowledge to
multiple groups as well as to the new SREs
- Explored multiple options for consolidating and hosting
those systems (investigation and initial feasibility)
- Assisted emergency response to the emerging COVID-19 situation
First member of the
SRE
team dedicated to managing
YouTube's
“content discovery” systems: Search,
Personalization, Watch Next, Recommendations, etc.
-
Worked closely with developers to deploy a custom instance of
the Google WebSearch technology stack for YouTube content
-
Optimized that search stack by using more advanced container /
cluster features (saved ~20% out of O(1M) CPU cores)
-
Senior member of a 24×7 oncall rotation with a 5-minute
response SLA
-
Mentored 10+ new/junior SREs to full solo oncall capability
-
Managed services deployed globally on millions of CPU cores:
cluster migrations, organic growth, feature launches, multiple
releases per week
-
Handled multiple large public-facing incidents, including
postmortem creation, analysis, and followup
-
Worked closely with our sister team in Zürich,
Switzerland (multiple in-person trips, weekly video
conferences, daily status handoff emails)
-
Co-owned responsibility for our “panic room”
(providing privileged access to production networks in case of
an on-corp / in-office network outage)
This wasn't a distinct role; instead, it calls out the areas
where I ended up specializing and providing extra value to my
teams.
-
Mentored many peers on a 1-to-1 basis:
- Brought 10+ SREs to full solo oncall ability
- Maintained a list of resources for new SREs
-
Supported other oncallers during their shifts:
-
Was often the designated contact person for new SREs
during their first few solo shifts
-
Was the YouTube SRE group expert in multiple technologies
(e.g., Search, Laelaps, GoogleSQL).
-
As a senior member of the overall team, often helped
manage and resolve massive incidents involving multiple
shards of YouTube SRE.
-
Developed, presented, and trained others to present courses on topics including:
- GCL — a custom configuration language with highly unusual semantics
- GSLB — Google's global load balancer, which routes trillions of requests per second
- ST-BTI — an obsolescent storage and indexing solution which a team wasn't yet able to migrate away from
- YouTube Search — a custom instance of the Google WebSearch stack
- “Going On-Call for Developer Rotations” —
introduced hundreds of developers to the principles of going oncall
-
Helped fellow Googlers across the company with questions in my fields of expertise:
-
GCL — the most widely used configuration language
within Google
-
BigQuery / GoogleSQL — Google's implementation of
standard SQL on top of petabytes of protocol buffers
(and gqui, an ad hoc query engine for
those same files)
-
Production Management — especially with rarely-used
edge cases at the intersection of virtual and physical
machines
-
Regularly received “peer bonuses” for this
assistance (being beyond my regular job duties)
-
Interviewed 50+ candidates:
-
Specifically volunteered for interviews with candidates
from more diverse backgrounds (and received a “peer
bonus“ for using inclusive language in my
feedback)
-
Became a “calibrated” interviewer (my scores
were generally aligned with other interviewers' and
ultimately with those of the hiring committees)
-
Helped build and maintain team culture and cohesiveness:
- Researched and initiated inclusive events
- Planned and ran multiple off-site activities
- Took photographs to share with the team, organization, and company
Albuquerque, New Mexico; July 2009 — October 2013
-
Worked with a multidisciplinary, international group including:
- Adapted Linux and supporting libraries to custom embedded processor
- Designed and built custom software for realtime data acquisition
- Provided high security data transfer and device configuration
- Wrote and generated in-depth API / extension documentation
- Advised a team new to Linux and many other
current technologies (XML, HTTP, TLS, NTP, etc).
- Integrated many technologies while creating a long-running
unattended data acquisition system, including:
- Busses: I²C, USB, PCI
- Security: OpenSSL+OpenSC+PKCS11, tamper sensors
- Web-Based UI: HTML, JavaScript, AJAX, CSS
- Real-Time Processing: Threads, Watchdogs, Optimizations
- Worked with electrical engineers and digital designers
- Provided project administration: version control, builds, and
release management
San Diego, California; October 2004 — July 2009
- Adapted existing system for serving data to hundreds of
thousands of nodes around the world
- Documented a substantial corpus of existing code
- Created a system for publishing that documentation to company standards
- Put Yahoo! Music onto mobile phones:
- Multi-tier architecture (J2EE, Tomcat, AXIS, AJP)
- Multi-client presentation (WAP, XHTML, JSP)
- Dealt with non-traditional (Mobile vs. PC) browser factors: memory,
display, latency, bandwidth
- Used custom packaging / deployment technology (similar to RPMs);
became site expert on technology (out of 70-100 engineers)
- Pushed technology envelope within a large company
- Early Linux (RHEL4) adopter
- Early J2EE (Apache/Tomcat/AXIS) adopter
- Re-organized some 60TiB of business-critical live data
- Coordinated 10 people doing various aspects of necessary work
- Wrote before/after comparison scripts to validate the operation
- Designed and wrote helper utilities to make the motion
transparent to client processes
- Shared in-depth knowledge of Perl and Unix on company
mailing lists
- Facilitated a “add your own map” mashup on the Y! Maps site
- Implemented a fast graph search algorithm for a remote colleague
- Contributed to various open-source packages
San Diego, California; November 2001 — acquired by Yahoo! in October 2004
Inherited and extended a distributed audio processing system:
- Updated and enhanced a heterogeneous cluster of processing nodes
- Rewrote DSP core in C++
- Handled multiple standards (MP3, AAC, WMA; DRM / no-DRM; tagging)
- Managed 60+ terabytes of audio data and associated metadata
- Helped build rules-based metadata engine for popular and classical audio tracks
- Helped evaluate various encoding / processing schemes
- Worked with Oracle (versions 8, 9, 10)
- Wrote, debugged, supported, and optimized DDL, DML, and bulk loading
- Implemented and supported a
mod_perl
-based administrative interface
(including DHTML features)
- Generated weekly metadata builds providing streaming audio to 30M desktops
- Optimized legacy system to accomodate 100x original design capability
- Helped scale related subsystems
- Extended and optimized browser-based administration tools
- Supported and extended existing systems
- Answered Unix / shell / Perl questions
- Optimized database queries
Older entries have been moved to the historical file.
Bachelor of Science in Computer Science and
Math, with a minor in German.
New Mexico State University
Las Cruces, New Mexico
Date of graduation: May 1995
GPA: 3.00 out of 4.00
I follow and contribute answers to many lists, including:
I contribute answers and a few patches to many lists, including:
References are available upon request.