Monday, August 22nd
9:00amA walk to remember: Debugging a distributed system failure
Debugging distributed systems has a different set of complications than other fields in our industry. Each system may behave differently depending on the environment it's running in and this undeterministic behavior makes the process more challenging. If the debugging happens on a production environment the risk increases and the nerves get to us.
The debugging process for a distributed system is hardly the same every time. Therefore, we need to have a toolsbelt ready to attack this issue from different fronts but we also need to be ready to backoff when we've gathered enough information to do a proper analysis.
This talk will walk you through the debugging process for an issue on an OpenStack deployment and the strategy used from a technical and non-technical perspective.Principal Software EngineerPrior to Red Hat, Flavio worked on Big Data oriented applications, search engines and message systems. He was also an active member of Gnome's a11y team where he contributed to Orca and created MouseTrap, a head-tracker application. Outside Red Hat Flavio likes to take pictures, swim, run, travel, hang around with family and friends and whatever seems interesting. Flavio spends most of his time hacking on storage and messaging modules. He has both Italian and Venezuelan roots, and is currently based in Italy where he works remotely for Red Hat. Flavio is also an actively open-source contributor, part of Mongodb Masters group and an active Rust lang contributor.
10:00amGetting the Word Out: Membership, Dissemination and Population protocols
We are building an instrumentation platform that runs across dozens of datacenters to provide operational visibility for internal systems and applications. This platform must remain up as much as possible and allow support and operations staff to understand and diagnose problems quickly. They must be able to ask questions like "what machines and applications are publishing metrics?", "what systems appear to be offline?", "what order did these errors occur in?", all without consulting every datacenter. Furthermore, they must be able to change configuration quickly, with confidence that every affected system will receive and act upon it.
To help with these problems, we are implementing several recently developed protocols for cluster membership, epidemic broadcast, and monotonic time. Respectively, these protocols allow us to know what nodes are peers, to disseminate configuration and status information, and to agree on roughly relative orders of events. Best of all, they are all synchronization-free, meaning we can achieve our goals while remaining highly available. In this talk, we'll discuss the protocols we chose, challenges to implementing them, and some preliminary results from deploying the protocols across our infrastructure.Software Engineer
Sean Cribbs is a distributed systems and web architecture enthusiast, currently building innovative cloud and infrastructure software at Comcast Cable. Previously, Sean spent five years with Basho Technologies contributing to nearly every part of Riak including client libraries, CRDTs and tools. In his free time, he has ported Basho’s Webmachine HTTP server toolkit from Erlang to Ruby, created a popular parser-generator for Erlang, and has contributed to many other open-source projects, including Chef, Homebrew, and Radiant CMS.
11:00amConquering Chaos to Land Humans on Mars
Landing on Mars is challenging for lots of reasons but primarily because of the inability to test a vehicle in a Mars environment prior to flight at the planet. Unable to do flight tests at Mars, we rely on Earth based ground and flight tests, wind tunnels and computer simulations for mission design, test and verification. The simulations need to include models of all variables that effect flight. The models include characterizations of the atmosphere, aerodynamics, surface terrain, vehicle mass properties, engines, and landing sensors to name a few. Additionally, uncertainties are included in each model to account for known unknowns and margin is added to protect from unknown unknowns. The process for coping with chaos has been used successfully at NASA five times in the past 20 years to land robotic missions on the surface at Mars; the most recent being the 900 kg Curiosity rover in 2012.
This presentation will review the overall Mars entry, descent and landing flight design process, including simulation development and various approaches used to land rovers on Mars. Additionally, the presentation will describe how the techniques developed for robotic missions are being used to land human missions on Mars.Alicia M. Dwyer CiancioloAerospace Engineer
Alicia Dwyer Cianciolo is an aerospace engineer at the NASA Langley Research Center. She specializes in developing simulations to analyze vehicle flight through different atmospheres in the solar system. Primarily focusing on Mars over the past 15 years, she has worked on several missions to the planet including the Odyssey and Reconnaissance Orbiter aerobraking operations, the Exploration Rovers, and as a member of the Entry, Decent and Landing Team that successfully landed the Curiosity Rover on Mars in August of 2012. She is currently supporting NASA’s the next lander mission to Mars, InSight, and is working to analyze entry technologies that will enable human exploration of the planet. She holds a Bachelor of Science degree in Physics from Creighton University and a Master of Science degree in Mechanical Engineering from The George Washington University
1:00pmFetching Moths from the Works: Correctness Methods in Software
We live in a nice world. There’s a wealth of historical thought on achieving correctness in software–shipping code that does only what is intended, not less and not more–and there are a whole bunch of methods available to us as practitioners. Some of these are hard to apply, some are easy. For instance, case testing is widely used and considered standard practice. Property testing is understood to exist but not widely used. The application of advanced logics? Way out there.
If you look around you’ll find a lot of software fails a lot of the time. Why is that?
In this talk I’ll give an overview of the methods for producing correct systems and will discuss each in its historical context. With each method, we’ll keep an eye out for present applications and the difficulty of doing so. We’ll discuss why there’s so much buggy software in the world. I expect there will be talk of spaceships a bit.
By the end of this talk you ought to be able to make reasoned decisions about applying correctness methods in your own work and have a good shot at building better software.
2:00pmVoice Controlled ChatOps
So you have seen Tony Stark (Iron Man) talk to his computer J.A.R.V.I.S. (Just A Rather Very Intelligent System), right? Voice controlled ChatOps is pretty much like talking to J.A.R.V.I.S. It is a much more natural interface than keyboard and mouse. You talk to the hardware (Alexa/Raspberry Pi) and it sends the commands off through the Amazon Skills Kit (ASK) to Amazon Lambda. From here it makes it to the chat room, where Hubot takes over and integrates to the rest of the operations tool chain (PagerDuty, New Relic, deployment, etc.).Senior Automation and System Engineer
Aaron Blythe has worked with software for over a decade. He is currently a Sr. Automation and Release Engineer working remote for Hearst. He is genuinely curious and interested in understanding things and making them better. He has co-organized the Kansas City DevOps community meetup for the past few years.
3:00pmReplacing a Jet Engine Mid-flight, or How We Launched New Architecture for a Planet-Scale Distributed System at Google
As our systems evolve and succeed at establishing themselves as the go-to solution for a problem domain, over time, the need arises to re-think the architecture of the system to better support the most popular (and potentially unanticipated) use cases and growth. Often, this results in a significant re-write of the system. In globally distributed systems, like the distributed build system at Google which serves millions of requests per day, the luxury of downtime is not an option. In this talk, we’ll look at how we managed to replace the previous production system with a new architecture, and how we did so with no downtime or user visible effects.
4:00pmPractical Accommodations for Mental Health
We have accommodations for many physical health issues. But when it comes to mental health, things are pretty abysmal even though it is incredibly important. Based on talks with other folks and my own experience, I will present some practical ways that mental health can be accommodated at work.Laura KuSoftware Engineer
Laura is a software engineer at a company called CarbonFive during the day and a super hero fighting monsters at night. One of those two is probably a lie. She also thinks that React is starting to make everything look like a nail.
Tuesday, August 23rd
9:00amThe Edge of Chaos
In the software industry, we are regularly faced with understanding complex systems--which can only be understood holistically, and not as the sum of their parts--with the limited capacity of our human brains. Moreover, our complex systems are developed and operated by people, which makes our overall sociotechnical systems complex *adaptive* systems with emergent behavior that wasn't contemplated when the system was first designed.
In this talk, we'll describe the nature of this problem and cover coping strategies drawn from control theory and other non-software contexts. We'll use this point of view to describe *why* Agile software development, DevOps, and microservices are business imperatives for avoiding falling over the "Edge of Chaos".Senior FellowJon Moore is a Senior Fellow at Comcast Cable, where he leads the Core Application Platforms group that focuses on building scalable, performant, robust software components for the company's varied software product development groups. His current interests include distributed systems, fault tolerance, building healthy and engaging engineering cultures, and Texas Hold'em. Jon received his Ph.D. in Computer and Information Science from the University of Pennsylvania and currently resides in West Philadelphia, although he was neither born there nor raised there and does not spend most of his days on playgrounds.
10:00amMesos: Automate your Data Center with Containers
Maybe you’ve heard of Mesos, that thing like Kubernetes? Or perhaps you read that Twitter is powered by open-source infrastructure? Is Mesos meant for continuous delivery or for microservices? In this talk, David Greenberg, author of the O’Reilly book “Building Applications on Mesos” will introduce you to Mesos. We’ll learn how Mesos makes it easy to host scalable, fault-tolerant application servers, continuous integration, and even your databases, through the development of a hypothetical sample application. We'll also learn about the open source and proprietary software that Mesos can automatically and reliably deploy for you, such as Spark, Wordpress, and Jenkins. At the end of this talk, you’ll be equipped to evaluate Mesos and understand its place in your software development projects.Author, ConsultantDavid Greenberg loves learning new things. He is an independent consultant who previously worked at Two Sigma, where he led the effort to rebuild their computing infrastructure. His desire to learn has lead him to study Russian, and he enjoys practicing cooking techniques. He's interested in high performance software and distributed systems with Mesos. He's the author of the O'Reilly book "Building Applications on Mesos" and the designer of Cook, a Mesos framework written in Clojure and Datomic which coordinates containers to optimize task scheduling.
11:00amAssigning Meaning to Programs
In 1968 Turing Award winner Robert Floyd wrote a seminal paper in formal program validation entitled ""Assigning Meaning to Programs."" In the paper, Floyd describes a technique which bounds each step in a computation with logical predicates on input and output conditions
In this talk, we will use the same technique to understand how and why a program behaves like it does. We use explore techniques to automate call graph visualizations, create logical predicates for the steps in program execution and discuss what conclusions we can draw for our own work.
By the time you leave, you will feel comfortable applying this technique to acquire a deep understanding of the behavior of your code. This approach will help you squash a well hidden bug, refactor too-complex code into simpler modular units or better comprehend a code base that is new to you.
This talk is suitable for programmers of all skill levels from novice to seasoned.
1:00pmMurphy's Law for Conferences
Don't believe in Murphy's Law? Throw a conference and get back to me. // Want to learn about organizing a conference or just pick up a few tips for your next event? Join me, Amanda Harlin, and I'll share some insight on how we make Thunder Plains happen without losing our cool.
2:00pmLiving on the Edge
Joshua Bloch said that "Public APIs, like diamonds, are forever." and Antoine de Saint-Exupéry said that "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." However, good APIs don't always spring to life fully formed. How, as client and server, can you deal with changes? We'll demonstrate through some real world examples learned while implementing the ever-changing FHIR standard.Principal Architect
Jenni is a Principal Architect at Cerner, and has worked on various service architectures for 15 years. Since 2008, Jenni has been working specifically with web services and now concentrates on implementing and providing input to the evolving Fast Healthcare Interoperability Resources (FHIR) standard. When not working, Jenni chases her two children around, reads, and plays video games.
3:00pmTesting the hard stuff and staying sane
Even the best test suites can't entirely prevent nasty surprises: race conditions, unexpected interactions, faults in distributed protocols and so on, still slip past them into production. Yet writing even more tests of the same kind quickly runs into diminishing returns. I'll talk about new automated techniques that can dramatically improve your testing, letting you focus on what your code should do, rather than which cases should be tested--with plenty of war stories from the likes of Ericsson, Volvo Cars, and Basho Technologies, to show how these new techniques really enable us to nail the hard stuff.Professor & Founder of QuviQ
John Hughes has been a functional programming enthusiast for more than thirty years, at the Universities of Oxford, Glasgow, and since 1992 Chalmers University in Gothenburg, Sweden. He served on the Haskell design committee, co-chairing the committee for Haskell 98, and is the author of more than 75 papers, including "Why Functional Programming Matters", one of the classics of the area. With Koen Claessen, he created QuickCheck, the most popular testing tool among Haskell programmers, and in 2006 he founded Quviq to commercialise the technology using Erlang.
4:00pmData-Driven Software Mastery
What if we could measure the indirect costs of pain building up on a software project? What if we could measure the loss of productivity, the escalating costs and risks, and could steer our projects with a data-driven feedback loop?
By measuring the friction in "Idea Flow", the flow of ideas between the developer and the software, we can create a data-driven feedback loop for learning what works. Rather than making decisions based on anecdote and gut feel, we can start driving our improvement decisions with real data.
Data-Driven Software Mastery is about learning and improving faster than ever.Find out how you can:
- Identify the biggest causes of productivity loss on your software project
- Avoid spending tons of time solving the wrong problems
- Collaborate with other industry professionals in the art of data-driven software mastery
Idea Flow gives us a universal language for describing our experience, and sharing our knowledge of what works. With a feedback loop, we can even run experiments!
Idea Flow turns the development community into a scientific community.CTO
Janelle is a NFJS Tour Speaker, author of the book, Idea Flow: How to Measure the PAIN in Software Development (leanpub.com/ideaflow), and founder of Open Mastery (openmastery.org), an industry collaborative learning network focused on mastering the art of software development with a data-driven feedback loop.
She founded Open Mastery to rally the industry in working together, and learning together to break down the wall of ignorance between managers and developers that drives our software projects into the ground. By making the pain visible with Idea Flow, we have a universal definition of effective practice, a universal language for sharing our experiences, and an opportunity to learn together like never before. Open Mastery is about taking the industry to a whole new level of effectiveness by working together.
Aside from Open Mastery, Janelle has been working with New Iron for the last 10 years, as a developer, consultant, and now as CTO. Her development background is specialized in data-intensive analytic systems from financial core processors to factory automation, supply chain optimization and statistical process control (SPC). Her consulting work has focused on Continuous Delivery infrastructure, database automation, test automation strategies, and helping companies identify and solve their biggest problems.