K Young

Founder at Mortar. 6'8, so I’m the biggest big data expert in the world.

Read this first

How’s that data lake?


What’s a data lake

A “data lake” is a single data store that ideally holds all of an enterprise’s data. The benefit of a lake architecture is that you can safely and easily access your data from many end-points such as dashboards, user-facing applications, or even your CRM. Ed Dumbill has a good overview in Forbes here.

If you’ve never been able to use all your data together before—and most companies have not—then a data lake is a huge improvement. So the idea is popular.

I don’t love data lakes

I don’t love data lakes for three reasons:

  1. They leave thorny questions like “reuse”, and “validity” to individual application implementers, rather than providing an architecturally consistent framework to address those problems—and that means that too often they are dumping grounds for data. Sure, everything ends up there, and if you’re willing to take the time to pick through it, you...

Continue reading →

Reports of AWS’s death are greatly exaggerated

Screen Shot 2014-09-10 at 16.08.37.png
Right now AWS has the most complete and mature cloud offering, far and away.

But a few weeks back there was buzz that AWS might be in trouble. Earnings were off, and people that felt this down-quarter might mark the end of AWS’s awesome ascent. They reasoned:

  1. Competition from other clouds is heating up
  2. AWS services are poor knock-offs of other, better services
  3. AWS has terrible support
  4. AWS isn’t clearly the lowest cost public cloud
  5. Once a company gets to a certain scale, it is more cost-effective to move to private hardware.

Many of these arguments were proposed by Brad Feld in his excellent AWS scorpion post. By coincident, I ran in up to Brad on the street the day I wrote the first draft of this article, so we discussed our different points of view. More on this discussion at the end of the post.

Anyway, let me address these points one by one.

1. Competition from other clouds is


Continue reading →

I’m freakishly good with email. You can be, too.


I haven’t lost track of an email in three years. Since at Mortar I’m the CEO (Chief Email Officer), that gives me a huge advantage. But I wasn’t always so good. In fact I was downright bad until I discovered one stupid trick: date-labels.

How it works

Sometimes I look at my inbox and it has an email in it. So I do one of three things:

  1. answer immediately
  2. ignore forever
  3. label the email with a date (which could be today if I plan to answer later)

I then remove the email from my inbox.

Here’s what my labels looked like on June 24, 2014 (look at the left side-bar):
Screen Shot 2014-06-24 at 14.06.51.png

Yes, I have at least one email I’ll act on in January 2015, all queued up for 6 months from now.

Ongoing usage

When I come in to work in the morning, I open the label for today’s date, and deal with every email in there, making the same choice as above: answer it now, ignore it forever, kick it down the road to a...

Continue reading →

Don’t start an algorithm business.


If you’re thinking about starting a company around proprietary algorithms… don’t!

The types of companies I’m most concerned about are:

  • recommendation engine companies
  • predictive analytics companies
  • natural language processing companies

Any strictly technical advantage is a very tough thing to maintain, but I believe that algorithms are the most tenuous advantage of them all. Why? Algorithms are often easy to swap, their advantages can be esoteric, right now you can get open source, high-quality algorithms for the all applications above for free, and open innovation is continuing at a dizzying pace.

This last point, “open” is the most important. If you sell algorithms, then you probably sell them closed-source, which means your customers can’t, you know, customize them. So your value prop is a “good enough” implementation that’ll get your customers to a solution fast. The problem...

Continue reading →

Devops for data science. Or: Repeated failure FTW!

Darwin never actually said the following quote, but it’s truthy so I’ll use it:

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change. —Darwin-ish

I was watching a talk by Josh Wills the other day. He was applying Lean engineering concepts to data science. To illustrate how important rapid learning is, he told a story about the team that built the Gossamer Condor and won the Kremer Prize for human-powered flight. They won because they failed more repeatedly than their competition. I’m sure their competition was brilliant, and they definitely had several years head start, but they lost because it took them maybe a year to iterate from design, to build, to test flight, to crash and destroy, and go back to design again. The Gossamer Condor team’s breakthrough insight was this: if the Condor could be repaired and improved in...

Continue reading →

Coffee / exercise / food / sleep. Or: Finding balance at a startup.

There are just four pages in Drastic Change, the opening chapter of Hoffer’s “The Ordeal of Change”, but they’ve stuck with me for years. The chapter was written in the context of social change in the 1960s, but its succinct insights about human nature and reaction to change go far beyond that time and place.

I’ve worked at startups for my entire career. And startups are one of the most drastic change-y things you can throw yourself into: your role, the market, the technical infrastructure, the requirements—everything is in constant, radical flux. And in reaction to that flux, many people find themselves working very, very hard. Sometimes people are proud of how hard they work. Hoffer comments:

“A workingman sure of his skill goes leisurely about his job and accomplishes much though he works as if at play. On the other hand, the workingman new to his trade attacks his work as if he...

Continue reading →

Data is a (bad) representation of reality. So data science will save the world.

Regardless of your worldview, I think there is near-universal agreement that humanity is facing a bristling array of existential threats. Here are a few obvious ones:

  • An airborne, massively transmissible, deadly disease
  • Nuclear war
  • True economic meltdown
  • Environmental collapse

In fact, it seems to me that the possibility of none of these threats materializing in the next 100 years is damn slim… unless we collectively get a lot smarter about detection, decision, and action.

But I think we may be able to save ourselves. Because we have lots of data, and (to a lesser degree) we have data scientists.


Most data is an abstract representation of the world. Whether it is the decoded genome of a disease, or untold reams of data that represent possible future states of the world in large-scale simulations, or measurements of honeybee populations. Unfortunately, data is a very thin...

Continue reading →