Thursday, July 18, 2013

Functional Programming and Stupid Examples

Sometimes it's hard to see why functional programming is good.

People say it's elegant, but to understand why, you might have to take a whole online course, read an entire book, decrypt a research paper, or study a whole functional language.

I've found some nice articles and Q&A threads here and there, but the info is scattered. Also, I'm not happy with the examples given. I wanted something "stupid" and informal.

This is my good faith effort to summarize the main characteristics and benefits of functional programming. I include examples that favor easier understanding over real-world usefulness, with hope that you'll be able to apply functional programming to your own use cases.

I will assume you have programming experience and are familiar with the concepts of abstraction and reuse. Also the code used in this post is pseudocode. It just kind of looks like JavaScript so please don't yell at me if you can't run the code. :)

What is functional programming?


Functional programming means functions are everywhere. For example, functions can be passed as arguments and can be return values.

A function is something that you "call" to get results. When you call a function with the same parameters, it always returns the same result.

Data in functional programming is immutable, because if data can change between function calls, the function can't guarantee to return the same result.

Why is functional programming good?


You would probably agree that reuse is good. Abstraction leads to reuse. In functional programming, there are more ways than non-functional programming to abstract stuff.

Since there are a lot of ways to abstract stuff, you get to use abstraction a lot. When you use it more, you get better at it.

When you use abstraction a lot, you tend to think about solving problems in a higher level. Your code looks more like solving a problem than telling the computer to do things. In other words, your code is more declarative.

In short, I think functional programming makes you a better programmer, and that's why it's good.

Common functional programming features


When functions are everywhere, these common features follow naturally.

Functions can be saved to variables.

You would save a number to a variable.

You can save a function to a variable too.

Functions as parameters.

You would pass an object as a function parameter.

You can pass a function to a function as well.

As you can see, non-functional programming allows you to abstract over data (what kind of meat to cook), but functional programming allows you to abstract over actions (how to cook a given type of meat).

Functions as return values.

You would return an object from a function.

You can return functions as well.

Lambdas

Sometimes, you don't need to name all of your values. Instead of this.

You would rather write this.

Same for functions. Sometimes, you don't need to name all your functions. Instead of this.

You would rather write this.

Closures

When functions can be used everywhere, we need to decide what the function bodies can see.

It turns out that it's useful when a function can see variables defined outside its body.

Suppose we want to make some sauce from some ingredient.

Every time we want to make some secret sauce, we would then have to pass the secret ingredient.

But if we can define functions that can see outside variables, we would be able to hide the secret_ingredient.

And if you need to make secret sauce many places in the code, you won't need to pass secret_ingredient along with them.

Functions coupled with data are called closures. This is another way programmers can hide data. It's called encapsulation, which is another form of abstraction.

Currying

Let's make dinner.

Suppose we want to make
1) pork over white rice.
2) fish over white rice.
3) pork over brown rice.
4) fish over brown rice.

We could write a function like this.

However, you can see that white rice is used twice and brown rice is used twice. We can refactor the function like this instead.

The benefits will be clearer if you made 10 more dishes with white rice. The example above allowed you to not have to use pass white_rice every time you cook white rice dishes.

Passing one parameter at a time on a multiple argument function, to create new functions along the way, is called currying.

The function implementation above might look a little weird, but some programming languages support currying, where you don't have to explicitly return a function from a function.

Currying is also another kind of abstraction: abstracting away parameters.

Recursion

Since data is immutable in functional programming, you can't really loop from i=0 to i=100, because you have no way to increment i.

However, when you think about it, one of the main reasons we loop, is to process a data structure. We want to do something to each "node" in a data structure. This forces us to think at a higher-level, which is good because it maps better to the problem we're trying to solve than to how a computer works.

Moreover, when data structures are more complex, recursion generally gives simpler code to reason about. You can easily loop over a list, but what about a tree or a graph?

The code written with recursion usually doesn't have any "boilerplate code" such as setting up the variable i to have the correct value.

Recursion is another exercise that forces you to think high-level, which makes you a better programmer.

More features


In functional programming, you apply the idea of using functions to everything. These are some high level descriptions of other features often found in functional programming. Since I want to keep this post "stupid", I'll not dive into the details of these features.

Calling functions on types

When you can call functions on types to create new types, you have algebraic data types and generics.

Determining which function is called

You can create objects with different constructors. At some point, you need to determine which constructor was called to create a given object and extract data from that object. Pattern matching is a nice way to do that. And actually pattern matching has more general uses than this too.

Lazy evaluation

Since functional programming guarantee that functions called with the same parameters always give the same results, we have freedom in the order of evaluation. In a nutshell, we can be "lazy" and evaluation expressions only if they are used. This has many benefits.

That's it


Functional programming gives us more ways to abstract stuff. It also forces us to think at a higher level. These make us better programmers.

One thing to watch out for is trying to be too clever and apply these abstractions everywhere, even when not needed.

As an extreme closing example, if you wanted to get 0, you would just write this.

I hope you don't write something like this.



Monday, May 13, 2013

The startup secret sauce


Through some reading, I found that profitable startups
1) make sure there's a market
2) solve a problem for the market with a solution
3) make the market know about their solution
4) make money from selling their solution
5) tweak the solution continuously

To acheive those things, you need a couple of basic skills: market research, building stuff, advertising, pricing, and risk taking.

There's only one rule you need to know for each of these skills.

The golden rule for market research: know your customers.
The golden rule for building stuff: use the right tools.
The golden rule for advertising: customers care about themselves.
The golden rule for pricing: price based on value that customers receive.
The golden rule for risk taking: validate ideas inexpensively.

Done. Follow these rules and you have a profitable startup.

So why am I not rich?

Because I don't go out there and do it. I've never run a startup before, and I'm writing as if I know everything. I just read stuff and bluff. I'm a phony.

It's easy to know how to do things, but it's a different story to actually do them. Skilled musicians don't read books, they practice.

There's no "secret sauce" in creating a profitable business. It just requires a lot of practice.

I think spending all day reading books and blogs on starting a business is a waste of time.

Yet so many people spend all day reading books and blogs on starting a business.

If there's one thing I've learned from reading, it's to stop reading too much and just do it. Read when you're in need of references or best practices, but only when you're actually about to or after you do it.

P.S. Just to be clear, when I say read stuff, I'm talking about stuff related to starting a business. Of course, I would still read other stuff for entertainment, inspiration, etc.

Friday, November 23, 2012

All Productivity Systems Are The Same

I've been reading here and there about productivity both on the web and on paper. A lot of info out there tend to either:
  • list very specific things to increase productivity (e.g. don't check your email every 10 seconds)
  • list less specific things to increase productivity (e.g. find a time you can focus and work during that time)

Maybe I haven't looked around enough, but that's kind of the pattern I noticed.

The way I see it, at the end, it all boils down to 4 things:
  1. increase available time
  2. do what's important
  3. do it efficiently
  4. do it sustainably
It's up to you to know how to achieve these 4 things in your specific circumstance, and the tips you find online or in books can serve very well as brainstorming tools.

Several months ago, I mentioned what agile was. Now I'm going to take a step back and look at the more generic view of productivity and how several development techniques, as well as day-to-day examples, fall into this framework.

(1) Increasing Time

A person given 2 hours will be able to watch more episodes of Family Guy than a person given 20 minutes. The example is contrived but I hope you wouldn't argue that more time means a higher probability of getting more work done.

The main things people do to increase available time is automating or delegating tasks and reducing "waste".
  • Automated testing saves hours of manual testing.
  • Delegating tasks to developers frees up time for the lead developer to work on important issues.
  • Having an assistant reduces time needed to keep track of things and context switching.
  • Disconnecting cable TV saves many hours a day.

(2) Doing what's important

There's not much point in working on something that gives little or no value.

The main things people do is prioritizing. Now prioritizing isn't the easiest thing to do. Priorities shift a lot and sometimes only somebody else can answer what is important. But there are general fixes. Priority shifts can be fixed by frequent reflection. In the software world, we call this tight feedback loops. As for unknown priorities, you just ask who knows.

Examples of prioritizing:
  • Asking clients what's the most important feature they want when they can't have all.
  • Finishing presentation slide contents before making them look beautiful.
  • Fixing bugs found in unit tests instead of waiting for it to reach production.
  • Have daily meetings and re-prioritize issues if something comes up.

(3) Doing it efficiently

Once you know what to do, you learn how to do it efficiently.

The main things people do here is improving skills, leveraging tools, and being healthy.
  • Use IDEs for code editing.
  • Learn to factor code well.
  • Learn to set your favorite channel instead of switching channels one by one.
  • Learn how to use email templates.
  • Have enough sleep and eat good breakfast. Vitamins also stimulate the brain.

(4) Doing it sustainably

All of the above can be accomplished but for how long? You need to keep yourself motivated.

This has more science to it than I originally thought. It turns out that our brains are programmed to work in a certain way. There's the emotion part and the logical part. The emotion part favors immediate and positive results. It doesn't care for long term goals. And most importantly in the long run, the emotion part almost always wins. That's why we hit snooze when we know we should wake up and work. And that's why we can force ourselves to wake up early but not for too long.

If you are a human, your brain works this way. No exception (unless you're not a human). So take advantage of how your brain works.

An effective way of keeping yourself motivated is to keep enticing yourself with small wins. They are easy to achieve, and you feel good about it right away. A lot of small wins snowball into big wins and that keeps you motivated further.

Examples of motivation techniques:
  • Daily to-do lists instead of monthly ones.
  • Working on small and easy tasks when stuck on a more complex one.
  • Start working out 5 minutes a day instead of an hour a week.

How I got started

I'm not a productivity expert. This is just information I deduced from reading and experience. But so far, what has worked for me is asking myself these questions everyday and tweak my process based on what works well and what doesn't.

The most important thing was I started small. I listed something that I could finish in 10 minutes and I got it done. The next day, I would add a little more. I accomplished something small and let it snowball.

It all started from there. And now I've been able to manage much more than before. A LOT more than before.

Maybe it will work for you too.

Tuesday, November 13, 2012

Hack the scope, Engineer the solution


Lately, it seems like startups have become the center of attention. Coworking spaces and tech-meetups have sprung up everywhere. The term "hack" which used to have a negative connotation, is now viewed in more positive light. And given the agile and lean movement, it appears to me that hacking things up sounds like something cool.

Before I get to my point, it's worth talking about two great companies with different principles.

After some reading and talking to people, my understanding is that Facebook has the "hacker culture", where people try to make things happen as quickly as possible, test it out, and fail fast if it doesn't work. I think video on Facebook was hacked in one day or so.

On the other hand, Google "engineers" software. If you don't know Jeff Dean, know him a little bit. The reason I brought up this guy is to give you a peek into the amount of science that goes behind building software at Google.

This is pure speculation. What I think about the two companies is possibly inaccurate, but that doesn't change what I'm trying to get at. I wanted to talk about hacking versus engineering.

Given these definitions:

  • hack = come up with something quickly that works
  • engineer = create something based on well-thought and sound principles

When you code, how often do you hack and how often do you engineer?
1) always hack
2) try to engineer, but hack when there are time constraints
3) always engineer

Test-driven development has made (1) a viable solution. We can come up with something quickly, and if something needs changing in the future, we have test cases to prevent regressions. I think a lot of startups go with (1) just to validate ideas or create demos for investors.

In established companies, I would think (2) is what a lot of people do. I do it too. During slow days, I have all the time to think very carefully about the design of the software I'm writing. But when there are tight deadlines, I tend to write something that works, and add a TODO comment to come back and finish it later (if I don't forget).

I think governments and financial institutions do (3).

Although I do (2) a lot, I'm not really happy with it, because most of the time, there are time constrains. I feel bad when I have code that works but could have been written better if there was a little more time. Well, everybody would write better code given more time, but I recently came to a realization that we can still write good code even with we have the same amount of time.

It is possible to always engineer software. What we need to do is hack the scope instead. So I'll add (4) hack the scope and engineer the solution.

So what does this mean? Let me give an example.

Suppose I need to write a noob-detector. It's a device whose alarm goes off when noobs are around. I have this HUGE third-party library that claims that it implements the best noob-detecting algorithm in the world. However, its API is so complicated that it takes months to learn how to use it properly, and I need it finished this weekend.

If I went the (2) route, I would randomly tinker with the API until it kind of does what I wanted it to do. At first glance, it seems to work. It detects noobs when I test it. However, I might not be aware that the way I used the library only detects asian noobs. (By the way, asian noobs are hard to find.)

But if I went the (4) route, I would see that the scope is too large. I hack the scope instead. There's another library that's fairly simple to use, but uses an inefficient algorithm. Yet, its slowness is tolerable. I loosen the requirement that the software needs to run with the state-of-art algorithm and use this library. With this library, I can write beautiful code that does the right thing. At the end, this version is less buggy, and correct according to the new requirement.

In a sense, I am tackling the problem as early as possible to save time, and this goes along well with productivity principles.

Sure it sounds like common sense, but even after writing code for so many years, I still fall into trap (2) all the time. I'm getting better at it though. Hope this helps others as well.

Saturday, July 21, 2012

Multi-Threaded Development Checklist

I happened to notice a recurring pattern that keeps happening when developing multi-threaded applications.
  • application is single threaded 
  • throughput and latency is unacceptable 
  • make application multi-threaded with coarse-grained synchronization 
  • throughput and latency still unacceptable 
  • introduce finer-grained synchronization 
  • deadlock 
  • fix deadlock 
A lot of synchronization is locking, but there can also be stuff like conditional variables too, for example.

Even locking alone, whose solution to avoid deadlocks is fairly simple -- just order the locks -- it turns out that implementing lock ordering is pretty hard to get right and there are many pitfalls to it.

So I've compiled a "checklist" that would hopefully help reduce some frustration when writing multi-threaded applications. These are more like "what worked for me" and not necessarily rules to follow. Also I'm not an expert, so keep that in mind!

Here goes, the "Multi-threaded-development-checklist-that-works-for-Um" checklist.

Are you locking on an arbitrarily long event?

Are you writing to disk/sending network packets/invoking user-supplied callbacks during locks? If you do, whoever is waiting for that lock, might be waiting for an arbitrarily long time as well.

Is your application layered?

If you're using fine-grained locks, layer your application so the locks have levels as well. It's easier to reason about lock ordering if your classes are well-layered. For example, if you have class Parent which holds a collection of class Child. Remembering the lock order Parent -> Child is intuitive.

Are layers in the same level interacting?

From the example above, Child objects should not interact with each other. If they need to share some information, do it in the Parent class. Otherwise, finding the right lock order would be hard.

Do lower layers release their locks before calling upper layers?

It's basically a violation of the lock ordering if Child locks before calling a method in Parent that locks the parent.

Are you aware of any locking inside your third-party libraries?

If you're using a third party library that says it's thread-safe, then that library is probably using synchronization primitives as well. In our example, the layering could be like this: Parent -> Child -> third-party object. What if the third-party can invoke some method Child::Callback()? Then that means there might be a lock order violation, e.g. the third party library holds a lock before invoking Child::Callback() which locks the child.

Is anything blocking forever?

If you have condition variables, do all code paths notify it at some point? If you're blocking to wait for an event (for example, via epoll), do you have a way to preempt it for have a timeout on it?

Is your application modifying a snapshot when it's suppose to modify shared data (or vice-versa)?

This bit us pretty hard recently at work, and we spent a couple of hours trying to find the cause. Suppose Parent contains a collection of Child. If for example, you wanted to remove a Child from the collection, but you only removed it from a copy of the collection. Then the real shared data won't be modified at all, and all weirdness follows. So make sure the data you're acting on is actually the data you want to act on.

That's all I have for now.

Thursday, May 31, 2012

Life at a startup

Hurray! I've been working at a tech startup for exactly a year already. This is one of the best things that ever happened in my life. Let’s just say it’s a super awesome and fun experience.

This post will basically be a condensed note to myself in the future, but it might benefit you as well if you're interested in starting your own company. (Or if you already have a company.)

The topics are pretty random and are not ordered in anyway.

startup != consulting != ecommerce


I personally don't think opening an online web design firm or selling stuff online is essentially a startup. You might, but that's your opinion.

I think startups are companies that try to solve problems for the mass. Facebook for example, changed the way people interact. I don’t have the energy to call everyone of my friends in Thailand everyday, so I learn about them through Facebook. Facebook solved one of my problems.

If you have a good vision, then you'll be rewarded when you see your product help people.

When To Hire


My company has about 7 roles. At the beginning, the founder had to do all of them. As a result, he was too overwhelmed. That's when the hiring happened.

In established companies, you hire on a continuous basis because people leave all the time. However, in early-stage startups, you should hire from needs. If you spend too much time on customer support, you don't have time to improve your product. But if you don't answer customer calls, you'll end up losing business. You need a balance. If you're out of balance, it's probably time to hire.

Environment


Before I left my old job, I never thought work environment would be such a big deal, but it actually is. A big part of why I left my old job was that I was in a cube in a room with no windows and I developed wrist, back, and neck pain from sitting in that cube.

You don’t have to have a fancy office (and you probably won't have money to pay for an interior designer anyway), but the better-looking office will attract developers.

My advice is at least make the office look nice -- nice being if you invited your mom to the office, she wouldn’t complain.

Here are some concrete things that are good to have:

computers that are fast enoughsufficient lightingbig tablesprivate rooms for meetings
air conditioning + heaters that work wellclean bathroomsearphonescupboards enough for everyone
chairs with neck restsmicrowave(s) and fridge(s)big windowscoffee/tea
water coolers or ice makerscoat hangers

Do I need an accountant?


You certainly do, but you don't have to hire one. You only have to know where the money comes and goes so you know if you’re going out of business or not. Oh, and you need to pay tax correctly too. In my company, this is managed by another developer. She said she needed about three courses from a local community college to get from zero to workable knowledge. She spends one day each month for accounting.

Network Admin


It's good to have if you can afford one, but probably not. Server/network setup (including contacting the cable company) are skills that programmers can learn easily. It's just not fun when you do it a lot. I remember seeing the founder driving to Chicago (2 hours away) at 8pm to switch a hard disk in the data center.

One thing to note is that when you're small, there's not much security concerns to worry about. But once you're big enough to become a target of hackers (the bad type), you might need a network admin, or more of a network security expert.

How to grow? Do I need Sales / Marketing / Advertising?


You can probably do it by yourself for the first 10 customers, but if you want to scale, you need people to help. I'm stereotyping but most software developers don't have a gift for marketing.

At the first stage of your startup, try to get customers in the most straightforward way. It’s like hacking up a prototype. It doesn’t need to scale yet.

This includes cold calling as much target audience as you can. Ask your mom and friends to sign up and spread the word. Go to conferences and make direct connections. Ads don’t make sense yet at this stage. If you have enough income, hire a sales rep to help with this.

Customer Service


Again, you won’t need any until you get overwhelmed by calls. We tried outsourcing, but didn’t really work out. We lost the opportunity to know our customers better and improve our product.

So I think at an early stage, when you don’t have a lot of numbers to crunch on, the best source of customer feedback is support calls, so you want somebody inhouse.

Administrator (this is different than network admin)


A euphemism for “errand runner”. You as the company owner have to do everything from basic plumbing, to lighting installation, to making sure there are stationary and enough napkins. This is not too bad, because these things don’t happen too often. It’s just a little more work than taking care of your own two bedroom apartment with 5 other roommates.

Software Developers


Every role is important, but since I like to self-promote what I do, I personally feel that a company needs great software developers.

You’ll probably be looking for generalists who are not tied to a specific set of technologies, and don’t mind working on anything (including tedious stuff). You want somebody who can tolerate imperfect code to get things done, and at the same time, feel bad about imperfect code because it's not beautiful.

Perks


Many startups can afford to buy their employees all the meals they want, some companies just can’t. But that doesn’t mean you have to be cheap. Once in a while, you can take your employees out. My company buys us lunch once a week. That’s a good perk given the size of our company. As a general rule, be generous with perks if it improves work efficiency and teamwork, and as long as it’s reasonable.

Pay


Of course you don’t have 250k to pay your employees, but do research to find what’s a competitive market pay in your area and at least match that. You should never lose a good employee because of your pay was too low. In other words, if you hired the right person, they'll likely work for the job not as much for the pay, as long as the pay can maintain their lifestyle.

Flexible work hours


A reasonable way of thinking is: as long as you get the job done, it doesn’t really matter that much what your work hours are. If your employee needs to see the dentist or take his/her cat to the vet, by all means, be flexible and your employees will love the job.

Funding


Angel investors and venture capitals (VCs) are not the only sources of seed funding. If you start a business with a market that exists, you will get your first paying customers. You can use the revenue to bootstrap your business.

VCs have pros and cons. You are advised by very experienced people, so you have a higher rate of success. On the other hand, since VCs invested and have equity, you have to listen to them and sometimes do something you don’t entirely want to do.

Everybody is involved


One of the best things about startups is you have input on anything, be it the processes to use, the tools we use, the color of the lights, which soda to buy, everything.

I find it really rewarding when somebody as smart as my company leader asks input from me. If you’re the company leader, you should listen to your employees as much as you can.

Coworkers


This is one of the most important things about startups. Of course, you would need somebody with skills. But as importantly, you have to make sure everybody is a great fit culturally.

In startups, you can’t afford to hire the wrong person. Make sure that cultural fit is one of the highest priorities. When you don’t have 200 teams, you can’t avoid seeing the wrong person everyday.

When interviewing a potential coworker, include questions that will show their personalities. For example, try finding something the candidate totally disagrees and see his/her reaction to that. If the candidate uses strong words like “stupid” or “hate”, that’s a bad sign.

Use open source and open your source when you can


Most of the time, open source code is a great way to save cost. Try finding open source libraries as much as you can, but if buying software licenses will save you cost more than the time wasted finding the right open source software, by all means, spend your money on it.

You want to look for liberal licenses which don’t force you to release your source code. My favorites are BSD, MIT, and Apache licenses. I’ll blog about the differences one day.

And at some point, don’t forget to give back to society and open source what you can. Just be careful not to open source stuff that your competitors can use to kick you out of business.

Have a research mind


There’s a lot of knowledge out there you can leverage. Having at least somebody with an MS or preferably higher in your team, who is used to reading research papers, will widen your point of views. We’ve applied solutions from at least 3 papers already in our system and have plans for future directions.

Not only research papers, but generally if you’re looking to solve hard problems, a lot of times when you include “site:edu” in your Google searches, you can find out how the academia solves it.

Even if you end up using nothing. Just reading for the sake of knowledge is fun enough (if you really enjoy what you do).

Meetings


Don’t meet unless you have to. Multiple short meetings is better than one big meeting. Anything over 35 minutes might lose people’s attention.

The end


I hope you find this post useful. Feel free to send me comments on what you agree or disagree. I’d love to hear from other people who work at startups too, especially learning what went well and what didn’t.

But anyway, happy one year (and 40000+ lines of code) anniversary to me! XD

Monday, May 21, 2012

NoSQL vs SQL

I've been exposed to NoSQL for about a year now, and I think I am finally able to describe what it is to other people.

NoSQL is a marketing term, well-named to catch buzz, and ill-named to describe what it is.

It kind of implies that we should throw SQL away and move to this hip database. Before all the NoSQL fuss, the most popular databases were relational databases, which supported a language called SQL. The marketing idea was to say, we're the opposite team, so we're "NO SQL". I think it worked.

The Traits


There are traits that relational databases vs NoSQL databases have. I think viewing them like this gives a better understanding of the differences.

How to Represent Data


At the end, a database is a system where you give it a question and it returns results. If you think of it as a map, you give it a key, and it returns a value.

A relational database's key is the primary key of the tables you specify.

For NoSQL databases, the key can be anything you want (including something like primary keys).

What people choose to use as a key depends on their design decisions. For example, Google's Bigtable specifies the key to be the row name, column name, and timestamp.

And you can see that nobody's stopping you from using relational databases to have a table whose primary keys are row name, column name and timestamp.

The API


All relational databases I know of, support SQL and some language binding, so that's how you talk to the database.

For NoSQL databases, the API can be anything, but usually is some language binding or web service.

Note that nothing can stop you from supporting SQL in NoSQL databases. It just usually isn't supported by default by all NoSQL database.

Scalability


Relational databases weren't designed from the beginning to be deployed in a distributed environment. They're usually on one computer as one process.

So if you want to support some data partitioning like,

  • for row 1 to row 100, I want to contact the database on machine 1
  • for row 101 to row 200, I want to contact the database on machine 2
  • although they're two database processes running, we actually view it as one logical database,
then your application has to be smart enough to do it.

But still, that means relational databases *can* scale with the help of application level code.

Many NoSQL databases were designed from the beginning to be distributed. This means, you get the partitioning above for free. This is why people say NoSQL databases scale better. It's because you don't need application code to scale.

Note that I said *many* NoSQL databases are distributed. A bunch of them aren't, but they're still called NoSQL because they have other NoSQL traits.

Schema


Relational databases need a schema and they strictly check that data conforms to that schema.

NoSQL databases might or might not have a schema, depending on the design. If you want a schema but the NoSQL database doesn't support it, then you need application level code to do it.

Transaction Guarantees


Relational databases have the ACID guarantee (you don't need to know what it is), but it means you can have transactions any way you like it. For example, your transaction could be modifying all the rows in the database in one shot. But note that whenever you make a relational database distributed though application code, you lose this guarantee just like NoSQL databases.

Distributed NoSQL databases normally have the BASE guarantee. Again you don't need to know the details what it is, but it means that if you have data across two machines, and you do a transaction on this data, then there might be a time lapse where the data in these two machines are not consistent. (They will eventually, but just not immediately.)

This relaxed guarantee is a tradeoff between performance and consistency. Technically, it's possible to have ACID guarantees for distributed transactions, but it doesn't perform well. On the other hand, you can have a pretty efficient distributed database if you relax this guarantee.

That said, if the transaction only involves data living on one machine in a distributed system, ACID can still be guaranteed. This also means that if the NoSQL database is a non-distributed one, then it can provide ACID guarantees easily, because data is on one machine.

So the guarantee boils down to this. If the data you're performing a transaction against is on the same machine, you can provide ACID. If the data is on different machines, but you need to perform a transaction, then there might be some inconsistencies for a short while if you want reasonable performance.

When to Choose Which


If you're deciding whether to use a relational database or NoSQL database, consider what traits do you want your database to have.

TraitRelational DatabasesNoSQL Databases
represent data with tables and primary keysyesmaybe
represent data with anything elsenoyes
sql supportyesmaybe
scalabilitymaybeyes
schemasyesmaybe
transaction on same machineyesyes
transaction across many machineschoose between relaxed guarantees vs efficiencychoose between relaxed guarantees vs efficiency

Any thing other than "yes" means you might need to write your own application level support.

As you can see with some amount of application level code, relational and NoSQL databases pretty much can support the same things.

The reason I see most people use NoSQL is for the non-strict schema and the scalability, which (1) requires a lot of work for application level code to handle, and (2) the current fashion for software is shipping fast (might need to change schema often) and scalable software.

Other things that might affect your decisions are: community support for the database, your boss says so, price, installation difficulty. These topics are pretty self-explanatory, so I didn't include them in this post.

I hope now you have a good answer when the next person asks you why you're using a NoSQL database.