Semantics Matters

The last weeks have been intense but some ideas come in mind sporadically, especially while running or doing other outdoor activities. During the last two weeks, my focus has been on language semantics, and especially why the semantics of our natural languages did not improve over time as much as programming languages did.

One issue in programming language is semantics. A poor semantics will then reduce potential analysis and verification because this is difficult to know what the program is really doing. This is one motivation of modeling language and one of the biggest message I am trying to communication when talking about AADL: better semantics leads to better analysis, which ultimately, will help you to deliver better software at a lower cost. Over decades, the semantics of programming languages evolved in a manner that we reduce the ambiguity (at least, some dudes tried hard) and makes them deterministic.

For example, in C (created in 1972 but one of the most used language - the core of smartphone OS is implemented in C), you create a task by calling a function

pthread_create (pthread_t*, const pthread_attr_t*, void *(*start_routine) (void *), void *arg);

But from a semantics point of view, this is not a task creation, this is a function call, similar to floor(), included in the math library

double floor(double x);

So, from a language perspective, calling floor() is similar to create a task: you just call a function. You cannot distinguish these concepts (you can do it by a syntax analysis but it will not catch everything - there might be some hacks and workaround but this does not address the root cause of the problem). To overcome this semantics issue and improves program analysis, researchers have created language with better semantics. For example, in Ada, the task concept is a built-in language concept. You define the task using the task keyword. You have two different keywords for a function and a task so that you can clearly distinguish a function call from a task creation.

To summarize, over the years, to avoid misunderstanding of concepts and improve our understanding of a program, we reduce the language ambiguity to distinguish the actions performed by the machine.

Now, let's come back to our natural language, the one we use every day to communicate. Let's have a look at how we speak and how poor is the underlying semantics. This is not because the language has not an accurate and precise concepts but because we choose not to use them. We have thousands of words but we just use few of some every day. Our vocabulary is really poor, as the way we use them. For example, most people will say: "Let's watch TV" and not communicate what they really mean, such as "I would like to watch the last Star Wars movie". You will tell your colleague that you have "stuff to do" rather than "finishing to write the report about the project review". Beyond the vocabulary issue, the way we articulate our thinking matters and is the message being communicated is interpreted differently by the receiver. This is also one of the reasons for many plane crashes (yes, plane crashes because the pilot and the air traffic control does not understand each other). What we really mean is not only a matter of what we want to communicate but also how we want to communicate.

Surprisingly, over the centuries, it seems that very small efforts have been done to improve natural languages and reduce potential misinterpretation and semantics gaps between languages. And it seems that what has been started since decades in computer science has not been considered by natural language (even if there is a real motivation, the plane crashes being a good illustration - you can also think about the semantics gaps between languages when talking to somebody from another country). The interesting part of it is to know if it has already been considered and if yes, what were the outcomes and why it has not been adopted so far. Just food for thought but definitively something funny to investigate.