@Veetaha

Welcome! This is the place, where I share my knowledge with people, where 99% of that is programming with the domination of Rust language.

You can find any blog posts of interest in the sidebar to the left.

Acknowledgment

I owe a lot of my knowledge to these people:

  • @DevInCube, a teacher who introduced me to the awesome art of programming for the first two years in the university, and inspired me to grow;
  • @matklad, who ultimately helped me become the developer I am today by reviewing my code in their open source project rust-analyzer;
  • @anelson, who hired me for a startup project and shared invaluable experience of making a commercial product with a team.

You guys are as passionate and as awesome as possible, thank you so much for investing in me 😋!


Website source code

Programming Language Definition

Welcome! We are about to find out what programming languages are, and how they work. We are probably eager to start with a code example. To satisfy our whim here is one written in Rust language:

fn main() {
  println!("Hello world");
}

It is a classic program that prints the words "Hello world" to the terminal window.

We are going to use Rust as our example programming language. However, there may be references and examples written in other languages further.

Formal vs Natural Language

What we've seen in the code example above is a formal language called Rust. We may wonder what the difference between a programming language and a human language like English is. That's a good question.

First of all, we must understand that classic programming languages are text-based. I.e. all code that programmers write is just simple text, and all code editors are text editors as well.

The main feature of programming language is that it is formal, which means that it is:

  • Unambiguous - each word of the language has a singular interpretation within the context where it is written. For example, the computer knows that println! instructs it to write to the console, and "Hello world" defines what should be printed. The rules of language's syntax, grammar, and semantics fully define what alphabet the language uses, what punctuation, keywords, and other symbols are valid, and in what order. From this follows the next feature.
  • Strict - the language disallows having grammar mistakes (e.g. a misspelled word or wrong punctuation). If we have any mistakes in our code, we'll see errors when running our program.
  • Little redundancy - the language is very laconic. It doesn't have redundant constructions that are not necessary for understanding. For example, the word fn in the code snippet shown at the beginning stands for "function". The language designers decided to shorten it because it is a widely used keyword that we would like to spend less time typing. Other than that we wouldn't write The fn main in a programming language, because articles are usually considered to be a redundancy of a natural language, so we omit a/an/the in code to keep it succinct.

Purpose

Programming languages are mostly used to write applications running on different platforms (servers, PC, Laptops, mobile phones, etc.).

We already know what programs are capable of (web browser, file browser, games, torrent, etc). We can make the computer do almost anything we want, but before we harness the full power we will begin with the basics. Knowing just the basics may make us feel like a child playing in a sandbox, but this groundwork is of the highest importance to building the ground in any language.

General-Purpose Programming Language Structure

Direction

Classic programming languages are written in the usual direction just like natural languages. We should read them left-to-right and top-to-bottom. Regular programming languages like Rust, TypeScript, C/C++, C#, Java, Kotlin, Python, etc. share a similar structure.

Comments

To address an elephant in the room we'll begin with what makes our code readable by letting programmers insert snippets of their natural language (99% of the time the conventional language is English) that explain what happens in the formal language around them. They are called comments, and they are completely ignored by a computer when it runs a program.

Different languages have different syntax for comments, but the most frequent syntax is to use // (Rust, C/C++, Java...) or # (Python, Ruby, YAML) symbol to denote the beginning of the comment.

The comment doesn't have to start on a new line. Any occurrence of // or # denotes the beginning of our natural language prattle. Such a comment is called a single-line comment because it finishes only at the end of the line.


#![allow(unused)]
fn main() {
// Blah, bruh, this code is dope, just look at this =)
println!("Hello world");

println!("Twigly"); // Comments can also be written after real code
}

Whitespace

Most of the languages ignore redundant whitespace. For example, the following 4 programs written in Rust are completely identical in what they do.


#![allow(unused)]
fn main() {
println!("Hello World");
}

#![allow(unused)]
fn main() {
println!( "Hello world"    );
}

#![allow(unused)]
fn main() {
println!( "Hello world"    )
;
}

#![allow(unused)]
fn main() {
println!(

      "Hello world"

                            )

             ;
}

We may recall that formal languages were said to be 'Little redundant'. This "little" is what still allows for exceptional trivial redundancy like this. However, it makes writing code easier and more flexible for humans. In some cases, extra whitespace makes our code prettier.

For example when formatting a long list of things we would like to write it across several lines, disregarding the fact that it would make the text of the program longer.

Good programmers strive to write their code in a way that it would be easier to read and understand by other people!

E.g. this code is hard to read because it is all on one line.


#![allow(unused)]
fn main() {
let vocabulary = ["Programming", "Rust", "Language", "Niko", "Veetaha", "Dictionary", "List", "Array", "Blackjack", "Morning Glory", "Rampage", "P21", "Project", "Horizons", "Octavia", "Vynil", "University", "Days"];
}

This code is easier to read, even though it takes much more screen space, just because human's perception of smaller lines of text is better than the longer ones.


#![allow(unused)]
fn main() {
let vocabulary = [
  "Programming",
  "Rust",
  "Language",
  "Niko",
  "Veetaha",
  "Dictionary",
  "List",
  "Array",
  // When code is written vertically it's easier to add comments to it
  "Blackjack",
  "Morning Glory",
  "Rampage",
  "P21",
  "Project",
  "Horizons",

  // We can even use whitespace to group related parts of code together visually
  "Octavia",
  "Vynil",
  "University",
  "Days"
];
}

Languages like Python or Ruby give programmers less freedom in how they can use whitespace. However, this way they reduce the number of punctuation symbols in their language (e.g. ;).

Expressions

Expressions are parts of the code that evaluate to some value. They are highly composite, and it is very important to understand how expressions compose into an expression tree and how the computer computes them.

Literals

The simplest expression is just a literal value


#![allow(unused)]
fn main() {
99
;
}

#![allow(unused)]
fn main() {
"Hello world"
;
}

#![allow(unused)]
fn main() {
true
;
}

These values are the atoms of our LEGO. They are called literals.

Mathematical expressions

Mathematical expressions are the most widely-known kind of expressions:


#![allow(unused)]
fn main() {
2 + 2
;
}

All expressions like LEGO pieces are composed of smaller LEGO pieces. And this relationship is best described with a tree.

For instance, a computer will build the following expression tree for a simple 2 + 2.

The tree is then evaluated from the bottom to the top. Mathematical operators always sit in the middle of the tree and literal values are always at the bottom.

To compute the result of this tree the computer will begin from the bottom and look at the left 2.

Since that value is already known (it is literally 2 duh...). The computer will look at the right branch of the tree, which is also 2.

Then it will conclude that the parameters of the plus (+) operator are known, so it can proceed with getting the sum of them, outputting 4 at the end.

Let's take another example


#![allow(unused)]
fn main() {
(2 + 6) * 9 / 3 > 46
;
}

We can see how mathematical expressions can grow in complexity without bounds. What's important is to understand how a computer gets the result of an expression. Our tree representation can scale to this arbitrarily complex math expression. The computer builds the following tree for it:

We've already seen how the computer copes with a simple 2 + 2. Similarly, it evaluates 2 + 6 here to 8:

We can see that parentheses have a single parameter that they evaluate to directly. From a semantics standpoint, the parentheses operator is used only for grouping parts of the expression tree that have lower operator precedence to give them a higher one. The operator precedence should be known to us from the school (e.g. * goes before +). So we just replace the parenthesis operator with the value of its argument.

Then the computer tries to evaluate the right branch of multiplication (*), but since that branch is also a nested expression, it follows to evaluate that from the bottom too begging with the literal value 9 that is already known.

Then it continues with the right branch of that subexpression and experiences the literal value 3 that is also known.

Likewise, the operator / has fully-evaluated values of parameters and the result is substituted with 3.

Now we can see where it goes. Operator * has fully evaluated values of parameters and the result is substituted with 24.

But we are not done yet! The computer must evaluate the right branch of the expression tree. We as humans see that the value is already 46, but the computer doesn't have eyesight, so it reads whatever is in the right branch to confirm that.

Now the result of the expression is substituted with the result of a logical comparison operation. 24 is not greater than (>) 46, so false is returned.

Arbitrary expressions

As we can see computers work not only with numbers, they work with values of different types including strings, booleans, lists, dictionaries (maps), etc. All of them also have their representation on the expression tree.

For example, suppose we have a dictionary and we would like to get the value of the key "blackjack" from it.

dict["blackjack"]

The expression tree for this would be

If we want to call a function print that would output the value of the dictionary we would do it this way:

print(dict["blackjack"])

resulting in the following expression tree:

Statements

Statements are the next level of LEGO. They consist of expressions.

A statement usually occupies a single line, but it can span multiple lines (except for Python-like languages where they are single-line), and it is delimited with a semicolon (;).


#![allow(unused)]
fn main() {
let blackjack = 42;

println!("Hello world");

std::process::exit(0);
}

Statements usually represent the execution of a single command. They are similar to sentences in natural languages. For example, if we translate the code written in Rust language above to English we would get:

Initialize a variable called blackjack with the integer value 42.

Output the string "Hello world" to the terminal.

Shut down the application with the status code of zero.

One difference, as well as the similarity, is that statements have a semicolon at the end, but sentences in English finish with a dot (.) or a question mark (?) or an exclamation point (!).

Special Kinds of Statements

A semicolon-delimited statement is the most frequent kind of statement we will see in any code. There exist also special kinds of statements that build on top of the regular statement allowing us to compose our program like a LEGO figurine.

There are a bunch of them that can be found in many programming languages:

  • Conditional statement (e.g. if statement)
  • Loop statement (e.g. while/for statement)

#![allow(unused)]
fn main() {
if 2 + 2 == 4 {
  println!("Yes that's true!");
}

for counter in 0..5 {
  println!("Counter value: {counter}");
}

while std::io::stdin().lines().next().is_some() {
  println!("You've input a line!");
  println!("Reading your next input...");
}
}

They aren't usually delimited by a semicolon, and they create irregular execution paths (conditional execution, loops). We can see how they are composed with regular statements. They usually delimit a block of code making the regular statements inside of them be written after a small whitespace gap (indentation). The more statements we embed in them, the more nested our code looks like:


#![allow(unused)]
fn main() {
if 2 + 2 == 4 {
  if 4 + 4 == 8 {
    if 2 * 2 == 4 {
      println!("All checks passed!");
    } else {
      println!("Last check didn't pass...");
    }
  }
}
}

Variables

Variables are a little bit different in different programming languages. Here are examples of how we would create a variable in different programming languages:

Rust uses the introducer keyword let to distinguish between variable declarations and reassignments.


#![allow(unused)]
fn main() {
let variable = "Hello world";
}

Python doesn't use an introducer keyword. This however increases the chance that a programmer creates a variable with a new name instead of reassigning it to the existing variable if they make a typo in the variable name.

variable = "Hello world";

# for example suppose we want to reassign to `variable`
# but we make a typo here `voriable`
voriable = "Equestria"

# This program doesn't work as expected...
print(variable)

Java uses an introducer type annotation for the variable

String variable = "Hello world";

// With introducer syntax there is no chance we could make a typo
// This will result in an error when building our application,
// because such a variable wasn't previously declared
voriable = "Equestria";

We can use variables in place of literal values. Their main purpose is to serve as storage for intermediate calculations as well as to reduce the repetition of literal values in a program.

Conclusion

These were the most common parts of any language. Once we understand how they work, we will be able to learn any classic programming language with ease, because they differ only in small syntactic peculiarities, which are trivial to study.

References

2022-04-12

Impl Trait Parameters And Turbofish

Agenda

We will overview the desugaring of impl Trait, how turbofish syntax works and the limitation of using them both together.

If you want to skip to the limitation itself, then go to "Turbofish Limitation Aftermath".

❗This post covers only impl Trait in function parameters position.

impl Trait in function return types is a totally different concept, that has almost nothing to do with what's in this article.

Impl Trait Desugaring

Rust allows you to have a shortcut for defining a function with generic parameters bounded by trait and lifetimes expression.

Here is an example of impl Trait usage in function parameter types:


#![allow(unused)]
fn main() {
trait Trait1 {} trait Trait2 {}

fn foo<T>(
    a: impl Trait1,
    b: T,
    c: impl Trait1 + Trait2 + 'static,
    d: impl Trait1,
) {
    /**/
}
}

Under the hood this probably desugars into the following code:

The word probably here denotes that it is not the exact desugaring that rustc does, which is it's private implementation detail.


#![allow(unused)]
fn main() {
trait Trait1 {} trait Trait2 {}

fn foo<
    T,
    __T1: Trait1,
    __T2: Trait1 + Trait2 + 'static,
    __T3: Trait1
>(
    a: __T1,
    b: T,
    c: __T2,
    d: __T3,
)
{
    /**/
}
}

Here we have a regular generic parameter T, and three more generic parameters generated for us by impl Trait syntax automatically (__T1, __T2, __T3).

Even if the function uses the same impl Trait in several function parameters, they still generate different generic parameter types. That's why even though a and d have impl Trait1 type annotation, they still use different __T1 and __T3 generic parameters in their desugaring shown above.

The symbols __T1, __T2, __T3 are not available in Rust code, therefore there will be limitations when working with them described in the following paragraphs.

Type Inference And Turbofish Syntax

Let's recap what turbofish (::<>) syntax provides to us. Take an example generic function.


#![allow(unused)]
fn main() {
fn bar<T, U>(a: T, b: U) { /**/ }
}

Suppose we want to provide it with T = bool and U = i32. We can do this in various ways.


#![allow(unused)]
fn main() {
fn bar<T, U>(a: T, b: U) { /**/ }
bar             (false, 99); // infer all generic params
bar::<bool, i32>(false, 99); // specify all generic params explicitly with turbofish
bar::<_, _>     (false, 99); // infer 2 generic params
bar::<_, i32>   (false, 99); // infer the first param, but specify the second
bar::<bool, _>  (false, 99); // specify the first param, but infer the second
}

Generic parameters in type definitions and type aliases can use default values. It is not possible to set default values for functions though!


#![allow(unused)]
fn main() {
enum Baz<A, B, C = u32> {
    A(A),
    B(B),
    C(C),
}

// Now we can create the value of the enum as such:

Baz::A::<_, ()>       (false); // (1) Baz<bool, (), u32>
Baz::B::<String, _>   (false); // (2) Baz<String, bool, u32>
Baz::C::<bool, u32, _>(false); // (3) Baz<bool, u32, bool>
}

By omitting the third argument in the turbofish syntax in cases (1) and (2) we opted in to using the default u32 type for the generic parameter C.

When the third generic parameter is overridden, even with a wildcard (_), it means that the default value is ignored. The wildcard merely specifies that the generic type parameter has to be inferred from usage.

It means, that the following will produce a compile error.


#![allow(unused)]
fn main() {
enum Baz<A, B, C = u32> {
    A(A),
    B(B),
    C(C),
}

Baz::C::<bool, u32>(false);
//                  ^^^^^ expected `u32`, found `bool`
}

This is because when using an explicit turbofish syntax all required type parameters must be explicitly specified and the optional ones will be set to their default values. Even, if we want to specify only the required type parameters, but infer the rest, we have to use a wildcard _ to do that.

The same is true even if part of the required type parameters can be inferred. For instance, we can't use this syntax to have the value type of the HashMap inferred.


#![allow(unused)]
fn main() {
use std::collections::HashMap;

let map = HashMap::<String>::from_iter([
//     |           ^^^^^^^   ------ supplied 1 generic argument
//     |           |
//     |           expected at least 2 generic arguments
    ("key".to_owned(), true)
]);
}

We are forced to enumerate all remaining deduced generic parameters with _.


#![allow(unused)]
fn main() {
use std::collections::HashMap;

let map = HashMap::<String, _>::from_iter([("key".to_owned(), true)]);
}

Turbofish Limitation Aftermath

Based on the knowledge of what impl Trait desugars to and how turbofish works, it should be obvious, that impl Trait in function parameter type annotations disables the turbofish (::<>) call syntax, and requires the generic parameters to be inferred. There is simply no way to specify the values for implicit generic parameters (denoted in previous paragraphs as __T1, __T2, etc.).

trait Trait1 {}
fn blackjack<T>(a: impl Trait1, b: T, c: impl Trait1) { /**/ }

blackjack::<bool, /* now way to pass two params for impl trait 🤔*/>(/*...*/);

The way how the compiler generates implicit generic parameters for each impl Trait occurrence is its private implementation detail. It guarantees neither the order nor the position of implicit generic parameters generated from impl Trait, so we can't explicitly specify the value for these parameters.

The only way for rustc to know what types for each impl Trait to use, is via type inference only. This also means we can't specify the value for regular generic parameters other than by letting them be deduced.

For example, it is impossible to call the following function at all.


#![allow(unused)]
fn main() {
trait Trait1 {}
impl Trait for i32 {}
fn voldemort<T: Default>(a: impl Trait1) {
    T::default();
}

// No syntax exists to call `voldemort` 😣

voldemort::<bool>(99); // (compile error) can't use turbofish
voldemort(99); // (compile error) can't infer `T` type parameter
}

We are forced to replace all usages of impl Trait in function parameters with regular generic types.


#![allow(unused)]
fn main() {
trait Trait1 {}
impl Trait1 for i32 {}
fn voldemort<T: Default, U: Trait1>(a: U) {
    T::default();
}

// It's callable, yay 😄!
// But now ugly `_` is required on each call 😖
voldemort::<bool, _>(99);
}

Even if all remaining generic parameters can be trivially inferred we have to enumerate them all with _. I recommend never to design such an API that forces users to always write a turbofish with a bunch of _ for generic parameters that can be inferred. Unfortunately, there isn't a better universal workaround for this problem.

There exists an initiative to fix this by letting us use turbofish syntax with impl Trait parameters being inferred, though I guess it has low priority at the time of this writing 🤔.

Real World Example

Such a problem occurred for me when writing an extension trait, but I will depict it as a free function here for simplicity. This function maps one collection into the other.


#![allow(unused)]
fn main() {
fn map_collect<O: FromIterator<T>, I: IntoIterator, T>(
    iter: I,
    map: impl FnMut(I::Item) -> T
) -> O {
    iter.into_iter().map(map).collect()
}
}

Because this function uses impl Trait syntax it's impossible to call it with turbofish. For example, we can't instruct rustc to infer Result<Vec<_>> for the first type parameter that easily.


#![allow(unused)]
fn main() {
fn map_collect<O: FromIterator<T>, I: IntoIterator, T>(
    iter: I,
    map: impl FnMut(I::Item) -> T
) -> O {
    iter.into_iter().map(map).collect()
}
use std::io::Error;
// Can't use turbofish to specify that the first type param is `Result<Vec<_>>`
map_collect([false, true], |val| Ok::<bool, Error>(val))?;
//                                                      ^ cannot infer type
Ok::<(), Error>(())
}

If we replace impl FnMut(T::Item) -> T with the fourth generic parameter we will be able to use turbofish for calling the function, but it will be as ugly as this:


#![allow(unused)]
fn main() {
fn map_collect<O: FromIterator<T>, I: IntoIterator, T, F: FnMut(I::Item) -> T>(
    iter: I,
    map: F
) -> O {
    iter.into_iter().map(map).collect()
}
use std::io::Error;
map_collect::<Result<Vec<_>, Error>, _, _, _>([false, true], |val| Ok(val))?;
Ok::<(), Error>(())
}

Conclusions

Now you know what the limitations of impl Trait are, and how to define a function, that is impossible to call in Rust without uninhabited types.

I hope you learned something new today 😉.


Post on Reddit

2022-07-22