Runes

Maybe it’s the nerd in me 🤓 but when I hear rune my mind goes to the dragon-born 🐲 FUS-RO-DAH!! 🌬️ Just me? 🙃 anyways, time to level up ⬆️ on our rune game.

It cannot be overstated. A rune is an int32. Plain and simple. The type is there to help guide us into understanding it is an int32 specifically for strings, but it’s an int32, yeah? Well, 🤔 Maybe we can step back and find out what it means to be int32 and then understand how runes fit into that. 💡 Before that lets take a look at a byte or a uint8.

One Byte At A Time 😹

A uint8 or unsigned integer 8 is called that because it has 8 slots and all of them are for values. If it was signed that means that one of the slots would be 0 for positive or 1 for negative. Lets look at uint8 through the eyes 👁️👁️ of a computer

0000 0000

Note that I separated it into groups of 4 because it’s easier to read, but a computer 🖥️ crams them all together.

We have 8 slots and only 2 possible numbers that go in those 8 slots, either 0 or 1. This is known as binary (base 2). We 🤖PUNY🤖 humans use decimal (base 10).

With those 8 places, 🤔 how many numbers can we represent in decimal? Well let’s do something easier, in decimal if I gave you 0 how many different numbers could you fit in that one slot? 0,1,2,3,4,5,6,7,8,9 – Ten! – You could fit ten in decimal. OK, now 00 0,1,2,3,4…97,98,99 – One Hundred! – Alright one more time with 000, how many different numbers could you fit? 0,1,2,3,4,5…997,998,999 – One Thousand!

🤔 Hmm, I think I’m seeing a pattern, don’t you? We went from 10 to 100 to 1000. It looks like were adding another zero every time, but how do we express that with math? Well it was 10 then it was 10 * 10 then it was 10 * 10 * 10 and instead of repeating that we can raise 10 to a power. 10¹ 10² 10³ we can raise 10 to a power of how many digits we have.

Back to our original question, there are 2 values and 8 slots, how many numbers can we fit in a byte? 2⁸ 👍 and what’s that number? 🤨 Well funny you ask because the Extended Ascii Table has the exact amount that a byte can represent 2⁸ or 256 values. 🤯 So that means everything not in those 256 values (0 is a value, that’s why it only goes to 255) cannot be represented with a byte or a uint8, not even emojis 😢

Setup

Let’s make our directory runes and the files we want inside of that directory example_test.go runes.go

mkdir runes
touch runes/example_test.go runes/runes.go

Now let’s open up runes.go and for the very first line we’ll add

package runes

Next for example_test.go for the very first line we’ll add

package runes_test

We can import basics/runes into cmd/main.go and run functions from there with go run cmd/main.go and also to run our example_test.go 👍 we use go test runes/example_test.go in the commandline.

Deepen Understanding of Runes

So again a rune is just int32 meaning it looks like 4 times the size of a byte or a uint8

1️⃣ 2️⃣ 3️⃣ 4️⃣

0000 0000 0000 0000 0000 0000 0000 0000

👀 That’s a lot of zeroes! Yes, 32 in fact 🙂 So now we can see that a rune is an int32 which is also the same as four uint8 or four byte

OK… So it’s just a number, but what number makes up a letter? Like a maybe? Good question! 😁 Let’s represent it in both byte and rune format.

It’s not a big deal to understand all this stuff at once and if what you see confuses you, come back at a later time or look at other resources who might do a better job explaining, it’s always good to have other inputs to fill in the gaps.

The letter a according to the Extended Ascii Table is 97 in decimal. So the byte would look like

0110 0001

7654 3210 Slots to insert 0 or 1

2⁶ (64) + 2⁵ (32) + 2⁰ (1) = 97

I got that from slot 6 + slot 5 + slot 0

And there you go, your very first look at what a computer sees…. Aren’t you glad we have byte and rune so we can just write 'a'? I know I am 😂 Speaking of rune it would look the same, just with a lot more wasted space.

0000 0000 0000 0000 0000 0000 0110 0001

and finally it should now make sense why we’re allowed to compare “letters” to “letters”, because they aren’t letters, it’s all int. (Wait, it’s all ints?😲 Always has been 🔫🧑‍🚀 )

Coding Time!

`runes.go`

// ByteAndRuneAreInt shows that we can compare characters to their integer
// counterpart in any common base we want, e.g. binary, octal, decimal,
// hexadecimal, or even Unicode code point. And that `byte` fits in `rune`
// making it a smaller version of a `rune`.
func ByteAndRuneAreInt() {
	var myByte byte = 'a'
	if myByte == 'a' {
		fmt.Println("It's a")
	}
	if myByte == 97 {
		fmt.Printf("dec: %d is %c\n", myByte, myByte)
	}
	if myByte == 0b01100001 {
		fmt.Printf("bin: %b is %c\n", myByte, myByte)
	}
	if myByte == 0o141 {
		fmt.Printf("oct: %o is %c\n", myByte, myByte)
	}
	if myByte == 0x61 {
		fmt.Printf("hex: %x is %c\n", myByte, myByte)
	}
	if myByte == []byte("\u0061")[0] {
		fmt.Printf("Unicode: %#U is %c\n", myByte, myByte)
	}
	myRune := rune(myByte)
	if myRune == 'a' {
		fmt.Println("It's a")
	}
	if myRune == 97 {
		fmt.Printf("dec: %d is %c\n", myRune, myRune)
	}
	if myRune == 0b01100001 {
		fmt.Printf("bin: %b is %c\n", myRune, myRune)
	}
	if myRune == 0o141 {
		fmt.Printf("oct: %o is %c\n", myRune, myRune)
	}
	if myRune == 0x61 {
		fmt.Printf("hex: %x is %c\n", myRune, myRune)
	}
	if myRune == rune([]byte("\u0061")[0]) {
		fmt.Printf("Unicode: %#U is %c\n", myRune, myRune)
	}
}

`example_test.go`

func ExampleByteAndRuneAreInt() {
	runes.ByteAndRuneAreInt()
	// Output:
	// It's a
	// dec: 97 is a
	// bin: 1100001 is a
	// oct: 141 is a
	// hex: 61 is a
	// Unicode: U+0061 'a' is a
	// It's a
	// dec: 97 is a
	// bin: 1100001 is a
	// oct: 141 is a
	// hex: 61 is a
	// Unicode: U+0061 'a' is a
}

Byte Count

We just had our lesson on range and it shows that we will get rune types when we range through a string, so what happens when we do a normal for loop? We get byte types instead, this is usually bad for us 😬 because we want to count the amount of characters, not the amount of space they take up.

Coding Time!

`runes.go`

// ByteCount shows how to get the amount of bytes in a string and using a for
// loop won't work if the bytes do not fall in the ASCII Table.
// For the complete table you can visit https://asciitable.com
func ByteCount() {
	fmt.Printf("Len %s: %d\n", helloWorldHindi, len(helloWorldHindi))
	fmt.Printf("Len %s: %d\n", helloWorldRussian, len(helloWorldRussian))
	fmt.Printf("Len %s: %d\n", helloWorldJapanese, len(helloWorldJapanese))
	fmt.Println()
	for i := 0; i < len(helloWorldJapanese); i++ {
		fmt.Printf("Byte in hexadecimal: %x and decimal: %d and as string: %s\n",
			helloWorldJapanese[i], helloWorldJapanese[i], string(helloWorldJapanese[i]))
	}
}

`example_test.go`

⚠️ Attention ⚠️ Some lines look like they end in a blank space string: but it’s a non-printing ASCII character. It would be best to copy and paste this output into your test or you’re very likely going to have problems. This is why we represent things as rune to avoid such problems 🤐

func ExampleByteCount() {
	runes.ByteCount()
	// Output:
	// Len नमस्ते दुनिया: 37
	// Len Привет мир: 19
	// Len こんにちは世界: 21
	//
	// Byte in hexadecimal: e3 and decimal: 227 and as string: ã
	// Byte in hexadecimal: 81 and decimal: 129 and as string: 
	// Byte in hexadecimal: 93 and decimal: 147 and as string: 
	// Byte in hexadecimal: e3 and decimal: 227 and as string: ã
	// Byte in hexadecimal: 82 and decimal: 130 and as string: 
	// Byte in hexadecimal: 93 and decimal: 147 and as string: 
	// Byte in hexadecimal: e3 and decimal: 227 and as string: ã
	// Byte in hexadecimal: 81 and decimal: 129 and as string: 
	// Byte in hexadecimal: ab and decimal: 171 and as string: «
	// Byte in hexadecimal: e3 and decimal: 227 and as string: ã
	// Byte in hexadecimal: 81 and decimal: 129 and as string: 
	// Byte in hexadecimal: a1 and decimal: 161 and as string: ¡
	// Byte in hexadecimal: e3 and decimal: 227 and as string: ã
	// Byte in hexadecimal: 81 and decimal: 129 and as string: 
	// Byte in hexadecimal: af and decimal: 175 and as string: ¯
	// Byte in hexadecimal: e4 and decimal: 228 and as string: ä
	// Byte in hexadecimal: b8 and decimal: 184 and as string: ¸
	// Byte in hexadecimal: 96 and decimal: 150 and as string: 
	// Byte in hexadecimal: e7 and decimal: 231 and as string: ç
	// Byte in hexadecimal: 95 and decimal: 149 and as string: 
	// Byte in hexadecimal: 8c and decimal: 140 and as string: 
}

Rune Count

Now we’ll see how we can get what we’re probably aiming 🎯 for, the amount of characters (rune) in a string. It’s important to note that we bring in a package from the standard library, unicode/utf8 that will get the amount of runes in the string.

Coding Time!

`runes.go`

import (
  "fmt"
  "unicode/utf8"
)

const helloWorldHindi = "नमस्ते दुनिया"
const helloWorldRussian = "Привет мир"
const helloWorldJapanese = "こんにちは世界"

// RuneCount shows how to get the actual count of characters (runes) in a
// string and how to loop through a string and get the runes using the range
// keyword.
func RuneCount() {
	fmt.Printf("Rune count in %s: %d\n",
		helloWorldHindi, utf8.RuneCountInString(helloWorldHindi))
	fmt.Printf("Rune count in %s: %d\n",
		helloWorldRussian, utf8.RuneCountInString(helloWorldRussian))
	fmt.Printf("Rune count in %s: %d\n",
		helloWorldJapanese, utf8.RuneCountInString(helloWorldJapanese))
	fmt.Println()
	for i, r := range helloWorldHindi {
		fmt.Printf("Rune verbose unicode value: %#U Its index: %d As a string %q\n",
			r, i, string(r))
	}
}

`example_test.go`

func ExampleRuneCount() {
	runes.RuneCount()
	// Output:
	// Rune count in नमस्ते दुनिया: 13
	// Rune count in Привет мир: 10
	// Rune count in こんにちは世界: 7
	//
	// Rune verbose unicode value: U+0928 'न' Its index: 0 As a string "न"
	// Rune verbose unicode value: U+092E 'म' Its index: 3 As a string "म"
	// Rune verbose unicode value: U+0938 'स' Its index: 6 As a string "स"
	// Rune verbose unicode value: U+094D '्' Its index: 9 As a string "्"
	// Rune verbose unicode value: U+0924 'त' Its index: 12 As a string "त"
	// Rune verbose unicode value: U+0947 'े' Its index: 15 As a string "े"
	// Rune verbose unicode value: U+0020 ' ' Its index: 18 As a string " "
	// Rune verbose unicode value: U+0926 'द' Its index: 19 As a string "द"
	// Rune verbose unicode value: U+0941 'ु' Its index: 22 As a string "ु"
	// Rune verbose unicode value: U+0928 'न' Its index: 25 As a string "न"
	// Rune verbose unicode value: U+093F 'ि' Its index: 28 As a string "ि"
	// Rune verbose unicode value: U+092F 'य' Its index: 31 As a string "य"
	// Rune verbose unicode value: U+093E 'ा' Its index: 34 As a string "ा"
}

What does Range do?

Let’s peel back some of the magic 🌈 of what range is doing for us in place of a regular for loop. Remember we established that a rune is int32 which would be four bytes or four int8 (8 + 8 + 8 + 8 = 32). So depending on how big the rune is depends on how many bytes you need. Well the unicode/utf8 package has a function just for that!

Coding Time!

`runes.go`

// ByteToRuneForLoop shows what the range keyword does by counting the amount
// of bytes and moving forward by that width
func ByteToRuneForLoop() {
	for i, width := 0, 0; i < len(helloWorldRussian); i += width {
		r, w := utf8.DecodeRuneInString(helloWorldRussian[i:])
		fmt.Printf("Rune verbose unicode value: %#U Its index: %d As a string %q\n",
			r, i, string(r))
		width = w
	}
}

`example_test.go`

func ExampleByteToRuneForLoop() {
	runes.ByteToRuneForLoop()
	// Output:
	// Rune verbose unicode value: U+041F 'П' Its index: 0 As a string "П"
	// Rune verbose unicode value: U+0440 'р' Its index: 2 As a string "р"
	// Rune verbose unicode value: U+0438 'и' Its index: 4 As a string "и"
	// Rune verbose unicode value: U+0432 'в' Its index: 6 As a string "в"
	// Rune verbose unicode value: U+0435 'е' Its index: 8 As a string "е"
	// Rune verbose unicode value: U+0442 'т' Its index: 10 As a string "т"
	// Rune verbose unicode value: U+0020 ' ' Its index: 12 As a string " "
	// Rune verbose unicode value: U+043C 'м' Its index: 13 As a string "м"
	// Rune verbose unicode value: U+0438 'и' Its index: 15 As a string "и"
	// Rune verbose unicode value: U+0440 'р' Its index: 17 As a string "р"
}

Source File 📄

The Source File

Test File 📝

The Test File

One Byte At A Time 😹#

Setup#

Deepen Understanding of Runes#

Coding Time!#

runes.go#

example_test.go#

Byte Count#

Coding Time!#

runes.go#

example_test.go#

Rune Count#

Coding Time!#

runes.go#

example_test.go#

What does Range do?#

Coding Time!#

runes.go#

example_test.go#

Source File 📄#

Test File 📝#

One Byte At A Time 😹

Setup

Deepen Understanding of Runes

Coding Time!

`runes.go`

`example_test.go`

Byte Count

Coding Time!

`runes.go`

`example_test.go`

Rune Count

Coding Time!

`runes.go`

`example_test.go`

What does Range do?

Coding Time!

`runes.go`

`example_test.go`

Source File 📄

Test File 📝