10.String Type - programming in go
String Type
Let’s have a look at String Type. String has some complexity to it, but let’s have a look at the simplicity of String. Remember, Go has some goals: efficient compilation, efficient execution, and ease of programming.
Let’s create a variable that is of type String. A string can be created with double quotes ""
or with backticks ````. A string created with backticks is a raw string literal, so it will include any returns, spaces, whatever.
package main
import (
"fmt"
)
var string_literal string
func main() {
s := "Hello, playground"
string_literal = `"Hello,
playground"`
fmt.Println(s)
fmt.Printf("%T\n", s)
fmt.Println(string_literal)
fmt.Printf("%T\n", string_literal)
}
What the Go spec says about String Types is,
A string type represents the set of string values. A string value is a (possibly empty) sequence of bytes. Strings are immutable: once created, it is impossible to change the contents of a string. The predeclared string type is string.
So, it’s a sequence of bytes. What that means is that it is a slice of bytes. A byte
is a type. It is an alias for an int8
. Let’s use conversion to convert this string to a slice of bytes, bs := []byte(s)
:
package main
import (
"fmt"
)
var string_literal string
func main() {
s := "Hello, playground"
fmt.Println(s)
fmt.Printf("%T\n", s)
bs := []byte(s)
fmt.Println(bs)
fmt.Printf("%T\n", bs)
}
Notice that this returns the slice of bytes [72 101 108 108 111 44 32 112 108 97 121 103 114 111 117 110 100]
If we look at the ASCII coding scheme, we can see that 72 is a capital “H”, and 101
, 108
, 108
, 111
are "e"
, "l"
, "l"
, and "o"
, respectively. This shows that in this string, we have used a coding scheme to represent letters of the alphabet. Those letters are being represented by the numbers in this slice of bytes. In this case, they are being represented as decimal numbers. In programming, we also represent things in hexadecimal numbers. Hexadecimal is just another way of representing numbers; it’s base 16 whereas decimal numbers are base 10.
In UTF-8, a UTF-8 code point is 1 to 4 bytes. Each code point corresponds to a character.
We have just seen that 72
corresponds to "H"
in this case. And in the ASCII coding scheme, 72
corresponds to the binary 100 1000
, which is 48
in hexadecimal.
To take a look deeper into the internals, and the hexadecimal, UTF-8 way, we can have a look in the fmt
package documentation, particularly %x
%s the uninterpreted bytes of the string or slice
%q a double-quoted string safely escaped with Go syntax
%x base 16, lower-case, two characters per byte
%X base 16, upper-case, two characters per byte
And if we scroll down to the “Other flags:” section
+ always print a sign for numeric values;
guarantee ASCII-only output for %q (%+q)
- pad with spaces on the right rather than the left (left-justify the field)
# alternate format: add leading 0 for octal (%#o), 0x for hex (%#x);
0X for hex (%#X); suppress 0x for %p (%#p);
for %q, print a raw (backquoted) string if strconv.CanBackquote
returns true;
always print a decimal point for %e, %E, %f, %F, %g and %G;
do not remove trailing zeros for %g and %G;
write e.g. U+0078 'x' if the character is printable for %U (%#U).
' ' (space) leave a space for elided sign in numbers (% d);
put spaces between bytes printing strings or slices in hex (% x, % X)
0 pad with leading zeros rather than spaces;
for numbers, this moves the padding after the sign
Particularly, the like write e.g. U+0078 'x' if the character is printable for %U (%#U).
Let’s have a look at some of the UTF-8 characters, using %#U
:
package main
import (
"fmt"
)
var string_literal string
func main() {
s := "Hello, playground"
fmt.Println(s)
fmt.Printf("%T\n", s)
bs := []byte(s)
fmt.Println(bs)
fmt.Printf("%T\n", bs)
for i := 0; i < len(s); i++ {
fmt.Printf("%#U ", s[i])
}
}
Another thing we can do is loop over the range of s
and print out the index position and the hex value
package main
import (
"fmt"
)
var string_literal string
func main() {
s := "Hello, playground"
fmt.Println(s)
fmt.Printf("%T\n", s)
bs := []byte(s)
/* Show each character from string s in decimal */
fmt.Println(bs)
fmt.Printf("%T\n", bs)
/* Show each character from string s in UTF-8 code point */
for i := 0; i < len(s); i++ {
fmt.Printf("%#U ", s[i])
}
fmt.Println("")
/* Show each character from string s in hexadecimal */
for i, v := range s {
fmt.Printf("At index position %d we have hex %#x\n", i, v)
}
}
You can verify in the output with the ASCII coding scheme and see it matches up:
Binary | Hex | Glyph |
---|---|---|
100 1000 | 48 | H |
110 0101 | 65 | e |
110 1100 | 6C | l |
110 1100 | 6C | l |
110 1111 | 6F | o |
010 1100 | 2C | , |
010 0000 | 20 | space |
111 0000 | 70 | p |
110 1100 | 6C | l |
110 0001 | 61 | a |
111 1001 | 79 | y |
110 0111 | 67 | g |
111 0010 | 72 | r |
110 1111 | 6F | o |
111 0101 | 75 | u |
110 1110 | 6E | n |
110 0100 | 64 | d |
Each character from the string "Hello, playground"
in:
Decimal notation
[72 101 108 108 111 44 32 112 108 97 121 103 114 111 117 110 100]
UTF-8 code point
U+0048 'H' U+0065 'e' U+006C 'l' U+006C 'l' U+006F 'o' U+002C ',' U+0020 ' ' U+0070 'p' U+006C 'l' U+0061 'a' U+0079 'y' U+0067 'g' U+0072 'r' U+006F 'o' U+0075 'u' U+006E 'n' U+0064 'd'
Hexadecimal
0x48 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x70 0x6c 0x61 0x79 0x67 0x72 0x6f 0x75 0x6e 0x64
Each code point is known as a rune
, which is an alias for int32
. Each rune
is a code point in UTF-8.
Diving Deeper
To recap, we write strings either enclosed in double quotes or as string literals in backticks. All the code you write in Go is encoded as UTF-8, but that doesn’t mean that all of your strings are going to be UTF-8 code points. You could have bytes which don’t correspond to a code point.
A string is a slice of bytes. It is immutable, which means its value cannot be modified. We can assign a new value:
package main
import (
"fmt"
)
var string_literal string
func main() {
s := "Hello, playground"
s = "Hello, go programmer"
fmt.Println(s)
}
So, you can assign a new value, but you cannot change the bytes of a string. So a string is an immutable slice of bytes. You can do conversion and see that slice of bytes, and see how it translates back to the ASCII UTF-8 coding scheme.
On the Golang website they have Asian characters in the example. Go ahead and use the techniques from this lesson and see that those characters are more than one byte.
package main
import "fmt"
func main() {
fmt.Println("Hello,Sangam")
}