HENNG

BookKeeper

之前写了一篇关于 Pulsar 的简介,Pulsar 的存储这块使用了 Bookkeeper来完成,所以再简单介绍下 Bookkeeper。当时用英文写的,就懒得翻译了,看懂应该完全没问题。

Introduction / Key Terms

BookKeeper is a replicated log service which can be used to build replicated state machines.

Main Features

  • writing efficiently (by sequentially)
  • replicating for fault tolerance
  • scaling throughput

Key Terms:

  • bookie: bokkeeper storage server
  • ledger: log stream(a sequence of entries)
  • ledger entry: each unit of a log(record)
  • ensemble: group of bookies
  • quorums: replicas of entries(num of avaliable bookies)

Here is a brief architecture diagram of bookkeeper.

bookkeeper architecture

Write / Read

Typically, we use bookkeeper by the following steps.

  1. Create a bookkeeper client;

  2. Create a ledger;

  3. Write to the ledger (add entries);

  4. Close the ledger;

  5. Open the same ledger for reading (by ledgerId);

  6. Read from the ledger;

  7. Close the ledger again;

  8. Close the bookkeeper client.

A quick demo is ready for you.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public class BookKeeperDemo {

private static Logger logger = LoggerFactory.getLogger(BookKeeperDemo.class);

private static final String PASSWD = "passwd";
private static final Charset Encoding = Charsets.UTF_8;

@Test
public void sampleTest() throws Exception {
BookKeeper bookKeeper = new BookKeeper("localhost:2181");
LedgerHandle ledger = bookKeeper.createLedger(BookKeeper.DigestType.MAC, PASSWD.getBytes());
long ledgerId = ledger.getId();
logger.info("Writing to ledger: {}", ledgerId);

for (int i = 0; i < 10; i++) {
String content = "entry-" + i;
ledger.addEntry(content.getBytes(Encoding));
}

ledger.close();

ledger = bookKeeper.openLedger(ledgerId, BookKeeper.DigestType.MAC, PASSWD.getBytes());

Enumeration<LedgerEntry> entries = ledger.readEntries(0, 9);
while (entries.hasMoreElements()) {
LedgerEntry entry = entries.nextElement();
String content = new String(entry.getEntry(), Encoding);
logger.info("Entry {} length={} content='{}'", entry.getEntryId(), entry.getLength(), content);
}

ledger.close();
bookKeeper.close();
}

}

It is worth nothing that only one process can write to a ledger (append), and it would be read-only after ledger is closed by normally or there is an exception. What happened when adding entries? First it will append EntryLog, then recording the related index in LedgerCache which is used for fast lookup, last transaction log is appended in Journal. LedgerHandler created a node in zookeeper for metadata, so it will knows who has my data.

While reading could only execute after a ledger is closed, and it will be requested to next bookie when first read found an entry is invalid.

Recovery

As closing a ledger consists essentially of writing the last entry written to a ledger to zookeeper, the recovery procedure simply finds the last entry written correctly and writes it to zookeeper.

There are two kinds of recovery, by automatically which is the default way and by manually. Auditor is selected as a leader to watch zookeeper, scan and decide which need recovery. Auditor will create nodes in zookeeper when recovery needed, and ReplicationWorker watching the nodes created by Auditor, then the worker will start related tasks for the job.

Performance

There is a reference by Yahoo at Hadoop in China 2011.

Pulsar / Managed-Ledger

How Pulsar uses bookkeeper? Pulsar created a library on top of bookkeeper ledger, which is called ManagedLedger, multi ledgers included. It opens(creates a new on if not exists) a ledger when creating persistent topic, adds entries when publishing messages, and reads entries(cache first, reads entries if cache oversize) when dispatching messages.


References
[1] Pulsar Github
[2] Apache BookKeeper