Kubernetes Redis-cluster vs Native Redis-cluster benchmark

Kubernetes Redis cluster with 3 master and 3 slave.

Native Redis cluster is deployed on 3 servers. Each server hold one master and one slave.

The conclusion is the kubernetes cluster is around 30% slower than native cluster.

This is understandable since are a lot layers in kubernetes (container, kube dns...), but the benefit is kubernetes can handle the crash for us. It depends on your business logic and see choosing which one.

# ./redis-benchmark -h 10.111.79.75 -p 6379 -n 250000 -

Test Case Kubernetes Redis Cluster (requests/s) Native Redis-Cluster (requests/s)
PING_INLINE 34886.96 44389.20
PING_BULK 37335.72 49309.66
SET 34373.71 47801.15
GET 37174.72 47258.98
INCR 36148.06 46772.69
LPUSH 36374.22 45821.12
RPUSH 34760.85 50854.36
LPOP 36437.84 49504.95
RPOP 35350.68 50484.65
SADD 37832.93 50617.53
HSET 36411.30 50771.73
SPOP 36748.49 48694.98
LPUSH (needed to benchmark LRANGE) 37358.04 51020.41
LRANGE_100 (first 100 elements) 36851.41 49144.88
LRANGE_300 (first 300 elements) 31277.37 45737.29
LRANGE_500 (first 450 elements) 23182.49 49603.18
LRANGE_600 (first 600 elements) 23992.32 50566.34
MSET (10 keys) 35582.12 50885.41

Tensorflow

Tensorflow record

  1. Download Tensorflow

  2. Create a simple model

    https://www.tensorflow.org/tutorials/keras/save_and_restore_models

  3. Training

    The primary use case is to automatically save checkpoints during and at the end of training. This way you can use a trained model without having to retrain it, or pick-up training where you left of—in case the training process was interrupted.

    tf.keras.callbacks.ModelCheckpoint is a callback that performs this task. The callback takes a couple of arguments to configure checkpointing.

    A. Deploy using Docker

    1. install tensorflow serving with docker

    2. run docker image

      1
      2
      3
      docker run -p 8501:8501 \
      --mount type=bind,source=<em>YOUR PATH</em>/tensorflow/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu,target=/models/half_plus_two \
      -e MODEL_NAME=half_plus_two -t tensorflow/serving &

    B. Deploy with Kubernetes

    1. Export the Inception model

      1
      2
      3
      git clone https://github.com/tensorflow/serving
      cd serving
      rm -rf ./models/inception

    Build TensorFlow Serving Inception model exporter

    tools/bazel_in_docker.sh bazel build -c opt tensorflow_serving/example:inception_saved_model

    Error: /tensorflow/serving/.cache/_bazel_jfan/7a4a59242df6fd82e0e4108ffd6fce39/external/org_tensorflow/tensorflow/core/BUILD:2101:1: no such package '@zlib_archive//': java.io.IOException: Error downloading [https://mirror.bazel.build/zlib.net/zlib-1.2.11.tar.gz, https://zlib.net/zlib-1.2.11.tar.gz] to /tensorflow/serving/.cache/_bazel_jfan/7a4a59242df6fd82e0e4108ffd6fce39/external/zlib_archive/zlib-1.2.11.tar.gz: All mirrors are down: [sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target] and referenced by '@org_tensorflow//tensorflow/core:lib_internal_impl'

OAuth2

http://www.rfcreader.com/#rfc6749

OAuth 2.0

There is four types mode of OAuth, here all of our resource are for internal use, thus we use password mode.

Authorization server:

  1. Authorization endpoint.
  2. Token endpoint.

Resource server

We deploy authorization server and resource server separately for scalability consideration. As we have many separate resource servers, each of them have a client id and client secret.

User need to create username and password with client info. Then each time user can ask authorization server for access token with user credentials, then use access token to access according client and get resources.

Union Found

1.Quick found

Data structure.

  • Integer array id[] of size N.
  • Interpretation: p and q are connected if they have the same id.

Find

Check if p and q have the same id.

Time complexity: O(1)

Union

To merge components containing p and q, change all entries with id[p] to id[q].

Time complexity: O(N)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class QuickFind{
private int[] id;
}
public QuickFind(int N){
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = I; //set id of each object to itself
}
public boolean find(int p, int q){
return id[p] == id[q];
}
public void union(int p, int q){
int pid = id[p];
for (int i = 0; i < id.length; i++)
if (id[i] == pid) id[i] = id[q];
}
}

2.Quick Union

Data structure.

  • Integer array id[] of size N.
  • Interpretation: id[i] is parent of i.
  • Root of i is id[id[id[...id[i]...]]].

Find

Check if p and q have the same root.

Time complexity: O(N)

Union

Set the id of q's root to the id of p's root.

Time complexity: O(N*)

In theory it should be less than quick find because we don't need to go through the entire list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public class QuickUnion
{
private int[] id;
public QuickUnion(int N){
id = new int[N];
for (int i = 0; i &lt; N; i++) id[i] = I;
}
private int root(int i){
while (i != id[i]) i = id[I];
return I;
}
public boolean find(int p, int q){
return root(p) == root(q);
}
public void union(int p, int q){
int i = root(p);
int j = root(q);
id[i] = j;
}
}

3.Weighted Quick Union

Data structure.

  • Almost identical to quick-union.
  • Maintain extra array sz[] to count number of elements in the tree rooted at i.

Find

Check if p and q have the same root.

Time complexity: O(logN)

Union

  • merge smaller tree into larger tree
  • update the sz[] array.

Time complexity: O(logN*)

1
2
3
4
5
6
7
8
9
10
...
if(size[i]<size[j]){
id[i]=j;
size[j]+=size[i];
}else {
id[j]=i;
size[i]+=size[j];
}

...

4.Weighted Quick Union with path compression

Path compression. Just after computing the root of i, set the id of each examined node to root(i).

Standard implementation: add second loop to root() to set the id of each examined node to the root.

Simpler one-pass variant: make every other node in path point to its grandparent.

Data structure: Using Path compression

Find

The same as QU Time complexity: O(N+MlogN*)

Union

The same as QU Time complexity: O(N+MlogN*)

1
2
3
4
5
6
7
public int root(int i){
while(i!=id[i]){
id[i]=id[id[i]];
i=id[i];
}
return i;
}

Kafka

Kafka is a distributed publish-subscribe messaging system. Publish-subscribe refers to a pattern of communication in distributed systems where the producers/publishers of data produce data categorized in different classes without any knowledge of how the data will be used by the subscribers. The consumers/subscribers can express interest in specific classes of data and receive only those messages. Kafka uses a commit log to persist data. The commit log is an ordered, immutable, append-only data structure that is the main abstract data structure that Kafka manages. The main advantage of Kafka is the fact that it provides a unifying data backbone from which all systems in the organization can consume data independently and reliably.

Topics

Topics represent a user-defined category to which messages are published. An example topic one might find at an advertising company could be AdClickEvents. All consumers of data read from one or more topics. Topics are generally maintained as a partitioned log (see below).

Producers

Producers are processes that publish messages to one or more topics in the Kafka cluster.

Consumers

Consumers are processes that subscribe to topics and read messages from the Kafka cluster.

Partitions

Topics are divided into partitions. A partition represents the unit of parallelism in Kafka. In general, a higher number of partitions means higher throughput. Within each partition each message has a specific offset that consumers use to keep track of how far they have progressed through the stream. Consumers may use Kafka partitions as semantic partitions as well.

Brokers

Brokers in Kafka are responsible for message persistence and replication. Producers talk to brokers to publish messages to the Kafka cluster and consumers talk to brokers to consume messages from the Kafka cluster.

Replication

Kakfa uses replication for fault-tolerance. For each partition of a Kafka topic, Kafka will choose a leader and zero or more followers from servers in the cluster and replicates the log across them. The number of replicas including the leader is determined by a replication factor. The leader of the partition will handle all the reads and writes while the followers consume messages from the leader in order to replicate the log. Since both leader and follower may fail when a server in the cluster fails, the leader keeps track of alive followers (in-sync replicas (ISR)) and removes unhealthy ones. In this case, if the leader dies, an alive follower will become the new leader of this partition. This mechanism allows Kafka to remain functional when failures exist.

ARIMA

ARIMA, Autoregressive Integrated Moving Average Model

ARIMA(p,d,q)

Advantage: Easy.

Disadvantage:

  1. Time series or after differencing must be stationary.

  2. Only linear relationship.

stationary means now trend, no seasonality.

p-lags Auto-Regressive

d-n differ can be stationary. Integrated

q-lags of error Moving Average

formula image

Multi-thread

  1. synchronized & asynchronized

    To share final variable in multiple threads, must use synchronized or volatile.

  2. mutual exclusion or visibility

    mutual exclusion: only one thread can hold a lock at a time.

    visibility: make sure the later thread can see the former thread changes to data.

  3. Atom

    Atom operation might not be thread safe.

    thread can save duplicates of variable in itself memory.

Sum problem

1.Two sum

Given an array of integers, return indices of the two numbers such that they add up to a specific target.

Solution: First think binary search. Might not be a good solution since it needs first to be sort O(NlogN), Then search O(nlogn).

HashMap may be a good way.

1
2
3
4
5
6
7
8
9
10
11
12
13
public int[] twoSum(int[] nums, int target) {
int[] result=new int[2];
HashMap<Integer,Integer> map=new HashMap<Integer,Integer>();
for(int i=0;i<nums.length;i++){
if(map.get(target-nums[i])!=null){
result[0]=map.get(target-nums[i]);
result[1]=i;
return result;
}
map.put(nums[i],i);
}
return result;
}

2.Three Sum

Given an array S of n integers, are there elements a, b, c in S such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero.

Solution: Brute force would be O(N^3).

In whatever way we must traverse firstly to ensure one number, then the problem would be two sum.

Hashset:O(N*N)

Sort+Two pointers: O(NlogN+N^n)

Eight Queen

Eight Queen Problem

This idea is from https://www.cnblogs.com/bigmoyan/p/4521683.html. It greatly reduces the space complexity compare to any 2D array solution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public class EightQueen {
// layer
int n = 8;
int[] r = new int[n];
int count = 0;

public void queen(int row) {
if (row == n) {
count++;
return;
}
for (int i = 0; i < n; i++) {
r[row] = i;
if (isValid(row)) {
queen(row + 1);
}
}
}

private boolean isValid(int row) {
for (int i = 0; i < row; i++) {
if (r[row] == r[i] row - r[row] == i - r[i] row + r[row] == i + r[i]) {
return false;
}
}
return true;
}

public static void main(String[] args) {
EightQueen eq=new EightQueen();
eq.queen(0);
System.out.println(eq.count);
}
}