Pakistan's First Oracle Blog

Subscribe to Pakistan's First Oracle Blog feed
Blog By Fahd Mirza Chughtai
Updated: 4 hours 59 min ago

Oracle Cloud is Rich with Spatial

18 hours 58 min ago
Spatial is the technology of future. Massive amount of data will be generated, cleansed and stored. Oracle's autonomous databases in Oracle's cloud are ready to take on the challenge with a bang with a solid database offering, which is time tested.

Machine learning and graph are just the elementary building blocks of this spatial offering from Oracle.

Oracle Database includes native spatial data support, rich location query and analysis, native geocoding and routing, and map visualization, to support location-enabled business intelligence applications and services.With a network data model, raster and gridded data analysis, 3D and point cloud operations, a location tracking server, and topology management, Oracle Database provides a comprehensive platform for GIS.

It’s easy for developers to add spatial capabilties to applications – with standards-based SQL and Java APIs, JSON and REST support, and integration with Database tools, Oracle Business Intelligence, and Applications. With dramatically fast spatial index and query performance, Exadata integration, and support for Database features such as partitioning, security, distributed transactions, and sharding,

Oracle Database powers the most demanding, large scale geospatial applications – from cloud-based location services to transportation, utilities, agriculture, asset managmeent, LiDAR analysis, energy and natural resouces, and planning.

With 12.2, Oracle continues to deliver the most advanced spatial and graph database platform for applications from geospatial and location services to Internet of Things, social media, and big data.

Categories: DBA Blogs

Data Encryption in Oracle Cloud

19 hours 4 min ago
World's leading financial institutions run their mission critical databases on Oracle and the biggest concerns they have around moving their database to cloud is encryption in transit and at rest.
Oracle TDE prevents attacks from users attempting to read sensitive data from tablespace files and users attempting to read information from acquired disks or back ups by denying access to clear text data. Oracle TDE technology uses two-tier encryption key architecture to enforce clear separation of keys from encrypted data. The encryption keys for this feature are all managed by Oracle TDE. The encryption algorithm used is AES128.

Redaction is the process of censoring or obscuring part of a text for legal or security purposes. The Data Redaction feature redacts customer data in Responsys to obfuscate consumers' Personally Identifiable Information (PII) from Responsys users. 

For example, Responsys accounts may want to redact customer data such as Email Addresses and Mobile Phone Numbers in the profile list to ensure customer data is hidden from Responsys end users. Data redaction ensures that Responsys accounts are compliant with data protection regulations to keep consumers' PII or medical records (for HIPAA compliance) confidential.

It is imperative that you test your database migrations to cloud with these redaction techniques and your well architecture review must include these use cases.

Oracle has implemented a “ubiquitous encryption” program with the goal of encrypting all data, everywhere, always. For customer tenant data, we use encryption both at-rest and in-transit. The Block Volumes and Object Storage services enable at-rest data encryption by default, by using the Advanced Encryption Standard (AES) algorithm with 256-bit encryption. In-transit control plane data is encrypted by using Transport Layer Security (TLS) 1.2 or later.
Categories: DBA Blogs

MLOps and Data Mining in Oracle 19c

19 hours 10 min ago
Machine learning operations or aka MLOps is getting quick traction even in database arena and Oracle is not behind. They are heavily using it in their data mining techniques and introducing new alogs and other frameworks.




Data mining is a technique that discovers previously unknown relationships in data. Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and to predict the likelihood of future events based on past events. Data mining is also known as Knowledge Discovery in Data (KDD).

This is especially very much pertinent when it comes to OLAP. On-Line Analytical Processing (OLAP) can be defined as fast analysis of multidimensional data. OLAP and data mining are different but complementary activities. Data mining and OLAP can be integrated in a number of ways. OLAP can be used to analyze data mining results at different levels of granularity. Data Mining can help you construct more interesting and useful cubes.

Data mining does not automatically discover information without guidance. The patterns you find through data mining are very different depending on how you formulate the problem. Each data mining model is produced by a specific algorithm. Some data mining problems can best be solved by using more than one algorithm. This necessitates the development of more than one model. For example, you might first use a feature extraction model to create an optimized set of predictors, then a classification model to make a prediction on the results.

In Oracle Data Mining, scoring is performed by SQL language functions. Understand the different ways involved in SQL function scoring. Oracle Data Mining supports attributes in nested columns. A transactional table can be cast as a nested column and included in a table of single-record case data. Similarly, star schemas can be cast as nested columns. With nested data transformations, Oracle Data Mining can effectively mine data originating from multiple sources and configurations.



Categories: DBA Blogs

CPU Conundrum in Oracle 19c

19 hours 16 min ago
If you have managed Oracle databases on any kind of hardware, you know how important the CPU is for the optimal performance of database. Over the years, Oracle has tried hard to come up with efficient strategies to make sure that CPU is utilized efficiently or caged properly.


Configuring the machine to optimally leverage CPU is a big ask. Normally onn top of the Bare Metal or Virtual instances, each instance of an Oracle database is configured to use a number of vCPUs by enabling Oracle Database Resource Manager (DBRM) and setting the CPU_COUNT parameter. If DBRM is not configured, the CPU_COUNT setting simply reflects the total vCPUs on the system. Enabling DBRM allows the CPU_COUNT setting to control the number of vCPUs available to the database. This applies at both the CDB (Container Database) and PDB (Pluggable Database) levels.

The most common approach to managing CPU resources is to NOT over-provision and simply allocate CPU according to what’s available. Whether CPU is allocated to Virtual Machines that each contain a single database, or CPU is allocated to individual databases residing on a single Virtual Machine, the result is the same.

Oracle offers the ability to configure “shares” for each Pluggable Database within a Container Database.  Each instance of a Container Database is given an amount of vCPU to use by enabling DBRM and setting CPU_COUNT. The Pluggable Databases within that Container Database are then given “shares” of the vCPU available to the Container Database. Each Pluggable Database then receives the designated share of CPU resources, and the system is not over-subscribed.

DBRM constantly monitors demand for CPU resources within each Pluggable Database, as well as the overall availability of CPU resources at the Container Database level. DBRM allows each Pluggable Database to automatically and immediately scale up to use more CPU resources if available in the Container Database. The ability to use Dynamic CPU Scaling is a new feature of Oracle Database 19c that allows Pluggable Databases to automatically scale CPU resources up and down in response to user demand.

Categories: DBA Blogs

Converged Oracle

19 hours 20 min ago
Its very interesting to note down that where some database providers especially in the cloud are going about diverging the database offering for various use cases, whereas Oracle is talking about converged or unified databases for various on-prem and Cloud use-cases.


Converged databases support Spatial data for location awareness, JSON for document stores, IoT for device integration, in-memory technologies for real-time analytics, and of course, traditional relational data.

Mainly these so-called converged databases are aimed at supporting mixed work loads or keeping your data at one location for disparate applications. You won't have to worry abotu managing and more importantly integrating different systems. You will synergize everything into one.

Now that looks great in theory but we have seen adverse impacts when you try to combine things such as OLTP and DWH or graph databases with spatial and so on. The marriage of these different uses cases might become a performance nightmare if not handled rightly.

Oracle Database at core is a good manifestation of a converged database, as it provides support for Machine Learning, Blockchain, Graph, Spatial, JSON, REST, Events, Editions, and IoT Streaming as part of the core database
Categories: DBA Blogs

SSIS in AWS RDS

Fri, 2020-05-22 03:38
Whenever migrating a SQL Server database from on-prem to AWS Cloud, my first preference is always to move it to AWS RDS, the managed database service. So whenever a client asks me to migrate an on-prem SQL Server database, my first question is:


Do you need to access filesystem as part of this database operations?

(Secretly wishing the answer would be NO), but more often than not, SSIS is the deal breaker in such database migration and the database ends up on an EC2 instance, which is still better than having it on-prem.

Managing a SQL Server on EC2 seems like a heavy chore when your other SQL Servers are humming smoothly on RDS and you know you don't have to nurse and babysit them. Well the prayers have been answered and the days of looking at those EC2 based SQL Servers having SSIS are numbered

AWS has announced SSIS support on RDS. For now, its only compatible with either SQL Server 2016 and 2017, which is a bit of a bummer, but still a welcome thing. SSIS is enabled through option groups in RDS and you have to do the S3 integration which is fairly straight forward. You can find step by step instructions here.

Looking forward to migrate my SSIS-struck EC2 based SQL Servers to RDS now.


Categories: DBA Blogs

Cloud Vanity: A Weekly Carnival of AWS, GCP and Azure - Edition 1

Thu, 2020-05-07 18:46
This is the first edition of this weekly collection about what is happening in the rapidly evolving cloud sphere. This will mainly focus on news, blogs, articles, tidbits, and views from AWS, Azure and GCP but will also include other Cloud providers from time to time. Enjoy Reading!!!




AWS:

Amazon Relational Database Service (RDS) for SQL Server now supports distributed transactions using Microsoft Distributed Transaction Coordinator (MSDTC). With MSDTC, you can run distributed transactions involving RDS for SQL Server DB instances.

Despite the Kubernetes and Serverless hypes, the vast majority of cloud workloads still happen on virtual machines. AWS offers the Amazon Elastic Compute Cloud (EC2) service, where you can launch virtual machines (AWS calls them instances).

James Beswick shows how you can import large amounts of data to DynamoDB using a serverless approach.

Amazon Lightsail provides an easy way to get started with AWS for many customers. The service balances ease of use, security, and flexibility. The Lightsail firewall now offers additional features to help customers secure their Lightsail instances.

AWS Security Hub offers a new security standard, AWS Foundational Security Best Practices This week AWS Security Hub launched a new security standard called AWS Foundational Security Best Practices.

GCP:

As organizations look to modernize their Windows Server applications to achieve improved scalability and smoother operations, migrating them into Windows containers has become a leading solution. And orchestrating these containers with Kubernetes has become the industry norm, just as it has with Linux.

“Keep calm and carry on.” While the words may resonate with the public, carrying on with business as usual these days is not an option for most enterprises—especially not application development and delivery teams.

During times of challenge and uncertainty, businesses across the world must think creatively and do more with less in order to maintain reliable and effective systems for customers in need.

COVID-19 is forcing us all to adapt to new realities. This is especially true for the healthcare industry. From large healthcare providers to pharmaceutical companies to small, privately run practices, nearly every customer in the healthcare industry is re-evaluating and shifting their strategies.

Protecting users and data is a big job for organizations, especially as attackers continue to attempt to access enterprise credentials and gain control of corporate machines. Google has been working hard to help protect corporate passwords with features like Password Checkup and a variety of other Chrome functionalities.

Azure:

Modern applications are increasingly built using containers, which are microservices packaged with their dependencies and configurations. For this reason, many companies are either containerizing their existing applications or creating new complex applications that are composed of multiple containers.

In the past few months, there has been a dramatic and rapid shift in the speed at which organizations of all sizes have enabled remote work amidst the global health crisis. Companies examining priorities and shifting resources with agility can help their employees stay connected from new locations and devices, allowing for business continuity essential to productivity.

Whether you're a new student, thriving startup, or the largest enterprise, you have financial constraints and you need to know what you're spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management + Billing comes in.

Azure Container Registry announces dedicated data endpoints, enabling tightly scoped client firewall rules to specific registries, minimizing data exfiltration concerns.

Azure Backup uses Recovery Services vault to hold customers' backup data which offers both local and geographic redundancy. To ensure high availability of backed up data, Azure Backup defaults storage settings to geo-redundancy.
Categories: DBA Blogs

Kubernetes Taints/Tolerations and Node Affinity for Dummies

Thu, 2020-04-09 21:26

In order to guarantee which pod goes to which node in a Kubernetes cluster, the concept of Taints/Tolerations and Node Affinity is used. With Taints/Tolerations, we taint a node with a specific label, and then add those labels as toleration in the pod manifest to ensure that if a pod doesn't have that toleration, it won't be scheduled on that tainted node. In order to ensure that, this tolerated pod only goes to tainted node, we also add an affinity within the pod manifest.


So in other words, Taints/Tolerations are used to repel undesired pods, whereas Node Affinity is used to guide Kubernetes scheduler to use a node for a specific pod.

So why we need both Taints/Tolerations and Node Affinity? It is to guarantee that a pod goes to an intended node. Because Taints/Tolerations ensures that undesired pod stay away from a node but it doesn't ensure that desired pod will actually be placed on that node. In order to guarantee that, we use node affinity.

Following is a complete example of 4 deployments: red, blue, green, other. We have 4 worker nodes node01, node02, node03, node04.

We have labelled nodes to their respective colors, and we also have added a taint with same key value pair. Then we added a toleration in deployments for the respective key value pair. For example, this ensures that For node01 , the label is red, taint is also red and any pod which doesn't have label red won't be scheduled on this node. Then we added a node affinity which ensures that red pods will only be placed on node with label red. Same logic is being used for other deployments.


For Node red:

Kubectl label node node01 color=red
Kubectl taint node node01 color=red:NoSchedule

For Deployment red:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: red
spec:
  replicas: 1
  selector:
    matchLabels:
      color: red
  template:
    metadata:
      labels:
        color: red
    spec:
      containers:
      - name: nginx
        image: nginx
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: color
                operator: In
                values:
                - red
      tolerations:
  - key: "color"
    operator: "Equal"
    value: "red"
    effect: "NoSchedule"


For Node blue:

Kubectl label node node02 color=blue
Kubectl taint node node02 color=blue:NoSchedule

For Deployment blue:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: blue
spec:
  replicas: 1
  selector:
    matchLabels:
      color: blue
  template:
    metadata:
      labels:
        color: blue
    spec:
      containers:
      - name: nginx
        image: nginx
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: color
                operator: In
                values:
                - blue
      tolerations:
  - key: "color"
    operator: "Equal"
    value: "blue"
    effect: "NoSchedule"

For Node green:

Kubectl label node node03 color=green
Kubectl taint node node03 color=green:NoSchedule

For Deployment green:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: green
spec:
  replicas: 1
  selector:
    matchLabels:
      color: green
  template:
    metadata:
      labels:
        color: green
    spec:
      containers:
      - name: nginx
        image: nginx
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: color
                operator: In
                values:
                - green
      tolerations:
  - key: "color"
    operator: "Equal"
    value: "green"
    effect: "NoSchedule"


For Node Other:

Kubectl label node node04 color=other

For Deployment other:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: other
spec:
  replicas: 2
  selector:
    matchLabels:
      color: other
  template:
    metadata:
      labels:
        color: other
    spec:
      containers:
      - name: nginx
        image: nginx
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: color
                operator: In
                values:
                - other
      tolerations:
  - key: "color"
    operator: "Equal"
    value: "other"
    effect: "NoSchedule"

Hope this helps!!!
Categories: DBA Blogs

BigQuery Materialized Views and Oracle Materialized Views

Wed, 2020-04-08 18:33
One of the common ways of one-to-many replication setups in Oracle databases involve, on high level, having one master transaction database which holds the transactions, then a mview log is created on that table.


Then all the other reporting databases subscribe their respective materialized views (MViews) to this log table. These MViews remain in sync with the master log table through incremental refresh or through complete refresh. As long as it runs fine, it runs fine but when things break, it becomes ugly, and I mean ugly. The MViews at reporting databases could lag behind the master log due to network issue or if the master database goes down. Doing a complete refresh is also a nightmare and you have to do lots of purging and tinkering. The more subscribing MViews, the more hassle it is when things break.

BigQuery is Google's managed data warehousing service which now offers materialized views. If you have managed Oracle MViews, it brings you to tears when you learn that BigQuery MViews offers following:

Zero maintenance: A materialized view is recomputed in background once the base table has changed. All incremental data changes from the base tables are automatically added to the materialized views. No user inputs are required.

Always fresh: A materialized view is always consistent with the base table, including BigQuery streaming tables. If a base table is modified via update, merge, partition truncation, or partition expiration, BigQuery will invalidate the impacted portions of the materialized view and fully re-read the corresponding portion of the base table. For an unpartitioned materialized view, BigQuery will invalidate the entire materialized view and re-read the entire base table. For a partitioned materialized view, BigQuery will invalidate the affected partitions of the materialized view and re-read the entire corresponding partitions from the base table. Partitions that are append-only are not invalidated and are read in delta mode. In other words, there will never be a situation when querying a materialized view results in stale data.

Smart tuning: If a query or part of a query against the source table can instead be resolved by querying the materialized view, BigQuery will rewrite (reroute) the query to use the materialized view for better performance and/or efficiency.

In my initial testing, the things work like a charm and refresh takes at most couple of minutes. I will be posting some tests here very soon. But suffice is to say that delegating manaagement of Mview refresh to Google is reason enough to move to BigQuery.


Categories: DBA Blogs

How To: Reduce Cloud Bill in Half for Any Cloud Provider

Tue, 2020-03-31 17:09
“But you said, we would be only paying what we use then how come my Cloud bill is way more than my on-prem cost?”
Managements are now asking above question from Cloud Operations team all across the globe as recession looms, revenues drop, businesses almost to a standstill, budgets shrunk and economic outlook becoming grimmer by the day.



Above statement about Pay-as-you-go is still true about the cloud but it doesn’t happen just by migrating to cloud. It has to be configured, monitored, managed and audited. If you start a virtual machine in the cloud, use it for 9 to 5, leave it running until next morning, then you will be charged for 24 hours instead of just from 9 to 5 where you were supposedly “using” it. Running in cloud means using, and you pay for it. Same is also true for managed database services in the cloud.
Following are top 3 things you can do right now to significantly reduce your cloud bill.
  1. Control Resource Creation:
Resource Creation privileges must be given to a handful team in your organization, and this team should be doing it through pipeline after proper approvals. So revoke resource creation rights from everyone else, and let one team be doing it through pipeline with a single click.
2. Tag Resources
Tag your resources, and no resource in cloud should live without following tags:
StartTime, StopTime, LifeSpan
These tags are self-explanatory. StartTime will tell, when this resource will start, when it will stop, and how long this will remain in cloud. You can either write your own serverless functions to enforce these timings, or use any third party or cloud-native solution to implement it. (Drop me a line, if you want further guidance on this). This point is a game-changer for your bill especially for VMs and managed databases.
3. Remove the Clutter
No matter how little it costs; if its not needed or you cannot find the owner, just delete it. Orphan persistent disk volumes, ownerless compute images, dangling object stores, and all stray services must be stopped and deleted.
If followed, these 3 points can reduce your cloud bill to half or even more in any cloud provider setup. Cloud’s honeymoon period is over, and with all this belt tightening around, its time to save.
Categories: DBA Blogs

Basic DynamoDB Data Modelling

Sat, 2020-03-28 01:58



The key differentiator between a NoSQL database like AWS Dynamodb and a relational database like Oracle is of the way data is modelled. In a relational model; data is normalized, de-duplicated and relationships are established between entities. In a NoSQL database like Dynamodb, data can be duplicated and entities can be fused together. This is done to facilitate scalability which is hallmark of today's hyper-scale cloud based applications.


One of the most common relationship within data modelling is One-to-Many. For example, we have 2 entities; Painting and Gallery. A painting can only reside in one gallery, whereas a gallery can have multiple paintings. This is one to many relationship. In relational modelling world, we would have 2 tables; Painting and Gallery, such as:


Museum
Painting

MuseumId
MuseumName
PaintingId
PaintingName
MuseumId
M1
Sofía
P1
Guernica
M1
M2
NSW Gallery
P2
The Great Masturbator
M1
M3
Louvre
P3
Sofala
M2
P4
Mona Lisa
M3


In above One-to-Many relationship, we have joined both Museum and Painting with primary key and foreign key relationship. So e.g. if want to check all paintings in Museum M1, then the query would be:

select p.Title, m.MuseumName from Painting p, Museum m where p.MuseumId=m.MuseumId and m.MuseumId=’M1’;

Now joins are expensive, and they become more expensive as the data grows. In huge cloud scale databases this hampers scalability and at times become impractical to have these joins, hence the NoSQL databases like Dynamodb.

So how do you model One-to-Many relationship in a Dynamodb table when we cannot have joins and aggregations?

We know what the entities are, and we have an access pattern, so now let’s design our table.


MuseumPaintingTable
Primary Key
Attributes
PK
SK
MuseumName
Title
Artist
Museum#M1
Painting#P1
Sofía
Guernica
Picasso
Painting#P2
Sofía
The Great Masturbator
Salvador Dali
Museum#M2
Painting#P3
NSW Gallery
Sofala
Russel Drysdale
Museum#M3
Painting#P4
Louvre
Mona Lisa
Leonardo


If you are coming from a relational world, the above table may seem like blasphemy even though I haven’t yet used different attributes for each item to keep it simple. This is where you need to appreciate the flexibility of NoSQL databases to facilitate scalability.

In above table, we have define a composite primary key, consisting of a partition key and a sort key. A primary key in Dynamodb table defines a unique item, where item corresponds to a row. Remember that all access patterns are driven through the primary key, which means that the entry point for your queries must be the primary key. Every museum and painting has its own unique id, so we have combined them together to generate a unique item.

Note that how we have defined a one-to-many relationship in above table using partition key and sort key within the primary key. For one Museum#M1, we have 2 paintings; Painting#P1 and Painting#P2, and then both of these paintings have their own attributes, which can even differ in structure. We have fused both Painting and Museum entities together.

Now how would you write the previous relational query in DynamoDB, where we wanted to check all paintings in the Museum M1? It would be like this:

aws dynamodb query \
    --table-name MuseumPaintingTable \
    --key-condition-expression "PK = :MID" \
    --expression-attribute-values  '{":MID":{"S":"M1"}}'
               --projection-expression 'Title,MuseumName'

The above command is a RESTful API call to Dynamodb table.
Categories: DBA Blogs

Why on Earth Would You Manage Your Own Kubernetes Control Plane?

Sat, 2020-03-14 21:40
The word Kubernetes comes from the Greek word 'Helmsman'. A helmsman is the person who navigates the ship ensuring the steady and desired course. Whether the seas are turbulent or not, or if some component of ship fails, or the load of containers on ship needs to be re-balanced; its the job of Helmsman or Kubernetes to make sure ship maintains its desired state until it reaches its destination.





For managing software containers, then, Kubernetes seems like an apt name. These containers comprise of the ready-to-run code, code's run-time, and code dependencies. These containers are then encapsulated within abstraction of Pods. Pods are container of one ore more containers which are tightly coupled. These Pods are then abstracted into Replicaset which are copies of the pods for high availability. These replicasets are abstracted into Deployment which is a stateless object of Kubernetes. Then there is Service object of Kubernetes which exposes the application to the networks connected to the Kubernetes cluster. All of these objects run in data plane section of Kubernetes.

In order to manage all the deployments, services, pods, and other objects Kubernetes has master nodes in something called as Control Plane. This control plane interfaces with data plane through api server. Then there are controllers to ensure desired state of nodes, pods and other objects in data plane, Scheduler to to schedule new pods on pertinent nodes, ETCD key-value persistent data store to store configuration and state of your whole Kubernetes cluster, cloud provider specific cloud manager etc.

Just step back a bit and think why we are doing this? All of this complexity just to provision a software in stead and reliable way? Isn't developing software complicated enough that we have to resort to such complexity to manage it? That is why it makes sense to delegate management of Kubernetes Control Plane to a Managed Kubernetes Service provider like Google Kubernetes Engine (GKE), AWS Elastic Kubernetes Service (EKS), Microsoft Azure Kubernetes Service (AKS), Linode Kubernetes Service (LKS), and there are few others.

Why not focus more and more on making software aligned to user expectations and experience rather than managing the software needed to provision and maintain this software? Other than Kubernetes, modern software also has to reside in some git oriented repository, follow the agile methodology, go through a pipeline, get packaged into a container image and then mounted on Kubernetes. Handing over management of Kubernetes control plane, and some part of data plane to a managed service only makes sense. All the managed service providers mentioned above and few others are Cloud Native Computing Foundation (CNCF) certified. They go through rigorous security and compliance checks all the time. They keep the K8s versions updated and stable and much more.

If you are running Kubernetes on-prem or in cloud and still managing its control plane by itself, I will be very interested to learn the reason behind it. Please share in the comments. Thanks. 
Categories: DBA Blogs

What is Nomos in Anthos?

Sat, 2020-03-07 19:41
In kubernetes, everything is a code. For code, the best place to reside is any git repository which should act as single source of truth. GCP Anthos enables us to extend the power of kubernetes to multi and hybrid clouds. With this power comes the added complexity of managing multiple kubernetes clusters scattered in GKE, or in GKE on-prem or in other cloud providers.



Anthos Config Management solves this issue elegantly by declaring the desired state of clusters in YAML or JSON files called as Configs. Then Anthos ensures from top-down that all clusters stay reconciled to the desired state. This is the same principle of configuration as code which is the soul of Kubernetes manifests.

In the context of Anthos and Google Kubernetes Engine (GKE), Nomos is a CLI tool which checks config files in git repository for any syntax errors. Nomos tool is installed locally and with Nomos you can check config files for syntax before pushing them to your single source of truth Git repository.

The usage of Nomos is quite intuitive. For help use:

nomos --help

To check installation status of Nomos:

nomos status

To initialize a new Anthos Config Management repo, create an empty directory, change to it, initialize a new Git repository, then run the nomos init command:

mkdir my-repo
cd my-repo
git init
nomos init

This creates the basic directory structure of your repo, including the system/, cluster/, and namespaces/ directories.

Before you commit a config to the repo, use the nomos vet command to check the syntax and validity of the configs in your repo:

nomos vet

If syntax errors are found, the nomos vet command exits with a non-zero status.

You can read more about nomos here.
Categories: DBA Blogs

Anthos is Icing on GKE Cake

Thu, 2020-03-05 21:31
If you have installed and managed all of Kubernetes by yourself, then you can appreciate how relieving it is to handover control plane and node component to services like GKE, AKS and EKS or any similar managed kubernetes service.
So far, for me Google Kubernetes Engine (GKE) is a winner hands down when it comes to managed kubernetes service due to its maturity, ease of use, and more importantly seamless integration with rest of Google Cloud Platform (GCP) services. AWS and Azure are also catching up fast but still some distance to go.

Google is one leap ahead with Kubernetes by offering the fragrant Anthos. Anthos enables you to run managed Kubernetes anywhere including on-prem, GKE, AWS, Azure or to any other supported cloud provider. Google is very actively improving Anthos as that clearly seems the next big thing in realm of Devops. It's heralding a new era in hybrid and multi cloud environment whereas cloud providers like AWS aren't much fan of term multi-cloud. 

Where Anthos is great in blending in K8s to your on-prem and cloud environments and providing a consistent application management framework, it could also introduce bad practices big time. One such feature which could be badly exploited is transferring all on-prem garbage to cloud. For instance, with Anthos Migrate, you can simple convert your VMs running on on-prem to containers and then move them into Kubernetes. 

This is against the very spirit of light-weight, mutable, micro-service containers. This lift and shift is powerful yet dangerous as no K8 can mask the heavy VM's monolithic burden. So, its should be imperative to prune the VMs before migrating them as containers to GKE through Anthos. 

Having said that Anthos is a thing of beauty and here to stay!!!!
Categories: DBA Blogs

Rename SQL Server Database in AWS RDS

Tue, 2020-03-03 15:06
In AWS RDS based SQL Server, you cannot rename a database from SQL Server management studio. You will get an error about permissions. You need to use rdsadmin command to accomplish that.



If you get error:

Msg 5030, Level 16, State 2, Line 1
The database could not be exclusively locked to perform the operation.

Then you need to set the database first to single user, rename it and then revert back to multi user. Use following steps to rename SQL Server database in AWS RDS:

use master
ALTER DATABASE OldDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE    
exec rdsadmin.dbo.rds_modify_db_name 'OldDB','NewDB'
ALTER DATABASE NewDB SET MULTI_USER

Hope that helps.


Categories: DBA Blogs

RDS Resource in Terraform

Fri, 2020-01-17 21:19
Just marveling again how easy and clean it is to create an RDS resource in Terraform.

# Create the RDS instance
resource "aws_db_instance" "databaseterra" {
  allocated_storage    = "${var.database_size}"
  engine               = "${var.DbEngineType}"
  engine_version       = "${var.DbMajorEngineVersion}"
  instance_class       = "${var.DBInstanceClass}"
  identifier             = "testdbterra"
  username             = "${var.database_user}"
  password             = "${var.database_password}"
  db_subnet_group_name = "${aws_db_subnet_group.rdssubnetgroupterra.id}"
  parameter_group_name = "${aws_db_parameter_group.rdsparametergroupterra.id}"
  multi_az           = "${var.db_multiaz}"
  vpc_security_group_ids = ["${aws_security_group.RdsSecurityGroupterra.id}"]
  publicly_accessible    = "false"
  backup_retention_period = "2"
  license_model           = "license-included"
  apply_immediately = "true"
  tags = {
    Env = "Non-Prod"
  }

Just have your vars ready and thats it.
Categories: DBA Blogs

Use Conda to Generate Requirements.txt for Docker Containers

Fri, 2019-12-27 00:01
pip is a standard package manager. Requirements.txt can be generated in one environment and installed by pip in a new environment. Conda replicates own installation. Pip produces a list of packages that were installed on top of standard library to make the package you wrote work.

Following are the steps to generate requirements.txt file to be used insdie Dockerfile for docker containers:



Go to your project environment conda activate

conda list gives you list of packages used for the environment

conda list -e > requirements.txt save all the info about packages to your folder

conda env export > .yml

pip freeze

Hope that helps.
Categories: DBA Blogs

Urdu in AWS

Fri, 2019-10-04 00:24
Urdu is arguably one of the most beautiful and poetic language on the planet. AWS Translate now supports Urdu along with 31 other languages, which is awesome.



AWS Translate is growing leaps and bounds and has matured quite a lot over the last few months. There are now hundreds of translations and its now available in all the regions.


Amazon Translate is a text translation service that uses advanced machine learning technologies to provide high-quality translation on demand. You can use Amazon Translate to translate unstructured text documents or to build applications that work in multiple languages.


Amazon Translate provides translation between a source language (the input language) and a target language (the output language). A source language-target language combination is known as a language pair.


As with other AWS products, there are no contracts or minimum commitments for using Amazon Translate.



Categories: DBA Blogs

How to do a quick health check of AWS RDS database

Fri, 2019-10-04 00:16
Just because the database is on AWS RDS, it doesn't mean that it won't run slow or get stuck. So when your users complain about the slowness of your RDS database, do the following quick health check:
1- From AWS console, in RDS section, go to your database and then go to Logs and Events tab. From Logs, in case of Oracle check alert log, in case of SQL Server check Error log, for PostgreSQL check postgres log and error log for MySQL database. Check for any errors or warnings and proceed accordingly as per that database engine.


2- If you dont see any errors or warnings or if you want to check in addition, then first check which database instance type you are using. For example for one of my test Oracle databases, it is db.r4.4xlarge.


Go to https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html and check specifications of this instance type.

For instance, for db.r4.4xlarge, it is :


Instance Class vCPU, ECU, Memory (GiB), VPC Only, EBS Optimized, Max. Bandwidth (Mbps), Network Performance
db.r4.4xlarge 16  53  122   Yes   Yes    3,500     Up to 10 Gbps


So this db.r4.4xlarge has a max bandwidth (throughput) of 437.5 MB/s  (3500 Mbps/8 = 437.5 MB/s). The throughput limit is separate for read and write, which means you’ll get 437.5 MB/s for read and 437.5 MB/s for write.


3- Now go to Monitoring tab of this RDS in your console and check Read Throughput and Write Throughput to see if your instance is touching above threshold. For instance in this case 437.5. If yes, then you know that IO is the issue and you may need to either tune the SQLs responsible or increase instance size.


4- Similarly, from the same monitoring tab check for CPU usage, free memory available and Free storage space to make sure no other threshold is being reached.
5- Also check for Disk Queue Depth. The Disk Queue Depth is the number of IO requests waiting to be serviced. This time spent waiting in the queue is a component of latency and service time. Ideally disk queue depth of 15 or less should be good, but in case you notice latency greater than 10 or milliseconds accompanied by high disk queue depth than that could cause performance issues.


6- Last but not least, reviewing and tuning your SQLs is the biggest optimization gain you can achieve whether your database is in RDS or not.


Hope it helps.
Categories: DBA Blogs

Cloning of RDS Instance to Another Account

Wed, 2019-08-28 21:01
Frequently, we need to refresh our development RDS based Oracle database from the production which is in another account in AWS. So we take a snapshot from production, share it with another account and then restore it in target from the snapshot.

I will post full process in a later post, but for now just sharing an issue we encountered today. While trying to share a snapshot with another account, I got the following error:


Sharing snapshots encrypted with the default service key for RDS is currently not supported.


Now, this snapshot was using default RDS keys and that is not supported. So in order to share it, we need to have customer managed keys and then copy this snapshot with these news keys and only then we can share it. You don't have to do anything at the target, as these customer managed keys become part of that snapshot. You can create customer managed keys in KMS console and may be assign to IAM user you are using.


I hope it helps.
Categories: DBA Blogs

Pages