DETAILED TERADATA ARCHITECTURE AND QUERY FLOW
Continuing from the discussion on a bird’s eye view of Teradata’s Architecture, we now deep dive a little more into the Architecture of Teradata.
PARSING ENGINE (PE):
Parsing engine is responsible for accepting client connections and taking the request from the users. One parsing engine can handle maximum of 120 sessions and it is important to know that one Teradata node has many parsing engines and amps. Once the session is accepted by the Parsing Engine, the SQL requests are also taken in by the parsing engine. We will discuss the flow of user request (SQL) step by step in the end once all the other components have been discussed.
MESSAGE PASSING LAYER (BYNET) :
Message Passing Layer which is otherwise known as BYNET in Teradata is responsible for taking in instructions from the parsing engine and propagates it over the network towards all the AMPs. As said in previous article, AMPs are the workhorses and the instructions needs to be known by all the AMPs before the work can be completed. For fault tolerance we have two bynets so that if one of them goes down then the DB operations continue using the other bynet.
ACCESS MODE PROCESSORS (AMPs):
Access Module Processors, also known as AMPs are the workhorses of Teradata i.e. that is carry out the main execution of user requests on the data stored in the database. AMPs have threads known as Amp Worker Tasks (AWTs) that does the work. When the AWTs are not available on a busy system, the coming workload waits till an AWT is available. It is very necessary to remember that all AMPs contribute to the execution of a user’s request as the data is distributed across the amps. Amps are virtual processors i.e. they are not hardware but are programs running inside Teradata. The execution time of any SQL is equal to the time taken by the slowest AMP to carry out its task. We will discuss about what/why an AMP can get slow much later in the course but for now it’s important to remember that an SQL is deemed complete only when all the AMPs have finished executing it. If more workload is queued and system runs out of AWTs, the system reaches a Flow Control Stage in which the performance of the database is impacted. The number of AWTs depends on the number of AMPs in configuration. The maximum number of available AWTs is tunable by system level settings.
Every AMP has a logical disk (virtual disk) attached to it which takes in a section of the hardware storage and are managed by a particular AMP and is not shared with any other AMP thereby enforcing shared nothing architecture. Data inside the database is stored as data blocks at the file system level.
QUERY FLOW IN TERADATA:
Once a user connection is established and a request (SQL) is passed to Teradata, it is taken by PE and passes through the various stages as shown in the diagram:
Lexer takes the request and breaks it down in to tokens also known as lex and are forwarded over to the next phase of parsing engine which is syntaxer. Syntaxer checks the user request to be syntactically correct and if it’s not, then an error is reported and the execution ends.
If the request is syntactically correct then the request is passed on to the next phase which is Resolver. Resolver checks for the objects mentioned in the request to be present in the database i.e. it does the semantic checking of the request. If the request is syntactically correct but references objects which are not in the database, then the request fails in the resolver. Once the request is resolved, it is passed over to the next phase called Query Rewrtite (QRW). QRW rewrites the user request in more optimized way by adopting certain mechanisms like viewfolding etc and passes it over to the Optimizer.
Optimizer generates the most cost effective plan to execute the user request and creates a step by step plan for its execution known as an Explain Plan. Explain Plans are a set of steps that the database will perform for executing the user SQL. The dispatcher then picks the first step and passes it over the bynet to all the amps. All the amps perform the task and give an acknowledgement to dispatcher to send in the next step. All the steps are then executed one by one in this way and once the last step is executed, dispatcher sends successful execution message to the user and gives the output to them.
This was all about the architecture of TD and a basic query flow through Teradata. Please post your questions if any.