TeraData Performance Analysis Tips
Key Points:
- Stat Collection
- Pack Disk
- Data Skew Analysis
- Lock monitoring (Locking Logger)
- Session tuning (DBS Control parameters)
1. Stat Collection :
Statistics on the loaded tables are very important from the performance perspective. Statistics helps optimizer to generate the accurate & faster query plans. An old statistics in warehouse can lead to wrong query plans which may take time in query processing. Hence it is much required to refresh the statistics periodically at least in the warehousing environment. We can identify the tables which are frequently accessed & modified with inserts, updates, deletes etc.
It is recommended to refresh the stats after every 10% of data change. We can collect the statistics at column level or at index level.
Syntax:
Collect statistics on column (column_name 1,.., column_name n); OR
Collect statistics on index (column_name 1,.., column_name n);
2. Pack Disk :
Pack disk is an utility that free up the cylinder space on the database, this
utility must be run periodically as in the warehouse environment large amount of data inserts, updates are happening which causes the physical memory to disorder due to frequent data manipulation. Pack disk utility allows us to restructure & physically reorder the data, free up space same as defragmentation. Teradata also run mini CYLPACKs automatically, if cylinder space goes below the prescribed limit. Cylinder space is required for the merge operation while the data Insert, Deletes Updates etc.
To run a pack disk we use Ferret utility provided by Teradata can be run through Teradata Manager Tool or through telnet on node session.
The set of commands that starts packdisk utility are given below one can create a kron job to schedule the same & run it periodically.
Commands to run pack defrag & packdisk utilities :
~ferret
defrag
Y
packdisk fsp=8
Y
3. Skew Analysis :
Primary index of a table in Teradata is responsible for the data
distribution on all the AMPs. Proper data distribution is required for the parallel processing in the system. As Teradata system follows shared nothing architecture, all the AMPs works in parallel. If data is evenly distributed amongst the AMPs then the amount of the work done by every AMP would be equal & time required for particular job would obviously be lesser. In contrast to this if only one/two AMPs are flooded with the data i.e. data skew then while running that job the two AMPs would be working & others will be idle. In this case we won’t be utilizing the parallel processing power of the system.
To avoid such data skew need to analyze the primary index of the tables in Teradata database over the period of time it might happen that data is getting accumulate at the few AMPs, which can have a adverse effect on the ETL as well as the system performance.
To analyze the data distribution for the table we can use the inbuilt HASH functions provided by the Teradata. To check the data distribution for a table one can use a query:
SELECT
HASHAMP (HASHBUCKET (HASHROW (Column 1,.., column n))) AS AMP_NUM, count(*)
From Table_Name
Group by 1;
This query will provide the distribution of records on each AMP
we can also analyze the probable PIs with this query which will predict the data distribution on the AMPs
4. Lock monitoring :
Locking Logger is an utility that enables us to monitor the
locking on the tables. Using this utility we can create a table that has the entries for the locks which have been applied to the tables while processing. This utility allows us to analyze the regular ETL process, jobs being blocked at particular time when there is no one to monitor the locking. By analyzing such locking situations we can modify the jobs & avoid the waiting period due to such situations.
To apply this locking loggers First, we need to enable locking logger via the DBS console window or the cnsterm subsystem. The setting does not take effect until the database is restarted.
- LockLogger - This Field defines the system default for the locking logger.
This allows the DBA to log the delays caused by database locks to help in identifying lock conflicts. To enable this feature set the field to TRUE. To disable the feature set the field to FALSE.
After a database restart with the LockLogger flag set to true, the Locking Logger will begin to accumulate lock information into a circular memory buffer of 64KB. Depending on how frequently the system encounters lock contention, this buffer will wrap, but it will usually span a several day period. Following a period of lock contention, to analyze the lock activity, you need to run the dumplocklog utility which moves the data from the memory buffer to a database table where it can be accessed.
5. Session Tuning:
Session tuning is done for the running the load utilities in parallel
this requires to analyze some DBScontrol parameters & tune the same to provide the best parallel processing of the load utilities. There are two parameters MaxLoadAWT & MaxLoadTasks that enables the parallel job management a short note on the same:
The MaxLoadAWT internal field serves two purposes:
1) Enabling a higher limit for the MaxLoadTasks field beyond the default limit of 15.
2) Specifying the AMP Worker Task (AWT) limit for concurrent FastLoad and MultiLoad jobs when a higher limit is enabled.
In effect, this field allows more FastLoad, MultiLoad, and FastExport utilities running concurrently while controlling AWT usage and preventing excessive consumption and possible AWT exhaustion.
• The default value is zero: When MaxLoadAWT is zero, concurrency limit operates in the same manner as prior to V2R6.1 MaxLoadTasks specifies the concurrency limit for all three utilities: FastLoad, MultiLoad, and FastExport. The valid range for MaxLoadTasks is from 0 to 15.
• When MaxLoadAWT is non-zero (higher limit enabled): It specifies the maximum number of AWTs that can be used by FastLoads and MultiLoads. Maximum allowable value is 60% of the total AWTs. The valid range for MaxLoadTasks is from 0 to 30.
A new FastLoad/MultiLoad job is allowed to start only if BOTH MaxLoadTasks AND MaxLoadAWT limits are not reached. Therefore, jobs may be rejected before MaxLoadTasks limit is exceeded. MaxLoadTasks specifies the concurrency limit for the combination of only two utilities: FastLoad and MultiLoad. FastExport is managed differently; FastExport is no longer controlled by the MaxLoadTasks field.
A FastExport job is only rejected if the total number of active utility jobs is 60. At least 30 FastExport jobs can run at any time. A FastExport job may be able to run even when FastLoad and MultiLoad jobs are rejected. When a Teradata Dynamic Workload Manager (TDWM) utility throttle rule is enabled, the MaxLoadAWT field is overridden. TDWM will use the highest allowable value which is 60% of total AWTs. Update to MaxLoadAWT becomes effective after the DBS control record has been written. No DBS restart is required. Note that when the total number of AWTs (specified by the internal field MaxAMPWorkerTasks) has been modified but a DBS restart has not occurred, then there may be a discrepancy between the actual number of AWTs and the DBS control record. The system may internally reduce the effective value of MaxLoadAWTs to prevent AWT exhaustion.
• AWT Usage of Load Utilities: All load/unload utilities require and consume AWTs at different rates depending on the execution phase: FastLoad: - Phase 1 (Loading): 3 AWT's - Phase 2 (End Loading): 1 AWT's MultiLoad*: - Acquisition Phase (and before): 2 AWT's. Application Phase (and after): 1 AWT's FastExport: - All.
This description is for the single target table case which is the most common.
The above explained parameters can be analyzed & tuned accordingly to achieve the expected performance on the Teradata system. Also need to have some maintenance/ House keeping activities in place to avoid the performance implications due to some physical data parameters like data skew, less cylinder space etc.
how would you determine how many load tasks your system should be able to run? The default is 5, but if you want to increase that how do you determine how many your system can handle?
This blog ie really helpful for me. Good Content about dataware house performance thank you
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Teradata, kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor led training on Teradata . We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us:
Name : Arunkumar U
Email : arun@maxmunus.com
Skype id: training_maxmunus
Contact No.-+91-9738507310
Company Website –http://www.maxmunus.com
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Teradata , kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor led training on Teradata. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Free Demo Contact us:
Name : Arunkumar U
Email : arun@maxmunus.com
Skype id: training_maxmunus
Contact No.-+91-9738507310
Company Website –http://www.maxmunus.com
nice article
Linux Admin Training
Linux Training
Microsoft Azure Training
Oracle DBA Training
Oracle Soa Training
R Programming Training
SAP ABAP On Hana Training
Thanks for sharing such a useful information
IELTS Coaching in chennai
German Classes in Chennai
GRE Coaching Classes in Chennai
TOEFL Coaching in Chennai
spoken english classes in chennai | Communication training
Great and I have a swell present: How Much Full House Renovation Cost house renovation stardew